Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix exec hang when tx channel is full #316

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ningmingxiao
Copy link
Contributor

@ningmingxiao ningmingxiao commented Sep 27, 2024

@Burning1020 @mxpv another way to fix #270, this pr don't need to use unbounded_channel
can you review this pr ? Thank you.

@mxpv
Copy link
Member

mxpv commented Oct 3, 2024

Could you pls rebase against latest main to fix the CI ?

@ningmingxiao
Copy link
Contributor Author

done

@mxpv
Copy link
Member

mxpv commented Oct 4, 2024

Trying to understand this change. Can you elaborate how exactly this fixes the issue?

You spawn an extra thread on each iteration and each new thread still writes to the channel. There are no consumers on this side, right? So why does this help?

@ningmingxiao
Copy link
Contributor Author

ningmingxiao commented Oct 4, 2024

Trying to understand this change. Can you elaborate how exactly this fixes the issue?

You spawn an extra thread on each iteration and each new thread still writes to the channel. There are no consumers on this side, right? So why does this help?

I add some log find that containers.lock() can't get lock when chan is full.(and I add some log I can't get who hold the lock),
So I split it into two tokio::spawn to make sure consumer can consume the channel quickly.

❯ cat container.json 
{
  "metadata": {
    "name": "gpu"
  },
  "image": {
    "image": "docker.io/library/busybox:1.28"
  },
  "command": [
    "top"
  ],
  "tty": true,
  "linux": {
  }
}

❯ cat pod.json 
{
  "metadata": {
    "name": "sandbox-a10798895196912d15189a32112",
    "namespace": "default",
    "attempt": 1,
    "uid": "hdishd83djaidwnduwk28bcsb"
  },
  "log_directory": "/tmp",
  "linux": {
  }
}

crictl  run    --no-pull container.json  pod.json

main.go

package main

import (
        "fmt"
        "math/rand"
        "os/exec"
        "sync"
        "time"
)

func main() {
        test()
        fmt.Println("done")
}

func runCmd(command string) {
        cmd := exec.Command("bash", "-c", command)
        err := cmd.Run()
        if err != nil {
                fmt.Println(err)
        }
}

func test() {
        var wg sync.WaitGroup
        const numContainers = 500
        for i := 1; i <= numContainers; i++ {
                wg.Add(1)
                go func(i int) {
                        rand.Seed(time.Now().UnixNano())
                        randomNumber := rand.Intn(10) + 1
                        cmdcri := "crictl exec 2380a  sleep " + fmt.Sprintf("%d", randomNumber)
                        fmt.Println(cmdcri)
                        defer wg.Done()
                        runCmd(cmdcri)
                }(i)
        }
        wg.Wait()
}

and limit work_threads to 2 make it easier to reproduce

#[tokio::main(flavor = "multi_thread", worker_threads = 2)]
async fn main() {
    parse_version();
    run::<Service>("io.containerd.runc.v2-rs", None).await;
}

@ningmingxiao
Copy link
Contributor Author

ping @mxpv @Burning1020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-runc-shim Runc shim C-shim Containerd shim
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants