Goroutine leaks and how to handle them

5 min readMay 15, 2022

Introduction

Goroutine is a lightweight function that executes concurrently with other Goroutines in the same address. It doesn’t fall under the category of either thread or process. As opposed to writing multithreading programming in other languages in which the thread is managed by the OS kernel directly, Goroutine is managed by the Go runtime.

While Go’s model of concurrency is indeed powerful. We would still need to pay attention to some of the possible problems in concurrency programming such as deadlock, race conditions, and so on. This post would be mainly discussing Goroutine leaks and how we can handle them.

Goroutine leaks

Goroutine leak is a type of memory leak. It happens simply because the goroutine is not terminated and keeps hanging in the background. The main effect is it would be consuming a lot of your computer’s memory and eventually degrade your application’s performance. Thus, we would really need to avoid Goroutine leaks. First, let me show you one example of how Goroutine leak might occur.

Consider you have 3 workers, each of them doing the same task and you want to return the main function if any of them have completed the execution. So you make use of channel and goroutine to take advantage of concurrency. However, before the function return, you can see that there are still 2 goroutines hanging.

The Goroutine is blocked when we are writing to a channel with no receiver, notice that we break out the loop at line 22 when one of the Goroutines returns a true value while another 2 of them are still executing.

package mainimport (
 "fmt"
 "runtime"
 "time"
)func main() {workers := []worker{worker1{}, worker2{}, worker3{}}c := make(chan bool)for _, w := range workers {
  go func(w worker) {
   res := w.executing()
   c <- res
  }(w)
 }for data := range c {
  if data {
   fmt.Println("work done")
   break
  }
 }time.Sleep(time.Second * 1)
 fmt.Printf("Number of hanging goroutines: %d", runtime.NumGoroutine()-1)
 return}type worker interface {
 executing() bool
}type worker1 struct{}func (worker1) executing() bool {
 time.Sleep(10 * time.Millisecond)
 return true
}type worker2 struct{}func (worker2) executing() bool {
 time.Sleep(500 * time.Millisecond)
 return true
}type worker3 struct{}func (worker3) executing() bool {
 time.Sleep(1500 * time.Millisecond)
 return false
}

Go Playground: https://go.dev/play/p/PDQo_Ce-yAX

Buffered channel

The good thing about using a buffered channel is it would only result in a blockage if send/write operation is sent to the full buffered channel. Since our buffered channel would not be full, our Goroutine would all be returned.

However, this solution might not be scaleable in case we have a large number of workers.

package mainimport (
 "fmt"
 "runtime"
 "time"
)func main() {workers := []worker{worker1{}, worker2{}, worker3{}}c := make(chan bool, 2)for _, w := range workers {
  go func(w worker) {
   res := w.executing()
   c <- res
  }(w)
 }for data := range c {
  if data {
   fmt.Println("work done")
   break
  }
 }time.Sleep(time.Second * 5)
 fmt.Printf("Number of hanging goroutines: %d", runtime.NumGoroutine()-1)
 return}type worker interface {
 executing() bool
}type worker1 struct{}func (worker1) executing() bool {
 time.Sleep(10 * time.Millisecond)
 return true
}type worker2 struct{}func (worker2) executing() bool {
 time.Sleep(500 * time.Millisecond)
 return true
}type worker3 struct{}func (worker3) executing() bool {
 time.Sleep(1500 * time.Millisecond)
 return false
}

Go Playground: https://go.dev/play/p/oDRrr82s6hv

Channel

You can see we make sure of a new channel to send in the done signal, when the goroutine received the signal, it would return.

However, this approach still no fully solved our problem of scalability (too many send operations to signal the “done” if we have a large number of workers), we would need a more robust approach to send a broadcasting signal.

package mainimport (
 "fmt"
 "runtime"
 "time"
)func main() {workers := []worker{worker1{}, worker2{}, worker3{}}c := make(chan bool)
 doneCh := make(chan bool)for _, w := range workers {
  go func(w worker) {
   for {
    select {
    case <-doneCh:
     fmt.Println("exiting Goroutine")returncase c <- w.executing():}}}(w)
 }for data := range c {
  if data {
   fmt.Println("work done")
   break
  }
 }for range workers {
  doneCh <- true}time.Sleep(time.Second * 5)
 fmt.Printf("Number of hanging goroutines: %d", runtime.NumGoroutine()-1)
 return}type worker interface {
 executing() bool
}type worker1 struct{}func (worker1) executing() bool {
 time.Sleep(10 * time.Millisecond)
 return true
}type worker2 struct{}func (worker2) executing() bool {
 time.Sleep(500 * time.Millisecond)
 return true
}type worker3 struct{}func (worker3) executing() bool {
 time.Sleep(1500 * time.Millisecond)
 return false
}

go playground:https://go.dev/play/p/2xPZ93agpGz

Close()

Notice that at line 36, we use close() to signal the closing of the channel, and all Goroutine returns. this approach solves our scalability problem mentioned previously.

package mainimport (
 "fmt"
 "runtime"
 "time"
)func main() {workers := []worker{worker1{}, worker2{}, worker3{}}c := make(chan bool, 2)for _, w := range workers {
  go func(w worker) {
   res := w.executing()
   c <- res
  }(w)
 }for data := range c {
  if data {
   fmt.Println("work done")
   break
  }
 }time.Sleep(time.Second * 5)
 fmt.Printf("Number of hanging goroutines: %d", runtime.NumGoroutine()-1)
 return}type worker interface {
 executing() bool
}type worker1 struct{}func (worker1) executing() bool {
 time.Sleep(10 * time.Millisecond)
 return true
}type worker2 struct{}func (worker2) executing() bool {
 time.Sleep(500 * time.Millisecond)
 return true
}type worker3 struct{}func (worker3) executing() bool {
 time.Sleep(1500 * time.Millisecond)
 return false
}