Debugging test plans
This document explains a few options available for finding bugs in test plans and troubleshooting failures
While writing a test plan, there are a few ways to troubleshoot. On this page I will introduce bugs intentionally so we can see how the system behaves and troubleshoot it.

Build errors

1
$ testground plan create --plan planbuggy
Copied!
The command above will create a default planbuggy test plan. Unfortunately, for our purposes, the plan has no bugs. Edit main.go so it contains the following buggier code:
main.go
1
package main
2
3
import "github.com/testground/sdk-go/runtime"
4
5
func main() {
6
runtime.Invoke(run)
7
}
8
9
func run(runenv *runtime.RunEnv) error {
10
// No closing quote, will not build.
11
runenv.RecordMessage("Hello Bugs)
12
return nil
13
}
Copied!

How it looks in terminal output

When this plan runs, the code is sent to the daemon to be built. Of course, this will fail. Notice that the output comes in several sections. The section labeled Server output shows us the error encountered by our builder.
1
$ testground run single --plan planbuggy --testcase quickstart --runner local:exec --builder exec:go --instances 1
2
3
May 5 00:31:15.650020 INFO using home directory: /home/cory/testground
4
May 5 00:31:15.650143 INFO no .env.toml found at /home/cory/testground/.env.toml; running with defaults
5
May 5 00:31:15.650182 INFO testground client initialized {"addr": "localhost:8042"}
6
May 5 00:31:15.651180 INFO using home directory: /home/cory/testground
7
May 5 00:31:15.651300 INFO no .env.toml found at /home/cory/testground/.env.toml; running with defaults
8
May 5 00:31:15.651339 INFO testground client initialized {"addr": "localhost:8042"}
9
May 5 00:31:15.651772 INFO test plan source at: /home/cory/testground/plans/planbuggy
10
11
>>> Server output:
12
13
May 5 00:31:15.662268 INFO performing build for groups {"req_id": "be8cc8ee", "plan": "planbuggy", "groups": ["single"], "builder": "exec:go"}
14
May 5 00:31:15.906353 ERROR go build failed: # github.com/coryschwartz/planbuggy
15
./main.go:10:35: newline in string
16
./main.go:10:35: syntax error: unexpected newline, expecting comma or )
17
./main.go:11:2: syntax error: unexpected return at end of statement
18
{"req_id": "be8cc8ee"}
19
May 5 00:31:15.906505 INFO build failed {"req_id": "be8cc8ee", "plan": "planbuggy", "groups": ["single"], "builder": "exec:go", "error": "failed to run the build; exit status 2"}
20
May 5 00:31:15.906563 WARN engine build error: failed to run the build; exit status 2 {"req_id": "be8cc8ee"}
21
22
>>> Error:
23
24
engine build error: failed to run the build; exit status 2
Copied!
In this case, the error is pretty straightforward, but in a more complex plan, this output can be difficult to parse. So what can you do?

Using standard debugging tools

Test plans are regular executables which accept configuration through environment variables. Because of this, you can test by compiling, testing, and running code. Except for the sync service, the code can be tested outside of Testground. Let's test the code this time without sending it to the Testground daemon. Let's see what the same code looks like testing locally.
1
$ go test
2
3
./main.go:10:35: newline in string
4
./main.go:10:35: syntax error: unexpected newline, expecting comma or )
5
./main.go:11:2: syntax error: unexpected return at end of statement
Copied!
If your plan relies on knowledge of the test plan or test case, this can be passed as an environment variable.
Now that output is much more readable!
I can't claim that build errors will always be as easy to diagnose as this one, but this feature enables plan writers to employ traditional debugging techniques or other debugging tools with which they are already familiar.

Debugging with message output

The next technique is useful for plans which build correctly and you want to observe the behaviour for debugging. If you have ever debugged a program by adding logging or printing to the screen, you know exactly what I'm talking about. On Testground plans can emit events and messages.
1
runenv.RecordEvent("this is a message")
Copied!
Another thing which might be useful for debugging is events. Just like messages, events can be used as a point-in-time caputre of the current state. Events are included in the outputs collection. They are recorded in the order they occur for each plan instance. We created R() and D() metrics collectors (results and debugging). The difference between these two is that debugging is sent to the metrics pipeline fairly quickly whereas results are collected at the end of a test run.
1
var things int
2
for {
3
// work work work...
4
things++
5
runenv.D().RecordPoint("how_many_things", things)
6
}
7
runenv.R().RecordPoint("total_things", things)
Copied!
To see how this works, let's use Ron Swanson's classic dilemma.
In the following plan, five philosophers Ron Swansons sit at a table with five forks between them. Unfortunately, there is an implementation bug and these Ron Swansons will be here forever. Add some debugging messages using runenv.RecordMessage to see if you can straighten this whole thing out (hint: answer is in the second tab).
exercise
solution
1
package main
2
3
import (
4
"github.com/testground/sdk-go/runtime"
5
"sync"
6
)
7
8
type Fork struct {
9
id int
10
m *sync.Mutex
11
}
12
13
type Swanson struct {
14
id int
15
meals int
16
leftFork, rightFork *Fork
17
wg *sync.WaitGroup
18
}
19
20
func (ron *Swanson) Feast(runenv *runtime.RunEnv) {
21
ron.leftFork.m.Lock()
22
ron.rightFork.m.Lock()
23
r.D().RecordPoint("eats", 1)
24
25
ron.leftFork.m.Unlock()
26
27
ron.meals = ron.meals - 1
28
if ron.meals > 0 {
29
runenv.Message("Still hungry. %d more meals to fill me up.", ron.meals)
30
ron.Feast(runenv)
31
} else {
32
runenv.Message("All done. For now...")
33
ron.wg.Done()
34
}
35
}
36
37
func main() {
38
runtime.Invoke(run)
39
}
40
41
func run(runenv *runtime.RunEnv) error {
42
// Five hungry rons eat 10 plates of food each.
43
countRons := 5
44
countMeals := 10
45
wg := sync.WaitGroup{}
46
47
// Create forks
48
forks := make([]*Fork, countRons)
49
for i := 0; i < countRons; i++ {
50
forks[i] = &Fork{
51
id: i,
52
m: new(sync.Mutex),
53
}
54
}
55
56
// Each ron swanson has has a fork to his left and right
57
rons := make([]*Swanson, countRons)
58
wg.Add(countRons)
59
for i := 0; i <= countRons; i++ {
60
rons[i] = &Swanson{
61
id: i,
62
leftFork: forks[i%countRons],
63
rightFork: forks[(i+1)%countRons],
64
meals: countMeals,
65
wg: &wg,
66
}
67
go rons[i].Feast(runenv)
68
}
69
70
wg.Wait()
71
runenv.RecordMessage("all rons have eaten")
72
return nil
73
}
Copied!
1
Line 24.
2
Ron puts down his left fork, but forgets to put down his right fork!
3
Add another line to unlick the rigtFork mutex to fix this problem.
4
5
Line 58.
6
We think there are an equal number of Rons and forks, but we are off
7
by one. Two Rons are sharing the same seat! this will not do.
Copied!
If you can successfully debug this code, you will see each Ron finish his meals and finally the end message "all rons have eaten".

Collecting outputs vs. viewing messages in the terminal

When using the local runners, with a relatively small number of plan instances it is fairly easy to view outputs in the terminal runner. I recommend troubleshooting the plan with a small number of instances. The same messages you can see in your terminal are also available in outputs collection.
For more information about this, see Analyzing the results.

Accessing profile data

All Go test plans have profiling enabled by default.
For information about using Go's pprof and generating graphs and reports, I recommend you start here.
On Testground gaining access to the pprof port can sometimes be non-obvious. Allow me to explain how to get access to port 6060 on each of our runners:
local:exec
local:docker
cluster:k8s
1
# Plans in the local:exec runner operate in the default namespace.
2
# the first plan will grab port 6060 and the any additional will listen
3
# on a random port.
4
#
5
# look for the following messages in the test run output to figure out
6
# the URL where to access the pprof endpoint of each instance:
7
#
8
# May 6 14:32:10.146239 INFO 0.0174s MESSAGE << instance 1 >>
9
# registering default http handler at: http://[::]:6060/
10
# (pprof: http://[::]:6060/debug/pprof/)
11
#
12
# May 6 14:32:10.146535 INFO 0.0179s MESSAGE << instance 2 >>
13
# registering default http handler at: http://[::]:64912/
14
# (pprof: http://[::]:64912/debug/pprof/)
15
16
17
$ echo "top" | go tool pprof http://localhost:6060/debug/pprof/heap
18
19
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
20
top: open top: no such file or directory
21
Fetched 1 source profiles out of 2
22
Saved profile in /home/cory/pprof/pprof.exec-go--planbuggy-ba70706f7cd1.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gz
23
File: exec-go--planbuggy-ba70706f7cd1
24
Type: inuse_space
25
Time: May 5, 2020 at 1:05am (PDT)
26
Entering interactive mode (type "help" for commands, "o" for options)
27
(pprof) Showing nodes accounting for 544.67kB, 100% of 544.67kB total
28
flat flat% sum% cum cum%
29
544.67kB 100% 100% 544.67kB 100% net.open
30
0 0% 100% 544.67kB 100% internal/singleflight.(*Group).doCall
31
0 0% 100% 544.67kB 100% net.(*Resolver).goLookupIPCNAMEOrder
32
0 0% 100% 544.67kB 100% net.(*Resolver).lookupIP
33
0 0% 100% 544.67kB 100% net.(*Resolver).lookupIPAddr.func1
34
0 0% 100% 544.67kB 100% net.glob..func1
35
0 0% 100% 544.67kB 100% net.goLookupIPFiles
36
0 0% 100% 544.67kB 100% net.lookupStaticHost
37
0 0% 100% 544.67kB 100% net.readHosts
Copied!
1
# local:docker runner has isolated networking for each container.
2
# An ephemeral port is assigned in the host network namespace which maps
3
# to port 6060 in the containers namespace.
4
5
# Find out the port assignment using `docker ps` and then profile the
6
# appropriate port. For example, in this case I have three plans running
7
# in different containers. To profile one of these, I can point pprof tool
8
# to a port in the host namespace.
9
10
$ docker ps -f 'name=tg-' --format "{{.Ports}}"
11
0.0.0.0:33279->6060/tcp
12
0.0.0.0:33278->6060/tcp
13
0.0.0.0:33280->6060/tcp
14
15
16
$ echo top | go tool pprof http://localhost:33279/debug/pprof/heap
17
18
Fetching profile over HTTP from http://localhost:33279/debug/pprof/heap
19
Saved profile in /home/cory/pprof/pprof.testplan.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
20
File: testplan
21
Type: inuse_space
22
Time: May 5, 2020 at 1:15am (PDT)
23
Entering interactive mode (type "help" for commands, "o" for options)
24
(pprof) Showing nodes accounting for 1026.44kB, 100% of 1026.44kB total
25
Showing top 10 nodes out of 11
26
flat flat% sum% cum cum%
27
514kB 50.08% 50.08% 514kB 50.08% bufio.NewWriterSize (inline)
28
512.44kB 49.92% 100% 512.44kB 49.92% runtime.allocm
29
0 0% 100% 514kB 50.08% net/http.(*conn).serve
30
0 0% 100% 514kB 50.08% net/http.newBufioWriterSize
31
0 0% 100% 512.44kB 49.92% runtime.handoffp
32
0 0% 100% 512.44kB 49.92% runtime.mcall
33
0 0% 100% 512.44kB 49.92% runtime.newm
34
0 0% 100% 512.44kB 49.92% runtime.park_m
35
0 0% 100% 512.44kB 49.92% runtime.schedule
36
0 0% 100% 512.44kB 49.92% runtime.startm
Copied!
1
# When using the Kubernetes cluster:k8s runner, each container runs in a
2
# separate pod. Access the profiling port throught the Kubernetes API using
3
# a port forward or kubectl proxy. The following example sets up a port
4
# forward to a sidecar pod and then runs a profiler against it.
5
6
$ kubectl port-forward testground-sidecar 6060:6060
7
8
$ echo top | go tool pprof http://localhost:6060/debug/pprof/heap
9
10
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
11
Saved profile in /home/cory/pprof/pprof.testground.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
12
File: testground
13
Type: inuse_space
14
Time: May 5, 2020 at 1:41am (PDT)
15
Entering interactive mode (type "help" for commands, "o" for options)
16
(pprof) Showing nodes accounting for 3112.91kB, 100% of 3112.91kB total
17
Showing top 10 nodes out of 21
18
flat flat% sum% cum cum%
19
1025.94kB 32.96% 32.96% 1025.94kB 32.96% regexp/syntax.(*compiler).inst (inline)
20
548.84kB 17.63% 50.59% 548.84kB 17.63% github.com/markbates/pkger/internal/takeon/github.com/markbates/hepa/filters.glob..func3
21
514kB 16.51% 67.10% 514kB 16.51% regexp.mergeRuneSets.func2 (inline)
22
512.08kB 16.45% 83.55% 2052.02kB 65.92% regexp.compile
23
512.05kB 16.45% 100% 512.05kB 16.45% runtime.systemstack
24
0 0% 100% 512.62kB 16.47% github.com/dustin/go-humanize.init.0
25
0 0% 100% 1026.08kB 32.96% github.com/go-playground/validator/v10.init
26
0 0% 100% 548.84kB 17.63% github.com/markbates/pkger/internal/takeon/github.com/markbates/hepa/filters.init
27
0 0% 100% 513.31kB 16.49% k8s.io/apimachinery/pkg/util/naming.init
28
0 0% 100% 2052.02kB 65.92% regexp.Compile (inline)
Copied!

Catching panics

When using the Go SDK, the library will handle panics automatically if they are thrown from the main test case routine. However, it is possible to handle panics thrown in goroutines by deferring a call to run.HandlePanics() in the beginning of the routine.
main.go
1
package main
2
3
import (
4
"fmt"
5
"time"
6
7
"github.com/testground/sdk-go/run"
8
"github.com/testground/sdk-go/runtime"
9
)
10
11
func main() {
12
run.Invoke(test)
13
}
14
15
func test(runenv *runtime.RunEnv) error {
16
go func() {
17
// Will ensure that the panics are handled!
18
defer run.HandlePanics()
19
panic(fmt.Errorf("do not panic and code"))
20
}()
21
22
// Make sure there's enough time.
23
time.Sleep(1 * time.Minute)
24
return nil
25
}
Copied!