Occasionally the test framework would fail with a timeout due to a
virtual machine not phoning home in time. This seems to be happen
whenever qemu can't bind the VNC or SSH ports for a virtual machine.
This was fixed by taking the following actions:
1. Don't listen on VNC unless the `-use-vnc` flag is passed, this
removes the need to listen on VNC at all in most cases. The option to
use VNC is still left in for debugging virtual machines, but removing
this makes it easier to deal with (VNC uses this odd system of
"displays" that are mapped to ports above 5900, and qemu doesn't
offer a decent way to use a normal port number, so we just disable
VNC by default as a compromise).
2. Use a (hopefully) inactive port for SSH. In an ideal world I'd just
have the VM's SSH port be exposed via a Unix socket, however the QEMU
documentation doesn't really say if you can do this or not. While I
do more research, this stopgap will have to make do.
3. Strictly tie more VM resource lifetimes to the tests themselves.
Previously the disk image layers for virtual machines were only
cleaned up at the end of the test and existed in the parent
test-scoped temporary folder. This can make your tmpfs run out of
space, which is not ideal. This should minimize the use of temporary
storage as much as I know how to.
4. Strictly tie the qemu process lifetime to the lifetime of the test
using testing.T#Cleanup. Previously it used a defer statement to
clean up the qemu process, however if the tests timed out this defer
was not run. This left around an orphaned qemu process that had to be
killed manually. This change ensures that all qemu processes exit
when their relevant tests finish.
Signed-off-by: Christine Dodrill <xe@tailscale.com>