Commit Graph

92 Commits

Author SHA1 Message Date
Christine Dodrill 6b234323a0
tstest/integration/vms: fix flake when testing (#2145)
Occasionally the test framework would fail with a timeout due to a
virtual machine not phoning home in time. This seems to be happen
whenever qemu can't bind the VNC or SSH ports for a virtual machine.
This was fixed by taking the following actions:

1. Don't listen on VNC unless the `-use-vnc` flag is passed, this
   removes the need to listen on VNC at all in most cases. The option to
   use VNC is still left in for debugging virtual machines, but removing
   this makes it easier to deal with (VNC uses this odd system of
   "displays" that are mapped to ports above 5900, and qemu doesn't
   offer a decent way to use a normal port number, so we just disable
   VNC by default as a compromise).
2. Use a (hopefully) inactive port for SSH. In an ideal world I'd just
   have the VM's SSH port be exposed via a Unix socket, however the QEMU
   documentation doesn't really say if you can do this or not. While I
   do more research, this stopgap will have to make do.
3. Strictly tie more VM resource lifetimes to the tests themselves.
   Previously the disk image layers for virtual machines were only
   cleaned up at the end of the test and existed in the parent
   test-scoped temporary folder. This can make your tmpfs run out of
   space, which is not ideal. This should minimize the use of temporary
   storage as much as I know how to.
4. Strictly tie the qemu process lifetime to the lifetime of the test
   using testing.T#Cleanup. Previously it used a defer statement to
   clean up the qemu process, however if the tests timed out this defer
   was not run. This left around an orphaned qemu process that had to be
   killed manually. This change ensures that all qemu processes exit
   when their relevant tests finish.

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-25 14:45:12 -04:00
Brad Fitzpatrick 45e64f2e1a net/dns{,/resolver}: refactor DNS forwarder, send out of right link on macOS/iOS
Fixes #2224
Fixes tailscale/corp#2045

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-23 16:04:10 -07:00
Brad Fitzpatrick a0c632f6b5 tstest/integration: fix a race
Noticed on a CI failure.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-22 10:24:13 -07:00
David Crawshaw 297b3d6fa4 staticcheck.conf: turn off noisy lint errors
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-06-18 15:48:20 -07:00
Brad Fitzpatrick 2f4817fe20 tstest/integration: fix race flake
Fixes #2172

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-18 10:10:23 -07:00
Christine Dodrill e0f0d10672
tstest/integration/vms: log to t.Logf directly (#2147)
Previously we used t.Logf indirectly via package log. This worked, but
it was not ideal for our needs. It could cause the streams of output to
get crossed. This change uses a logger.FuncWriter every place log.Output
was previously used, which will more correctly write log information to
the right test output stream.

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-16 14:57:11 -04:00
Brad Fitzpatrick 082cc1b0a7 tstest/integration: reenable TestAddPingRequest
Failure understood now; see:
https://github.com/tailscale/tailscale/pull/2088#issuecomment-859896598

As of 333e9e75d4, PingRequest is
now safe for the server to send multiple times, without fear
of the client handling it multiple times.

Fixes #2079

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-15 12:41:08 -07:00
Denton Gentry c61d777705 tstest/integration: disable TestAddPingRequest
Failing often now, we don't want people to get used to
routinely ignoring test failures.

Can be re-enabled when
https://github.com/tailscale/tailscale/issues/2079
is resolved.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
2021-06-14 22:24:27 -07:00
Christine Dodrill 8b2b899989
tstest/integration: test Alpine Linux (#2098)
Alpine Linux[1] is a minimal Linux distribution built around musl libc.
It boots very quickly, requires very little ram and is as close as you
can get to an ideal citizen for testing Tailscale on musl. Alpine has a
Tailscale package already[2], but this patch also makes it easier for us
to provide an Alpine Linux package off of pkgs in the future.

Alpine only offers Tailscale on the rolling-release edge branch.

[1]: https://alpinelinux.org/
[2]: https://pkgs.alpinelinux.org/packages?name=tailscale&branch=edge

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-11 09:20:13 -04:00
Brad Fitzpatrick 0affcd4e12 tstest/integration: add some debugging for TestAddPingRequest flakes
This fails pretty reliably with a lot of output now showing what's
happening:

TS_DEBUG_MAP=1 go test --failfast -v -run=Ping -race -count=20 ./tstest/integration --verbose-tailscaled

I haven't dug into the details yet, though.

Updates #2079
2021-06-10 15:13:14 -07:00
Brad Fitzpatrick ee3df2f720 tstest/integration: rename ambiguous --verbose test flag
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-06-10 11:24:01 -07:00
Christine Dodrill b402e76185
.github/workflows: add integration test with a custom runner (#2044)
This runner is in my homelab while we muse about a better, more
permanent home for these tests.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-08 12:49:23 -04:00
Christine Dodrill 622dc7b093
tstest/integration/vms: download images from s3 (#2035)
This makes integration tests pull pristine VM images from Amazon S3 if
they don't exist on disk. If the S3 fetch fails, it will fall back to
grabbing the image from the public internet. The VM images on the public
internet are known to be updated without warning and thusly change their
SHA256 checksum. This is not ideal for a test that we want to be able to
fire and forget, then run reliably for a very long time.

This requires an AWS profile to be configured at the default path. The
S3 bucket is rigged so that the requester pays. The VM images are
currently about 6.9 gigabytes. Please keep this in mind when running
these tests on your machine.

Documentation was added to the integration test folder to aid others in
running these tests on their machine.

Some wording in the logs of the tests was altered.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-08 12:47:24 -04:00
Christine Dodrill 3f1405fa2a
tstest/integration/vms: bump images, fix caching bug (#2052)
Before this redownloaded the image every time. Now it only redownloads
it when it needs to.

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-08 10:15:59 -04:00
Simeng He 0141390365 tstest/integration/testcontrol: add Server.AddPingRequest
Signed-off-by: Simeng He <simeng@tailscale.com>
2021-06-07 13:40:35 -04:00
Christine Dodrill adaecd83c8
tstest/integration/vms: add DownloadImages test to download images (#2039)
The image downloads can take a significant amount of time for the tests.
This creates a new test that will download every distro image into the
local cache in parallel, optionally matching the distribution regex.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-04 15:30:58 -04:00
Christine Dodrill 607b7ab692
tstest/integration/vms: aggressively re-verify shasums (#2050)
I've run into a couple issues where the tests time out while a VM image
is being downloaded, making the cache poisoned for the next run. This
moves the hash checking into its own function and calls it much sooner
in the testing chain. If the hash check fails, the OS is redownloaded.

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-04 15:27:03 -04:00
Christine Dodrill 6ce77b8eca
tstest/integration/vms: log qemu output (#2047)
Most of the time qemu will output nothing when it is running. This is
expected behavior. However when qemu is unable to start due to some
problem, it prints that to either stdout or stderr. Previously this
output wasn't being captured. This patch captures that output to aid in
debugging qemu issues.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-04 14:44:04 -04:00
Brad Fitzpatrick 58cc2cc921 tstest/integration/testcontrol: add Server.nodeLocked 2021-06-04 08:19:23 -07:00
Christine Dodrill 0a655309c6
tstest/integration/vms: only build binaries once (#2042)
Previously this built the binaries for every distro. This is a bit
overkill given we are using static binaries. This patch makes us only
build once.

There was also a weird issue with how processes were being managed.
Previously we just killed qemu with Process.Kill(), however that was
leaving behind zombies. This has been mended to not only kill qemu but
also waitpid() the process so it doesn't become a zombie.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-03 10:58:35 -04:00
Christine Dodrill a282819026
tstest/integration/vms: fix OpenSUSE Leap 15.1 (#2038)
The OpenSUSE 15.1 image we are using (and conseqentially the only one
that is really available easily given it is EOL) has cloud-init
hardcoded to use the OpenStack metadata thingy. Other OpenSUSE Leap
images function fine with the NoCloud backend, but this one seems to
just not work with it. No bother, we can just pretend to be OpenStack.

Thanks to Okami for giving me an example OpenStack configuration seed
image.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-03 09:29:07 -04:00
Christine Dodrill 4da5e79c39
tstest/integration/vms: test on Arch Linux (#2040)
Arch is a bit of a weirder distro, however as a side effect it is much
more of a systemd purist experience. Adding it to our test suite will
make sure that we are working in the systemd happy path.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-03 09:09:18 -04:00
Christine Dodrill ca96357d4b
tstest/integration/vms: add OpenSUSE Leap 15.3 (#2026)
This distro is about to be released. OpenSUSE has historically had the
least coverage for functional testing, so this may prove useful in the
future.

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-06-01 11:08:45 -04:00
Christine Dodrill 2802a01b81
tstest/integration/vms: test vms as they are ready (#2022)
Instead of testing all the VMs at once when they are all ready, this
patch changes the testing logic so that the vms are tested as soon as
they register with testcontrol. Also limit the amount of VM ram used at
once with the `-ram-limit` flag. That uses a semaphore to guard resource
use.

Also document CentOS' sins.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-05-31 17:04:49 -04:00
Christine Dodrill 36cb69002a
tstest/integration/vms: regex-match distros using a flag (#2021)
If you set `-distro-regex` to match a subset of distros, only those
distros will be tested. Ex:

    $ go test -run-vm-tests -distro-regex='opensuse'

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-05-31 13:23:38 -04:00
Christine Dodrill e1b994f7ed
tstest/integration/vms: maintain distro info (#2020)
This lets us see the names of distros in our tests.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-05-31 13:14:30 -04:00
Brad Fitzpatrick fa548c5b96
tstest/integration/vms: fix bindhost lookup (#2012)
Don't try to do heuristics on the name. Use the net/interfaces package
which we already have to do this sort of stuff.

Fixes #2011

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-31 12:00:50 -04:00
Christine Dodrill 14c1113d2b
tstest/integration/vms: copy locally built binaries (#2006)
Instead of pulling packages from pkgs.tailscale.com, we should use the
tailscale binaries that are local to this git commit. This exposes a bit
of the integration testing stack in order to copy the binaries
correctly.

This commit also bumps our version of github.com/pkg/sftp to the latest
commit.

If you run into trouble with yaml, be sure to check out the
commented-out alpine linux image complete with instructions on how to
use it.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-05-31 11:35:01 -04:00
Brad Fitzpatrick f21982f854 tstest/integration/vms: skip a test for now
Updates #2011

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-28 20:31:36 -07:00
Christine Dodrill 4cfaf489ac
tstest/integration/vms: t.Log for VM output (#2007)
Previously we spewed a lot of output to stdout and stderr, even when
`-v` wasn't set. This is sub-optimal for various reasons. This patch
shunts that output to test logs so it only shows up when `-v` is set.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-05-28 14:19:44 -04:00
Christine Dodrill 1f72b6f812
tstest/integration/vms: use dynamically discovered bindhost (#1992)
Instead of relying on a libvirtd bridge address that you probably won't
have on your system.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-05-28 08:05:17 -04:00
Christine Dodrill 35749ec297
tstest/integration/vms: small cleanups (#1989)
Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-05-27 14:29:29 -04:00
Christine Dodrill ba59c0391b
tstest/integration: add experimental integration test (#1966)
This will spin up a few vms and then try and make them connect to a
testcontrol server.

Updates #1988

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-05-26 14:10:10 -04:00
simenghe dd0b690e7b
Added new Addresses / AllowedIPs fields to testcontrol when creating tailcfg.Node (#1948)
* Added new Addresses / AllowedIPs fields to testcontrol when creating new &tailcfg.Node

Signed-off-by: Simeng He <simeng@tailscale.com>

* Added single node test to check Addresses and AllowedIPs

Signed-off-by: Simeng He <simeng@tailscale.com>

Co-authored-by: Simeng He <simeng@tailscale.com>
2021-05-18 16:20:29 -04:00
Josh Bleecher Snyder 25df067dd0 all: adapt to opaque netaddr types
This commit is a mishmash of automated edits using gofmt:

gofmt -r 'netaddr.IPPort{IP: a, Port: b} -> netaddr.IPPortFrom(a, b)' -w .
gofmt -r 'netaddr.IPPrefix{IP: a, Port: b} -> netaddr.IPPrefixFrom(a, b)' -w .

gofmt -r 'a.IP.Is4 -> a.IP().Is4' -w .
gofmt -r 'a.IP.As16 -> a.IP().As16' -w .
gofmt -r 'a.IP.Is6 -> a.IP().Is6' -w .
gofmt -r 'a.IP.As4 -> a.IP().As4' -w .
gofmt -r 'a.IP.String -> a.IP().String' -w .

And regexps:

\w*(.*)\.Port = (.*)  ->  $1 = $1.WithPort($2)
\w*(.*)\.IP = (.*)  ->  $1 = $1.WithIP($2)

And lots of manual fixups.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-16 14:52:00 -07:00
Brad Fitzpatrick 5a7c6f1678 tstest/integration{,/testcontrol}: add node update support, two node test
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-12 14:43:43 -07:00
Brad Fitzpatrick d32667011d tstest/integration: build test binaries with -race if test itself is
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-12 13:13:08 -07:00
Brad Fitzpatrick ed9d825552 tstest/integration: fix integration test on linux/386
Apparently can't use GOBIN with GOARCH.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-12 11:56:00 -07:00
Brad Fitzpatrick c0158bcd0b tstest/integration{,/testcontrol}: add testcontrol.RequireAuth mode, new test
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-12 11:37:27 -07:00
Josh Bleecher Snyder 78d4c561b5 types/logger: add key grinder stats lines to rate-limiting exemption list
Updates #1749

Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-05-05 08:25:15 -07:00
Brad Fitzpatrick 68fb51b833 tstest/integration: misc cleanups
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-05-03 14:22:18 -07:00
Brad Fitzpatrick 3237e140c4 tstest/integration: add testNode.AwaitListening, DERP+STUN, improve proxy trap
Updates #1840
2021-05-03 12:14:20 -07:00
Brad Fitzpatrick 98d7c28faa tstest/integration: start factoring test types out to clean things up
To enable easy multi-node testing (including inter-node traffic) later.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-30 20:27:05 -07:00
Brad Fitzpatrick 5e9e11a77d tstest/integration/testcontrol: add start of test control server
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 22:51:22 -07:00
Brad Fitzpatrick a07a504b16 tstest/integration: use go binary from runtime.GOROOT
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 17:04:29 -07:00
Brad Fitzpatrick f342d10dc5 tstest/integration: set an HTTP_PROXY to catch bogus requests
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 16:00:02 -07:00
Brad Fitzpatrick 80429b97e5 testing: add start of an integration test
Only minimal tailscale + tailscaled for now.

And a super minimal in-memory logcatcher.

No control ... yet.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 15:32:27 -07:00
Brad Fitzpatrick 08782b92f7 tstest: add WaitFor helper
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 14:43:46 -07:00
Josh Bleecher Snyder d31eff8473 tstest/natlab: use net.ErrClosed
We are now on 1.16.
And wgconn.NetErrClosed has been removed upstream.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-03-24 09:46:36 -07:00
Filippo Valsorda 39f7a61e9c tstest/staticcheck: import the main package to fix "go mod tidy"
Importing the non-main package was missing some dependencies that
"go mod tidy" would then cleanup. Also added a non-ignore build tag to
avoid other tools getting upset about importing a main package.

Signed-off-by: Filippo Valsorda <hi@filippo.io>
2021-02-20 09:53:47 -08:00