Commit Graph

2955 Commits

Author SHA1 Message Date
Brad Fitzpatrick 5bacbf3744 wgengine/magicsock, health, ipn/ipnstate: track DERP-advertised health
And add health check errors to ipnstate.Status (tailscale status --json).

Updates #2746
Updates #2775

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-02 10:20:25 -07:00
Brad Fitzpatrick 722942dd46 tsweb: restore CPU profiling handler
It was accidentally deleted in the earlier 0022c3d2e (#2143) refactor.
Lock it in with a test.

Fixes tailscale/corp#2503

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-02 09:24:29 -07:00
David Anderson daf54d1253 control/controlclient: remove TS_DEBUG_USE_DISCO=only.
It was useful early in development when disco clients were the
exception and tailscale logs were noisier than today, but now
non-disco is the exception.

Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 18:11:32 -07:00
David Anderson 39748e9562 net/dns/resolver: authoritatively return NXDOMAIN for reverse zones we own.
Fixes #2774

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 18:11:32 -07:00
David Anderson 954064bdfe wgengine/wgcfg/nmcfg: don't configure peers who can't DERP or disco.
Fixes #2770

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 18:11:32 -07:00
David Anderson f90ac11bd8 wgengine: remove unnecessary magicConnStarted channel.
Having removed magicconn.Start, there's no need to synchronize startup
of other things to it any more.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 18:11:32 -07:00
David Anderson bb10443edf wgengine/wgcfg: use just the hexlified node key as the WireGuard endpoint.
The node key is all magicsock needs to find the endpoint that WireGuard
needs.

Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
David Anderson d00341360f wgengine/magicsock: remove unused debug knob.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
David Anderson dfd978f0f2 wgengine/magicsock: use NodeKey, not DiscoKey, as the trigger for lazy reconfig.
Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
David Anderson 4c27e2fa22 wgengine/magicsock: remove Start method from Conn.
Over time, other magicsock refactors have made Start effectively a
no-op, except that some other functions choose to panic if called
before Start.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
David Anderson 1a899344bd wgengine/magicsock: don't store tailcfg.Nodes alongside endpoints.
Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
David Anderson b2181608b5 wgengine/magicsock: eagerly create endpoints in SetNetworkMap.
Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
Maisem Ali 0842e2f45b ipn/store: add ability to store data as k8s secrets.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
2021-09-01 12:50:59 -07:00
David Crawshaw f53792026e tstest/integration/vms: move build tags from linux to !windows
The tests build fine on other Unix's, they just can't run there.
But there is already a t.Skip by default, so `go test` ends up
working fine elsewhere and checks the code compiles.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-09-01 11:38:18 -07:00
Brad Fitzpatrick 7f29dcaac1 cmd/tailscale/cli: make up block until state Running, not just Starting
At "Starting", the DERP connection isn't yet up. After the first netmap
and DERP connect, then it transitions into "Running".

Fixes #2708

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-01 08:25:42 -07:00
Brad Fitzpatrick fb8b821710 tsnet: fix typo in comment
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-01 07:55:25 -07:00
Brad Fitzpatrick 7a7aa8f2b0 cmd/derper: also add port 80 timeouts
Didn't notice this one in earlier 00b3c1c042

Updates tailscale/corp#2486

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 21:18:36 -07:00
Brad Fitzpatrick 3c8ca4b357 client/tailscale, cmd/tailscale/cli: move version mismatch check to CLI
So people can use the package for whois checks etc without version
skew errors.

The earlier change faa891c1f2 for #1905
was a bit too aggressive.

Fixes #2757

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 15:27:25 -07:00
Brad Fitzpatrick 8744394cde version: bump date
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 15:27:25 -07:00
Brad Fitzpatrick 21cb0b361f safesocket: add connect retry loop to wait for tailscaled
Updates #2708

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 15:13:42 -07:00
Brad Fitzpatrick a59b389a6a derp: add new health update and server restarting frame types
Updates #2756
Updates #2746

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 13:31:51 -07:00
Christine Dodrill 0b9e938152 tstest/integration/vms: test DNS configuration
This uses a neat little tool to dump the output of DNS queries to
standard out. This is the first end-to-end test of DNS that runs against
actual linux systems. The /etc/resolv.conf test may look superflous,
however this will help for correlating system state if one of the DNS
tests fails.

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-08-31 12:31:54 -07:00
Brad Fitzpatrick 00b3c1c042 cmd/derper: add missing read/write timeouts
Updates tailscale/corp#2486

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 10:23:53 -07:00
David Crawshaw 9b7fc2ed1f .github: add Ubuntu VM test
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 08:50:55 -07:00
Brad Fitzpatrick 73280595a8 derp: accept dup clients without closing prior's connection
A public key should only have max one connection to a given
DERP node (or really: one connection to a node in a region).

But if people clone their machine keys (e.g. clone their VM, Raspbery
Pi SD card, etc), then we can get into a situation where a public key
is connected multiple times.

Originally, the DERP server handled this by just kicking out a prior
connections whenever a new one came. But this led to reconnect fights
where 2+ nodes were in hard loops trying to reconnect and kicking out
their peer.

Then a909d37a59 tried to add rate
limiting to how often that dup-kicking can happen, but empirically it
just doesn't work and ~leaks a bunch of goroutines and TCP
connections, tying them up for hour+ while more and more accumulate
and waste memory. Mostly because we were doing a time.Sleep forever
while not reading from their TCP connections.

Instead, just accept multiple connections per public key but track
which is the most recent. And if two both are writing back & forth,
then optionally disable them both. That last part is only enabled in
tests for now. The current default policy is just last-sender-wins
while we gather the next round of stats.

Updates #2751

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 08:21:21 -07:00
David Crawshaw debaaebf3b tstest/integration/vms: turn on logcatcher logging by default
Absolutely vital to debugging failures.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 06:40:28 -07:00
David Crawshaw a1f1020042 tstest/integration/vms: avoid log after test completion
Avoids a panic in the Go testing package if a late log comes in.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 06:40:28 -07:00
David Crawshaw 583af7c1a6 tstest/integration/vms: give guest multiple cores and use generic machine
Speeds up tests.
Allows the use of more version of qemu.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 06:40:28 -07:00
David Crawshaw 8668103f06 tstest/integration/vms: print qemu console output, fix printing issues
Fix a few test printing issues when tests fail.

Qemu console output is super useful when something is wrong in the
harness and we cannot even bring up the tests.
Also useful for figuring out where all the time goes in tests.

A little noisy, but not too noisy as long as you're only running one VM
as part of the tests, which is my plan.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 06:40:28 -07:00
David Crawshaw 1a9fba5b04 tstest/integration/vms: fix ubuntu URLs
Also remove extra distros for now.
We can bring them back later if useful.
Though our most important distros are these two Ubuntu, debian stable,
and Raspbian (not currently supported).
And before doing more Linux, we should do Windows.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 06:40:28 -07:00
Emmanuel T Odeke 0daa32943e all: add (*testing.B).ReportAllocs() to every benchmark
This ensures that we can properly track and catch allocation
slippages that could otherwise have been missed.

Fixes #2748
2021-08-30 21:41:04 -07:00
David Anderson 44d71d1e42 wgengine/magicsock: fix race in test shutdown, again.
We were returning an error almost, but not quite like errConnClosed in
a single codepath, which could still trip the panic on reconfig in the
test logic.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 21:26:38 -07:00
David Anderson f09ede9243 wgengine/magicsock: don't configure eager WireGuard handshaking in tests.
Our prod code doesn't eagerly handshake, because our disco layer enables
on-demand handshaking. Configuring both peers to eagerly handshake leads
to WireGuard handshake races that make TestTwoDevicePing flaky.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 17:28:12 -07:00
David Anderson 86d1c4eceb wgengine/magicsock: ignore close races even harder.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 17:09:45 -07:00
David Anderson 8bacfe6a37 wgengine/magicsock: remove unused sendLogLimit limiter.
Magicsock these days gets its logs limited by the global log limiter.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 17:09:45 -07:00
David Anderson e151b74f93 wgengine/magicsock: remove opts.SimulatedNetwork.
It only existed to override one test-only behavior with a
different test-only behavior, in both cases working around
an annoying feature of our CI environments. Instead, handle
that weirdness entirely in the test code, with a tweaked
TestOnlyPacketListener that gets injected.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 17:09:45 -07:00
David Anderson 58c1f7d51a wgengine/magicsock: rename opts.PacketListener to TestOnlyPacketListener.
The docstring said it was meant for use in tests, but it's specifically a
special codepath that is _only_ used in tests, so make the claim stronger.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 17:09:45 -07:00
David Anderson 8049063d35 wgengine/magicsock: rename discoEndpoint to just endpoint.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 17:09:45 -07:00
David Anderson f2d949e2db wgengine/magicsock: fold findEndpoint into its only remaining caller.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 17:09:45 -07:00
David Anderson fe2f89deab wgengine/magicsock: fix rare shutdown race in test.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 14:33:07 -07:00
David Anderson 97693f2e42 wgengine/magicsock: delete legacy AddrSet endpoints.
Instead of using the legacy codepath, teach discoEndpoint to handle
peers that have a home DERP, but no disco key. We can still communicate
with them, but only over DERP.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 14:33:07 -07:00
David Anderson 61c62f48d9 wgengine/bench: disable unused benchmark that relies on legacy magicsock.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 14:33:07 -07:00
David Anderson 54bc3b7d97 util/deephash: remove soon to be deleted field from wgcfg.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 14:33:07 -07:00
David Anderson 923c98cd8f types/wgkey: add TODO for a future API change.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-08-30 14:33:07 -07:00
Brad Fitzpatrick 065c4ffc2c net/dns: add start of Linux newOSConfigurator tests
Only one test case so far.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-30 14:16:12 -07:00
Brad Fitzpatrick 09a47ea3f1 net/dns: prep for writing manager_linux tests; pull some stuff out
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-30 13:49:02 -07:00
Joe Tsai 3f1317e3e5
util/deephash: fix TestArrayAllocs
Unfortunately this test fails on certain architectures.
The problem comes down to inconsistencies in the Go escape analysis
where specific variables are marked as escaping on certain architectures.
The variables escaping to the heap are unfortunately in crypto/sha256,
which makes it impossible to fixthis locally in deephash.

For now, fix the test by compensating for the allocations that
occur from calling sha256.digest.Sum.

See golang/go#48055

Fixes #2727

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2021-08-30 10:47:21 -07:00
Joe Tsai 30458c71c8
tstime/rate: deflake TestLongRunningQPS
This test is highly dependent on the accuracy of OS timers.
Reduce the number of failures by decreasing the required
accuracy from 0.999 to 0.995.
Also, switch from repeated time.Sleep to using a time.Ticker
for improved accuracy.

Updates #2727

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2021-08-30 10:46:41 -07:00
David Crawshaw bb47feca44 tstest/integration: prefix logs with logid
The VM test has two tailscaled instances running and interleaves the
logs. Without a prefix it is impossible to figure out what is going on.

It might be even better to include the [ABCD] node prefix here as well.
Unfortunately lots of interesting logs happen before tailscaled has a
node key, so it wouldn't be a replacement for a short ID.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-30 10:20:32 -07:00
Maisem Ali fd4838dc57 wgengine/userspace: add support to automatically enable/disable the tailscale
protocol in BIRD, when the node is a primary subnet router as determined
by control.

Signed-off-by: Maisem Ali <maisem@tailscale.com>
2021-08-30 10:18:05 -07:00