Commit Graph

2976 Commits

Author SHA1 Message Date
David Anderson efe8020dfa wgengine/magicsock: fix race condition in tests.
AFAICT this was always present, the log read mid-execution was never safe.
But it seems like the recent magicsock refactoring made the race much
more likely.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-05 17:42:33 -07:00
Evan Anderson 000f90d4d7 wgengine/wglog: Fix docstring on wireguardGoString to match args
@danderson linked this on Twitter and I noticed the mismatch.

Signed-off-by: Evan Anderson <evan.k.anderson@gmail.com>
2021-09-05 15:52:16 -07:00
David Anderson 69c897a763 net/dnsfallback: run go generate to pick up new derp9s.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-05 00:00:16 -07:00
David Anderson bb6fdfb243 net/dns: fix the build on freebsd (missing default case in switch)
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-05 00:00:16 -07:00
David Anderson b3b1c06b3a net/dns: only restart systemd-resolved if we changed /etc/resolv.conf.
Reported on IRC: in an edge case, you can end up with a directManager DNS
manager and --accept-dns=false, in which case we should do nothing, but
actually end up restarting resolved whenever the netmap changes, even though
the user told us to not manage DNS.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-05 00:00:16 -07:00
David Anderson 10547d989d net/dns: exhaustively test DNS selection paths for linux.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-04 23:40:48 -07:00
David Anderson c071bcda33 net/dns: relax systemd-resolved detection.
Reported on IRC: a resolv.conf that contained two entries for
"nameserver 127.0.0.53", which defeated our "is resolved actually
in charge" check. Relax that check to allow any number of nameservers,
as long as they're all 127.0.0.53.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-04 22:32:28 -07:00
Dave Anderson 980acc38ba
types/key: add a special key with custom serialization for control private keys (#2792)
* Revert "Revert "types/key: add MachinePrivate and MachinePublic.""

This reverts commit 61c3b98a24.

Signed-off-by: David Anderson <danderson@tailscale.com>

* types/key: add ControlPrivate, with custom serialization.

ControlPrivate is just a MachinePrivate that serializes differently
in JSON, to be compatible with how the Tailscale control plane
historically serialized its private key.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-03 13:17:46 -07:00
David Anderson 61c3b98a24 Revert "types/key: add MachinePrivate and MachinePublic."
Broke the tailscale control plane due to surprise different serialization.

This reverts commit 4fdb88efe1.
2021-09-03 11:34:34 -07:00
David Anderson 4fdb88efe1 types/key: add MachinePrivate and MachinePublic.
Plumb throughout the codebase as a replacement for the mixed use of
tailcfg.MachineKey and wgkey.Private/Public.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-03 10:07:15 -07:00
David Anderson 4ce091cbd8 version: use `go` from the current toolchain to compile in tests.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-02 20:11:20 -07:00
Brad Fitzpatrick d1cb7a2639 metrics: use SYS_OPENAT
New systems like arm64 don't even have SYS_OPEN.
2021-09-02 15:28:19 -07:00
David Anderson 159d88aae7 go.mod: tidy.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-02 14:26:27 -07:00
David Anderson b96159e820 go.mod: update github.com/ulikunitz/xz for https://github.com/advisories/GHSA-25xm-hr59-7c27
Our code is not vulnerable to the issue in question: it only happens in the decompression
path for untrusted inputs, and we only use xz as part of mkpkg, which is write-only
and operates on trusted build system outputs to construct deb and rpm packages.

Still, it's nice to keep the dependabot dashboard clean.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-02 14:02:57 -07:00
Brad Fitzpatrick 99a1c74a6a metrics: optimize CurrentFDs to not allocate on Linux
It was 50% of our allocs on one of our servers. (!!)

Updates #2784

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-02 13:28:39 -07:00
Brad Fitzpatrick db3586cd43 go.mod: upgrade staticcheck
It was crashing on a PR of mine and this fixes it.
2021-09-02 13:13:42 -07:00
Brad Fitzpatrick ec2b7c7da6 all: bump minimum Go to 1.17
In prep for using 1.17 features.

Note the go.mod changes are due to:
https://golang.org/doc/go1.17#go-command

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-02 12:51:11 -07:00
Brad Fitzpatrick fc160f80ee metrics: move currentFDs code to the metrics package
Updates #2784

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-02 11:14:14 -07:00
Brad Fitzpatrick 5d800152d9 cmd/derper: increase port 80's WriteTimeout to permit longer CPU profiles
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-02 10:49:14 -07:00
Chuangbo Li e4e4d336d9
cmd/derper: listen on host of flag server addr for port 80 and 3478 (#2768)
cmd/derper: listen on host of flag server addr for port 80 and 3478

When using custom derp on the server with multiple IP addresses,
we would like to bind derp 80, 443 and stun 3478 to a certain IP.

derp command provides flag `-a` to customize which address to bind
for port 443. But port :80 and :3478 were hard-coded.

Fixes #2767

Signed-off-by: Li Chuangbo <im@chuangbo.li>
2021-09-02 10:42:27 -07:00
David Crawshaw 4e18cca62e testcontrol: replace panic with error
I have seen this once in the VM test (caused by an EOF, I believe on
shutdown) that didn't need to cause the test to fail.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-09-02 10:39:32 -07:00
Brad Fitzpatrick 5bacbf3744 wgengine/magicsock, health, ipn/ipnstate: track DERP-advertised health
And add health check errors to ipnstate.Status (tailscale status --json).

Updates #2746
Updates #2775

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-02 10:20:25 -07:00
Brad Fitzpatrick 722942dd46 tsweb: restore CPU profiling handler
It was accidentally deleted in the earlier 0022c3d2e (#2143) refactor.
Lock it in with a test.

Fixes tailscale/corp#2503

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-02 09:24:29 -07:00
David Anderson daf54d1253 control/controlclient: remove TS_DEBUG_USE_DISCO=only.
It was useful early in development when disco clients were the
exception and tailscale logs were noisier than today, but now
non-disco is the exception.

Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 18:11:32 -07:00
David Anderson 39748e9562 net/dns/resolver: authoritatively return NXDOMAIN for reverse zones we own.
Fixes #2774

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 18:11:32 -07:00
David Anderson 954064bdfe wgengine/wgcfg/nmcfg: don't configure peers who can't DERP or disco.
Fixes #2770

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 18:11:32 -07:00
David Anderson f90ac11bd8 wgengine: remove unnecessary magicConnStarted channel.
Having removed magicconn.Start, there's no need to synchronize startup
of other things to it any more.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 18:11:32 -07:00
David Anderson bb10443edf wgengine/wgcfg: use just the hexlified node key as the WireGuard endpoint.
The node key is all magicsock needs to find the endpoint that WireGuard
needs.

Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
David Anderson d00341360f wgengine/magicsock: remove unused debug knob.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
David Anderson dfd978f0f2 wgengine/magicsock: use NodeKey, not DiscoKey, as the trigger for lazy reconfig.
Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
David Anderson 4c27e2fa22 wgengine/magicsock: remove Start method from Conn.
Over time, other magicsock refactors have made Start effectively a
no-op, except that some other functions choose to panic if called
before Start.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
David Anderson 1a899344bd wgengine/magicsock: don't store tailcfg.Nodes alongside endpoints.
Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
David Anderson b2181608b5 wgengine/magicsock: eagerly create endpoints in SetNetworkMap.
Updates #2752

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-09-01 15:13:21 -07:00
Maisem Ali 0842e2f45b ipn/store: add ability to store data as k8s secrets.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
2021-09-01 12:50:59 -07:00
David Crawshaw f53792026e tstest/integration/vms: move build tags from linux to !windows
The tests build fine on other Unix's, they just can't run there.
But there is already a t.Skip by default, so `go test` ends up
working fine elsewhere and checks the code compiles.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-09-01 11:38:18 -07:00
Brad Fitzpatrick 7f29dcaac1 cmd/tailscale/cli: make up block until state Running, not just Starting
At "Starting", the DERP connection isn't yet up. After the first netmap
and DERP connect, then it transitions into "Running".

Fixes #2708

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-01 08:25:42 -07:00
Brad Fitzpatrick fb8b821710 tsnet: fix typo in comment
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-01 07:55:25 -07:00
Brad Fitzpatrick 7a7aa8f2b0 cmd/derper: also add port 80 timeouts
Didn't notice this one in earlier 00b3c1c042

Updates tailscale/corp#2486

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 21:18:36 -07:00
Brad Fitzpatrick 3c8ca4b357 client/tailscale, cmd/tailscale/cli: move version mismatch check to CLI
So people can use the package for whois checks etc without version
skew errors.

The earlier change faa891c1f2 for #1905
was a bit too aggressive.

Fixes #2757

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 15:27:25 -07:00
Brad Fitzpatrick 8744394cde version: bump date
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 15:27:25 -07:00
Brad Fitzpatrick 21cb0b361f safesocket: add connect retry loop to wait for tailscaled
Updates #2708

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 15:13:42 -07:00
Brad Fitzpatrick a59b389a6a derp: add new health update and server restarting frame types
Updates #2756
Updates #2746

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 13:31:51 -07:00
Christine Dodrill 0b9e938152 tstest/integration/vms: test DNS configuration
This uses a neat little tool to dump the output of DNS queries to
standard out. This is the first end-to-end test of DNS that runs against
actual linux systems. The /etc/resolv.conf test may look superflous,
however this will help for correlating system state if one of the DNS
tests fails.

Signed-off-by: Christine Dodrill <xe@tailscale.com>
2021-08-31 12:31:54 -07:00
Brad Fitzpatrick 00b3c1c042 cmd/derper: add missing read/write timeouts
Updates tailscale/corp#2486

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 10:23:53 -07:00
David Crawshaw 9b7fc2ed1f .github: add Ubuntu VM test
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 08:50:55 -07:00
Brad Fitzpatrick 73280595a8 derp: accept dup clients without closing prior's connection
A public key should only have max one connection to a given
DERP node (or really: one connection to a node in a region).

But if people clone their machine keys (e.g. clone their VM, Raspbery
Pi SD card, etc), then we can get into a situation where a public key
is connected multiple times.

Originally, the DERP server handled this by just kicking out a prior
connections whenever a new one came. But this led to reconnect fights
where 2+ nodes were in hard loops trying to reconnect and kicking out
their peer.

Then a909d37a59 tried to add rate
limiting to how often that dup-kicking can happen, but empirically it
just doesn't work and ~leaks a bunch of goroutines and TCP
connections, tying them up for hour+ while more and more accumulate
and waste memory. Mostly because we were doing a time.Sleep forever
while not reading from their TCP connections.

Instead, just accept multiple connections per public key but track
which is the most recent. And if two both are writing back & forth,
then optionally disable them both. That last part is only enabled in
tests for now. The current default policy is just last-sender-wins
while we gather the next round of stats.

Updates #2751

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-08-31 08:21:21 -07:00
David Crawshaw debaaebf3b tstest/integration/vms: turn on logcatcher logging by default
Absolutely vital to debugging failures.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 06:40:28 -07:00
David Crawshaw a1f1020042 tstest/integration/vms: avoid log after test completion
Avoids a panic in the Go testing package if a late log comes in.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 06:40:28 -07:00
David Crawshaw 583af7c1a6 tstest/integration/vms: give guest multiple cores and use generic machine
Speeds up tests.
Allows the use of more version of qemu.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 06:40:28 -07:00
David Crawshaw 8668103f06 tstest/integration/vms: print qemu console output, fix printing issues
Fix a few test printing issues when tests fail.

Qemu console output is super useful when something is wrong in the
harness and we cannot even bring up the tests.
Also useful for figuring out where all the time goes in tests.

A little noisy, but not too noisy as long as you're only running one VM
as part of the tests, which is my plan.

Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
2021-08-31 06:40:28 -07:00