Commit Graph

2691 Commits

Author SHA1 Message Date
Brad Fitzpatrick 5e9e11a77d tstest/integration/testcontrol: add start of test control server
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 22:51:22 -07:00
Avery Pennarun 19c3e6cc9e types/logger: rate limited: more hysteresis, better messages.
- Switch to our own simpler token bucket, since x/time/rate is missing
  necessary stuff (can't provide your own time func; can't check the
  current bucket contents) and it's overkill anyway.

- Add tests that actually include advancing time.

- Don't remove the rate limit on a message until there's enough room to
  print at least two more of them. When we do, we'll also print how
  many we dropped, as a contextual reminder that some were previously
  lost. (This is more like how the Linux kernel does it.)

- Reformat the [RATE LIMITED] messages to be shorter, and to not
  corrupt original message. Instead, we print the message, then print
  its format string.

- Use %q instead of \"%s\", for more accurate parsing later, if the
  format string contained quotes.

Fixes #1772

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-30 01:01:15 -04:00
Josh Bleecher Snyder 20e04418ff net/dns: add GOOS build tags
Fixes #1786

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-29 21:34:55 -07:00
Avery Pennarun b7e31ab1a4 ipn: mock controlclient.Client; big ipn.Backend state machine test.
A very long unit test that verifies the way the controlclient and
ipn.Backend interact.

This is a giant sequential test of the state machine. The test passes,
but only because it's asserting all the wrong behaviour. I marked all
the behaviour I think is wrong with BUG comments, and several
additional test opportunities with TODO.

Note: the new test supercedes TestStartsInNeedsLoginState, which was
checking for incorrect behaviour (although the new test still checks
for the same incorrect behaviour) and assumed .Start() would converge
before returning, which it happens to do, but only for this very
specific case, for the current implementation. You're supposed to wait
for the notifications.

Updates: tailscale/corp#1660

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-30 00:09:35 -04:00
Avery Pennarun b4d04a065f controlclient: extract a Client interface and rename Client->Auto.
This will let us create a mock or fake Client implementation for use
with ipn.Backend.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-30 00:09:35 -04:00
Avery Pennarun cc3119e27e controlclient: extract State and Status stuff into its own file.
No changes other than moving stuff around.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-29 23:18:25 -04:00
Brad Fitzpatrick a07a504b16 tstest/integration: use go binary from runtime.GOROOT
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 17:04:29 -07:00
David Anderson bf5fc8edda go.mod: update wireguard-go.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-29 16:36:55 -07:00
David Anderson 1d7e7b49eb ipn/ipnlocal: be authoritative for the entire MagicDNS record tree.
With this change, shared node names resolve correctly on split DNS-supporting
operating systems.

Fixes tailscale/corp#1706

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-29 16:06:01 -07:00
Brad Fitzpatrick f342d10dc5 tstest/integration: set an HTTP_PROXY to catch bogus requests
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 16:00:02 -07:00
Brad Fitzpatrick 80429b97e5 testing: add start of an integration test
Only minimal tailscale + tailscaled for now.

And a super minimal in-memory logcatcher.

No control ... yet.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 15:32:27 -07:00
Brad Fitzpatrick 08782b92f7 tstest: add WaitFor helper
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 14:43:46 -07:00
Josh Bleecher Snyder 4037fc25c5 types/wgkey: use value receiver with MarshalJSON
Pointer receivers used with MarshalJSON are code rakes.

https://github.com/golang/go/issues/22967
https://github.com/dominikh/go-tools/issues/911

I just stepped on one, and it hurt. Turn it over.
While we're here, optimize the code a bit.

name           old time/op    new time/op    delta
MarshalJSON-8     184ns ± 0%      44ns ± 0%  -76.03%  (p=0.000 n=20+19)

name           old alloc/op   new alloc/op   delta
MarshalJSON-8      184B ± 0%       80B ± 0%  -56.52%  (p=0.000 n=20+20)

name           old allocs/op  new allocs/op  delta
MarshalJSON-8      4.00 ± 0%      1.00 ± 0%  -75.00%  (p=0.000 n=20+20)

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-29 14:14:34 -07:00
Josh Bleecher Snyder 7ee891f5fd all: delete wgcfg.Key and wgcfg.PrivateKey
For historical reasons, we ended up with two near-duplicate
copies of curve25519 key types, one in the wireguard-go module
(wgcfg) and one in the tailscale module (types/wgkey).
Then we moved wgcfg to the tailscale module.
We can now remove the wgcfg key type in favor of wgkey.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-29 14:14:34 -07:00
David Anderson bf9ef1ca27 net/dns: stop NetworkManager breaking v6 connectivity when setting DNS.
Tentative fix for #1699

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-29 12:25:47 -07:00
David Anderson 72b6d98298 net/interfaces: return all Tailscale addresses from Tailscale().
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-29 12:25:47 -07:00
Brad Fitzpatrick b7a497a30b ipn/ipnlocal: make FileTargets check IPN state first
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-29 10:26:53 -07:00
Denton Gentry b9f8dc7867 workflows: remove coverage
This workflow has been disabled for some time.
It can come back later, when appropriate.

Signed-off-by: Denton Gentry <dgentry@tailscale.com>
2021-04-28 17:04:30 -07:00
Brad Fitzpatrick 0c5c16327d version: add IsMacSysExt func
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 14:57:04 -07:00
Josh Bleecher Snyder ae36b57b71 go.mod: upgrade wireguard-go
This should be the last bump before 1.8.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:25:52 -07:00
Josh Bleecher Snyder 9d542e08e2 wgengine/magicsock: always run ReceiveIPv6
One of the consequences of the 	bind refactoring in 6f23087175
is that attempting to bind an IPv6 socket will always
result in c.pconn6.pconn being non-nil.
If the bind fails, it'll be set to a placeholder packet conn
that blocks forever.

As a result, we can always run ReceiveIPv6 and health check it.
This removes IPv4/IPv6 asymmetry and also will allow health checks
to detect any IPv6 receive func failures.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:07:14 -07:00
Josh Bleecher Snyder fe50ded95c health: track whether we have a functional udp4 bind
Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:07:14 -07:00
Josh Bleecher Snyder 7dc7078d96 wgengine/magicsock: use netaddr.IP in listenPacket
It must be an IP address; enforce that at the type level.

Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:07:14 -07:00
Brad Fitzpatrick 4bf6939ee0 ipn/ipnlocal: remove t.Parallel from recently added test
The test modifies a global; it shouldn't be parallel.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 11:02:56 -07:00
Josh Bleecher Snyder 3c543c103a wgengine/magicsock: unify initial bind and rebind
We had two separate code paths for the initial UDP listener bind
and any subsequent rebinds.

IPv6 got left out of the rebind code.
Rather than duplicate it there, unify the two code paths.
Then improve the resulting code:

* Rebind had nested listen attempts to try the user-specified port first,
  and then fall back to :0 if that failed. Convert that into a loop.
* Initial bind tried only the user-specified port.
  Rebind tried the user-specified port and 0.
  But there are actually three ports of interest:
  The one the user specified, the most recent port in use, and 0.
  We now try all three in order, as appropriate.
* In the extremely rare case in which binding to port 0 fails,
  use a dummy net.PacketConn whose reads block until close.
  This will keep the wireguard-go receive func goroutine alive.

As a pleasant side-effect of this, if we decide that
we need to resuscitate #1796, it will now be much easier.

Fixes #1799

Co-authored-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 10:39:28 -07:00
Josh Bleecher Snyder 8fb66e20a4 wgengine/magicsock: remove DefaultPort const
Assume it'll stay at 0 forever, so hard-code it
and delete code conditional on it being non-0.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 10:39:28 -07:00
Josh Bleecher Snyder a8f61969b9 wgengine/magicsock: remove context arg from listenPacket
It was set to context.Background by all callers, for the same reasons.
Set it locally instead, to simplify call sites.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 10:39:28 -07:00
Brad Fitzpatrick a48c8991f1 ipn/ipnlocal: add a test for earlier lazy machine key generation change
Updates #1573

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 08:52:04 -07:00
Brad Fitzpatrick 1e6d512bf0 cmd/tailscale: improve file cp error message in macOS GUI version
Fixes tailscale/corp#1684

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 08:35:55 -07:00
Brad Fitzpatrick 4512aad889 version: add IsSandboxedMacOS func
For when we need to tweak behavior or errors as a function of which of
3 macOS Tailscale variants we're using. (more accessors coming later
as needed)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 08:34:19 -07:00
Brad Fitzpatrick 8efc7834f2 go.mod: bump wireguard-go
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-28 08:04:26 -07:00
David Anderson 306a094d4b ipn/ipnlocal: remove IPv6 records from MagicDNS.
Fixes #1813.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-28 01:01:56 -07:00
Brad Fitzpatrick 2840afabba version: bump date
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 19:10:25 -07:00
David Anderson 44c2b7dc79 net/dns: on windows, skip site-local v6 resolvers.
Further refinement for tailscale/corp#1662.

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-27 18:24:54 -07:00
Brad Fitzpatrick 8554694616 cmd/tailscale: add 'tailscale file get' subcommand
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 15:28:50 -07:00
Brad Fitzpatrick cafa037de0 cmd/tailscale/cli: rename 'tailscale push' to 'tailscale file cp'
And reverse order, require final colon, and support multiple files.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 13:58:56 -07:00
Brad Fitzpatrick bb2141e0cf wgengine: periodically poll engine status for logging side effect
Fixes tailscale/corp#1560

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 13:55:47 -07:00
Brad Fitzpatrick 3c9dea85e6 wgengine: update a log line from 'weird' to conventional 'unexpected'
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 09:59:25 -07:00
Brad Fitzpatrick 3bdc9e9cb2 ipn/ipnlocal: prevent a now-expected [unexpected] log message on Windows
Updates #1620

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-27 09:58:05 -07:00
Ross Zurowski b062ac5e86
cmd/tailscale: fix typo in error message (#1807)
Signed-off-by: Ross Zurowski <ross@rosszurowski.com>
2021-04-27 10:16:08 -04:00
Brad Fitzpatrick 5ecc7c7200 cmd/tailscale: make the new 'up' errors prettier and more helpful
Fixes #1746

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-26 21:26:29 -07:00
Josh Bleecher Snyder 744de615f1 health, wgenegine: fix receive func health checks for the fourth time
The old implementation knew too much about how wireguard-go worked.
As a result, it missed genuine problems that occurred due to unrelated bugs.

This fourth attempt to fix the health checks takes a black box approach.
A receive func is healthy if one (or both) of these conditions holds:

* It is currently running and blocked.
* It has been executed recently.

The second condition is required because receive functions
are not continuously executing. wireguard-go calls them and then
processes their results before calling them again.

There is a theoretical false positive if wireguard-go go takes
longer than one minute to process the results of a receive func execution.
If that happens, we have other problems.

Updates #1790

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:35:49 -07:00
Josh Bleecher Snyder 0d4c8cb2e1 health: delete ReceiveFunc health checks
They were not doing their job.
They need yet another conceptual re-think.
Start by clearing the decks.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:35:49 -07:00
Josh Bleecher Snyder 99705aa6b7 net/tstun: split TUN events channel into up/down and MTU
We had a long-standing bug in which our TUN events channel
was being received from simultaneously in two places.

The first is wireguard-go.

At wgengine/userspace.go:366, we pass e.tundev to wireguard-go,
which starts a goroutine (RoutineTUNEventReader)
that receives from that channel and uses events to adjust the MTU
and bring the device up/down.

At wgengine/userspace.go:374, we launch a goroutine that
receives from e.tundev, logs MTU changes, and triggers
state updates when up/down changes occur.

Events were getting delivered haphazardly between the two of them.

We don't really want wireguard-go to receive the up/down events;
we control the state of the device explicitly by calling device.Up.
And the userspace.go loop MTU logging duplicates logging that
wireguard-go does when it received MTU updates.

So this change splits the single TUN events channel into up/down
and other (aka MTU), and sends them to the parties that ought
to receive them.

I'm actually a bit surprised that this hasn't caused more visible trouble.
If a down event went to wireguard-go but the subsequent up event
went to userspace.go, we could end up with the wireguard-go device disappearing.

I believe that this may also (somewhat accidentally) be a fix for #1790.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:16:51 -07:00
David Anderson 97d2fa2f56 net/dns: work around WSL DNS implementation flaws.
Fixes tailscale/corp#1662

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-26 16:54:50 -07:00
Brad Fitzpatrick ffe6c8e335 cmd/tailscale/cli: don't do a simple up when in state NeedsLogin
Fixes #1780

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-26 11:38:56 -07:00
Brad Fitzpatrick 138921ae40 ipn/ipnlocal: always write files to partial files, even in buffered mode
The intention was always that files only get written to *.partial
files and renamed at the end once fully received, but somewhere in the
process that got lost in buffered mode and *.partial files were only
being used in direct receive mode. This fix prevents WaitingFiles
from returning files that are still being transferred.

Updates tailscale/corp#1626

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-26 11:34:05 -07:00
Brad Fitzpatrick 5e268e6153 ipn/ipnlocal: use delete marker files to work around Windows delete problems
If DeleteFile fails on Windows due to another process (anti-virus,
probably) having our file open, instead leave a marker file that the
file is logically deleted, and remove it from API calls and clean it
up lazily later.

Updates tailscale/corp#1626

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-04-26 10:59:25 -07:00
Avery Pennarun a7fe1d7c46 wgengine/bench: improved rate selection.
The old decay-based one took a while to converge. This new one (based
very loosely on TCP BBR) seems to converge quickly on what seems to be
the best speed.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-26 03:51:13 -04:00
Avery Pennarun a92b9647c5 wgengine/bench: speed test for channels, sockets, and wireguard-go.
This tries to generate traffic at a rate that will saturate the
receiver, without overdoing it, even in the event of packet loss. It's
unrealistically more aggressive than TCP (which will back off quickly
in case of packet loss) but less silly than a blind test that just
generates packets as fast as it can (which can cause all the CPU to be
absorbed by the transmitter, giving an incorrect impression of how much
capacity the total system has).

Initial indications are that a syscall about every 10 packets (TCP bulk
delivery) is roughly the same speed as sending every packet through a
channel. A syscall per packet is about 5x-10x slower than that.

The whole tailscale wireguard-go + magicsock + packet filter
combination is about 4x slower again, which is better than I thought
we'd do, but probably has room for improvement.

Note that in "full" tailscale, there is also a tundev read/write for
every packet, effectively doubling the syscall overhead per packet.

Given these numbers, it seems like read/write syscalls are only 25-40%
of the total CPU time used in tailscale proper, so we do have
significant non-syscall optimization work to do too.

Sample output:

$ GOMAXPROCS=2 go test -bench . -benchtime 5s ./cmd/tailbench
goos: linux
goarch: amd64
pkg: tailscale.com/cmd/tailbench
cpu: Intel(R) Core(TM) i7-4785T CPU @ 2.20GHz
BenchmarkTrivialNoAlloc/32-2         	56340248	        93.85 ns/op	 340.98 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivialNoAlloc/124-2        	57527490	        99.27 ns/op	1249.10 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivialNoAlloc/1024-2       	52537773	       111.3 ns/op	9200.39 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/32-2                	41878063	       135.6 ns/op	 236.04 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/124-2               	41270439	       138.4 ns/op	 896.02 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTrivial/1024-2              	36337252	       154.3 ns/op	6635.30 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkBlockingChannel/32-2           12171654	       494.3 ns/op	  64.74 MB/s	         0 %lost	    1791 B/op	       0 allocs/op
BenchmarkBlockingChannel/124-2          12149956	       507.8 ns/op	 244.17 MB/s	         0 %lost	    1792 B/op	       1 allocs/op
BenchmarkBlockingChannel/1024-2         11034754	       528.8 ns/op	1936.42 MB/s	         0 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/32-2          8960622	      2195 ns/op	  14.58 MB/s	         8.825 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/124-2         3014614	      2224 ns/op	  55.75 MB/s	        11.18 %lost	    1792 B/op	       1 allocs/op
BenchmarkNonlockingChannel/1024-2        3234915	      1688 ns/op	 606.53 MB/s	         3.765 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/32-2          	 8457559	       764.1 ns/op	  41.88 MB/s	         5.945 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/124-2         	 5497726	      1030 ns/op	 120.38 MB/s	        12.14 %lost	    1792 B/op	       1 allocs/op
BenchmarkDoubleChannel/1024-2        	 7985656	      1360 ns/op	 752.86 MB/s	        13.57 %lost	    1792 B/op	       1 allocs/op
BenchmarkUDP/32-2                    	 1652134	      3695 ns/op	   8.66 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkUDP/124-2                   	 1621024	      3765 ns/op	  32.94 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkUDP/1024-2                  	 1553750	      3825 ns/op	 267.72 MB/s	         0 %lost	     176 B/op	       3 allocs/op
BenchmarkTCP/32-2                    	11056336	       503.2 ns/op	  63.60 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTCP/124-2                   	11074869	       533.7 ns/op	 232.32 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkTCP/1024-2                  	 8934968	       671.4 ns/op	1525.20 MB/s	         0 %lost	       0 B/op	       0 allocs/op
BenchmarkWireGuardTest/32-2          	 1403702	      4547 ns/op	   7.04 MB/s	        14.37 %lost	     467 B/op	       3 allocs/op
BenchmarkWireGuardTest/124-2         	  780645	      7927 ns/op	  15.64 MB/s	         1.537 %lost	     420 B/op	       3 allocs/op
BenchmarkWireGuardTest/1024-2        	  512671	     11791 ns/op	  86.85 MB/s	         0.5206 %lost	     411 B/op	       3 allocs/op
PASS
ok  	tailscale.com/wgengine/bench	195.724s

Updates #414.

Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
2021-04-26 03:51:13 -04:00