Instead of hard-coding the DERP map (except for cmd/tailscale netcheck
for now), get it from the control server at runtime.
And make the DERP map support multiple nodes per region with clients
picking the first one that's available. (The server will balance the
order presented to clients for load balancing)
This deletes the stunner package, merging it into the netcheck package
instead, to minimize all the config hooks that would've been
required.
Also fix some test flakes & races.
Fixes#387 (Don't hard-code the DERP map)
Updates #388 (Add DERP region support)
Fixes#399 (wgengine: flaky tests)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
If a test calls log.Printf, 'go test' horrifyingly rearranges the
output to no longer be in chronological order, which makes debugging
virtually impossible. Let's stop that from happening by making
log.Printf panic if called from any module, no matter how deep, during
tests.
This required us to change the default error handler in at least one
http.Server, as well as plumbing a bunch of logf functions around,
especially in magicsock and wgengine, but also in logtail and backoff.
To add insult to injury, 'go test' also rearranges the output when a
parent test has multiple sub-tests (all the sub-test's t.Logf is always
printed after all the parent tests t.Logf), so we need to screw around
with a special Logf that can point at the "current" t (current_t.Logf)
in some places. Probably our entire way of using subtests is wrong,
since 'go test' would probably like to run them all in parallel if you
called t.Parallel(), but it definitely can't because the're all
manipulating the shared state created by the parent test. They should
probably all be separate toplevel tests instead, with common
setup/teardown logic. But that's a job for another time.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
The docs on magicsock.Conn stated that they implemented the
wireguard/device.Bind interface, yet this type does not exist. In
reality, the Conn type implements the wireguard/conn.Bind interface.
I also fixed a small typo in the same file.
Signed-off-by: Blake Gentry <blakesgentry@gmail.com>
* remove endpoint discovery noise when results unchanged
* consistently spell derp nodes as "derp-N"
* replace "127.3.3.40:" with "derp-" in CreateEndpoint log output
* stop early DERP setup before SetPrivateKey is called;
it just generates log nosie
* fix stringification of peer ShortStrings (it had an old %x on it,
rendering it garbage)
* describe why derp routes are changing, with one of:
shared home, their home, our home, alt
Add opt-in method to request IPv6 endpoints from the control plane.
For now they should just be skipped. A previous version of this CL was
unconditional and reportedly had problems that I can't reproduce. So
make it a knob until the mystery is solved.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Breaks something deep in wireguard or magicsock's brainstem, no packets at all
can flow. All received packets fail decryption with "invalid mac1".
This reverts commit 94024355ed.
Signed-off-by: David Anderson <dave@natulte.net>
More steps towards IPv6 transport.
We now send it to tailcontrol, which ignores it.
But it doesn't actually actually support IPv6 yet (outside of STUN).
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Use this when making the ipn state transition from Starting to
Running. This way a network of quiet nodes with no active
handshaking will still transition to Active.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
Typically the home DERP server is found and set on startup before
magicsock's SetPrivateKey can be called, so no DERP connection is
established. Make sure one is by kicking the home DERP tires in
SetPrivateKey.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
The code as written intended to do this, but it repeated the
comparison of derpNum and c.myDerp after c.myDerp had been
updated, so it never executed.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
Before, endpoint updates were constantly being interrupted and resumed
on Linux due to tons of LinkChange messages from over-zealous Linux
netlink messages (from router_linux.go)
Now that endpoint updates are fast and bounded in time anyway, just
let them run to completion, but note that another needs to be
scheduled after.
Now logs went from pages of noise to just:
root@taildoc:~# grep -i -E 'stun|endpoint update' log
2020/03/13 08:51:29 magicsock.Conn: starting endpoint update (initial)
2020/03/13 08:51:30 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
2020/03/13 08:51:31 magicsock.Conn: starting endpoint update (link-change-minor)
2020/03/13 08:51:31 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
2020/03/13 08:51:33 magicsock.Conn: starting endpoint update (link-change-minor)
2020/03/13 08:51:33 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
2020/03/13 08:51:35 magicsock.Conn: starting endpoint update (link-change-minor)
2020/03/13 08:51:35 magicsock.Conn.ReSTUN: endpoint update active, need another later ("link-change-minor")
Or, seen in another run:
2020/03/13 08:45:41 magicsock.Conn: starting endpoint update (periodic)
2020/03/13 08:46:09 magicsock.Conn: starting endpoint update (periodic)
2020/03/13 08:46:21 magicsock.Conn: starting endpoint update (link-change-major)
2020/03/13 08:46:37 magicsock.Conn: starting endpoint update (periodic)
2020/03/13 08:47:05 magicsock.Conn: starting endpoint update (periodic)
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The TODO above derphttp.NewClient suggests it does network I/O,
but the derphttp client connects lazily and so creating one is
very cheap.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
It used to make assumptions based on having Anycast IPs that are super
near. Now we're intentionally going to a bunch of different distant
IPs to measure latency.
Also, optimize how the hairpin detection works. No need to STUN on
that socket. Just use that separate socket for sending, once we know
the other UDP4 socket's endpoint. The trick is: make our test probe
also a STUN packet, so it fits through magicsock's existing STUN
routing.
This drops netcheck from ~5 seconds to ~250-500ms.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Basically, don't trust the OS-level link monitor to only tell you
interesting things. Sanity check it.
Also, move the interfaces package into the net directory now that we
have it.
The UDP reader goroutine was clobbering `n` and `err` from the
main goroutine, whose accesses are not synchronized the way `b` is.
Signed-off-by: David Anderson <danderson@tailscale.com>
wireguard-go closes magicsock, and expects this to unblock reads
so that its internal goroutines can wind down. We were incorrectly
blocking the read indefinitey and breaking this contract.
Signed-off-by: David Anderson <danderson@tailscale.com>
The previous code would skip the DERP short-circuit if roamAddr
was set, which is not what we wanted. More generally, hitting
any of the fast path conditions is a direct return, so we can
just have 3 standalone branches rather than 'else if' stuff.
Signed-Off-By: David Anderson <danderson@tailscale.com>
The effect is subtle: when we're not spraying packets, and have not yet
figured out a curAddr, and we're not spraying, we end up sending to
whatever the first IP is in the iteration order. In English, that
means "when we have no idea where to send packets, and we've given
up on sending to everyone, just send to the first addr we see in
the list."
This is, in general, what we want, because the addrs are in sorted
preference order, low to high, and DERP is the least preferred
destination. So, when we have no idea where to send, send to DERP,
right?
... Except for very historical reasons, appendDests iterated through
addresses in _reverse_ order, most preferred to least preferred.
crawshaw@ believes this was part of the earliest handshaking
algorithm magicsock had, where it slowly iterated through possible
destinations and poked handshakes to them one at a time.
Anyway, because of this historical reverse iteration, in the case
described above of "we have no idea where to send", the code would
end up sending to the _most_ preferred candidate address, rather
than the _least_ preferred. So when in doubt, we'd end up firing
packets into the blackhole of some LAN address that doesn't work,
and connectivity would not work.
This case only comes up if all your non-DERP connectivity options
have failed, so we more or less failed to detect it because we
didn't have a pathological test box deployed. Worse, codependent
bug 2839854994 made DERP accidentally
work sometimes anyway by incorrectly exploiting roamAddr behavior,
albeit at the cost of making DERP traffic symmetric. In fixing
DERP to once again be asymmetric, we effectively removed the
bandaid that was concealing this bug.
Signed-Off-By: David Anderson <danderson@tailscale.com>
DERP traffic is asymmetric by design, with nodes always sending
to their peer's home DERP server. However, if roamAddr is set,
magicsock will always push data there, rather than let DERP
server selection do its thing, so we end up accidentally
creating a symmetric flow.
Signed-Off-By: David Anderson <danderson@tailscale.com>
I started to write a full DNS caching resolver and I realized it was
overkill and wouldn't work on Windows even in Go 1.14 yet, so I'm
doing this tiny one instead for now, just for all our netcheck STUN
derp lookups, and connections to DERP servers. (This will be caching a
exactly 8 DNS entries, all ours.)
Fixes#145 (can be better later, of course)
The value predates the introduction of AddrSet which replaces
the index by tracking curAddr directly.
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
In particular, this is designed to catch the case where a
HandshakeInitiation packet is sent out but the intermediate NATs
have not been primed, so the packet passes over DERP.
In that case, the HandshakeResponse also comes back over DERP,
and the connection proceeds via DERP without ever trying to punch
through the NAT.
With this change, the HandshakeResponse (which was sprayed out
and so primed one NAT) triggers an UpdateDst, which triggers
the extra spray logic.
(For this to work, there has to be an initial supply of packets
to send on to a peer for the three seconds following a handshake.
The source of these packets is left as a future exercise.)
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>