In rare circumstances (tailscale/corp#3016), the PublicKey
and Endpoints can diverge.
This by itself doesn't cause any harm, but our early exit
in response did, because it prevented us from recovering from it.
Remove the early exit.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Tailscale 1.18 uses netlink instead of the "ip" command to program the
Linux kernel.
The old way was kept primarily for tests, but this also adds a
TS_DEBUG_USE_IP_COMMAND environment knob to force the old way
temporarily for debugging anybody who might have problems with the
new way in 1.18.
Updates #391
Change-Id: I0236fbfda6c9c05dcb3554fcc27ec0c86456efd9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
endpoint.discoKey is protected by endpoint.mu.
endpoint.sendDiscoMessage was reading it without holding the lock.
This showed up in a CI failure and is readily reproducible locally.
The fix is in two parts.
First, for Conn.enqueueCallMeMaybe, eliminate the one-line helper method endpoint.sendDiscoMessage; call Conn.sendDiscoMessage directly.
This makes it more natural to read endpoint.discoKey in a context
in which endpoint.mu is already held.
Second, for endpoint.sendDiscoPing, explicitly pass the disco key
as an argument. Again, this makes it easier to read endpoint.discoKey
in a context in which endpoint.mu is already held.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
I believe that this should eliminate the flakiness.
If GitHub CI manages to be even slower that can be believed
(and I can believe a lot at this point),
then we should roll this back and make some more invasive changes.
Updates #654Fixes#3247 (I hope)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We can do the "maybe delete" check unilaterally:
In the case of an insert, both oldDiscoKey
and ep.discoKey will be the zero value.
And since we don't use pi again, we can skip
giving it a name, which makes scoping clearer.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
wgengine/wgcfg: introduce wgcfg.NewDevice helper to disable roaming
at all call sites (one real plus several tests).
Fixestailscale/corp#3016.
Signed-off-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
And annotate magicsock as a start.
And add localapi and debug handlers with the Prometheus-format
exporter.
Updates #3307
Change-Id: I47c5d535fe54424741df143d052760387248f8d3
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In DeviceConfig, we did not close r after calling FromUAPI.
If FromUAPI returned early due to an error, then it might
not have read all the data that IpcGetOperation wanted to write.
As a result, IpcGetOperation could hang, as in #3220.
We were also closing the wrong end of the pipe after IpcSetOperation
in ReconfigDevice.
To ensure that we get all available information to diagnose
such a situation, include all errors anytime something goes wrong.
This should fix the immediate crashing problem in #3220.
We'll then need to figure out why IpcGetOperation was failing.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
github.com/go-multierror/multierror served us well.
But we need a few feature from it (implement Is),
and it's not worth maintaining a fork of such a small module.
Instead, I did a clean room implementation inspired by its API.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Using temporary netlink fork in github.com/tailscale/netlink until we
get the necessary changes upstream in either vishvananda/netlink
or jsimonetti/rtnetlink.
Updates #391
Change-Id: I6e1de96cf0750ccba53dabff670aca0c56dffb7c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Even if not in use. We plan to use it for more stuff later.
(not for iOS or macOS-GUIs yet; only tailscaled)
Change-Id: Idaef719d2a009be6a39f158fd8f57f8cca68e0ee
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Pull out the list of policy routing rules to a data structure
now shared between the add & delete paths, but to also be shared
by the netlink paths in a future change.
Updates #391
Change-Id: I119ab1c246f141d639006c808b61c585c3d67924
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
There are a few remaining uses of testing.AllocsPerRun:
Two in which we only log the number of allocations,
and one in which dynamically calculate the allocations
target based on a different AllocsPerRun run.
This also allows us to tighten the "no allocs"
test in wgengine/filter.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Anybody using that one old, unreleased version of Tailscale from over
a year ago should've rebooted their machine by now to get various
non-Tailscale security updates. :)
Change-Id: If9e043cb008b20fcd6ddfd03756b3b23a9d7aeb5
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
And the derper change to add a CORS endpoint for latency measurement.
And a little magicsock change to cut down some log spam on js/wasm.
Updates #3157
Change-Id: I5fd9e6f5098c815116ddc8ac90cbcd0602098a48
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Just something I ran across while debugging an unrelated failure. This
is not in response to any bug/issue.
Signed-off-by: Maisem Ali <maisem@tailscale.com>
Be DERP-only for now. (WebRTC can come later :))
Updates #3157
Change-Id: I56ebb3d914e37e8f4ab651306fd705b817ca381c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Now that peerMap tracks the set of nodes for a DiscoKey.
Updates #3088
Change-Id: I927bf2bdfd2b8126475f6b6acc44bc799fcb489f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Continuation of 2aa5df7ac1, remove nil
check because it can never be nil. (It previously was able to be nil.)
Change-Id: I59cd9ad611dbdcbfba680ed9b22e841b00c9d5e6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds new fields (currently unused) to discoInfo to track what the
last verified (unambiguous) NodeKey a DiscoKey last mapped to, and
when.
Then on CallMeMaybe, Pong and on most Pings, we update the mapping
from DiscoKey to the current NodeKey for that DiscoKey.
Updates #3088
Change-Id: Idc4261972084dec71cf8ec7f9861fb9178eb0a4d
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This lets clients quickly (sub-millisecond within a local LAN) map
from an ambiguous disco key to a node key without waiting for a
CallMeMaybe (over relatively high latency DERP).
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The "go generate" command blindly looks for "//go:generate" anywhere
in the file regardless of whether it is truly a comment.
Prevent this false positive in cloner.go by mangling the string
to look less like "//go:generate".
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
https://github.com/tailscale/tailscale/pull/3014 added a
rebind on STUN failure, which means there can now be a
tailscale.com/wgengine/magicsock.(*RebindingUDPConn).ReadFromNetaddr
in progress at the end of the test waiting for a STUN
response which will never arrive.
This causes a test flake due to the resource leak in those
cases where the Conn decided to rebind. For whatever reason,
it mostly flakes with Windows.
If the Conn is closed, don't Rebind after a send error.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Renames only; continuation of earlier 8049063d35
These kept confusing me while working on #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The one remaining caller of peerMap.endpointForDiscoKey was making the
improper assumption that there's exactly 1 node with a given DiscoKey
in the network. That was the cause of #3088.
Now that all the other callers have been updated to not use
endpointForDiscoKey, there's no need to try to keep maintaining that
prone-to-misuse index.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
A DiscoKey maps 1:n to endpoints. When we get a disco pong, we don't
necessarily know which endpoint sent it to us. Ask them all. There
will only usually be 1 (and in rare circumstances 2). So it's easier
to ask all two rather than building new maps from the random ping TxID
to its endpoint.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We can reply to a ping without knowing which exact node it's from. As
long as it's in our netmap, it's safe to reply. If there's more than
one node with that discokey, it doesn't matter who we're relpying to.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
As more prep for removing the false assumption that you're able to
map from DiscoKey to a single peer, move the lastPingFrom and lastPingTime
fields from the endpoint type to a new discoInfo type, effectively upgrading
the old sharedDiscoKey map (which only held a *[32]byte nacl precomputed key
as its value) to discoInfo which then includes that naclbox key.
Then start plumbing it into handlePing in prep for removing the need
for handlePing to take an endpoint parameter.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The pass just after in this method handles cleaning up sharedDiscoKey.
No need to do it wrong (assuming DiscoKey => 1 node) earlier.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It's not valid to assume that a discokey is globally unique.
This removes the first two of the four callers.
Updates #3088
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
On iOS (and possibly other platforms), sometimes our UDP socket would
get stuck in a state where it was bound to an invalid interface (or no
interface) after a network reconfiguration. We can detect this by
actually checking the error codes from sending our STUN packets.
If we completely fail to send any STUN packets, we know something is
very broken. So on the next STUN attempt, let's rebind the UDP socket
to try to correct any problems.
This fixes a problem where iOS would sometimes get stuck using DERP
instead of direct connections until the backend was restarted.
Fixes#2994
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
Windows has a public dns.Flush used in router_windows.go.
However that won't work for platforms like Linux, where
we need a different flush mechanism for resolved versus
other implementations.
We're instead adding a FlushCaches method to the dns Manager,
which can be made to work on all platforms as needed.
Fixes https://github.com/tailscale/tailscale/issues/2132
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Spelling out the command to run for every type
means that changing the command makes for a large, repetitive diff.
Stop doing that.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We currently plumb full URLs for DNS resolvers from the control server
down to the client. But when we pass the values into the net/dns
package, we throw away any URL that isn't a bare IP. This commit
continues the plumbing, and gets the URL all the way to the built in
forwarder. (It stops before plumbing URLs into the OS configurations
that can handle them.)
For #2596
Signed-off-by: David Crawshaw <crawshaw@tailscale.com>
And in the process, fix the related confusing error messages from
pinging your own IP or hostname.
Fixes#2803
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
AFAICT this was always present, the log read mid-execution was never safe.
But it seems like the recent magicsock refactoring made the race much
more likely.
Signed-off-by: David Anderson <danderson@tailscale.com>
And add health check errors to ipnstate.Status (tailscale status --json).
Updates #2746
Updates #2775
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
It was useful early in development when disco clients were the
exception and tailscale logs were noisier than today, but now
non-disco is the exception.
Updates #2752
Signed-off-by: David Anderson <danderson@tailscale.com>
Having removed magicconn.Start, there's no need to synchronize startup
of other things to it any more.
Signed-off-by: David Anderson <danderson@tailscale.com>
Over time, other magicsock refactors have made Start effectively a
no-op, except that some other functions choose to panic if called
before Start.
Signed-off-by: David Anderson <danderson@tailscale.com>
We were returning an error almost, but not quite like errConnClosed in
a single codepath, which could still trip the panic on reconfig in the
test logic.
Signed-off-by: David Anderson <danderson@tailscale.com>
Our prod code doesn't eagerly handshake, because our disco layer enables
on-demand handshaking. Configuring both peers to eagerly handshake leads
to WireGuard handshake races that make TestTwoDevicePing flaky.
Signed-off-by: David Anderson <danderson@tailscale.com>
It only existed to override one test-only behavior with a
different test-only behavior, in both cases working around
an annoying feature of our CI environments. Instead, handle
that weirdness entirely in the test code, with a tweaked
TestOnlyPacketListener that gets injected.
Signed-off-by: David Anderson <danderson@tailscale.com>
The docstring said it was meant for use in tests, but it's specifically a
special codepath that is _only_ used in tests, so make the claim stronger.
Signed-off-by: David Anderson <danderson@tailscale.com>
Instead of using the legacy codepath, teach discoEndpoint to handle
peers that have a home DERP, but no disco key. We can still communicate
with them, but only over DERP.
Signed-off-by: David Anderson <danderson@tailscale.com>
This log is quite verbose, it was only to be left in for one
unstable build to help debug a user issue.
This reverts commit 1dd2552032.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
Intended to help in resolving customer issue with
DNS caching.
We currently exec `ipconfig /flushdns` from two
places:
- SetDNS(), which logs before invoking
- here in router_windows, which doesn't
We'd like to see a positive indication in logs that flushdns
is being run.
As this log is expected to be spammy, it is proposed to
leave this in just long enough to do an unstable 1.13.x build
and then revert it. They won't run an unsigned image that
I build.
Signed-off-by: Denton Gentry <dgentry@tailscale.com>
The number of peers we have will be pretty stable across time.
Allocate roughly the right slice size.
This reduces memory usage when there are many peers.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Two optimizations.
Use values instead of pointers.
We were using pointers to make track the "peer in progress" easier.
It's not too hard to do it manually, though.
Make two passes through the data, so that we can size our
return value accurately from the beginning.
This is cheap enough compared to the allocation,
which grows linearly in the number of peers,
that it is worth doing.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Still very much a prototype (hard-coded IPs, etc) but should be
non-invasive enough to submit at this point and iterate from here.
Updates #2589
Co-Author: David Crawshaw <crawshaw@tailscale.com>
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This is a simplified rate limiter geared for exactly our needs:
A fast, mono.Time-based rate limiter for use in tstun.
It was generated by stripping down the x/time/rate rate limiter
to just our needs and switching it to use mono.Time.
It removes one time.Now call per packet.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
magicsock makes multiple calls to Now per packet.
Move to mono.Now. Changing some of the calls to
use package mono has a cascading effect,
causing non-per-packet call sites to also switch.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
TCP was done in 662fbd4a09.
This does the same for UDP.
Tested by hand. Integration tests will have to come later. I'd wanted
to do it in this commit, but the SOCKS5 server needed for interop
testing between two userspace nodes doesn't yet support UDP and I
didn't want to invent some whole new userspace packet injection
interface at this point, as SOCKS seems like a better route, but
that's its own bug.
Fixes#2302
RELNOTE=netstack mode can now UDP relay to subnets
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Prep for #1591 which will need to make Linux's router react to changes
that the link monitor observes.
The router package already depended on the monitor package
transitively. Now it's explicit.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The fact that Hash returns a [sha256.Size]byte leaks details about
the underlying hash implementation. This could very well be any other
hashing algorithm with a possible different block size.
Abstract this implementation detail away by declaring an opaque type
that is comparable. While we are changing the signature of UpdateHash,
rename it to just Update to reduce stutter (e.g., deephash.Update).
Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The earlier 2ba36c294b started listening
for ip rule changes and only cared about DELRULE events, buts its subscription
included all rule events, including new ones, which meant we were then
catching our own ip rule creations and logging about how they were unknown.
Stop that log spam.
Updates #1591
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For debugging & working on #1591 where certain versions of systemd-networkd
delete Tailscale's ip rule entries.
Updates #1591
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
logBufWriter had no serialization.
It just so happens that none of its users currently ever log concurrently.
Make it safe for concurrent use.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
As Brad suggested, mem.RO allows for a lot of easy perf gains. There were also some smaller
changes outside of mem.RO, such as using hex.Decode instead of hex.DecodeString.
```
name old time/op new time/op delta
FromUAPI-8 14.7µs ± 3% 12.3µs ± 4% -16.58% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
FromUAPI-8 9.52kB ± 0% 7.04kB ± 0% -26.05% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
FromUAPI-8 77.0 ± 0% 29.0 ± 0% -62.34% (p=0.008 n=5+5)
```
Signed-off-by: julianknodt <julianknodt@gmail.com>
Adds a benchmark for FromUAPI in wgcfg.
It appears that it's not actually that slow, the main allocations are from the scanner and new
config.
Updates #1912.
Signed-off-by: julianknodt <julianknodt@gmail.com>
The DERPTestPort int meant two things before: which port to use, and
whether to disable TLS verification. Users would like to set the port
without disabling TLS, so break it into two options.
Updates #1264
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
For instance, ephemeral nodes with only IPv6 addresses can now
SOCKS5-dial out to names like "foo" and resolve foo's IPv6 address
rather than foo's IPv4 address and get a "no route"
(*tcpip.ErrNoRoute) error from netstack's dialer.
Per https://github.com/tailscale/tailscale/issues/2268#issuecomment-870027626
which is only part of the isuse.
Updates #2268
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
After allowing for custom DERP maps, it's convenient to be able to see their latency in
netcheck. This adds a query to the local tailscaled for the current DERPMap.
Updates #1264
Signed-off-by: julianknodt <julianknodt@gmail.com>
This uses a debug envvar to optionally disable filter logging rate
limits by setting the environment variable
TS_DEBUG_FILTER_RATE_LIMIT_LOGS to "all", and if it matches,
the code will effectively disable the limits on the log rate by
setting the limit to 1 millisecond. This should make sure that all
filter logs will be captured.
Signed-off-by: Christine Dodrill <xe@tailscale.com>
Unused so far, but eventually we'll want this for SOCKS5 UDP binds (we
currently only do TCP with SOCKS5), and also for #2102 for forwarding
MagicDNS upstream to Tailscale IPs over netstack.
Updates #2102
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The route creation for the `tun` device was augmented in #1469 but
didn't account for adding IPv4 vs. IPv6 routes. There are 2 primary
changes as a result:
* Ensure that either `-inet` or `-inet6` was used in the
[`route(8)`](https://man.openbsd.org/route) command
* Use either the `localAddr4` or `localAddr6` for the gateway argument
depending which destination network is being added
The basis for the approach is based on the implementation from
`router_userspace_bsd.go`, including the `inet()` helper function.
Fixes#2048
References #1469
Signed-off-by: Fletcher Nichol <fnichol@nichol.ca>
It is a bit faster.
But more importantly, it matches upstream byte-for-byte,
which ensures there'll be no corner cases in which we disagree.
name old time/op new time/op delta
SetPeers-8 3.58µs ± 0% 3.16µs ± 2% -11.74% (p=0.016 n=4+5)
name old alloc/op new alloc/op delta
SetPeers-8 2.53kB ± 0% 2.53kB ± 0% ~ (all equal)
name old allocs/op new allocs/op delta
SetPeers-8 99.0 ± 0% 99.0 ± 0% ~ (all equal)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Pull in the latest version of wireguard-windows.
Switch to upstream wireguard-go.
This requires reverting all of our import paths.
Unfortunately, this has to happen at the same time.
The wireguard-go change is very low risk,
as that commit matches our fork almost exactly.
(The only changes are import paths, CI files, and a go.mod entry.)
So if there are issues as a result of this commit,
the first place to look is wireguard-windows changes.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
We repeat many peers each time we call SetPeers.
Instead of constructing strings for them from scratch every time,
keep strings alive across iterations.
name old time/op new time/op delta
SetPeers-8 3.58µs ± 1% 2.41µs ± 1% -32.60% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
SetPeers-8 2.53kB ± 0% 1.30kB ± 0% -48.73% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SetPeers-8 99.0 ± 0% 16.0 ± 0% -83.84% (p=0.000 n=10+10)
We could reduce alloc/op 12% and allocs/op 23% if strs had
type map[string]strCache instead of map[string]*strCache,
but that wipes out the execution time impact.
Given that re-use is the most common scenario, let's optimize for it.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Our wireguard-go fork used different values from upstream for
package device's memory limits on iOS.
This was the last blocker to removing our fork.
These values are now vars rather than consts for iOS.
c27ff9b9f6
Adjust them on startup to our preferred values.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
magicsock.Conn.ParseEndpoint requires a peer's public key,
disco key, and legacy ip/ports in order to do its job.
We currently accomplish that by:
* adding the public key in our wireguard-go fork
* encoding the disco key as magic hostname
* using a bespoke comma-separated encoding
It's a bit messy.
Instead, switch to something simpler: use a json-encoded struct
containing exactly the information we need, in the form we use it.
Our wireguard-go fork still adds the public key to the
address when it passes it to ParseEndpoint, but now the code
compensating for that is just a couple of simple, well-commented lines.
Once this commit is in, we can remove that part of the fork
and remove the compensating code.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
The new code is ugly, but much faster and leaner.
name old time/op new time/op delta
SetPeers-8 7.81µs ± 1% 3.59µs ± 1% -54.04% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
SetPeers-8 7.68kB ± 0% 2.53kB ± 0% -67.08% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
SetPeers-8 237 ± 0% 99 ± 0% -58.23% (p=0.000 n=10+10)
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Because it showed up on hello profiles.
Cycle through some moderate-sized sets of peers.
This should cover the "small tweaks to netmap"
and the "up/down cycle" cases.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Yes, it printed, but that was an implementation detail for hashing.
And coming optimization will make it print even less.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
On benchmark completion, we shut down the wgengine.
If we happen to poll for status during shutdown,
we get an "engine closing" error.
It doesn't hurt anything; ignore it.
Fixestailscale/corp#1776
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Without any synchronization here, the "first packet" callback can
be delayed indefinitely, while other work continues.
Since the callback starts the benchmark timer, this could skew results.
Worse, if the benchmark manages to complete before the benchmark timer begins,
it'll cause a data race with the benchmark shutdown performed by package testing.
That is what is reported in #1881.
This is a bit unfortunate, in that it means that users of TrafficGen have
to be careful to keep this callback speedy and lightweight and to avoid deadlocks.
Fixes#1881
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
It is possible to get multiple status callbacks from an Engine.
We need to wait for at least one from each Engine.
Without limiting to one per Engine,
wait.Wait can exit early or can panic due to a negative counter.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
This reduces the speed with which these benchmarks exhaust their supply fds.
Not to zero unfortunately, but it's still helpful when doing long runs.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Fields rename only.
Part of the general effort to make our code agnostic about endpoint formatting.
It's just a name, but it will soon be a misleading one; be more generic.
Do this as a separate commit because it generates a lot of whitespace changes.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Upstream wireguard-go renamed the interface method
from CreateEndpoint to ParseEndpoint.
I missed some comments. Fix them.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
By using conn.NewDefaultBind, this test requires that our endpoints
be comprehensible to wireguard-go. Instead, use a no-op bind that
treats endpoints as opaque strings.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Legacy endpoints (addrSet) currently reconstruct their dst string when requested.
Instead, store the dst string we were given to begin with.
In addition to being simpler and cheaper, this makes less code
aware of how to interpret endpoint strings.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
Prefer the error from the actual wireguard-go device method call,
not {To,From}UAPI, as those tend to be less interesting I/O errors.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
When wireguard-go's UAPI interface fails with an error, ReconfigDevice hangs.
Fix that by buffering the channel and closing the writer after the call.
The code now matches the corresponding code in DeviceConfig, where I got it right.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
It is unused, and has been since early Feb 2021 (Tailscale 1.6).
We can't get delete the DeviceOptions entirely yet;
first #1831 and #1839 need to go in, along with some wireguard-go changes.
Deleting this chunk of code now will make the later commits more clearly correct.
Pingers can now go too.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
The earlier eb06ec172f fixed
the flaky SSH issue (tailscale/corp#1725) by making sure that packets
addressed to Tailscale IPs in hybrid netstack mode weren't delivered
to netstack, but another issue remained:
All traffic handled by netstack was also potentially being handled by
the host networking stack, as the filter hook returned "Accept", which
made it keep processing. This could lead to various random racey chaos
as a function of OS/firewalls/routes/etc.
Instead, once we inject into netstack, stop our caller's packet
processing.
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Prior to wireguard-go using printf-style logging,
all wireguard-go logging occurred using format string "%s".
We fixed that but continued to use %s when we rewrote
peer identifiers into Tailscale style.
This commit removes that %sl, which makes rate limiting work correctly.
As a happy side-benefit, it should generate less garbage.
Instead of replacing all wireguard-go peer identifiers
that might occur anywhere in a fully formatted log string,
assume that they only come from args.
Check all args for things that look like *device.Peers
and replace them with appropriately reformatted strings.
There is a variety of ways that this could go wrong
(unusual format verbs or modifiers, peer identifiers
occurring as part of a larger printed object, future API changes),
but none of them occur now, are likely to be added,
or would be hard to work around if they did.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
The "stop phrases" we use all occur in wireguard-go in the format string.
We can avoid doing a bunch of fmt.Sprintf work when they appear.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
For historical reasons, we ended up with two near-duplicate
copies of curve25519 key types, one in the wireguard-go module
(wgcfg) and one in the tailscale module (types/wgkey).
Then we moved wgcfg to the tailscale module.
We can now remove the wgcfg key type in favor of wgkey.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
One of the consequences of the bind refactoring in 6f23087175
is that attempting to bind an IPv6 socket will always
result in c.pconn6.pconn being non-nil.
If the bind fails, it'll be set to a placeholder packet conn
that blocks forever.
As a result, we can always run ReceiveIPv6 and health check it.
This removes IPv4/IPv6 asymmetry and also will allow health checks
to detect any IPv6 receive func failures.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
It must be an IP address; enforce that at the type level.
Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
We had two separate code paths for the initial UDP listener bind
and any subsequent rebinds.
IPv6 got left out of the rebind code.
Rather than duplicate it there, unify the two code paths.
Then improve the resulting code:
* Rebind had nested listen attempts to try the user-specified port first,
and then fall back to :0 if that failed. Convert that into a loop.
* Initial bind tried only the user-specified port.
Rebind tried the user-specified port and 0.
But there are actually three ports of interest:
The one the user specified, the most recent port in use, and 0.
We now try all three in order, as appropriate.
* In the extremely rare case in which binding to port 0 fails,
use a dummy net.PacketConn whose reads block until close.
This will keep the wireguard-go receive func goroutine alive.
As a pleasant side-effect of this, if we decide that
we need to resuscitate #1796, it will now be much easier.
Fixes#1799
Co-authored-by: David Anderson <danderson@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
Assume it'll stay at 0 forever, so hard-code it
and delete code conditional on it being non-0.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
It was set to context.Background by all callers, for the same reasons.
Set it locally instead, to simplify call sites.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
The old implementation knew too much about how wireguard-go worked.
As a result, it missed genuine problems that occurred due to unrelated bugs.
This fourth attempt to fix the health checks takes a black box approach.
A receive func is healthy if one (or both) of these conditions holds:
* It is currently running and blocked.
* It has been executed recently.
The second condition is required because receive functions
are not continuously executing. wireguard-go calls them and then
processes their results before calling them again.
There is a theoretical false positive if wireguard-go go takes
longer than one minute to process the results of a receive func execution.
If that happens, we have other problems.
Updates #1790
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
They were not doing their job.
They need yet another conceptual re-think.
Start by clearing the decks.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
We had a long-standing bug in which our TUN events channel
was being received from simultaneously in two places.
The first is wireguard-go.
At wgengine/userspace.go:366, we pass e.tundev to wireguard-go,
which starts a goroutine (RoutineTUNEventReader)
that receives from that channel and uses events to adjust the MTU
and bring the device up/down.
At wgengine/userspace.go:374, we launch a goroutine that
receives from e.tundev, logs MTU changes, and triggers
state updates when up/down changes occur.
Events were getting delivered haphazardly between the two of them.
We don't really want wireguard-go to receive the up/down events;
we control the state of the device explicitly by calling device.Up.
And the userspace.go loop MTU logging duplicates logging that
wireguard-go does when it received MTU updates.
So this change splits the single TUN events channel into up/down
and other (aka MTU), and sends them to the parties that ought
to receive them.
I'm actually a bit surprised that this hasn't caused more visible trouble.
If a down event went to wireguard-go but the subsequent up event
went to userspace.go, we could end up with the wireguard-go device disappearing.
I believe that this may also (somewhat accidentally) be a fix for #1790.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
The old decay-based one took a while to converge. This new one (based
very loosely on TCP BBR) seems to converge quickly on what seems to be
the best speed.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
This tries to generate traffic at a rate that will saturate the
receiver, without overdoing it, even in the event of packet loss. It's
unrealistically more aggressive than TCP (which will back off quickly
in case of packet loss) but less silly than a blind test that just
generates packets as fast as it can (which can cause all the CPU to be
absorbed by the transmitter, giving an incorrect impression of how much
capacity the total system has).
Initial indications are that a syscall about every 10 packets (TCP bulk
delivery) is roughly the same speed as sending every packet through a
channel. A syscall per packet is about 5x-10x slower than that.
The whole tailscale wireguard-go + magicsock + packet filter
combination is about 4x slower again, which is better than I thought
we'd do, but probably has room for improvement.
Note that in "full" tailscale, there is also a tundev read/write for
every packet, effectively doubling the syscall overhead per packet.
Given these numbers, it seems like read/write syscalls are only 25-40%
of the total CPU time used in tailscale proper, so we do have
significant non-syscall optimization work to do too.
Sample output:
$ GOMAXPROCS=2 go test -bench . -benchtime 5s ./cmd/tailbench
goos: linux
goarch: amd64
pkg: tailscale.com/cmd/tailbench
cpu: Intel(R) Core(TM) i7-4785T CPU @ 2.20GHz
BenchmarkTrivialNoAlloc/32-2 56340248 93.85 ns/op 340.98 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTrivialNoAlloc/124-2 57527490 99.27 ns/op 1249.10 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTrivialNoAlloc/1024-2 52537773 111.3 ns/op 9200.39 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTrivial/32-2 41878063 135.6 ns/op 236.04 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTrivial/124-2 41270439 138.4 ns/op 896.02 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTrivial/1024-2 36337252 154.3 ns/op 6635.30 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkBlockingChannel/32-2 12171654 494.3 ns/op 64.74 MB/s 0 %lost 1791 B/op 0 allocs/op
BenchmarkBlockingChannel/124-2 12149956 507.8 ns/op 244.17 MB/s 0 %lost 1792 B/op 1 allocs/op
BenchmarkBlockingChannel/1024-2 11034754 528.8 ns/op 1936.42 MB/s 0 %lost 1792 B/op 1 allocs/op
BenchmarkNonlockingChannel/32-2 8960622 2195 ns/op 14.58 MB/s 8.825 %lost 1792 B/op 1 allocs/op
BenchmarkNonlockingChannel/124-2 3014614 2224 ns/op 55.75 MB/s 11.18 %lost 1792 B/op 1 allocs/op
BenchmarkNonlockingChannel/1024-2 3234915 1688 ns/op 606.53 MB/s 3.765 %lost 1792 B/op 1 allocs/op
BenchmarkDoubleChannel/32-2 8457559 764.1 ns/op 41.88 MB/s 5.945 %lost 1792 B/op 1 allocs/op
BenchmarkDoubleChannel/124-2 5497726 1030 ns/op 120.38 MB/s 12.14 %lost 1792 B/op 1 allocs/op
BenchmarkDoubleChannel/1024-2 7985656 1360 ns/op 752.86 MB/s 13.57 %lost 1792 B/op 1 allocs/op
BenchmarkUDP/32-2 1652134 3695 ns/op 8.66 MB/s 0 %lost 176 B/op 3 allocs/op
BenchmarkUDP/124-2 1621024 3765 ns/op 32.94 MB/s 0 %lost 176 B/op 3 allocs/op
BenchmarkUDP/1024-2 1553750 3825 ns/op 267.72 MB/s 0 %lost 176 B/op 3 allocs/op
BenchmarkTCP/32-2 11056336 503.2 ns/op 63.60 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTCP/124-2 11074869 533.7 ns/op 232.32 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkTCP/1024-2 8934968 671.4 ns/op 1525.20 MB/s 0 %lost 0 B/op 0 allocs/op
BenchmarkWireGuardTest/32-2 1403702 4547 ns/op 7.04 MB/s 14.37 %lost 467 B/op 3 allocs/op
BenchmarkWireGuardTest/124-2 780645 7927 ns/op 15.64 MB/s 1.537 %lost 420 B/op 3 allocs/op
BenchmarkWireGuardTest/1024-2 512671 11791 ns/op 86.85 MB/s 0.5206 %lost 411 B/op 3 allocs/op
PASS
ok tailscale.com/wgengine/bench 195.724s
Updates #414.
Signed-off-by: Avery Pennarun <apenwarr@tailscale.com>
The existing implementation was completely, embarrassingly conceptually broken.
We aren't able to see whether wireguard-go's receive function goroutines
are running or not. All we can do is model that based on what we have done.
This commit fixes that model.
Fixes#1781
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
Avery reported a sub-ms health transition from "receiveIPv4 not running" to "ok".
To avoid these transient false-positives, be more precise about
the expected lifetime of receive funcs. The problematic case is one in which
they were started but exited prior to a call to connBind.Close.
Explicitly represent started vs running state, taking care with the order of updates.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
The connection failure diagnostic code was never updated enough for
exit nodes, so disable its misleading output when the node it picks
(incorrectly) to diagnose is only an exit node.
Fixes#1754
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
I've spent two days searching for a theoretical wireguard-go bug
around receive functions exiting early.
I've found many bugs, but none of the flavor we're looking for.
Restore wireguard-go's logging around starting and stopping receive functions,
so that we can definitively rule in or out this particular theory.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
I see a bunch of these in some logs I'm looking at,
separated only by a few seconds.
Log the error so we can tell what's going on here.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
These were getting rate-limited for nodes with many peers.
Consolate the output into single lines, which are nicer anyway.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
We were accidentally logging oldPort -> oldPort.
Log oldPort as well as c.port; if we failed to get the preferred port
in a previous rebind, oldPort might differ from c.port.
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
Track endpoints internally with a new tailcfg.Endpoint type that
includes a typed netaddr.IPPort (instead of just a string) and
includes a type for how that endpoint was discovered (STUN, local,
etc).
Use []tailcfg.Endpoint instead of []string internally.
At the last second, send it to the control server as the existing
[]string for endpoints, but also include a new parallel
MapRequest.EndpointType []tailcfg.EndpointType, so the control server
can start filtering out less-important endpoint changes from
new-enough clients. Notably, STUN-discovered endpoints can be filtered
out from 1.6+ clients, as they can discover them amongst each other
via CallMeMaybe disco exchanges started over DERP. And STUN endpoints
change a lot, causing a lot of MapResposne updates. But portmapped
endpoints are worth keeping for now, as they they work right away
without requiring the firewall traversal extra RTT dance.
End result will be less control->client bandwidth. (despite negligible
increase in client->control bandwidth)
Updates tailscale/corp#1543
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
On FreeBSD, we add the interface IP as a /48 to work around a kernel
bug, so we mustn't then try to add a /48 route to the Tailscale ULA,
since that will fail as a dupe.
Signed-off-by: David Anderson <danderson@tailscale.com>