Commit Graph

7980 Commits

Author SHA1 Message Date
Andrew Dunham 30f8d8199a ipn/ipnlocal: fix data race in tests
We can observe a data race in tests when logging after a test is
finished. `b.onHealthChange` is called in a goroutine after being
registered with `health.Tracker.RegisterWatcher`, which calls callbacks
in `setUnhealthyLocked` in a new goroutine.

See: https://github.com/tailscale/tailscale/actions/runs/9672919302/job/26686038740

Updates #12054

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Ibf22cc994965d88a9e7236544878d5373f91229e
2024-06-25 21:43:22 -07:00
Aaron Klotz da078b4c09 util/winutil: add package for logging into Windows via Service-for-User (S4U)
This PR ties together pseudoconsoles, user profiles, s4u logons, and
process creation into what is (hopefully) a simple API for various
Tailscale services to obtain Windows access tokens without requiring
knowledge of any Windows passwords. It works both for domain-joined
machines (Kerberos) and non-domain-joined machines. The former case
is fairly straightforward as it is fully documented. OTOH, the latter
case is not documented, though it is fully defined in the C headers in
the Windows SDK. The documentation blanks were filled in by reading
the source code of Microsoft's Win32 port of OpenSSH.

We need to do a bit of acrobatics to make conpty work correctly while
creating a child process with an s4u token; see the doc comments above
startProcessInternal for details.

Updates #12383

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
2024-06-25 22:05:52 -06:00
Andrew Dunham 53a5d00fff net/dns: ensure /etc/resolv.conf is world-readable even with a umask
Previously, if we had a umask set (e.g. 0027) that prevented creating a
world-readable file, /etc/resolv.conf would be created without the o+r
bit and thus other users may be unable to resolve DNS.

Since a umask only applies to file creation, chmod the file after
creation and before renaming it to ensure that it has the appropriate
permissions.

Updates #12609

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I2a05d64f4f3a8ee8683a70be17a7da0e70933137
2024-06-26 00:02:05 -04:00
Andrew Dunham 8161024176 wgengine/magicsock: always set home DERP if no control conn
The logic we added in #11378 would prevent selecting a home DERP if we
have no control connection.

Updates tailscale/corp#18095

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I44bb6ac4393989444e4961b8cfa27dc149a33c6e
2024-06-25 23:31:14 -04:00
Andrew Dunham a475c435ec net/dns/resolver: fix test failure
Updates #cleanup

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I0e815a69ee44ca0ff7c0ea0ca3c6904bbf67ed1f
2024-06-25 23:08:08 -04:00
Jonathan Nobels 27033c6277
net/dns: recheck DNS config on SERVFAIL errors (#12547)
Fixes tailscale/corp#20677

Replaces the original attempt to rectify this (by injecting a netMon
event) which was both heavy handed, and missed cases where the
netMon event was "minor".

On apple platforms, the fetching the interface's nameservers can
and does return an empty list in certain situations.   Apple's API
in particular is very limiting here.  The header hints at notifications
for dns changes which would let us react ahead of time, but it's all
private APIs.

To avoid remaining in the state where we end up with no
nameservers but we absolutely need them, we'll react
to a lack of upstream nameservers by attempting to re-query
the OS.

We'll rate limit this to space out the attempts.   It seems relatively
harmless to attempt a reconfig every 5 seconds (triggered
by an incoming query) if the network is in this broken state.

Missing nameservers might possibly be a persistent condition
(vs a transient error), but that would  also imply that something
out of our control is badly misconfigured.

Tested by randomly returning [] for the nameservers.   When switching
between Wifi networks, or cell->wifi, this will randomly trigger
the bug, and we appear to reliably heal the DNS state.

Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
2024-06-25 14:56:13 -04:00
Brad Fitzpatrick d5e692f7e7 ipn/ipnlocal: check operator user via osuser package
So non-local users (e.g. Kerberos on FreeIPA) on Linux can be looked
up. Our default binaries are built with pure Go os/user which only
supports the classic /etc/passwd and not any libc-hooked lookups.

Updates #12601

Change-Id: I9592db89e6ca58bf972f2dcee7a35fbf44608a4f
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-25 10:56:32 -07:00
Jordan Whited 94415e8029
cmd/stunstamp: remove sqlite DB and API (#12604)
stunstamp now sends data to Prometheus via remote write, and Prometheus
can serve the same data. Retaining and cleaning up old data in sqlite
leads to long probing pauses, and it's not worth investing more effort
to optimize the schema and/or concurrency model.

Updates tailscale/corp#20344

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2024-06-25 10:21:40 -07:00
Brad Fitzpatrick 3485e4bf5a derp: make RunConnectionLoop funcs take Messages, support PeerPresentFlags
PeerPresentFlags was added in 5ffb2668ef but wasn't plumbed through to
the RunConnectionLoop. Rather than add yet another parameter (as
IP:port was added earlier), pass in the raw PeerPresentMessage and
PeerGoneMessage struct values, which are the same things, plus two
fields: PeerGoneReasonType for gone and the PeerPresentFlags from
5ffb2668ef.

Updates tailscale/corp#17816

Change-Id: Ib19d9f95353651ada90656071fc3656cf58b7987
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-25 09:47:25 -07:00
Fran Bull 7eb8a77ac8 appc: don't schedule advertisement of 0 routes
When the store-appc-routes flag is on for a tailnet we are writing the
routes more often than seems necessary. Investigation reveals that we
are doing so ~every time we observe a dns response, even if this causes
us not to advertise any new routes. So when we have no new routes,
instead do not advertise routes.

Fixes #12593

Signed-off-by: Fran Bull <fran@tailscale.com>
2024-06-25 08:12:51 -07:00
Irbe Krumina 24a40f54d9
util/linuxfw: verify that IPv6 if available if (#12598)
nftable runner for an IPv6 address gets requested.

Updates tailscale/tailscale#12215

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-06-25 14:13:49 +01:00
Brad Fitzpatrick d91e5c25ce derp: redo, simplify how mesh update writes are queued/written
I couldn't convince myself the old way was safe and couldn't lose
writes.

And it seemed too complicated.

Updates tailscale/corp#21104

Change-Id: I17ba7c7d6fd83458a311ac671146a1f6a458a5c1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-24 21:42:14 -07:00
Brad Fitzpatrick ded7734c36 derp: account for increased size of peerPresent messages in mesh updates
sendMeshUpdates tries to write as much as possible without blocking,
being careful to check the bufio.Writer.Available size before writes.

Except that regressed in 6c791f7d60 which made those messages larger, which
meants we were doing network I/O with the Server mutex held.

Updates tailscale/corp#13945

Change-Id: Ic327071d2e37de262931b9b390cae32084811919
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-24 16:21:01 -07:00
Andrew Dunham 200d92121f types/lazy: add Peek method to SyncValue
This adds the ability to "peek" at the value of a SyncValue, so that
it's possible to observe a value without computing this.

Updates tailscale/corp#17122

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Co-authored-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I06f88c22a1f7ffcbc7ff82946335356bb0ef4622
2024-06-24 12:41:00 -07:00
Aaron Klotz 7dd76c3411 net/netns: add Windows support for bind-to-interface-by-route
This is implemented via GetBestInterfaceEx. Should we encounter errors
or fail to resolve a valid, non-Tailscale interface, we fall back to
returning the index for the default interface instead.

Fixes #12551

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
2024-06-24 10:43:34 -06:00
tailscale-license-updater[bot] 591979b95f
licenses: update license notices (#12414)
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
Co-authored-by: License Updater <noreply+license-updater@tailscale.com>
2024-06-24 09:20:34 -07:00
Brad Fitzpatrick 91786ff958 cmd/derper: add debug endpoint to adjust mutex profiling rate
Updates #3560

Change-Id: I474421ce75c79fb66e1c306ed47daebc5a0e069e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-24 09:05:31 -07:00
Brad Fitzpatrick 5ffb2668ef derp: add PeerPresentFlags bitmask to Watch messages
Updates tailscale/corp#17816

Change-Id: Ib5baf6c981a6a4c279f8bbfef02048cfbfb3323b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-22 20:38:25 -07:00
Aaron Klotz d7a4f9d31c net/dns: ensure multiple hosts with the same IP address are combined into a single HostEntry
This ensures that each line has a unique IP address.

Fixes #11939

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
2024-06-21 13:16:49 -06:00
Jordan Whited 0d6e71df70
cmd/stunstamp: add explicit metric to track timeout events (#12564)
Timeouts could already be identified as NaN values on
stunstamp_derp_stun_rtt_ns, but we can't use NaN effectively with
promql to visualize them. So, this commit adds a timeouts metric that
we can use with rate/delta/etc promql functions.

Updates tailscale/corp#20689

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2024-06-21 09:17:35 -07:00
Kristoffer Dalby dcb0f189cc cmd/proxy-to-grafana: add flag for alternative control server
Fixes #12571

Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
2024-06-21 12:17:39 +02:00
Brad Fitzpatrick 5ec01bf3ce wgengine/filter: support FilterRules matching on srcIP node caps [capver 100]
See #12542 for background.

Updates #12542

Change-Id: Ida312f700affc00d17681dc7551ee9672eeb1789
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-20 12:27:04 -07:00
Irbe Krumina 07063bc5c7
ssh/tailssh: fix integration test (#12562)
Updates#cleanup

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-06-20 19:30:19 +01:00
Brad Fitzpatrick fd3efd9bad
control/controlclient: add more Screen Time blocking detection
Updates #9658
Updates #12545

Change-Id: Iec1dad354a75f145567b4055d77b1c1db27c89e2
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Co-authored-by: Andrea Gottardo <andrea@gottardo.me>
2024-06-20 11:09:50 -07:00
Keli bd50a3457d
wgengine/filter: add "Accept" TCP log lines to verbose logging (#12525)
Changes "Accept" TCP logs to display in verbose logs only,
and removes lines from default logging behavior.

Updates #12158

Signed-off-by: Keli Velazquez <keli@tailscale.com>
2024-06-20 13:24:46 -04:00
Percy Wegmann 730f0368d0 ssh/tailssh: replace incubator process with su instead of running su as child
This allows the SSH_AUTH_SOCK environment variable to work inside of
su and agent forwarding to succeed.

Fixes #12467

Signed-off-by: Percy Wegmann <percy@tailscale.com>
2024-06-20 10:11:03 -05:00
Andrew Dunham 24976b5bfd cmd/tailscale/cli: actually perform Noise request in 'debug ts2021'
This actually performs a Noise request in the 'debug ts2021' command,
instead of just exiting once we've dialed a connection. This can help
debug certain forms of captive portals and deep packet inspection that
will allow a connection, but will RST the connection when trying to send
data on the post-upgraded TCP connection.

Updates #1634

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I1e46ca9c9a0751c55f16373a6a76cdc24fec1f18
2024-06-19 19:56:20 -04:00
Andrew Dunham 732605f961 control/controlclient: move noiseConn to internal package
So that it can be later used in the 'tailscale debug ts2021' function in
the CLI, to aid in debugging captive portals/WAFs/etc.

Updates #1634

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: Iec9423f5e7570f2c2c8218d27fc0902137e73909
2024-06-19 19:56:20 -04:00
Brad Fitzpatrick 0004827681
control/controlhttp: add health warning for macOS filtering blocking Tailscale (#12546)
Updates #9658
Updates #12545

Change-Id: I6612b9b65eb193a1a651e219b5198c7c20ed94e1
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Co-authored-by: Andrea Gottardo <andrea@tailscale.com>
2024-06-19 13:22:14 -07:00
Brad Fitzpatrick 1023b2a82c util/deephash: fix test regression on 32-bit
Fix regression from bd93c3067e where I didn't notice the
32-bit test failure was real and not its usual slowness-related
regression. Yay failure blindness.

Updates #12526

Change-Id: I00e33bba697e2cdb61a0d76a71b62406f6c2eeb9
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-19 12:25:53 -07:00
Andrea Gottardo d7619d273b health: fix nil DERPMap dereference panic
Looks like a DERPmap might not be available when we try to get the
name associated with a region ID, and that was causing an intermittent
panic in CI.

Fixes #12534

Change-Id: I4ace53681bf004df46c728cff830b27339254243
Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
2024-06-19 12:20:44 -07:00
Brad Fitzpatrick 25eeafde23 derp: don't verify mesh peers when --verify-clients is set
Updates tailscale/corp#20654

Change-Id: I33c7ca3c7a3c4e492797b73c66eefb699376402c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-19 08:59:34 -07:00
Brad Fitzpatrick 4b39b6f7ce derp: fix fmt verb for nodekeys
It was hex-ifying the String() form of key.NodePublic, which was already hex.
I noticed in some logs:

    "client 6e6f64656b65793a353537353..."

And thought that 6x6x6x6x looked strange. It's "nodekey:" in hex.

Updates tailscale/corp#20844

Change-Id: Ib9f2d63b37e324420b86efaa680668a9b807e465
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-19 08:40:47 -07:00
Brad Fitzpatrick 21460a5b14 tailcfg, wgengine/filter: remove most FilterRule.SrcBits code
The control plane hasn't sent it to clients in ages.

Updates tailscale/corp#20965

Change-Id: I1d71a4b6dd3f75010a05c544ee39827837c30772
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-18 21:45:22 -07:00
Brad Fitzpatrick 162d593514 net/flowtrack: fix, test String method
I meant to do this in the earlier change and had a git fail.

To atone, add a test too while I'm here.

Updates #12486
Updates #12507

Change-Id: I4943b454a2530cb5047636f37136aa2898d2ffc7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-18 21:44:44 -07:00
Brad Fitzpatrick 9e0a5cc551 net/flowtrack: optimize Tuple type for use as map key
This gets UDP filter overhead closer to TCP. Still ~2x, but no longer ~3x.

    goos: darwin
    goarch: arm64
    pkg: tailscale.com/wgengine/filter
                                       │   before    │                after                │
                                       │   sec/op    │   sec/op     vs base                │
    FilterMatch/tcp-not-syn-v4-8         15.43n ± 3%   15.38n ± 5%        ~ (p=0.339 n=10)
    FilterMatch/udp-existing-flow-v4-8   42.45n ± 0%   34.77n ± 1%  -18.08% (p=0.000 n=10)
    geomean                              25.59n        23.12n        -9.65%

Updates #12486

Change-Id: I595cfadcc6b7234604bed9c4dd4261e087c0d4c4
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-18 21:31:48 -07:00
Andrea Gottardo d6a8fb20e7
health: include DERP region name in bad derp notifications (#12530)
Fixes tailscale/corp#20971

We added some Warnables for DERP failure situations, but their Text currently spits out the DERP region ID ("10") in the UI, which is super ugly. It would be better to provide the RegionName of the DERP region that is failing. We can do so by storing a reference to the last-known DERP map in the health package whenever we fetch one, and using it when generating the notification text.

This way, the following message...

> Tailscale could not connect to the relay server '10'. The server might be temporarily unavailable, or your Internet connection might be down.

becomes:

> Tailscale could not connect to the 'Seattle' relay server. The server might be temporarily unavailable, or your Internet connection might be down.

which is a lot more user-friendly.

Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
2024-06-18 16:03:17 -07:00
Andrea Gottardo 8eb15d3d2d
cli/netcheck: fail with output if we time out fetching a derpmap (#12528)
Updates tailscale/corp#20969

Right now, when netcheck starts, it asks tailscaled for a copy of the DERPMap. If it doesn't have one, it makes a HTTPS request to controlplane.tailscale.com to fetch one.

This will always fail if you're on a network with a captive portal actively blocking HTTPS traffic. The code appears to hang entirely because the http.Client doesn't have a Timeout set. It just sits there waiting until the request succeeds or fails.

This adds a timeout of 10 seconds, and logs more details about the status of the HTTPS request.

Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
2024-06-18 15:04:43 -07:00
Jordan Whited a93173b56a
cmd/xdpderper,derp/xdp: implement mode that drops STUN packets (#12527)
This is useful during maintenance as a method for shedding home client
load.

Updates tailscale/corp#20689

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2024-06-18 14:06:00 -07:00
Andrea Gottardo d55b105dae
health: expose DependsOn to local API via UnhealthyState (#12513)
Updates #4136

Small PR to expose the health Warnables dependencies to the GUI via LocalAPI, so that we can only show warnings for root cause issues, and filter out unnecessary messages before user presentation.

Signed-off-by: Andrea Gottardo <andrea@gottardo.me>
2024-06-18 13:34:55 -07:00
Brad Fitzpatrick bd93c3067e wgengine/filter/filtertype: make Match.IPProto a view
I noticed we were allocating these every time when they could just
share the same memory. Rather than document ownership, just lock it
down with a view.

I was considering doing all of the fields but decided to just do this
one first as test to see how infectious it became.  Conclusion: not
very.

Updates #cleanup (while working towards tailscale/corp#20514)

Change-Id: I8ce08519de0c9a53f20292adfbecd970fe362de0
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2024-06-18 13:30:55 -07:00
Flakes Updater bfb775ce62 go.mod.sri: update SRI hash for go.mod changes
Signed-off-by: Flakes Updater <noreply+flakes-updater@tailscale.com>
2024-06-18 11:26:57 -07:00
Tom Proctor 3099323976
cmd/k8s-operator,k8s-operator,go.{mod,sum}: publish proxy status condition for annotated services (#12463)
Adds a new TailscaleProxyReady condition type for use in corev1.Service
conditions.

Also switch our CRDs to use metav1.Condition instead of
ConnectorCondition. The Go structs are seralized identically, but it
updates some descriptions and validation rules. Update k8s
controller-tools and controller-runtime deps to fix the documentation
generation for metav1.Condition so that it excludes comments and
TODOs.

Stop expecting the fake client to populate TypeMeta in tests. See
kubernetes-sigs/controller-runtime#2633 for details of the change.

Finally, make some minor improvements to validation for service hostnames.

Fixes #12216

Co-authored-by: Irbe Krumina <irbe@tailscale.com>
Signed-off-by: Tom Proctor <tomhjp@users.noreply.github.com>
2024-06-18 19:01:40 +01:00
Andrew Dunham 45d2f4301f proxymap, various: distinguish between different protocols
Previously, we were registering TCP and UDP connections in the same map,
which could result in erroneously removing a mapping if one of the two
connections completes while the other one is still active.

Add a "proto string" argument to these functions to avoid this.
Additionally, take the "proto" argument in LocalAPI, and plumb that
through from the CLI and add a new LocalClient method.

Updates tailscale/corp#20600

Signed-off-by: Andrew Dunham <andrew@du.nham.ca>
Change-Id: I35d5efaefdfbf4721e315b8ca123f0c8af9125fb
2024-06-18 13:29:41 -04:00
Aaron Klotz 2cb408f9b1 hostinfo: update Windows hostinfo to include MSIDist registry value
We need to expand our enviornment information to include info about
the Windows store. Thinking about future plans, it would be nice
to include both the packaging mechanism and the distribution mechanism.

In this PR we change packageTypeWindows to check a new registry value
named MSIDist, and concatenate that value to "msi/" when present.

We also remove vestigial NSIS detection.

Updates https://github.com/tailscale/corp/issues/2790

Signed-off-by: Aaron Klotz <aaron@tailscale.com>
2024-06-18 10:19:00 -06:00
James Tucker 87c5ad4c2c derp: add a verifyClients check to the consistency check
Only implemented for the local tailscaled variant for now.

Updates tailscale/corp#20844

Signed-off-by: James Tucker <james@tailscale.com>
2024-06-17 16:22:48 -07:00
Joe Tsai 2db2d04a37
types/logid: add Add method (#12478)
The Add method derives a new ID by adding a signed integer
to the ID, treating it as an unsigned 256-bit big-endian integer.

We also add Less and Compare methods to PrivateID to provide
feature parity with existing methods on PublicID.

Updates tailscale/corp#11038

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2024-06-17 16:03:44 -07:00
Jordan Whited 315f3d5df1
derp/xdp: fix handling of zero value UDP checksums (#12510)
validate_udp_checksum was previously indeterminate (not zero) at
declaration, and IPv4 zero value UDP checksum packets were being passed
to the kernel.

Updates tailscale/corp#20689

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2024-06-17 14:06:53 -07:00
Irbe Krumina 8cc2738609
cmd/{containerboot,k8s-operator}: store proxy device ID early to help with cleanup for broken proxies (#12425)
* cmd/containerboot: store device ID before setting up proxy routes.

For containerboot instances whose state needs to be stored
in a Kubernetes Secret, we additonally store the device's
ID, FQDN and IPs.
This is used, between other, by the Kubernetes operator,
who uses the ID to delete the device when resources need
cleaning up and writes the FQDN and IPs on various kube
resource statuses for visibility.

This change shifts storing device ID earlier in the proxy setup flow,
to ensure that if proxy routing setup fails,
the device can still be deleted.

Updates tailscale/tailscale#12146

Signed-off-by: Irbe Krumina <irbe@tailscale.com>

* code review feedback

Signed-off-by: Irbe Krumina <irbe@tailscale.com>

---------

Signed-off-by: Irbe Krumina <irbe@tailscale.com>
2024-06-17 18:50:50 +01:00
Andrew Lytvynov 674c998e93
cmd/tailscale/cli: do not allow update --version on macOS (#12508)
We do not support specific version updates or track switching on macOS.
Do not populate the flag to avoid confusion.

Updates #cleanup

Signed-off-by: Andrew Lytvynov <awly@tailscale.com>
2024-06-17 10:33:26 -07:00