Commit Graph

29 Commits

Author SHA1 Message Date
Brad Fitzpatrick 3f8e185003 health: add Warnable, move ownership of warnable items to callers
The health package was turning into a rando dumping ground. Make a new
Warnable type instead that callers can request an instance of, and
then Set it locally in their code without the health package being
aware of all the things that are warnable. (For plenty of things the
health package will want to know details of how Tailscale works so it
can better prioritize/suppress errors, but lots of the warnings are
pretty leaf-y and unrelated)

This just moves two of the health warnings. Can probably move more
later.

Change-Id: I51e50e46eb633f4e96ced503d3b18a1891de1452
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-11-13 08:00:27 -08:00
Brad Fitzpatrick 08e110ebc5 cmd/tailscale: make "up", "status" warn if routes and --accept-routes off
Example output:

    # Health check:
    #     - Some peers are advertising routes but --accept-routes is false

Also, move "tailscale status" health checks to the bottom, where they
won't be lost in large netmaps.

Updates #2053
Updates #6266

Change-Id: I5ae76a0cd69a452ce70063875cd7d974bfeb8f1a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-11-11 10:56:50 -08:00
Brad Fitzpatrick e55ae53169 tailcfg: add Node.UnsignedPeerAPIOnly to let server mark node as peerapi-only
capver 48

Change-Id: I20b2fa81d61ef8cc8a84e5f2afeefb68832bd904
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-11-02 21:55:04 -07:00
Brad Fitzpatrick 3562b5bdfa envknob, health: support Synology, show parse errors in status
Updates #5114

Change-Id: I8ac7a22a511f5a7d0dcb8cac470d4a403aa8c817
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-09-17 08:42:41 -07:00
Brad Fitzpatrick 74674b110d envknob: support changing envknobs post-init
Updates #5114

Change-Id: Ia423fc7486e1b3f3180a26308278be0086fae49b
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-09-15 15:04:02 -07:00
Jordan Whited 43f9c25fd2
cmd/tailscale: surface authentication errors in status.Health (#4748)
Fixes #3713

Signed-off-by: Jordan Whited <jordan@tailscale.com>
2022-06-03 10:52:07 -07:00
Brad Fitzpatrick 2ff481ff10 net/dns: add health check for particular broken-ish Linux DNS config
Updates #3937 (need to write docs before closing)

Change-Id: I1df7244cfbb0303481e2621ee750d21358bd67c6
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-02-16 10:40:04 -08:00
Brad Fitzpatrick 41fd4eab5c envknob: add new package for all the strconv.ParseBool(os.Getenv(..))
A new package can also later record/report which knobs are checked and
set. It also makes the code cleaner & easier to grep for env knobs.

Change-Id: Id8a123ab7539f1fadbd27e0cbeac79c2e4f09751
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-01-24 11:51:23 -08:00
Brad Fitzpatrick 0aa4c6f147 net/dns/resolver: add debug HTML handler to see what DNS traffic was forwarded
Change-Id: I6b790e92dcc608515ac8b178f2271adc9fd98f78
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-12-21 14:32:36 -08:00
Brad Fitzpatrick 8f43ddf1a2 ipn/ipnlocal, health: populate self node's Online bit in tailscale status
One option was to just hide "offline" in the text output, but that
doesn't fix the JSON output.

The next option was to lie and say it's online in the JSON (which then
fixes the "offline" in the text output).

But instead, this sets the self node's "Online" to whether we're in an
active map poll.

Fixes #3564

Change-Id: I9b379989bd14655198959e37eec39bb570fb814a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-12-16 10:14:08 -08:00
David Anderson 6c82cebe57 health: add a health state for net/dns.OSConfigurator.
Lets the systemd-resolved OSConfigurator report health changes
for out of band config resyncs.

Updates #3327

Signed-off-by: David Anderson <danderson@tailscale.com>
2021-11-19 11:09:32 -08:00
Josh Bleecher Snyder 3fd5f4380f util/multierr: new package
github.com/go-multierror/multierror served us well.
But we need a few feature from it (implement Is),
and it's not worth maintaining a fork of such a small module.

Instead, I did a clean room implementation inspired by its API.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2021-11-02 17:50:15 -07:00
Brad Fitzpatrick 09e692e318 health: don't look for UDP goroutines in js/wasm health check
Updates #3157

Change-Id: I43d97e6876eeb2d1936fc567835134568bb8615c
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-10-22 09:12:00 -07:00
Brad Fitzpatrick aae622314e tailcfg, health: add way for control plane to add problems to health check
So if the control plane knows that something's broken about the node, it can
include problem(s) in MapResponse and "tailscale status" will show it.
(and GUIs in the future, as it's in ipnstate.Status/JSON)

This also bumps the MapRequest.Version, though it's not strictly
required. Doesn't hurt.

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-19 17:55:49 -07:00
Brad Fitzpatrick 4549d3151c cmd/tailscale: make status show health check problems
Fixes #2775

RELNOTE=tailscale status now shows health check problems

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-16 11:24:59 -07:00
Brad Fitzpatrick 5bacbf3744 wgengine/magicsock, health, ipn/ipnstate: track DERP-advertised health
And add health check errors to ipnstate.Status (tailscale status --json).

Updates #2746
Updates #2775

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-09-02 10:20:25 -07:00
Josh Bleecher Snyder 9d542e08e2 wgengine/magicsock: always run ReceiveIPv6
One of the consequences of the 	bind refactoring in 6f23087175
is that attempting to bind an IPv6 socket will always
result in c.pconn6.pconn being non-nil.
If the bind fails, it'll be set to a placeholder packet conn
that blocks forever.

As a result, we can always run ReceiveIPv6 and health check it.
This removes IPv4/IPv6 asymmetry and also will allow health checks
to detect any IPv6 receive func failures.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:07:14 -07:00
Josh Bleecher Snyder fe50ded95c health: track whether we have a functional udp4 bind
Suggested-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-28 11:07:14 -07:00
Josh Bleecher Snyder 744de615f1 health, wgenegine: fix receive func health checks for the fourth time
The old implementation knew too much about how wireguard-go worked.
As a result, it missed genuine problems that occurred due to unrelated bugs.

This fourth attempt to fix the health checks takes a black box approach.
A receive func is healthy if one (or both) of these conditions holds:

* It is currently running and blocked.
* It has been executed recently.

The second condition is required because receive functions
are not continuously executing. wireguard-go calls them and then
processes their results before calling them again.

There is a theoretical false positive if wireguard-go go takes
longer than one minute to process the results of a receive func execution.
If that happens, we have other problems.

Updates #1790

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:35:49 -07:00
Josh Bleecher Snyder 0d4c8cb2e1 health: delete ReceiveFunc health checks
They were not doing their job.
They need yet another conceptual re-think.
Start by clearing the decks.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-26 17:35:49 -07:00
Josh Bleecher Snyder 8d7f7fc7ce health, wgenegine: fix receive func health checks yet again
The existing implementation was completely, embarrassingly conceptually broken.

We aren't able to see whether wireguard-go's receive function goroutines
are running or not. All we can do is model that based on what we have done.
This commit fixes that model.

Fixes #1781

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-23 08:42:04 -07:00
Josh Bleecher Snyder 5835a3f553 health, wgengine/magicsock: avoid receive function false positives
Avery reported a sub-ms health transition from "receiveIPv4 not running" to "ok".

To avoid these transient false-positives, be more precise about
the expected lifetime of receive funcs. The problematic case is one in which
they were started but exited prior to a call to connBind.Close.
Explicitly represent started vs running state, taking care with the order of updates.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-22 12:48:10 -07:00
Josh Bleecher Snyder f845aae761 health: track whether magicsock receive functions are running
Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
2021-04-22 08:57:36 -07:00
David Anderson f007a9dd6b health: add DNS subsystem and plumb errors in.
Signed-off-by: David Anderson <danderson@tailscale.com>
2021-04-05 10:55:35 -07:00
Brad Fitzpatrick 85138d3183 health: track whether any network interface is up
Fixes #1562

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-22 21:42:14 -07:00
Brad Fitzpatrick 9eb65601ef health, ipn/ipnlocal: track, log overall health
Updates #1505

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-16 09:12:39 -07:00
Brad Fitzpatrick 232cfda280 wgengine/router: report to control when setPrivateNetwork fails
Fixes #1503

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-15 16:19:40 -07:00
Brad Fitzpatrick ba8c6d0775 health, controlclient, ipn, magicsock: tell health package state of things
Not yet checking anything. Just plumbing states into the health package.

Updates #1505

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-03-15 15:20:55 -07:00
Brad Fitzpatrick fd8e070d01 health, control/controlclient, wgengine: report when router unhealthy
Updates tailscale/corp#1338

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-02-18 11:48:48 -08:00