net/dns: re-query system resolvers on no-upstream resolver failure on apple platforms (#12398)
Fixes tailscale/corp#20677 On macOS sleep/wake, we're encountering a condition where reconfigure the network a little bit too quickly - before apple has set the nameservers for our interface. This results in a persistent condition where we have no upstream resolver and fail all forwarded DNS queries. No upstream nameservers is a legitimate configuration, and we have no (good) way of determining when Apple is ready - but if we need to forward a query, and we have no nameservers, then something has gone badly wrong and the network is very broken. A simple fix here is to simply inject a netMon event, which will go through the configuration dance again when we hit the SERVFAIL condition. Tested by artificially/randomly returning [] for the list of nameservers in the bespoke ipn-bridge code responsible for getting the nameservers. Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
This commit is contained in:
parent
d0f1a838a6
commit
02e3c046aa
|
@ -14,6 +14,7 @@ import (
|
|||
"net/http"
|
||||
"net/netip"
|
||||
"net/url"
|
||||
"runtime"
|
||||
"sort"
|
||||
"strings"
|
||||
"sync"
|
||||
|
@ -881,6 +882,24 @@ func (f *forwarder) forwardWithDestChan(ctx context.Context, query packet, respo
|
|||
if len(resolvers) == 0 {
|
||||
metricDNSFwdErrorNoUpstream.Add(1)
|
||||
f.logf("no upstream resolvers set, returning SERVFAIL")
|
||||
|
||||
if runtime.GOOS == "darwin" || runtime.GOOS == "ios" {
|
||||
// On apple, having no upstream resolvers here is the result a race condition where
|
||||
// we've tried a reconfig after a major link change but the system has not yet set
|
||||
// the resolvers for the new link. We use SystemConfiguration to query nameservers, and
|
||||
// the timing of when that will give us the "right" answer is non-deterministic.
|
||||
//
|
||||
// This will typically happen on sleep-wake cycles with a Wifi interface where
|
||||
// it takes some random amount of time (after telling us that the interface exists)
|
||||
// for the system to configure the dns servers.
|
||||
//
|
||||
// Repolling the network monitor here is a bit odd, but if we're
|
||||
// seeing DNS queries, it's likely that the network is now fully configured, and it's
|
||||
// an ideal time to to requery for the nameservers.
|
||||
f.logf("injecting network monitor event to attempt to refresh the resolvers")
|
||||
f.netMon.InjectEvent()
|
||||
}
|
||||
|
||||
res, err := servfailResponse(query)
|
||||
if err != nil {
|
||||
f.logf("building servfail response: %v", err)
|
||||
|
|
Loading…
Reference in New Issue