MonotonicCoarse provides a coarse monotonic time.
On some platforms, it is implemented in assembly,
which lets us do much less work than time.Now,
which gets a high precision monotonic time and
a high precision wall time.
The assembly code is tied to a particular Go release
because it reaches into the Go internals
in order to switch to the system stack for the vdso call.
On my darwin/arm64 machine, there is no perf difference.
On my linux/amd64 machine, MonotonicCoarse is 5x faster (50ns -> 10ns).
On my linux/arm64 VM, MonotonicCoarse is 16x faster (64ns -> 4ns).
We could also use this in the rate limiter and magicsock,
which are two other uses of time.Now that show up in the CPU pprof
when doing throughput benchmarking.
Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>