Commit Graph

23 Commits

Author SHA1 Message Date
Brad Fitzpatrick 8c5c87be26 util/deephash: fix collisions between different types
Updates #4883

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-06-21 22:29:04 -07:00
Brad Fitzpatrick 757ecf7e80 util/deephash: fix map hashing when key & element have the same type
Regression from 09afb8e35b, in which the
same reflect.Value scratch value was being used as the map iterator
copy destination.

Also: make nil and empty maps hash differently, add test.

Fixes #4871

Co-authored-by: Josh Bleecher Snyder <josharian@gmail.com>
Change-Id: I67f42524bc81f694c1b7259d6682200125ea4a66
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-06-16 22:29:47 -07:00
Brad Fitzpatrick f31588786f util/deephash: don't track cycles on non-recursive types
name              old time/op    new time/op    delta
Hash-8              67.3µs ±20%    76.5µs ±16%     ~     (p=0.143 n=10+10)
HashMapAcyclic-8    63.0µs ± 2%    56.3µs ± 1%  -10.65%  (p=0.000 n=10+8)
TailcfgNode-8       9.18µs ± 2%    6.52µs ± 3%  -28.96%  (p=0.000 n=9+10)
HashArray-8          732ns ± 3%     709ns ± 1%   -3.21%  (p=0.000 n=10+10)

name              old alloc/op   new alloc/op   delta
Hash-8               24.0B ± 0%     24.0B ± 0%     ~     (all equal)
HashMapAcyclic-8     0.00B          0.00B          ~     (all equal)
TailcfgNode-8        0.00B          0.00B          ~     (all equal)
HashArray-8          0.00B          0.00B          ~     (all equal)

name              old allocs/op  new allocs/op  delta
Hash-8                1.00 ± 0%      1.00 ± 0%     ~     (all equal)
HashMapAcyclic-8      0.00           0.00          ~     (all equal)
TailcfgNode-8         0.00           0.00          ~     (all equal)
HashArray-8           0.00           0.00          ~     (all equal)

Change-Id: I28642050d837dff66b2db54b2b0e6d272a930be8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-06-16 18:39:10 -07:00
Brad Fitzpatrick 36ea837736 util/deephash: fix map hashing to actually hash elements
Fixes #4868

Change-Id: I574fd139cb7f7033dd93527344e6aa0e625477c7
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2022-06-16 11:32:12 -07:00
Josh Bleecher Snyder 0868329936 all: use any instead of interface{}
My favorite part of generics.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2022-03-17 11:35:09 -07:00
Josh Bleecher Snyder 97a01b7b17 util/deephash: remove Tailscale toolchain compatibility shim
The future is now.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
2022-03-16 12:45:28 -07:00
slowy07 ac0353e982 fix: typo spelling grammar
Signed-off-by: slowy07 <slowy.arfy@gmail.com>
2021-08-24 07:55:04 -07:00
Joe Tsai 9d0c86b6ec
util/deephash: remove unnecessary formatting for structs and slices (#2571)
The index for every struct field or slice element and
the number of fields for the struct is unncessary.

The hashing of Go values is unambiguous because every type (except maps)
encodes in a parsable manner. So long as we know the type information,
we could theoretically decode every value (except for maps).

At a high level:
* numbers are encoded as fixed-width records according to precision.
* strings (and AppendTo output) are encoded with a fixed-width length,
followed by the contents of the buffer.
* slices are prefixed by a fixed-width length, followed by the encoding
of each value. So long as we know the type of each element, we could
theoretically decode each element.
* arrays are encoded just like slices, but elide the length
since it is determined from the Go type.
* maps are encoded first with a byte indicating whether it is a cycle.
If a cycle, it is followed by a fixed-width index for the pointer,
otherwise followed by the SHA-256 hash of its contents. The encoding of maps
is not decodeable, but a SHA-256 hash is sufficient to avoid ambiguities.
* interfaces are encoded first with a byte indicating whether it is nil.
If not nil, it is followed by a fixed-width index for the type,
and then the encoding for the underlying value. Having the type be encoded
first ensures that the value could theoretically be decoded next.
* pointers are encoded first with a byte indicating whether it is
1) nil, 2) a cycle, or 3) newly seen. If a cycle, it is followed by
a fixed-width index for the pointer. If newly seen, it is followed by
the encoding for the pointed-at value.

Removing unnecessary details speeds up hashing:

	name              old time/op    new time/op    delta
	Hash-8              76.0µs ± 1%    55.8µs ± 2%  -26.62%        (p=0.000 n=10+10)
	HashMapAcyclic-8    61.9µs ± 0%    62.0µs ± 0%     ~             (p=0.666 n=9+9)
	TailcfgNode-8       10.2µs ± 1%     7.5µs ± 1%  -26.90%         (p=0.000 n=10+9)
	HashArray-8         1.07µs ± 1%    0.70µs ± 1%  -34.67%         (p=0.000 n=10+9)

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2021-08-03 20:35:57 -07:00
Joe Tsai d8fbce7eef
util/deephash: hash uint{8,16,32,64} explicitly (#2502)
Instead of hashing the humanly formatted forms of a number,
hash the native machine bits of the integers themselves.

There is a small performance gain for this:
	name              old time/op    new time/op    delta
	Hash-8              75.7µs ± 1%    76.0µs ± 2%    ~            (p=0.315 n=10+9)
	HashMapAcyclic-8    63.1µs ± 3%    61.3µs ± 1%  -2.77%        (p=0.000 n=10+10)
	TailcfgNode-8       10.3µs ± 1%    10.2µs ± 1%  -1.48%        (p=0.000 n=10+10)
	HashArray-8         1.07µs ± 1%    1.05µs ± 1%  -1.79%        (p=0.000 n=10+10)

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2021-08-02 21:44:13 -07:00
Joe Tsai 01d4dd331d
util/deephash: simplify hasher.hashMap (#2503)
The swapping of bufio.Writer between hasher and mapHasher is subtle.
Just embed a hasher in mapHasher to avoid complexity here.

No notable change in performance:
	name              old time/op    new time/op    delta
	Hash-8              76.7µs ± 1%    77.0µs ± 1%    ~            (p=0.182 n=9+10)
	HashMapAcyclic-8    62.4µs ± 1%    62.5µs ± 1%    ~            (p=0.315 n=10+9)
	TailcfgNode-8       10.3µs ± 1%    10.3µs ± 1%  -0.62%         (p=0.004 n=10+9)
	HashArray-8         1.07µs ± 1%    1.06µs ± 1%  -0.98%          (p=0.001 n=8+9)

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2021-08-02 21:29:14 -07:00
Joe Tsai d145c594ad
util/deephash: improve cycle detection (#2470)
The previous algorithm used a map of all visited pointers.
The strength of this approach is that it quickly prunes any nodes
that we have ever visited before. The detriment of the approach
is that pruning is heavily dependent on the order that pointers
were visited. This is especially relevant for hashing a map
where map entries are visited in a non-deterministic manner,
which would cause the map hash to be non-deterministic
(which defeats the point of a hash).

This new algorithm uses a stack of all visited pointers,
similar to how github.com/google/go-cmp performs cycle detection.
When we visit a pointer, we push it onto the stack, and when
we leave a pointer, we pop it from the stack.
Before visiting a pointer, we first check whether the pointer exists
anywhere in the stack. If yes, then we prune the node.
The detriment of this approach is that we may hash a node more often
than before since we do not prune as aggressively.

The set of visited pointers up until any node is only the
path of nodes up to that node and not any other pointers
that may have been visited elsewhere. This provides us
deterministic hashing regardless of visit order.
We can now delete hashMapFallback and associated complexity,
which only exists because the previous approach was non-deterministic
in the presence of cycles.

This fixes a failure of the old algorithm where obviously different
values are treated as equal because the pruning was too aggresive.
See https://github.com/tailscale/tailscale/issues/2443#issuecomment-883653534

The new algorithm is slightly slower since it prunes less aggresively:
	name              old time/op    new time/op    delta
	Hash-8              66.1µs ± 1%    68.8µs ± 1%   +4.09%        (p=0.000 n=19+19)
	HashMapAcyclic-8    63.0µs ± 1%    62.5µs ± 1%   -0.76%        (p=0.000 n=18+19)
	TailcfgNode-8       9.79µs ± 2%    9.88µs ± 1%   +0.95%        (p=0.000 n=19+17)
	HashArray-8          643ns ± 1%     653ns ± 1%   +1.64%        (p=0.000 n=19+19)
However, a slower but more correct algorithm seems
more favorable than a faster but incorrect algorithm.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2021-07-22 15:22:48 -07:00
Joe Tsai d666bd8533
util/deephash: disambiguate hashing of AppendTo (#2483)
Prepend size to AppendTo output.

Fixes #2443

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2021-07-21 11:29:08 -07:00
Joe Tsai 23ad028414
util/deephash: include type as part of hash for interfaces (#2476)
A Go interface may hold any number of different concrete types.
Just because two underlying values hash to the same thing
does not mean the two values are identical if they have different
concrete types. As such, include the type in the hash.
2021-07-21 10:26:04 -07:00
Joe Tsai a5fb8e0731
util/deephash: introduce deliberate instability (#2477)
Seed the hash upon first use with the current time.
This ensures that the stability of the hash is bounded within
the lifetime of one program execution.
Hopefully, this prevents future bugs where someone assumes that
this hash is stable.

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2021-07-21 09:23:04 -07:00
Joe Tsai 9a0c8bdd20 util/deephash: make hash type opaque
The fact that Hash returns a [sha256.Size]byte leaks details about
the underlying hash implementation. This could very well be any other
hashing algorithm with a possible different block size.

Abstract this implementation detail away by declaring an opaque type
that is comparable. While we are changing the signature of UpdateHash,
rename it to just Update to reduce stutter (e.g., deephash.Update).

Signed-off-by: Joe Tsai <joetsai@digital-static.net>
2021-07-20 11:03:25 -07:00
Brad Fitzpatrick ddb8726c98 util/deephash: don't reflect.Copy if element type is a defined uint8
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-07 11:58:04 -07:00
Brad Fitzpatrick 6dc38ff25c util/deephash: optimize hashing of byte arrays, reduce allocs in Hash
name              old time/op    new time/op    delta
Hash-6               173µs ± 4%     101µs ± 3%   -41.69%  (p=0.000 n=10+9)
HashMapAcyclic-6     101µs ± 5%     105µs ± 3%    +3.52%  (p=0.001 n=9+10)
TailcfgNode-6       29.4µs ± 2%    16.4µs ± 3%   -44.25%  (p=0.000 n=8+10)

name              old alloc/op   new alloc/op   delta
Hash-6              3.60kB ± 0%    1.13kB ± 0%   -68.70%  (p=0.000 n=10+10)
HashMapAcyclic-6    2.53kB ± 0%    2.53kB ± 0%      ~     (p=0.137 n=10+8)
TailcfgNode-6         528B ± 0%        0B       -100.00%  (p=0.000 n=10+10)

name              old allocs/op  new allocs/op  delta
Hash-6                84.0 ± 0%      40.0 ± 0%   -52.38%  (p=0.000 n=10+10)
HashMapAcyclic-6       202 ± 0%       202 ± 0%      ~     (all equal)
TailcfgNode-6         11.0 ± 0%       0.0       -100.00%  (p=0.000 n=10+10)

Updates tailscale/corp#2130

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-07 11:30:49 -07:00
Brad Fitzpatrick 3962744450 util/deephash: prevent infinite loop on map cycle
Fixes #2340

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-07 10:57:46 -07:00
Brad Fitzpatrick aceaa70b16 util/deephash: move funcs to methods
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-07 08:17:18 -07:00
Brad Fitzpatrick 14f901da6d util/deephash: fix sync.Pool usage
Whoops.

From yesterday's 9ae3bd0939 (not yet
used by anything, fortunately)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-05 22:21:44 -07:00
Brad Fitzpatrick bf9f279768 util/deephash: optimize CPU a bit by by avoiding fmt in more places
name              old time/op    new time/op    delta
Hash-6               179µs ± 5%     173µs ± 4%   -3.12%  (p=0.004 n=10+10)
HashMapAcyclic-6     115µs ± 3%     101µs ± 5%  -11.51%  (p=0.000 n=9+9)
TailcfgNode-6       30.8µs ± 4%    29.4µs ± 2%   -4.51%  (p=0.000 n=10+8)

name              old alloc/op   new alloc/op   delta
Hash-6              3.60kB ± 0%    3.60kB ± 0%     ~     (p=0.445 n=9+10)
HashMapAcyclic-6    2.53kB ± 0%    2.53kB ± 0%     ~     (p=0.065 n=9+10)
TailcfgNode-6         528B ± 0%      528B ± 0%     ~     (all equal)

name              old allocs/op  new allocs/op  delta
Hash-6                84.0 ± 0%      84.0 ± 0%     ~     (all equal)
HashMapAcyclic-6       202 ± 0%       202 ± 0%     ~     (all equal)
TailcfgNode-6         11.0 ± 0%      11.0 ± 0%     ~     (all equal)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-05 21:28:54 -07:00
Brad Fitzpatrick 9ae3bd0939 util/deephash: export a Hash func for use by the control plane
name              old time/op    new time/op    delta
Hash-6              69.4µs ± 6%    68.4µs ± 4%     ~     (p=0.286 n=9+9)
HashMapAcyclic-6     115µs ± 5%     115µs ± 4%     ~     (p=1.000 n=10+10)

name              old alloc/op   new alloc/op   delta
Hash-6              2.29kB ± 0%    1.88kB ± 0%  -18.13%  (p=0.000 n=10+10)
HashMapAcyclic-6    2.53kB ± 0%    2.53kB ± 0%     ~     (all equal)

name              old allocs/op  new allocs/op  delta
Hash-6                58.0 ± 0%      54.0 ± 0%   -6.90%  (p=0.000 n=10+10)
HashMapAcyclic-6       202 ± 0%       202 ± 0%     ~     (all equal)

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
2021-07-05 11:41:44 -07:00
Brad Fitzpatrick 700badd8f8 util/deephash: move internal/deephash to util/deephash
No code changes. Just a minor package doc addition about lack of API
stability.
2021-07-02 21:33:02 -07:00