Improve performance of FromBytesOrNil #206

gaffneyc · 2025-03-26T18:21:24Z

This pulls code from FromBytes and UnmarshalBinary into FromBytesOrNil which reduces the cost of creating a UUID from a byte slice. It also removes an allocation when the UUID is invalid as it no longer generates an error which is discarded. One downside of this approach is that it duplicates the logic from UnmarshalBinary.

goos: linux
goarch: amd64
pkg: github.com/gofrs/uuid/v5
cpu: AMD Ryzen 9 5950X 16-Core Processor
                        │    old.txt     │            new.txt            │
                        │     sec/op     │    sec/op     vs base         │
FromBytesOrNil/valid-32      3.814n ± 0%    1.118n ± 1%  -70.69% (n=100)
FromBytesOrNil/empty-32   135.3500n ± 0%   0.6514n ± 1%  -99.52% (n=100)
geomean                      22.72n        0.8534n       -96.24%

                        │   old.txt    │                 new.txt                  │
                        │     B/op     │    B/op     vs base                      │
FromBytesOrNil/valid-32   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=100) ¹
FromBytesOrNil/empty-32   96.00 ± 0%      0.00 ± 0%  -100.00% (n=100)

                        │   old.txt    │                 new.txt                  │
                        │  allocs/op   │ allocs/op   vs base                      │
FromBytesOrNil/valid-32   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=100) ¹
FromBytesOrNil/empty-32   2.000 ± 0%     0.000 ± 0%  -100.00% (n=100)

Benchmark for previous implementation

benchmark                            old ns/op     new ns/op     delta
BenchmarkFromBytesOrNil/valid-10     5.03          5.68          +12.82%
BenchmarkFromBytesOrNil/empty-10     114           2.04          -98.21%

benchmark                            old allocs     new allocs     delta
BenchmarkFromBytesOrNil/valid-10     0              0              +0.00%
BenchmarkFromBytesOrNil/empty-10     2              0              -100.00%

benchmark                            old bytes     new bytes     delta
BenchmarkFromBytesOrNil/valid-10     0             0             +0.00%
BenchmarkFromBytesOrNil/empty-10     96            0             -100.00%

codecov · 2025-03-26T18:30:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (1404e37) to head (0854f62).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##            master      #206   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines          548       552    +4     
=========================================
+ Hits           548       552    +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kohenkatz · 2025-03-26T18:35:10Z

Increasing the execution time by 13% for valid cases seems like a potential downside to me - not everyone's workload is the same as yours with many invalid values.

On the other hand, the actual time change is really not that much - we're talking about 0.65 ns/op - vs. the major improvement in the invalid case.

I'm leaning toward accepting this, but I'm not totally sure. Let's see what anyone else has to say.

cameracker · 2025-03-26T18:40:32Z

@kohenkatz Not sure either. The speed up for the invalid cases is rather substantial, but how often in the wild would this case actually be encountered and on the happy path of a consumer's workflow? It's also a fairly specific invalid case of empty/partial string.

The sub nanosecond time increase on the valid case doesn't worry me that much. Willing to accept this trade but will probably give this a couple of weeks to see if anybody pipes up and raises a concern.

@gaffneyc would you mind adding a unit test to make sure we're at 100% coverage?

dylan-bourque · 2025-03-26T21:20:19Z

I'd lean towards a no because of the +13% increase in the valid case because there are existing consumers that generate a very large volume of valid cases (in the multiple trillions of values per day). While .65 nanoseconds is an extremely small amount of time, adding that to 3 trillion+ operations is an additional 22.5 hours of processing for every 24 hour window.

gaffneyc · 2025-03-27T02:08:37Z

I took a look at the code coverage change and since we're checking the only case where FromBytes can return an error there was no way to test the conditional which is what caused the drop in test coverage. My intention with the original change was to make the smallest change necessary as there may be other cases in the future where FromBytes could return an error (though I can't think of any).

That said, I've pushed an alternative implementation which pulls code from FromBytes and UnmarshalBinary into FromBytesOrNil. This avoids checking the length of the input twice as well as the overhead involved with calling multiple functions but duplicates the (minimal) logic for unmarshalling from a byte slice. I've also rerun the benchmarks on a less busy system and switched to benchstat for comparison.

By the way, the extra overhead of 0.65ns across 3 trillion calls would be 32 minutes and 30 seconds rather than 22 hours and 30 minutes.

I've added the benchmarks in the main comment for the updated implementation. In the example with 3 trillion requests this would save ~2 hours and 15 minutes of compute time per day for valid UUIDs and ~112 hours and 15 minutes if all of those UUIDs were empty.

kohenkatz · 2025-03-27T02:48:36Z

@gaffneyc I like this new version!

... but duplicates the (minimal) logic for unmarshalling from a byte slice.

I am not concerned about that, but I think there should be a comment here explaining that it has been done on purpose for performance reasons so that some future contributor (or ourselves in 6 months) doesn't try to "fix" this and consolidate it again.

This pulls code from FromBytes and UnmarshalBinary into FromBytesOrNil which reduces the cost of creating a UUID from a byte slice. It also removes an allocation when the UUID is invalid as it no longer generates an error which is discarded. One downside of this approach is that it duplicates the logic from UnmarshalBinary. ``` goos: linux goarch: amd64 pkg: github.com/gofrs/uuid/v5 cpu: AMD Ryzen 9 5950X 16-Core Processor │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ FromBytesOrNil/valid-32 3.814n ± 0% 1.118n ± 1% -70.69% (n=100) FromBytesOrNil/empty-32 135.3500n ± 0% 0.6514n ± 1% -99.52% (n=100) geomean 22.72n 0.8534n -96.24% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ FromBytesOrNil/valid-32 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=100) ¹ FromBytesOrNil/empty-32 96.00 ± 0% 0.00 ± 0% -100.00% (n=100) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ FromBytesOrNil/valid-32 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=100) ¹ FromBytesOrNil/empty-32 2.000 ± 0% 0.000 ± 0% -100.00% (n=100) ```

gaffneyc · 2025-03-27T18:18:41Z

@kohenkatz I've added a comment though let me know if you think it could be more clear. Thanks!

kohenkatz · 2025-03-27T18:21:03Z

Looks reasonable to me.

Let's see if @cameracker and @dylan-bourque (or anyone else) have thoughts.

dylan-bourque · 2025-03-27T18:45:53Z

I ran the benchmarks locally and saw improvements in both time and allocs.

 benchstat before.txt after.txt
goos: darwin
goarch: arm64
pkg: github.com/gofrs/uuid/v5
cpu: Apple M1 Max
                        │   before.txt   │              after.txt               │
                        │     sec/op     │    sec/op     vs base                │
FromBytesOrNil/valid-10      5.186n ± 2%    1.790n ± 2%  -65.49% (p=0.000 n=20)
FromBytesOrNil/empty-10   115.0000n ± 1%   0.5857n ± 2%  -99.49% (p=0.000 n=20)
geomean                      24.42n         1.024n       -95.81%

                        │  before.txt  │                after.txt                │
                        │     B/op     │    B/op     vs base                     │
FromBytesOrNil/valid-10   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=20) ¹
FromBytesOrNil/empty-10   96.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=20)
geomean                              ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                        │  before.txt  │                after.txt                │
                        │  allocs/op   │ allocs/op   vs base                     │
FromBytesOrNil/valid-10   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=20) ¹
FromBytesOrNil/empty-10   2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=20)
geomean                              ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

@gaffneyc did you change something significant since this PR was first created that would have improved the behavior?

Regardless, my objection was to the ~13% increase, which doesn't seem to be manifesting anymore.

dylan-bourque · 2025-03-27T18:47:04Z

Another option here to avoid allocating errors that are just discarded would be to change UnmarshalBinary() to return a constant error rather than using fmt.Errorf()

gaffneyc · 2025-03-27T19:06:26Z

@dylan-bourque Yes, I rewrote the implementation pulling code from FromBytes and UnmarshalBinary into FromBytesOrNil. This was the only way without changing UnmarshalBinary to avoid checking the input's length twice and to get code coverage back to 100%.

Another option here to avoid allocating errors that are just discarded would be to change UnmarshalBinary() to return a constant error rather than using fmt.Errorf()

Yeah, that could work as well though it would be a change in behavior. To keep the same behavior you could implement a custom type which would wrap the ErrIncorrectByteLength, store the invalid size, and respond with the full message when Error() is called. Not sure what impact either option would have without benchmarking it.

The original implementation was:

func FromBytesOrNil(input []byte) UUID {
	if len(input) != Size {
		return Nil
	}

	uuid, err := FromBytes(input)
	if err != nil {
		return Nil
	}
	return uuid
}

The new implementation is:

func FromBytesOrNil(input []byte) UUID {
	if len(input) != Size {
		return Nil
	}

	uuid := UUID{}
	copy(uuid[:], input)

	return uuid
}

cameracker · 2025-03-27T19:11:13Z

@gaffneyc I'm thankful for the time you took to contribute this. Really solid improvement.

gaffneyc force-pushed the master branch from 292bb51 to aa35087 Compare March 27, 2025 01:59

gaffneyc changed the title ~~Improve performance of FromBytesOrNil when UUID is invalid~~ Improve performance of FromBytesOrNil Mar 27, 2025

gaffneyc force-pushed the master branch from aa35087 to 0854f62 Compare March 27, 2025 18:17

cameracker approved these changes Mar 27, 2025

View reviewed changes

dylan-bourque approved these changes Mar 27, 2025

View reviewed changes

cameracker merged commit a85a526 into gofrs:master Mar 27, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of FromBytesOrNil #206

Improve performance of FromBytesOrNil #206

gaffneyc commented Mar 26, 2025 •

edited

Loading

codecov bot commented Mar 26, 2025 •

edited

Loading

kohenkatz commented Mar 26, 2025 •

edited

Loading

cameracker commented Mar 26, 2025 •

edited

Loading

dylan-bourque commented Mar 26, 2025

gaffneyc commented Mar 27, 2025 •

edited

Loading

kohenkatz commented Mar 27, 2025

gaffneyc commented Mar 27, 2025

kohenkatz commented Mar 27, 2025

dylan-bourque commented Mar 27, 2025

dylan-bourque commented Mar 27, 2025

gaffneyc commented Mar 27, 2025

cameracker commented Mar 27, 2025

Improve performance of FromBytesOrNil #206

Improve performance of FromBytesOrNil #206

Conversation

gaffneyc commented Mar 26, 2025 • edited Loading

codecov bot commented Mar 26, 2025 • edited Loading

Codecov Report

kohenkatz commented Mar 26, 2025 • edited Loading

cameracker commented Mar 26, 2025 • edited Loading

dylan-bourque commented Mar 26, 2025

gaffneyc commented Mar 27, 2025 • edited Loading

kohenkatz commented Mar 27, 2025

gaffneyc commented Mar 27, 2025

kohenkatz commented Mar 27, 2025

dylan-bourque commented Mar 27, 2025

dylan-bourque commented Mar 27, 2025

gaffneyc commented Mar 27, 2025

cameracker commented Mar 27, 2025

gaffneyc commented Mar 26, 2025 •

edited

Loading

codecov bot commented Mar 26, 2025 •

edited

Loading

kohenkatz commented Mar 26, 2025 •

edited

Loading

cameracker commented Mar 26, 2025 •

edited

Loading

gaffneyc commented Mar 27, 2025 •

edited

Loading