Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of FromBytesOrNil #206

Merged
merged 1 commit into from
Mar 27, 2025
Merged

Conversation

gaffneyc
Copy link
Contributor

@gaffneyc gaffneyc commented Mar 26, 2025

This pulls code from FromBytes and UnmarshalBinary into FromBytesOrNil which reduces the cost of creating a UUID from a byte slice. It also removes an allocation when the UUID is invalid as it no longer generates an error which is discarded. One downside of this approach is that it duplicates the logic from UnmarshalBinary.

goos: linux
goarch: amd64
pkg: github.com/gofrs/uuid/v5
cpu: AMD Ryzen 9 5950X 16-Core Processor
                        │    old.txt     │            new.txt            │
                        │     sec/op     │    sec/op     vs base         │
FromBytesOrNil/valid-32      3.814n ± 0%    1.118n ± 1%  -70.69% (n=100)
FromBytesOrNil/empty-32   135.3500n ± 0%   0.6514n ± 1%  -99.52% (n=100)
geomean                      22.72n        0.8534n       -96.24%

                        │   old.txt    │                 new.txt                  │
                        │     B/op     │    B/op     vs base                      │
FromBytesOrNil/valid-32   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=100) ¹
FromBytesOrNil/empty-32   96.00 ± 0%      0.00 ± 0%  -100.00% (n=100)

                        │   old.txt    │                 new.txt                  │
                        │  allocs/op   │ allocs/op   vs base                      │
FromBytesOrNil/valid-32   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=100) ¹
FromBytesOrNil/empty-32   2.000 ± 0%     0.000 ± 0%  -100.00% (n=100)

Benchmark for previous implementation

benchmark                            old ns/op     new ns/op     delta
BenchmarkFromBytesOrNil/valid-10     5.03          5.68          +12.82%
BenchmarkFromBytesOrNil/empty-10     114           2.04          -98.21%

benchmark                            old allocs     new allocs     delta
BenchmarkFromBytesOrNil/valid-10     0              0              +0.00%
BenchmarkFromBytesOrNil/empty-10     2              0              -100.00%

benchmark                            old bytes     new bytes     delta
BenchmarkFromBytesOrNil/valid-10     0             0             +0.00%
BenchmarkFromBytesOrNil/empty-10     96            0             -100.00%

Copy link

codecov bot commented Mar 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (1404e37) to head (0854f62).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #206   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines          548       552    +4     
=========================================
+ Hits           548       552    +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kohenkatz
Copy link
Contributor

kohenkatz commented Mar 26, 2025

Increasing the execution time by 13% for valid cases seems like a potential downside to me - not everyone's workload is the same as yours with many invalid values.

On the other hand, the actual time change is really not that much - we're talking about 0.65 ns/op - vs. the major improvement in the invalid case.

I'm leaning toward accepting this, but I'm not totally sure. Let's see what anyone else has to say.

@cameracker
Copy link
Collaborator

cameracker commented Mar 26, 2025

@kohenkatz Not sure either. The speed up for the invalid cases is rather substantial, but how often in the wild would this case actually be encountered and on the happy path of a consumer's workflow? It's also a fairly specific invalid case of empty/partial string.

The sub nanosecond time increase on the valid case doesn't worry me that much. Willing to accept this trade but will probably give this a couple of weeks to see if anybody pipes up and raises a concern.

@gaffneyc would you mind adding a unit test to make sure we're at 100% coverage?

@dylan-bourque
Copy link
Member

I'd lean towards a no because of the +13% increase in the valid case because there are existing consumers that generate a very large volume of valid cases (in the multiple trillions of values per day). While .65 nanoseconds is an extremely small amount of time, adding that to 3 trillion+ operations is an additional 22.5 hours of processing for every 24 hour window.

@gaffneyc
Copy link
Contributor Author

gaffneyc commented Mar 27, 2025

I took a look at the code coverage change and since we're checking the only case where FromBytes can return an error there was no way to test the conditional which is what caused the drop in test coverage. My intention with the original change was to make the smallest change necessary as there may be other cases in the future where FromBytes could return an error (though I can't think of any).

That said, I've pushed an alternative implementation which pulls code from FromBytes and UnmarshalBinary into FromBytesOrNil. This avoids checking the length of the input twice as well as the overhead involved with calling multiple functions but duplicates the (minimal) logic for unmarshalling from a byte slice. I've also rerun the benchmarks on a less busy system and switched to benchstat for comparison.

By the way, the extra overhead of 0.65ns across 3 trillion calls would be 32 minutes and 30 seconds rather than 22 hours and 30 minutes.

I've added the benchmarks in the main comment for the updated implementation. In the example with 3 trillion requests this would save ~2 hours and 15 minutes of compute time per day for valid UUIDs and ~112 hours and 15 minutes if all of those UUIDs were empty.

@gaffneyc gaffneyc changed the title Improve performance of FromBytesOrNil when UUID is invalid Improve performance of FromBytesOrNil Mar 27, 2025
@kohenkatz
Copy link
Contributor

@gaffneyc I like this new version!

... but duplicates the (minimal) logic for unmarshalling from a byte slice.

I am not concerned about that, but I think there should be a comment here explaining that it has been done on purpose for performance reasons so that some future contributor (or ourselves in 6 months) doesn't try to "fix" this and consolidate it again.

This pulls code from FromBytes and UnmarshalBinary into FromBytesOrNil
which reduces the cost of creating a UUID from a byte slice. It also
removes an allocation when the UUID is invalid as it no longer generates
an error which is discarded. One downside of this approach is that it
duplicates the logic from UnmarshalBinary.

```
goos: linux
goarch: amd64
pkg: github.com/gofrs/uuid/v5
cpu: AMD Ryzen 9 5950X 16-Core Processor
                        │    old.txt     │            new.txt            │
                        │     sec/op     │    sec/op     vs base         │
FromBytesOrNil/valid-32      3.814n ± 0%    1.118n ± 1%  -70.69% (n=100)
FromBytesOrNil/empty-32   135.3500n ± 0%   0.6514n ± 1%  -99.52% (n=100)
geomean                      22.72n        0.8534n       -96.24%

                        │   old.txt    │                 new.txt                  │
                        │     B/op     │    B/op     vs base                      │
FromBytesOrNil/valid-32   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=100) ¹
FromBytesOrNil/empty-32   96.00 ± 0%      0.00 ± 0%  -100.00% (n=100)

                        │   old.txt    │                 new.txt                  │
                        │  allocs/op   │ allocs/op   vs base                      │
FromBytesOrNil/valid-32   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=100) ¹
FromBytesOrNil/empty-32   2.000 ± 0%     0.000 ± 0%  -100.00% (n=100)
```
@gaffneyc
Copy link
Contributor Author

@kohenkatz I've added a comment though let me know if you think it could be more clear. Thanks!

@kohenkatz
Copy link
Contributor

Looks reasonable to me.

Let's see if @cameracker and @dylan-bourque (or anyone else) have thoughts.

@dylan-bourque
Copy link
Member

I ran the benchmarks locally and saw improvements in both time and allocs.

 benchstat before.txt after.txt
goos: darwin
goarch: arm64
pkg: github.com/gofrs/uuid/v5
cpu: Apple M1 Max
                        │   before.txt   │              after.txt               │
                        │     sec/op     │    sec/op     vs base                │
FromBytesOrNil/valid-10      5.186n ± 2%    1.790n ± 2%  -65.49% (p=0.000 n=20)
FromBytesOrNil/empty-10   115.0000n ± 1%   0.5857n ± 2%  -99.49% (p=0.000 n=20)
geomean                      24.42n         1.024n       -95.81%

                        │  before.txt  │                after.txt                │
                        │     B/op     │    B/op     vs base                     │
FromBytesOrNil/valid-10   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=20) ¹
FromBytesOrNil/empty-10   96.00 ± 0%      0.00 ± 0%  -100.00% (p=0.000 n=20)
geomean                              ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                        │  before.txt  │                after.txt                │
                        │  allocs/op   │ allocs/op   vs base                     │
FromBytesOrNil/valid-10   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=20) ¹
FromBytesOrNil/empty-10   2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=20)
geomean                              ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

@gaffneyc did you change something significant since this PR was first created that would have improved the behavior?

Regardless, my objection was to the ~13% increase, which doesn't seem to be manifesting anymore.

@dylan-bourque
Copy link
Member

Another option here to avoid allocating errors that are just discarded would be to change UnmarshalBinary() to return a constant error rather than using fmt.Errorf()

@gaffneyc
Copy link
Contributor Author

@dylan-bourque Yes, I rewrote the implementation pulling code from FromBytes and UnmarshalBinary into FromBytesOrNil. This was the only way without changing UnmarshalBinary to avoid checking the input's length twice and to get code coverage back to 100%.

Another option here to avoid allocating errors that are just discarded would be to change UnmarshalBinary() to return a constant error rather than using fmt.Errorf()

Yeah, that could work as well though it would be a change in behavior. To keep the same behavior you could implement a custom type which would wrap the ErrIncorrectByteLength, store the invalid size, and respond with the full message when Error() is called. Not sure what impact either option would have without benchmarking it.

The original implementation was:

func FromBytesOrNil(input []byte) UUID {
	if len(input) != Size {
		return Nil
	}

	uuid, err := FromBytes(input)
	if err != nil {
		return Nil
	}
	return uuid
}

The new implementation is:

func FromBytesOrNil(input []byte) UUID {
	if len(input) != Size {
		return Nil
	}

	uuid := UUID{}
	copy(uuid[:], input)

	return uuid
}

@cameracker
Copy link
Collaborator

@gaffneyc I'm thankful for the time you took to contribute this. Really solid improvement.

@cameracker cameracker merged commit a85a526 into gofrs:master Mar 27, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants