-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of FromBytesOrNil #206
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #206 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 5 5
Lines 548 552 +4
=========================================
+ Hits 548 552 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Increasing the execution time by 13% for valid cases seems like a potential downside to me - not everyone's workload is the same as yours with many invalid values. On the other hand, the actual time change is really not that much - we're talking about 0.65 ns/op - vs. the major improvement in the invalid case. I'm leaning toward accepting this, but I'm not totally sure. Let's see what anyone else has to say. |
@kohenkatz Not sure either. The speed up for the invalid cases is rather substantial, but how often in the wild would this case actually be encountered and on the happy path of a consumer's workflow? It's also a fairly specific invalid case of empty/partial string. The sub nanosecond time increase on the valid case doesn't worry me that much. Willing to accept this trade but will probably give this a couple of weeks to see if anybody pipes up and raises a concern. @gaffneyc would you mind adding a unit test to make sure we're at 100% coverage? |
I'd lean towards a no because of the +13% increase in the valid case because there are existing consumers that generate a very large volume of valid cases (in the multiple trillions of values per day). While .65 nanoseconds is an extremely small amount of time, adding that to 3 trillion+ operations is an additional 22.5 hours of processing for every 24 hour window. |
I took a look at the code coverage change and since we're checking the only case where That said, I've pushed an alternative implementation which pulls code from By the way, the extra overhead of 0.65ns across 3 trillion calls would be 32 minutes and 30 seconds rather than 22 hours and 30 minutes. I've added the benchmarks in the main comment for the updated implementation. In the example with 3 trillion requests this would save ~2 hours and 15 minutes of compute time per day for valid UUIDs and ~112 hours and 15 minutes if all of those UUIDs were empty. |
@gaffneyc I like this new version!
I am not concerned about that, but I think there should be a comment here explaining that it has been done on purpose for performance reasons so that some future contributor (or ourselves in 6 months) doesn't try to "fix" this and consolidate it again. |
This pulls code from FromBytes and UnmarshalBinary into FromBytesOrNil which reduces the cost of creating a UUID from a byte slice. It also removes an allocation when the UUID is invalid as it no longer generates an error which is discarded. One downside of this approach is that it duplicates the logic from UnmarshalBinary. ``` goos: linux goarch: amd64 pkg: github.com/gofrs/uuid/v5 cpu: AMD Ryzen 9 5950X 16-Core Processor │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ FromBytesOrNil/valid-32 3.814n ± 0% 1.118n ± 1% -70.69% (n=100) FromBytesOrNil/empty-32 135.3500n ± 0% 0.6514n ± 1% -99.52% (n=100) geomean 22.72n 0.8534n -96.24% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ FromBytesOrNil/valid-32 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=100) ¹ FromBytesOrNil/empty-32 96.00 ± 0% 0.00 ± 0% -100.00% (n=100) │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ FromBytesOrNil/valid-32 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=100) ¹ FromBytesOrNil/empty-32 2.000 ± 0% 0.000 ± 0% -100.00% (n=100) ```
@kohenkatz I've added a comment though let me know if you think it could be more clear. Thanks! |
Looks reasonable to me. Let's see if @cameracker and @dylan-bourque (or anyone else) have thoughts. |
I ran the benchmarks locally and saw improvements in both time and allocs.
@gaffneyc did you change something significant since this PR was first created that would have improved the behavior? Regardless, my objection was to the ~13% increase, which doesn't seem to be manifesting anymore. |
Another option here to avoid allocating errors that are just discarded would be to change |
@dylan-bourque Yes, I rewrote the implementation pulling code from
Yeah, that could work as well though it would be a change in behavior. To keep the same behavior you could implement a custom type which would wrap the The original implementation was: func FromBytesOrNil(input []byte) UUID {
if len(input) != Size {
return Nil
}
uuid, err := FromBytes(input)
if err != nil {
return Nil
}
return uuid
} The new implementation is: func FromBytesOrNil(input []byte) UUID {
if len(input) != Size {
return Nil
}
uuid := UUID{}
copy(uuid[:], input)
return uuid
} |
@gaffneyc I'm thankful for the time you took to contribute this. Really solid improvement. |
This pulls code from FromBytes and UnmarshalBinary into FromBytesOrNil which reduces the cost of creating a UUID from a byte slice. It also removes an allocation when the UUID is invalid as it no longer generates an error which is discarded. One downside of this approach is that it duplicates the logic from UnmarshalBinary.
Benchmark for previous implementation