-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize the no-escape case with strpbrk #29
Conversation
Typically, strpbrk(3) is optimized pretty well with SIMD instructions. Just using it makes this as fast as a SIMD-based implementation for the no-escape case. Not utilizing this for escaped cases because memory allocation would be a more significant bottleneck for many strings anyway. Also, there'll be some overhead in calling a C function (strpbrk) many times because we're not using SIMD instructions directly. So using strpbrk all the time might not necessarily be faster.
(ruby/erb#29) Typically, strpbrk(3) is optimized pretty well with SIMD instructions. Just using it makes this as fast as a SIMD-based implementation for the no-escape case. Not utilizing this for escaped cases because memory allocation would be a more significant bottleneck for many strings anyway. Also, there'll be some overhead in calling a C function (strpbrk) many times because we're not using SIMD instructions directly. So using strpbrk all the time might not necessarily be faster.
@@ -38,6 +38,12 @@ escaped_length(VALUE str) | |||
static VALUE | |||
optimized_escape_html(VALUE str) | |||
{ | |||
// Optimize the most common, no-escape case with strpbrk(3). Not using it after | |||
// this because calling a C function many times could be slower for some cases. | |||
if (strpbrk(RSTRING_PTR(str), "'&\"<>") == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get not calling strpbrk
after that, but if it returned a pointer on the first call we could use that to directly skip to the first char to escape, no?
Maybe not worth the extra complexity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried writing it for a bit and then stopped doing it for the extra complexity. If someone brings a simple enough patch and it's meaningfully better in benchmarks, I'd merge it though. Also note that there's f58476b too now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Are your benchmarks committed anywhere? I may try that if I get bored in the plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use ruby/ruby@f276d5a for it.
Out of curiosity, have you ran that same benchmark on M1? |
Ugh, actually it was way worse than I imagined. Perhaps it should have an #ifdef or be just reverted... Before
After
|
Yeah that's the problem with SIMD and such, it's always a bit of a gamble when you are multi-platform.
M1 perf isn't as critical given that it's very unlikely to be a production plafform. But with other ARM targets such as graviton etc, that will become important (if not already). I wonder if it's macOS or clang |
because it's much slower on M1 #29. It'd be too complicated to switch the implementation based on known optimized platforms / versions. Besides, short strings are the most common usages of this method and SIMD doesn't really help that case. All in all, I can't justify the existence of this code.
because it's much slower on M1 ruby/erb#29. It'd be too complicated to switch the implementation based on known optimized platforms / versions. Besides, short strings are the most common usages of this method and SIMD doesn't really help that case. All in all, I can't justify the existence of this code. ruby/erb@30691c8995
Here's my take for now 30691c8. Given that We need a benchmark that is convincing enough to ignore all the other slowdown, a better patch that prevents a slowdown in all cases/environments, or SIMD-based code that is simple enough. |
I know some people manage to get SIMD "for free" by writing their code in a way that compiler can easily vectorize. I've never done it myself though, so not too sure how hard it is. |
Actually using a Ruby Regexp on TruffleRuby gives you that :) (https://twitter.com/eregontp/status/1596493699160367104) |
This PR makes
ERB::Util.html_escape
even more optimized towards the most common, no-escaped case.Typically, strpbrk(3) is optimized pretty well with SIMD instructions. Just using it makes this as fast as a SIMD-based implementation for the no-escape case.
Not utilizing this for escaped cases because memory allocation would be a more significant bottleneck for many strings anyway. Also, there'll be some overhead in calling a C function (strpbrk) many times because we're not using SIMD instructions directly. So using strpbrk all the time might not necessarily be faster.
Benchmark
No escape ("abcde" * 300)
Based on a benchmark in ruby/ruby, this is how it defeats an SIMD-based implementation:
Benchmark script
Based on benchmark/cgi_escape_html.yml in ruby/ruby, but modified to compare different implementations:
Using hescape.gem 0.1.1.
Before
After
Other benchmarks
The following benchmarks are "After"-only. Only the string part of the "Benchmark script" is modified and reused. Please look at https://github.com/ruby/ruby/blob/f276d5a7fe28ae4fd2934af4befd920bb78cfa9e/benchmark/cgi_escape_html.yml to see the actual strings I used.
No escape ("")
No escape ("abcde")
Escaped ("abcd<")
Escaped ("'&"<>")
Escaped ("'&"<>" * 10)
Looking at a local variable
str = "'&\"<>" * 10
Escaped (example.com)
Some results in "Other benchmarks" are improved in: #30