-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Add option to interpret request headers as Latin1 #18255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
{ | ||
_context = context; | ||
|
||
ServerOptions = ServiceContext.ServerOptions; | ||
HttpRequestHeaders = new HttpRequestHeaders(reuseHeaderValues: !ServerOptions.DisableStringReuse); | ||
_useLatin1 = isHttp1 && ServerOptions.Latin1RequestHeaders; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why just HTTP1? HTTP/2 would be affected the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that come up specifically? I don't recall...
Taking a quick look at the HTTP/2 and HPACK specs it doesn't appear that a character set encoding is specified for the header values - all I can see is a reference to them as octets.
I would say the server shouldn't take the transport version into account. Especially because intermediate gateways between client and server could shift the connection between HTTP/1.* and HTTP/2 --- the header values would be the same end-to-end but the http version is only the same hop-by-hop.
// 64-bit: Loop longs by default | ||
while (input <= end - sizeof(long)) | ||
{ | ||
output[0] = (char)input[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use Bmi2.X64.ParallelBitDeposit
if supported. Same for the 32-bit path.
See
aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs
Lines 121 to 137 in 80a77ff
if (Bmi2.X64.IsSupported) | |
{ | |
// BMI2 will work regardless of the processor's endianness. | |
((ulong*)output)[0] = Bmi2.X64.ParallelBitDeposit((ulong)value, 0x00FF00FF_00FF00FFul); | |
((ulong*)output)[1] = Bmi2.X64.ParallelBitDeposit((ulong)(value >> 32), 0x00FF00FF_00FF00FFul); | |
} | |
else | |
{ | |
output[0] = (char)input[0]; | |
output[1] = (char)input[1]; | |
output[2] = (char)input[2]; | |
output[3] = (char)input[3]; | |
output[4] = (char)input[4]; | |
output[5] = (char)input[5]; | |
output[6] = (char)input[6]; | |
output[7] = (char)input[7]; | |
} |
do | ||
{ | ||
var vector = Unsafe.AsRef<Vector<sbyte>>(input); | ||
Vector.Widen( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this method is hot, you could add explicit paths for AVX2, SSE2, because
- Vector operates on the widest op-set available, i.e. AVX, so some iterations need to be done sequentially instead of parallel as SSE would allow
- Vector-methods may result in somewhat not ideal codegen.
For reference see
public static unsafe bool TryGetAsciiString(byte* input, char* output, int count) |
Except CheckBytesInAsciiRange
both methods are similar. So they could be combined to one, and with some tricks let the JIT generate the proper code for both flavors.
I could do this as part of my PR, if this one gets merged first (also add the BMI-path, comment above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't expect this method to be hot. This is mostly here so Bing can accept non-UTF8/ASCII encoded headers from legacy clients. This is opt-in via an undocumented IConfiguration flag. Even when opted-in, this method will only be hit to decode non-ASCII request headers.
I'll look into merging #17556 to master soon. I'll convert GetLatin1String to use your TryGetAsciiString optimizations when I do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is if we don't just end up using Encoding.Latin1 in .NET 5 which we very well might.
From #17399 (comment)
For the latter point I don't see any check in this PR. Will this case be handled at other parts in the stack? Should it be handled early in |
{ | ||
resultString = HeaderValueEncoding.GetString(buffer, span.Length); | ||
StringUtilities.GetLatin1String(buffer, output, span.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should take this opportunity to refactor this code not to create a string
instance and mutate its contents (see line 145 above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get to that in 5.0. We'll want to limit changes for the 3.x patch.
@@ -115,6 +115,93 @@ public static unsafe bool TryGetAsciiString(byte* input, char* output, int count | |||
return isValid; | |||
} | |||
|
|||
[MethodImpl(MethodImplOptions.AggressiveOptimization)] | |||
public static unsafe void GetLatin1String(byte* input, char* output, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I'd rather this logic not be in aspnetcore. If this is pri-0 for you, let the API review team know that having a Encoding.Latin1
static property (see proposal at https://github.com/dotnet/corefx/issues/42686) is a necessity for you, and keep the optimized logic wholly within the runtime itself.
Ah, didn't realize this was the 3.x branch. Ignore my comments about waiting for runtime changes then. :) |
The HttpParser checks there are no carriage returns, and GetAsciiOrUTF8StringNonNullCharacters already checks there are no nulls. The previous version of GetAsciiOrUTF8StringNonNullCharacters did not reject other unprintable ASCII characters, and I decided not to add that extra validation as part of this patch. |
if (IntPtr.Size == 8) // Use Intrinsic switch for branch elimination | ||
{ | ||
// 64-bit: Loop longs by default | ||
while (input <= end - sizeof(long)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clause should be end - input >= sizeof(long)
.
do | ||
{ | ||
// If Vector not-accelerated or remaining less than vector size | ||
if (!Vector.IsHardwareAccelerated || input > end - Vector<sbyte>.Count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clause should be end - input <= Vector<sbyte>.Count
.
input += sizeof(long); | ||
output += sizeof(long); | ||
} | ||
if (input <= end - sizeof(int)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end - input >= sizeof(int)
} | ||
else | ||
{ | ||
// 32-bit: Loop ints by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for this branch. Just do loops of stride 8 on all platforms, and ignore the IntPtr.Size
value.
output += sizeof(int); | ||
} | ||
} | ||
if (input <= end - sizeof(short)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end - input >= sizeof(short)
// do/while as entry condition already checked | ||
do | ||
{ | ||
var vector = Unsafe.AsRef<Vector<sbyte>>(input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe.ReadUnaligned
, not Unsafe.AsRef
.
var vector = Unsafe.AsRef<Vector<sbyte>>(input); | ||
Vector.Widen( | ||
vector, | ||
out Unsafe.AsRef<Vector<short>>(output), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out these results to a local, then Unsafe.WriteUnaligned
the local to the destination address.
|
||
input += Vector<sbyte>.Count; | ||
output += Vector<sbyte>.Count; | ||
} while (input <= end - Vector<sbyte>.Count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end - input >= Vector<sbyte>.Count
FWIW, my comments about using the pattern |
@GrabYourPitchforks GetLatin1String is a copy of TryGetAsciiString. I plan on leaving it as-is in this patch assuming there's nothing actually broken with this implementation as it's being used today. @gfoidl was looking at optimizing TryGetAsciiString with #17556. I was planning on merging #17556 into master soon and then changing the GetLatin1String implementation to match. Can you review #17556? Thanks. |
@halter73 Then at the very least |
src/Servers/Kestrel/Core/src/Internal/Infrastructure/HttpUtilities.cs
Outdated
Show resolved
Hide resolved
I like that approach, having latin1 as a parity copy of ascii and optimizing them together or becoming direct calls to core API when appropriate. Adding to that - for partity it would be better to have If It would also not change the existing ascii+utf8 extension method's structure at all, which could be a small plus because that's the code path used by all existing customers. |
79b68cc
to
e24e0bc
Compare
@lodejard I added GetLatin1StringNonNullCharacters like you suggested.
TryGetLatin1String now calls into CheckBytesNotNull. CheckBytesNotNull doesn't verify there are no bytes in the 0x01-0x1f range like CheckBytesInLatin1Range would, because TryGetAsciiString also doesn't verify it. HttpParser.TakeHeader rejects carriage returns inside a header, but that's it. While I agree it would be better to reject these characters, I don't want to start rejecting requests that would have been accepted in the default ASCII fast path as part of a patch. |
@halter73 That makes sense - no reason to put them out of sync for that. Thanks! Did you see the other remark about http1? I forgot to tag you so not sure if it was missed... |
I saw that. I now support latin1 encoding for http/2 request headers as well. I added tests for both HTTP/1 and HTTP/2. |
new KeyValuePair<string, string>(HeaderNames.Method, "GET"), | ||
new KeyValuePair<string, string>(HeaderNames.Path, "/"), | ||
new KeyValuePair<string, string>(HeaderNames.Scheme, "http"), | ||
// The HPackEncoder will encode £ as 0xA3 aka Latin1 encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that the encoder didn't need to be modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I though I was going to have to use a custom HPackEncoder, but it turns out that it's just casting chars to bytes. The response header's are normally validated by Kestrel's HttpResponseHeaders dictionary.
TODONE? Or still TODO? |
TODONE |
@@ -115,6 +117,111 @@ public static unsafe bool TryGetAsciiString(byte* input, char* output, int count | |||
return isValid; | |||
} | |||
|
|||
[MethodImpl(MethodImplOptions.AggressiveOptimization)] | |||
public static unsafe bool TryGetLatin1String(byte* input, char* output, int count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using unsafe code here instead of Span and Span? To match TryGetAsciiString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I'm trying to keep this as close to TryGetAsciiString as possible since it's conceptually quite similar.
[Tactics] - ensure Bing validates. |
Hello @halter73! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
Apologies, while this PR appears ready to be merged, it looks like |
Bing confirmed this change works for them. Can you please merge this @anurse? |
Branches aren't open for March, so no :). I'll do so as soon as they open though! |
Slightly nervous about the passing build being a month ago, but I'll merge it ;) |
TODONE: Add tests.
Fixes #17399