Add option to interpret request headers as Latin1 #18255

halter73 · 2020-01-10T03:57:04Z

TODONE: Add tests.

Tratcher · 2020-01-10T04:36:51Z

src/Servers/Kestrel/Core/src/Internal/Http/HttpProtocol.cs

        {
            _context = context;

            ServerOptions = ServiceContext.ServerOptions;
-            HttpRequestHeaders = new HttpRequestHeaders(reuseHeaderValues: !ServerOptions.DisableStringReuse);
+            _useLatin1 = isHttp1 && ServerOptions.Latin1RequestHeaders;


Why just HTTP1? HTTP/2 would be affected the same way.

When we talked about it in the meeting with @lodejard, @mjsabby and @timmydo, they said this change was for legacy clients and that requiring UTF-8 for HTTP/2 clients was fine.

Did that come up specifically? I don't recall...

Taking a quick look at the HTTP/2 and HPACK specs it doesn't appear that a character set encoding is specified for the header values - all I can see is a reference to them as octets.

I would say the server shouldn't take the transport version into account. Especially because intermediate gateways between client and server could shift the connection between HTTP/1.* and HTTP/2 --- the header values would be the same end-to-end but the http version is only the same hop-by-hop.

gfoidl · 2020-01-10T09:06:05Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

+                        // 64-bit: Loop longs by default
+                        while (input <= end - sizeof(long))
+                        {
+                            output[0] = (char)input[0];


You could use Bmi2.X64.ParallelBitDeposit if supported. Same for the 32-bit path.
See

aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs

Lines 121 to 137 in 80a77ff

if (Bmi2.X64.IsSupported)

{

// BMI2 will work regardless of the processor's endianness.

((ulong*)output)[0] = Bmi2.X64.ParallelBitDeposit((ulong)value, 0x00FF00FF_00FF00FFul);

((ulong*)output)[1] = Bmi2.X64.ParallelBitDeposit((ulong)(value >> 32), 0x00FF00FF_00FF00FFul);

}

else

{

output[0] = (char)input[0];

output[1] = (char)input[1];

output[2] = (char)input[2];

output[3] = (char)input[3];

output[4] = (char)input[4];

output[5] = (char)input[5];

output[6] = (char)input[6];

output[7] = (char)input[7];

}

gfoidl · 2020-01-10T09:13:01Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

+                do
+                {
+                    var vector = Unsafe.AsRef<Vector<sbyte>>(input);
+                    Vector.Widen(


If this method is hot, you could add explicit paths for AVX2, SSE2, because

Vector operates on the widest op-set available, i.e. AVX, so some iterations need to be done sequentially instead of parallel as SSE would allow

Vector-methods may result in somewhat not ideal codegen.

For reference see

aspnetcore/src/Shared/ServerInfrastructure/StringUtilities.cs

Line 19 in 80a77ff

public static unsafe bool TryGetAsciiString(byte* input, char* output, int count)

from #17556

Except CheckBytesInAsciiRange both methods are similar. So they could be combined to one, and with some tricks let the JIT generate the proper code for both flavors.
I could do this as part of my PR, if this one gets merged first (also add the BMI-path, comment above).

We don't expect this method to be hot. This is mostly here so Bing can accept non-UTF8/ASCII encoded headers from legacy clients. This is opt-in via an undocumented IConfiguration flag. Even when opted-in, this method will only be hit to decode non-ASCII request headers.

I'll look into merging #17556 to master soon. I'll convert GetLatin1String to use your TryGetAsciiString optimizations when I do.

That is if we don't just end up using Encoding.Latin1 in .NET 5 which we very well might.

gfoidl · 2020-01-10T09:22:53Z

From #17399 (comment)

header-value octets 0x00-0x1F and 0x7F are unprintable USASCII and not expected to map to unicode char in any request header value strings

some unprintable ASCII is already consumed as http protocol delimiters

other unprintable ASCII will cause the request to be rejected as malformed

For the latter point I don't see any check in this PR. Will this case be handled at other parts in the stack? Should it be handled early in StringUtilities.GetLatin1String?

GrabYourPitchforks · 2020-01-10T20:24:34Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/HttpUtilities.cs

                    {
-                        resultString = HeaderValueEncoding.GetString(buffer, span.Length);
+                        StringUtilities.GetLatin1String(buffer, output, span.Length);


Should take this opportunity to refactor this code not to create a string instance and mutate its contents (see line 145 above).

We can get to that in 5.0. We'll want to limit changes for the 3.x patch.

GrabYourPitchforks · 2020-01-10T20:29:49Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

@@ -115,6 +115,93 @@ public static unsafe bool TryGetAsciiString(byte* input, char* output, int count
            return isValid;
        }

+        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
+        public static unsafe void GetLatin1String(byte* input, char* output, int count)


Honestly I'd rather this logic not be in aspnetcore. If this is pri-0 for you, let the API review team know that having a Encoding.Latin1 static property (see proposal at https://github.com/dotnet/corefx/issues/42686) is a necessity for you, and keep the optimized logic wholly within the runtime itself.

GrabYourPitchforks · 2020-01-10T20:50:01Z

Ah, didn't realize this was the 3.x branch. Ignore my comments about waiting for runtime changes then. :)

halter73 · 2020-01-10T20:51:14Z

For the latter point I don't see any check in this PR. Will this case be handled at other parts in the stack? Should it be handled early in StringUtilities.GetLatin1String?

The HttpParser checks there are no carriage returns, and GetAsciiOrUTF8StringNonNullCharacters already checks there are no nulls. The previous version of GetAsciiOrUTF8StringNonNullCharacters did not reject other unprintable ASCII characters, and I decided not to add that extra validation as part of this patch.

GrabYourPitchforks · 2020-01-10T20:51:48Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

+                    if (IntPtr.Size == 8) // Use Intrinsic switch for branch elimination
+                    {
+                        // 64-bit: Loop longs by default
+                        while (input <= end - sizeof(long))


This clause should be end - input >= sizeof(long).

GrabYourPitchforks · 2020-01-10T20:52:12Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

+            do
+            {
+                // If Vector not-accelerated or remaining less than vector size
+                if (!Vector.IsHardwareAccelerated || input > end - Vector<sbyte>.Count)


This clause should be end - input <= Vector<sbyte>.Count.

GrabYourPitchforks · 2020-01-10T20:53:08Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

+                            input += sizeof(long);
+                            output += sizeof(long);
+                        }
+                        if (input <= end - sizeof(int))


end - input >= sizeof(int)

GrabYourPitchforks · 2020-01-10T20:56:34Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

+                    }
+                    else
+                    {
+                        // 32-bit: Loop ints by default


There's no need for this branch. Just do loops of stride 8 on all platforms, and ignore the IntPtr.Size value.

GrabYourPitchforks · 2020-01-10T20:56:55Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

+                            output += sizeof(int);
+                        }
+                    }
+                    if (input <= end - sizeof(short))


end - input >= sizeof(short)

GrabYourPitchforks · 2020-01-10T20:58:26Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

+                // do/while as entry condition already checked
+                do
+                {
+                    var vector = Unsafe.AsRef<Vector<sbyte>>(input);


Unsafe.ReadUnaligned, not Unsafe.AsRef.

GrabYourPitchforks · 2020-01-10T20:59:01Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

+                    var vector = Unsafe.AsRef<Vector<sbyte>>(input);
+                    Vector.Widen(
+                        vector,
+                        out Unsafe.AsRef<Vector<short>>(output),


out these results to a local, then Unsafe.WriteUnaligned the local to the destination address.

GrabYourPitchforks · 2020-01-10T20:59:24Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

+
+                    input += Vector<sbyte>.Count;
+                    output += Vector<sbyte>.Count;
+                } while (input <= end - Vector<sbyte>.Count);


end - input >= Vector<sbyte>.Count

GrabYourPitchforks · 2020-01-10T21:05:38Z

FWIW, my comments about using the pattern end - start >= length rather than start <= end - length is that it future-proofs the method against inputs like start = null, count = 0 being passed in the future. I realize that this isn't possible with the current PR, but utility methods like this have a tendency to be copied to other libraries or to gain more callers in the future, and those callers might not know about the implied "input cannot be null" assumption within this method. Rewriting the bounds checks as suggested here will future-proof the code against this possibility so that we don't inadvertently cause a buffer overrun in the future.

halter73 · 2020-01-10T21:13:03Z

@GrabYourPitchforks GetLatin1String is a copy of TryGetAsciiString. I plan on leaving it as-is in this patch assuming there's nothing actually broken with this implementation as it's being used today.

@gfoidl was looking at optimizing TryGetAsciiString with #17556. I was planning on merging #17556 into master soon and then changing the GetLatin1String implementation to match. Can you review #17556? Thanks.

GrabYourPitchforks · 2020-01-10T21:15:29Z

@halter73 Then at the very least Debug.Assert at the start of the method that neither input is null. This will make clear that the method isn't resilient to this case.

src/Servers/Kestrel/Core/src/Internal/Infrastructure/HttpUtilities.cs

lodejard · 2020-01-10T21:59:00Z

@GrabYourPitchforks GetLatin1String is a copy of TryGetAsciiString. I plan on leaving it as-is in this patch assuming there's nothing actually broken with this implementation as it's being used today.

I like that approach, having latin1 as a parity copy of ascii and optimizing them together or becoming direct calls to core API when appropriate.

Adding to that - for partity it would be better to have TryGetLatin1String with its own copy of CheckBytesInLatin1Range -- it will still need isValid to be false for 0x00-0x1f.

If TryGetLatin1String only was called by a value.GetLatin1StringNonNullCharacters() (when useLatin1 is true) then there is no fallback logic needed at all and it will have the same performance as the ascii+utf8 case.

It would also not change the existing ascii+utf8 extension method's structure at all, which could be a small plus because that's the code path used by all existing customers.

halter73 · 2020-01-10T23:08:53Z

@lodejard I added GetLatin1StringNonNullCharacters like you suggested.

Adding to that - for partity it would be better to have TryGetLatin1String with its own copy of CheckBytesInLatin1Range -- it will still need isValid to be false for 0x00-0x1f.

TryGetLatin1String now calls into CheckBytesNotNull. CheckBytesNotNull doesn't verify there are no bytes in the 0x01-0x1f range like CheckBytesInLatin1Range would, because TryGetAsciiString also doesn't verify it.

HttpParser.TakeHeader rejects carriage returns inside a header, but that's it. While I agree it would be better to reject these characters, I don't want to start rejecting requests that would have been accepted in the default ASCII fast path as part of a patch.

lodejard · 2020-01-11T03:15:18Z

@halter73 That makes sense - no reason to put them out of sync for that. Thanks!

Did you see the other remark about http1? I forgot to tag you so not sure if it was missed...

halter73 · 2020-01-13T19:08:41Z

Did you see the other remark about http1? I forgot to tag you so not sure if it was missed...

I saw that. I now support latin1 encoding for http/2 request headers as well. I added tests for both HTTP/1 and HTTP/2.

Tratcher · 2020-01-13T19:41:08Z

src/Servers/Kestrel/test/InMemory.FunctionalTests/Http2/Http2StreamTests.cs

+                new KeyValuePair<string, string>(HeaderNames.Method, "GET"),
+                new KeyValuePair<string, string>(HeaderNames.Path, "/"),
+                new KeyValuePair<string, string>(HeaderNames.Scheme, "http"),
+                // The HPackEncoder will encode £ as 0xA3 aka Latin1 encoding.


It's interesting that the encoder didn't need to be modified.

Yeah. I though I was going to have to use a custom HPackEncoder, but it turns out that it's just casting chars to bytes. The response header's are normally validated by Kestrel's HttpResponseHeaders dictionary.

jkotalik · 2020-01-13T19:43:31Z

@lodejard I chatted with @halter73 and I asked what can we do to let you dogfood this change to make sure it meets your expectations? Can you easily try a nightly build in your current setup?

analogrelay · 2020-01-13T20:47:01Z

TODO: Add tests.

TODONE? Or still TODO?

halter73 · 2020-01-14T00:56:10Z

TODONE

jkotalik · 2020-01-14T18:02:14Z

src/Servers/Kestrel/Core/src/Internal/Infrastructure/StringUtilities.cs

@@ -115,6 +117,111 @@ public static unsafe bool TryGetAsciiString(byte* input, char* output, int count
            return isValid;
        }

+        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
+        public static unsafe bool TryGetLatin1String(byte* input, char* output, int count)


Why are we using unsafe code here instead of Span and Span? To match TryGetAsciiString?

Yeah. I'm trying to keep this as close to TryGetAsciiString as possible since it's conceptually quite similar.

leecow · 2020-02-04T18:50:18Z

[Tactics] - ensure Bing validates.

ghost · 2020-02-06T22:11:24Z

Hello @halter73!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

ghost · 2020-02-06T22:11:30Z

Apologies, while this PR appears ready to be merged, it looks like release/3.1 is a protected branch and I have not been granted permission to perform the merge.

halter73 · 2020-02-06T22:11:52Z

Bing confirmed this change works for them. Can you please merge this @anurse?

analogrelay · 2020-02-06T23:06:57Z

Branches aren't open for March, so no :). I'll do so as soon as they open though!

analogrelay · 2020-02-14T17:32:31Z

Slightly nervous about the passing build being a month ago, but I'll merge it ;)

halter73 added 2 commits January 9, 2020 19:06

Add option to interpret request headers as Latin1

a8968a7

Work around inability to make signature changes visible to tests

f8c8829

halter73 requested a review from Tratcher January 10, 2020 03:57

ghost added the area-servers label Jan 10, 2020

Tratcher reviewed Jan 10, 2020

View reviewed changes

gfoidl reviewed Jan 10, 2020

View reviewed changes

GrabYourPitchforks reviewed Jan 10, 2020

View reviewed changes

lodejard reviewed Jan 10, 2020

View reviewed changes

src/Servers/Kestrel/Core/src/Internal/Infrastructure/HttpUtilities.cs Outdated Show resolved Hide resolved

Address PR feedback

e24e0bc

halter73 force-pushed the halter73/17399-3.1 branch from 79b68cc to e24e0bc Compare January 10, 2020 23:03

halter73 added 3 commits January 10, 2020 16:01

Cleanup

6db7e3b

Fix Vector.Widen logic to not fill first byte

15dc9ae

Add tests

9a4d28b

halter73 marked this pull request as ready for review January 11, 2020 01:57

halter73 requested review from analogrelay and jkotalik as code owners January 11, 2020 01:57

Add HTTP/2 tests

273f489

halter73 added the feature-kestrel label Jan 13, 2020

Tratcher approved these changes Jan 13, 2020

View reviewed changes

jkotalik reviewed Jan 14, 2020

View reviewed changes

analogrelay added the Servicing-consider Shiproom approval is required for the issue label Jan 15, 2020

analogrelay added this to the 3.1.x milestone Jan 15, 2020

leecow added Servicing-approved Shiproom has approved the issue and removed Servicing-consider Shiproom approval is required for the issue labels Feb 4, 2020

leecow modified the milestones: 3.1.x, 3.1.3 Feb 4, 2020

halter73 added the auto-merge label Feb 6, 2020

analogrelay removed the auto-merge label Feb 14, 2020

analogrelay merged commit 7fa6d19 into release/3.1 Feb 14, 2020

analogrelay deleted the halter73/17399-3.1 branch February 14, 2020 17:32

analogrelay mentioned this pull request Feb 14, 2020

Add an option to interpret request headers as Latin1 encoded #17399

Closed

JamesNK mentioned this pull request Feb 25, 2020

HTTP2: Fix HttpRequestHeaders no longer being pooled #19329

Merged

jkotalik mentioned this pull request Jun 8, 2020

Allow ANCM in-process to interpret request headers as Latin1 encoded #22675

Closed

GrabYourPitchforks referenced this pull request in GrabYourPitchforks/runtime Apr 15, 2021

Add internal "skip zero-init" code path to string allocation

dd5012c

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 2, 2023

	if (Bmi2.X64.IsSupported)
	{
	// BMI2 will work regardless of the processor's endianness.
	((ulong*)output)[0] = Bmi2.X64.ParallelBitDeposit((ulong)value, 0x00FF00FF_00FF00FFul);
	((ulong*)output)[1] = Bmi2.X64.ParallelBitDeposit((ulong)(value >> 32), 0x00FF00FF_00FF00FFul);
	}
	else
	{
	output[0] = (char)input[0];
	output[1] = (char)input[1];
	output[2] = (char)input[2];
	output[3] = (char)input[3];
	output[4] = (char)input[4];
	output[5] = (char)input[5];
	output[6] = (char)input[6];
	output[7] = (char)input[7];
	}

Add option to interpret request headers as Latin1 #18255

Add option to interpret request headers as Latin1 #18255

Conversation

halter73 commented Jan 10, 2020 • edited by analogrelay Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfoidl commented Jan 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GrabYourPitchforks commented Jan 10, 2020

halter73 commented Jan 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GrabYourPitchforks commented Jan 10, 2020

halter73 commented Jan 10, 2020

GrabYourPitchforks commented Jan 10, 2020

lodejard commented Jan 10, 2020

halter73 commented Jan 10, 2020

lodejard commented Jan 11, 2020

halter73 commented Jan 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotalik commented Jan 13, 2020

analogrelay commented Jan 13, 2020

halter73 commented Jan 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leecow commented Feb 4, 2020

ghost commented Feb 6, 2020

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

ghost commented Feb 6, 2020

halter73 commented Feb 6, 2020

analogrelay commented Feb 6, 2020

analogrelay commented Feb 14, 2020

halter73 commented Jan 10, 2020 •

edited by analogrelay

Loading

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.