[Kestrel] Allow UTF-8 characters in request URL #17402

analogrelay · 2019-11-25T23:49:07Z

We want to support incoming UTF-8 characters in request URLs. We believe other servers support this, and need to do a little research to confirm that.

Today, URL characters must be in the range 0x00-0x7F (excluding control chars). With this change, Kestrel would accept bytes over 0x7F and expose them in the URL string as percent-encoded values. For example, consider

Original requested URL: /aéi
As UTF-8 bytes: 0x2F 0x61 0xC3 0xA9 0x69
IHttpRequestFeature.RawTarget would return: /a%C3%A9i
All other derived values, which normally decode percent-encoded raw URLs, would see the original intended value: /aéi

To be clear: This behavior will be off-by-default and will be enabled via an option.

The text was updated successfully, but these errors were encountered:

GrabYourPitchforks · 2019-11-26T02:09:36Z

Just to clarify from our earlier discussions, you're allowing any non-ASCII character to come through, and you're %-escaping it without regard to whether it's valid UTF-8, correct?

For example, original requested URL: /[FF][FF][FF]
As bytes: [ 2F FF FF FF ]
RawTarget would return: /%FF%FF%FF

And then some other component in the system would eventually say "hmm, that's not well-formed UTF-8" and kick out the request?

blowdart · 2019-11-26T14:08:11Z

And also it needs to be optional, with the default being off.

lodejard · 2019-11-27T00:16:37Z

Just to clarify from our earlier discussions, you're allowing any non-ASCII character to come through, and you're %-escaping it without regard to whether it's valid UTF-8, correct?

@GrabYourPitchforks yes - that's what I understood. To be honest I'd even frame the feature as:

Escape any start-line-url octets 0x80 to 0xFF with ASCII sequence '%' '8' '0' to '%' 'F' 'F'

But I believe that's exactly right. At the moment of start-line parsing any mention of non-ASCII character codes is miles away. It's not until the raw url is passed to a Uri or PathString or QueryString that the percents become unicode chars.

And also it needs to be optional, with the default being off.

@blowdart that sounds right - the client has produced a technically malformed request and the web site author should be making an informed choice to accept them. In the fullness of time the default could change, but you'd want to be dead certain there are zero security implications.

lodejard · 2019-11-27T00:43:59Z

Actually, come to think of it... I think @Tratcher mentioned that versions of clients are known to send octets 0x80 to 0xFF in headers like Host and Referer (sic) when international characters are UTF-8 encoded but not percent escaped.

It might be a good idea to consider a new semi-related issue for that case. Like: For Host and Referer headers (known in advance to contain Uri components) escape header-value octets 0x80 to 0xFF with ASCII sequence '%' '8' '0' to '%' 'F' 'F' prior to decoding the header-value string representation.

Same reasoning as percent escaping in the start-line url prior to character decoding. Because those headers are known to have a percent-encoded payload the server can provide a higher-fidelity value that will be eventually handled as a proper escape, and at the same time the server layer is making fewer assumptions about the character encoding in effect. That's a much more conservative way to handle Referer high-order characters, and even produces more correct header string values. (They technically shouldn't have non-ASCII in their string form, which UTF8 decoding at the server level introduces).

This would work very well in combination with the other features like being in a latin1 request header mode. Host and Referer high-order octets would come through as percent escaped, pure ASCII which can be correctly decoded via UTF-8 by any reasonable Uri component.

BrennanConroy · 2020-05-11T21:17:52Z

We'll follow up with customer to see if this is needed.

ghost · 2020-07-24T22:07:57Z

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

adityamandaleeka · 2021-11-17T22:01:37Z

We haven't gotten any requests for this feature so we're not planning to work on it at this time. Will re-open this if there is more demand.

analogrelay added the area-servers label Nov 25, 2019

analogrelay added this to the 5.0.0-preview1 milestone Dec 2, 2019

analogrelay removed this from the 5.0.0-preview1 milestone Mar 11, 2020

shirhatti added this to the Next sprint planning milestone Mar 25, 2020

BrennanConroy modified the milestones: Next sprint planning, 5.0.0-preview6 May 11, 2020

BrennanConroy assigned shirhatti May 11, 2020

halter73 mentioned this issue Jun 29, 2020

[Kestrel] Support for custom decoder for parsing target (e.g. URL) #23490

Open

BrennanConroy modified the milestones: Next sprint planning, Backlog Jul 24, 2020

jkotalik added affected-very-few This issue impacts very few customers enhancement This issue represents an ask for new feature or an enhancement to an existing one severity-major This label is used by an internal tool labels Nov 13, 2020 — with ASP.NET Core Issue Ranking

adityamandaleeka unassigned shirhatti Nov 17, 2021

adityamandaleeka closed this as completed Nov 17, 2021

ghost locked as resolved and limited conversation to collaborators Dec 18, 2021

amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kestrel] Allow UTF-8 characters in request URL #17402

[Kestrel] Allow UTF-8 characters in request URL #17402

analogrelay commented Nov 25, 2019 •

edited

Loading

GrabYourPitchforks commented Nov 26, 2019

blowdart commented Nov 26, 2019

lodejard commented Nov 27, 2019

lodejard commented Nov 27, 2019 •

edited

Loading

BrennanConroy commented May 11, 2020

ghost commented Jul 24, 2020

adityamandaleeka commented Nov 17, 2021

[Kestrel] Allow UTF-8 characters in request URL #17402

[Kestrel] Allow UTF-8 characters in request URL #17402

Comments

analogrelay commented Nov 25, 2019 • edited Loading

GrabYourPitchforks commented Nov 26, 2019

blowdart commented Nov 26, 2019

lodejard commented Nov 27, 2019

lodejard commented Nov 27, 2019 • edited Loading

BrennanConroy commented May 11, 2020

ghost commented Jul 24, 2020

adityamandaleeka commented Nov 17, 2021

analogrelay commented Nov 25, 2019 •

edited

Loading

lodejard commented Nov 27, 2019 •

edited

Loading