Skip to content

[Kestrel] Allow UTF-8 characters in request URL #17402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
analogrelay opened this issue Nov 25, 2019 · 7 comments
Closed

[Kestrel] Allow UTF-8 characters in request URL #17402

analogrelay opened this issue Nov 25, 2019 · 7 comments
Labels
affected-very-few This issue impacts very few customers area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions enhancement This issue represents an ask for new feature or an enhancement to an existing one severity-major This label is used by an internal tool
Milestone

Comments

@analogrelay
Copy link
Contributor

analogrelay commented Nov 25, 2019

We want to support incoming UTF-8 characters in request URLs. We believe other servers support this, and need to do a little research to confirm that.

Today, URL characters must be in the range 0x00-0x7F (excluding control chars). With this change, Kestrel would accept bytes over 0x7F and expose them in the URL string as percent-encoded values. For example, consider

  • Original requested URL: /aéi
  • As UTF-8 bytes: 0x2F 0x61 0xC3 0xA9 0x69
  • IHttpRequestFeature.RawTarget would return: /a%C3%A9i
  • All other derived values, which normally decode percent-encoded raw URLs, would see the original intended value: /aéi

To be clear: This behavior will be off-by-default and will be enabled via an option.

@GrabYourPitchforks
Copy link
Member

Just to clarify from our earlier discussions, you're allowing any non-ASCII character to come through, and you're %-escaping it without regard to whether it's valid UTF-8, correct?

For example, original requested URL: /[FF][FF][FF]
As bytes: [ 2F FF FF FF ]
RawTarget would return: /%FF%FF%FF

And then some other component in the system would eventually say "hmm, that's not well-formed UTF-8" and kick out the request?

@blowdart
Copy link
Contributor

And also it needs to be optional, with the default being off.

@lodejard
Copy link
Contributor

Just to clarify from our earlier discussions, you're allowing any non-ASCII character to come through, and you're %-escaping it without regard to whether it's valid UTF-8, correct?

@GrabYourPitchforks yes - that's what I understood. To be honest I'd even frame the feature as:

Escape any start-line-url octets 0x80 to 0xFF with ASCII sequence '%' '8' '0' to '%' 'F' 'F'

But I believe that's exactly right. At the moment of start-line parsing any mention of non-ASCII character codes is miles away. It's not until the raw url is passed to a Uri or PathString or QueryString that the percents become unicode chars.

And also it needs to be optional, with the default being off.

@blowdart that sounds right - the client has produced a technically malformed request and the web site author should be making an informed choice to accept them. In the fullness of time the default could change, but you'd want to be dead certain there are zero security implications.

@lodejard
Copy link
Contributor

lodejard commented Nov 27, 2019

Actually, come to think of it... I think @Tratcher mentioned that versions of clients are known to send octets 0x80 to 0xFF in headers like Host and Referer (sic) when international characters are UTF-8 encoded but not percent escaped.

It might be a good idea to consider a new semi-related issue for that case. Like: For Host and Referer headers (known in advance to contain Uri components) escape header-value octets 0x80 to 0xFF with ASCII sequence '%' '8' '0' to '%' 'F' 'F' prior to decoding the header-value string representation.

Same reasoning as percent escaping in the start-line url prior to character decoding. Because those headers are known to have a percent-encoded payload the server can provide a higher-fidelity value that will be eventually handled as a proper escape, and at the same time the server layer is making fewer assumptions about the character encoding in effect. That's a much more conservative way to handle Referer high-order characters, and even produces more correct header string values. (They technically shouldn't have non-ASCII in their string form, which UTF8 decoding at the server level introduces).

This would work very well in combination with the other features like being in a latin1 request header mode. Host and Referer high-order octets would come through as percent escaped, pure ASCII which can be correctly decoded via UTF-8 by any reasonable Uri component.

@analogrelay analogrelay added this to the 5.0.0-preview1 milestone Dec 2, 2019
@analogrelay analogrelay removed this from the 5.0.0-preview1 milestone Mar 11, 2020
@shirhatti shirhatti added this to the Next sprint planning milestone Mar 25, 2020
@BrennanConroy
Copy link
Member

We'll follow up with customer to see if this is needed.

@ghost
Copy link

ghost commented Jul 24, 2020

We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process.

@jkotalik jkotalik added affected-very-few This issue impacts very few customers enhancement This issue represents an ask for new feature or an enhancement to an existing one severity-major This label is used by an internal tool labels Nov 13, 2020 — with ASP.NET Core Issue Ranking
@adityamandaleeka
Copy link
Member

We haven't gotten any requests for this feature so we're not planning to work on it at this time. Will re-open this if there is more demand.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2021
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affected-very-few This issue impacts very few customers area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions enhancement This issue represents an ask for new feature or an enhancement to an existing one severity-major This label is used by an internal tool
Projects
None yet
Development

No branches or pull requests

9 participants