-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[Kestrel] Allow UTF-8 characters in request URL #17402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just to clarify from our earlier discussions, you're allowing any non-ASCII character to come through, and you're %-escaping it without regard to whether it's valid UTF-8, correct? For example, original requested URL: And then some other component in the system would eventually say "hmm, that's not well-formed UTF-8" and kick out the request? |
And also it needs to be optional, with the default being off. |
@GrabYourPitchforks yes - that's what I understood. To be honest I'd even frame the feature as: Escape any start-line-url octets But I believe that's exactly right. At the moment of start-line parsing any mention of non-ASCII character codes is miles away. It's not until the raw url is passed to a
@blowdart that sounds right - the client has produced a technically malformed request and the web site author should be making an informed choice to accept them. In the fullness of time the default could change, but you'd want to be dead certain there are zero security implications. |
Actually, come to think of it... I think @Tratcher mentioned that versions of clients are known to send octets It might be a good idea to consider a new semi-related issue for that case. Like: For Host and Referer headers (known in advance to contain Uri components) escape header-value octets Same reasoning as percent escaping in the start-line url prior to character decoding. Because those headers are known to have a percent-encoded payload the server can provide a higher-fidelity value that will be eventually handled as a proper escape, and at the same time the server layer is making fewer assumptions about the character encoding in effect. That's a much more conservative way to handle Referer high-order characters, and even produces more correct header string values. (They technically shouldn't have non-ASCII in their string form, which UTF8 decoding at the server level introduces). This would work very well in combination with the other features like being in a latin1 request header mode. Host and Referer high-order octets would come through as percent escaped, pure ASCII which can be correctly decoded via UTF-8 by any reasonable Uri component. |
We'll follow up with customer to see if this is needed. |
We've moved this issue to the Backlog milestone. This means that it is not going to be worked on for the coming release. We will reassess the backlog following the current release and consider this item at that time. To learn more about our issue management process and to have better expectation regarding different types of issues you can read our Triage Process. |
We haven't gotten any requests for this feature so we're not planning to work on it at this time. Will re-open this if there is more demand. |
We want to support incoming UTF-8 characters in request URLs. We believe other servers support this, and need to do a little research to confirm that.
Today, URL characters must be in the range
0x00-0x7F
(excluding control chars). With this change, Kestrel would accept bytes over0x7F
and expose them in the URL string as percent-encoded values. For example, consider/aéi
0x2F 0x61 0xC3 0xA9 0x69
/a%C3%A9i
/aéi
To be clear: This behavior will be off-by-default and will be enabled via an option.
The text was updated successfully, but these errors were encountered: