-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pct encoder #147
Add pct encoder #147
Conversation
This diff reworks the current parser based on regexes in favor of one that uses Angstrom. It also removes the dependency on `ocaml-re`, moving the regular expressions to a new package `uri-re`. This happens to fix mirage#21.
This is useful for reusing the pct encoding function but with custom character sets. For example, RFC 6570 uses some non-standard pct encodings in some template expansions.
99665bc
to
750c3f1
Compare
@avsm I got this to pass tests. I don't know if what I did is correct, and I did not clean up all existing work (specifically there seems to be some But, this is at least mergable in the "build is green" sense of the word. |
I don't think deleting the legacy tests is accurate. I think the problem here is that the tests for the |
I deleted the legacy test because the Uri_legacy model couldn't be found on testing. Missing something else? Is it correct that the legacy tests are verifying that this new implementation matches the old implementation? If so, is that necessary? |
This is what I was trying to express above:
the |
Ah ok. Is that something you want to fix in your branch and I can rebase my changes on it? Or do you can just tell me what to do to fix it. |
I'm not sure how to fix it. I think it's a problem of how CI runs the tests, something along the lines of |
@avsm can you lend us any insight? |
This looks like some internal timeout in the build infrastructure, not sure @avsm
|
Rebuild the failing parts and they pass now. Strongly suspect there was a CI timeout environment issue. |
@tmcgilchrist Nice! Thank you. @avsm How's this change looking to you? |
@avsm @tmcgilchrist @anmonteiro One thing I noticed is the new parser means uri does not work in jsoo. Is this desired? |
@orbitz Could you clarify why that is the case? The only change that I could imagine would cause that is the switch to Angstrom, though that has JSOO support (see inhabitedtype/angstrom#187) |
@anmonteiro ahhhh you're right, I had some old dependencies. Upgraded and everything works. Any idea how to get this version of uri released? |
So, this PR has what @anmonteiro did on #142 if I understand correctly? If it's the case, can we close #142? Then, I can take my time to:
|
So it looks like I got bit by opam caching some results and it does not work. I get the following error:
I created the following topic to see if anyone has suggestions. https://discuss.ocaml.org/t/bigstringaf-blit-from-bytes-not-implemented/6524 |
I tried your PR (with let v = Uri.of_string "http://localhost/" in
Format.printf "%a\n%!" Uri.pp v Which is compiled with
It seems that all works for me when I do: $ nodejs _build/default/main.bs.js
https://localhost/ My version is |
All is good. With some help from @dinosaure I figured out the issue was on my end based on how I'm compiling my program. @dinosaure for your original points above: the interface to Uri should not have changed so I don't think releasing this will impact the ecosystem. |
Yes it's mostly about behavior when |
Let's start for a release 👍 |
Only one way to find out :-) It's all good to me by inspection. This is not a gater to the release, but I'd like to compare the old and new implementation using memtrace to check that the new one is more constant-memory than the old regex based one. |
CHANGES: * sexp: use the sexplib v0.13 ppx directives (@avsm, mirage/ocaml-uri#143). * rework the URI parser with `angstrom` (@anmonteiro, review @avsm & @dinosaure, mirage/ocaml-uri#142). * add simple fuzzer tests between `angstrom` parser and _legacy_ parser (with `re.posix`, mirage/ocaml-uri#142) * add support of modifying pct encoding (with a custom one) (@orbitz, review @anmonteiro, @tmcgilchrist, @avsm & @dinosaure, mirage/ocaml-uri#147) * allow the selection of generic set of safe characters (with `Generic`) (@madroach, review @dinosaure, mirage/ocaml-uri#141)
CHANGES: * sexp: use the sexplib v0.13 ppx directives (@avsm, mirage/ocaml-uri#143). * rework the URI parser with `angstrom` (@anmonteiro, review @avsm & @dinosaure, mirage/ocaml-uri#142). * add simple fuzzer tests between `angstrom` parser and _legacy_ parser (with `re.posix`, mirage/ocaml-uri#142) * add support of modifying pct encoding (with a custom one) (@orbitz, review @anmonteiro, @tmcgilchrist, @avsm & @dinosaure, mirage/ocaml-uri#147) * allow the selection of generic set of safe characters (with `Generic`) (@madroach, review @dinosaure, mirage/ocaml-uri#141)
CHANGES: * Do not mutate the base encoder when using custom percent encoders. This was a bug introduced in mirage/ocaml-uri#147. (mirage/ocaml-uri#156 @aantron) * Disable Travis CI tests and switch Win/Mac tests to GitHub Actions and Linux ones to ocaml-ci (@avsm).
@avsm So this implements the customization we talked about.
I think it works well for
to_string
but I feel it clutters up the individual string accessors. It also elucidates that not all component accessors perform pct encoding, I'm not sure if that is deliberate or not.What do you think?