Skip to content

Permit use of codec registry with N5Store #577

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 23, 2020
Merged

Permit use of codec registry with N5Store #577

merged 5 commits into from
Nov 23, 2020

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Jul 30, 2020

We are interested in using lossy jpeg compression for 2D / 3D uint8 data. A java implementation of this exists for n5: https://github.com/saalfeldlab/n5-jpeg, but zarr cannot decode these chunks because jpeg is not in the list of hard-coded compressors in zarr/n5.py . Because this jpeg compression is specialized to the point of being a hack, it doesn't make sense to add it to the list of hard-coded compressors. Instead, this PR enables reading/writing an N5Store with an arbitrary compressor.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • AppVeyor and Travis CI passes
  • Test coverage is 100% (Coveralls passes)

@pep8speaks
Copy link

pep8speaks commented Jul 30, 2020

Hello @d-v-b! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-30 13:42:10 UTC

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 17, 2020

ping @jakirkham @funkey, is there anything more that needs to be done here?

@joshmoore
Copy link
Member

Hi @d-v-b. I brought this up on the community call Wednesday. I'll try to summarize the discussion, but @perlman and @jakirkham should feel to correct me. Thoughts in order from least work but also least value rising, to highest:

  • As written, the change makes zarr.n5 very permissive. This feels like the hack is extending out to the user and could lead to surprises. I don't feel like I can make that call (e.g. if it's a must from your side) but my preference would be to not knee cap ourselves that way.
  • I could see having an API method which would allow either setting permissivity or even allowing to add at codec at runtime to the registry.
  • @jakirkham had previously looked into a JPEG2000 codec in Using JPEG2000 for chunk compression  numcodecs#73 . There of course going to be limitations with any JPEG-like codec, but if it were possible to encode parameters which would make your chunks readable, that'd be pretty cool.

~J.

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 20, 2020

Why should the n5 compressor logic restrict the user's options to guard against surprises, while the zarr compressor logic does not? My lossy jpeg compressor (https://github.com/d-v-b/zarr-jpeg) that inspired this PR seems just as likely to cause "surprises" whether I use the zarr or n5 format, but the vanilla zarr backend doesn't complain about it, while the n5 backend errors. So I don't understand why there should be an asymmetry in the restrictiveness in the compressor API between these two options.

@joshmoore
Copy link
Member

Hi Davis. Sounds like I may just be clueless. I agree with you that the behavior of the N5Store should be as close to the base zarr implementations as possible (and if we need to add API or fix an implementation, that we do that in sync)

My assumption was that zarr-python would throw an exception on running into a codec that it didn't know. (A quick test of modifying a .zarray file to change the code id gives me ValueError: codec not available: 'unknown'). Can you share a trivial .zarr and .n5 each with your codec for comparison? Is the reason your process isn't throwing because zarr_jpeg is registering itself? https://github.com/d-v-b/zarr-jpeg/blob/main/src/zarr_jpeg/zarr_jpeg.py#L44

All the being said, don't get me wrong: I find "This repo is inspired by the neuroglancer "precomputed" format, which uses jpeg encoding to compress chunks of imaging data." very exciting! (link)

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 23, 2020

Is the reason your process isn't throwing because zarr_jpeg is registering itself?

Yes. @jakirkham can shed more light on the details, but I think zarr-python is looking for compressors in the codec_registry mapping, the remaining validation is just duck typing:

codec_cls = codec_registry[compression]

The codec lookup occurs outside of n5.py, so the only thing my PR is doing (as far as I can tell) is enabling the full codec registry for n5 chunks, rather than a hand-picked list of codecs.

Copy link
Member

@joshmoore joshmoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha.

Behavior without your PR:

RuntimeError: Unknown compressor with id xblosc

Behavior with your PR:

ValueError: codec not available: 'xblosc'

So I agree this synchronizes zarr.n5.N5Store behavior with other stores for unknown codecs (:+1:) and I'm assuming then that installing zarr_jpeg will allow files with that codec to be read (untested).

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 23, 2020

Thanks! And I realize in retrospect that the title of the this PR was a source of confusion -- I should not have used the term "arbitrary compressors", but instead I should have referenced the codec registry. Sorry for being unclear.

@joshmoore joshmoore merged commit da4f790 into zarr-developers:master Nov 23, 2020
@joshmoore joshmoore changed the title enable arbitrary compressors for N5Store Permit use of codec registry with N5Store Nov 23, 2020
@d-v-b d-v-b deleted the more-n5-compressors branch November 23, 2020 22:22
@joshmoore
Copy link
Member

joshmoore commented Nov 23, 2020

@d-v-b, I've updated the commit message & title to "Permit use of codec registry with N5Store". If you don't think this captures this PR, please let us know so that @Carreau can prepare the changelog for v2.6 in gh-656

@Carreau Carreau added this to the v2.6 milestone Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants