Skip to content

TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.blosc.Blosc'> instead. #10032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
leoniewgnr opened this issue Feb 6, 2025 · 12 comments
Labels
bug topic-zarr Related to zarr storage library

Comments

@leoniewgnr
Copy link

This code runs without any problems with zarr2, but give the following error when running with zarr3:

import pandas as pd
import numpy as np
import xarray as xr
from numcodecs.blosc import Blosc

ds = xr.Dataset(
    {"foo": (("x", "y"), np.random.rand(4, 5))},
    coords={
        "x": [10, 20, 30, 40],
        "y": pd.date_range("2000-01-01", periods=5),
        "z": ("x", list("abcd")),
    },
)

tmp_path = 'tmp.zarr'

# this works
ds.to_zarr(tmp_path, mode="w")
print('Saved to tmp.zarr')

# this does not work 
compressor = Blosc(cname="zstd", clevel=3, shuffle=2)
ds.to_zarr(tmp_path, encoding={"foo": {"compressor": compressor}}, mode="w")
print('Saved to tmp.zarr')

The error message is: TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.blosc.Blosc'> instead.
The same error occurs in the documentation: https://docs.xarray.dev/en/stable/user-guide/io.html#zarr-compressors-and-filters

Copy link

welcome bot commented Feb 6, 2025

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@TomNicholas TomNicholas added topic-zarr Related to zarr storage library bug labels Feb 6, 2025
@TomNicholas
Copy link
Member

Thanks for raising this @leoniewgnr ! We're still hunting down all the bugs that the move to zarr 3 created.

The same error occurs in the documentation:

That's particularly weird - errors in the documentation examples are supposed to lead to errors in the CI...

@keewis
Copy link
Collaborator

keewis commented Feb 6, 2025

See also #9987

@FedeMPouzols
Copy link

I think this example needs to be updated for zarr-python 3. Something like this works for me:

diff --git a/doc/user-guide/io.rst b/doc/user-guide/io.rst
index 986d43ce..7f5d6e2b 100644
--- a/doc/user-guide/io.rst
+++ b/doc/user-guide/io.rst
@@ -829,10 +829,10 @@ For example:
     :okwarning:
 
     import zarr
-    from numcodecs.blosc import Blosc
+    from zarr.codecs import BloscCodec
 
-    compressor = Blosc(cname="zstd", clevel=3, shuffle=2)
-    ds.to_zarr("foo.zarr", encoding={"foo": {"compressor": compressor}})
+    compressor = BloscCodec(cname="zstd", clevel=3, shuffle="shuffle")
+    ds.to_zarr("foo.zarr", encoding={"foo": {"compressors": (compressor,)}})
 
 .. note::

(this is my best guess based on what I see in the backend tests some Zarr v3 related PRs. In this particular case, {"compressor": compressor} (without tuple) seems to also work.).

Perhaps @d-v-b can confirm this is now the proper way to specify encoders/help with this?

@d-v-b
Copy link
Contributor

d-v-b commented Feb 9, 2025

that looks right, although I'm not too familiar with what ds.to_zarr is doing under the hood. The basic idea in zarr v3 is that there can be multiple codecs that transform an array after it has been flattened to a byte stream (alternately called "compressors" or "BytesBytesCodec"), hence the tuple. but we also accept a single codec, which we will wrap in a tuple.

@fowlerovski
Copy link

My situation with numcodecs 0.15.1 and Zarr 3.0.3 mirrors this: BytesBytesCodec is unavailable in numcodecs.abc, and even numcodecs.Blosc is rejected with TypeError: Expected a BytesBytesCodec.

@roansong
Copy link

roansong commented Feb 20, 2025

I'm running into this as well, even when using numcodecs.zarr3.Blosc or zarr.codecs.BloscCodec.

@aurelgriesser
Copy link

@FedeMPouzols, when I tried your suggested {"compressors": (compressor,)} form (with tuple value and now plural key "compressors" instead of the older singular form), I still get the "TypeError: Expected a BytesBytesCodec" of leoniewgnr.
Ta.

@rsignell
Copy link

rsignell commented Mar 4, 2025

Didn't work for me either -- here's a reproducible example notebook: https://nbviewer.org/gist/rsignell/066cc39664a0c8b7fe70be1fd7d7e0cb

@jensdebruijn
Copy link

jensdebruijn commented Mar 14, 2025

Edited, because much simpler solution below.

It actually seems that the error is not with the compressed data array but with the coords. Xarray's default BloscCodec (used when no compressor is specified) inherits from numcodecs.abc.Codec, while it should inherit from zarr.abc.codec.BytesBytesCodec for zarr v3 to pass the isinstance assert. It also seems that xarray compresses the coords as well by default, thus using the default compressor that incorrectly inherits from the numcodecs.abc.Codec.

The (temporary) solution I found is to use zarr.codecs.BloscCodec for the data var like you would expect, and explicitly tell xarray not to compress the coordinates like so:

from zarr.codecs import BloscCodec

encoding = {
    "data": {
        "compressor": BloscCodec(
            cname="zstd",
            clevel=6,
        ),
    }
}
for coord in da.coords:
    encoding[coord] = {"compressor": None}

If you do want to compress the coords specifying the compressor explicitly from zarr.codecs should also likely work (not tested).

@jensdebruijn
Copy link

Actually, the solution is a lot simpler, the codecs should be imported from numcodecs.zarr3 and it will work. We could maybe consider giving a clear warning and solution in the error message?

from numcodecs.zarr3 import Blosc

@tinaok
Copy link

tinaok commented Mar 22, 2025

I also ran into this issue when trying to load zarr v2 Datatree and write a DataTree to a Zarr v3 store. As suggested, I tried using:

from numcodecs.zarr3 import Blosc

But this gave me the following warning:

/srv/conda/envs/notebook/lib/python3.12/site-packages/numcodecs/zarr3.py:133: UserWarning: Numcodecs codecs are not in the Zarr version 3 specification and may not be supported by other zarr implementations.
super().init(**codec_config)

May be I misunderstood something, but
to avoid this warning and still use a Zarr v3-compatible compressor, I switched to the last suggestion in this issue, using

from zarr.codecs import BloscCodec

Here’s what worked for me with Sentinel 1 sample data from EOPF Zarr sample service :

import xarray as xr
#S1A\_IW\_GRDH\_1SDV\_20240201T164915\_20240201T164940\_052368\_065517\_750E.SAFE  
path = (
"https://objectstore.eodc.eu:2222/e05ab01a9d56408d82ac32d69a5aae2a:sample-data/tutorial_data/"
"cpm_v253/S1A_IW_GRDH_1SDV_20240201T164915_20240201T164940_052368_065517_750E.zarr"
)
s1_grdh = xr.open_datatree(path, engine="zarr",chunks={})
#s1_grdh.to_zarr('s1_grdh_z2.zarr', zarr_format=2,mode='w')  ok
#s1_grdh.to_zarr('s1_grdh_z3.zarr', zarr_format=3,mode='w')  not ok
print(s1_grdh['/S01SIWGRD_20240201T164915_0025_A299_750E_065517_VH/measurements']['grd'].encoding['compressors'])
from pathlib import PurePosixPath
#from numcodecs.zarr3 import Blosc
#compressor = Blosc(cname="zstd", clevel=3,shuffle=2, blocksize=0 )  #warning
from zarr.codecs import BloscCodec
compressor = BloscCodec(cname="zstd", clevel=3,shuffle='bitshuffle', blocksize=0 )

encoding = {}

for node in s1_grdh.subtree:
    if node.ds is not None:
        group = str(node.path) if node.path != PurePosixPath("") else "."
        encoding[group] = {}
        for var in node.ds.data_vars:
            encoding[group][var] = {"compressors": [compressor]}
        for coord in node.ds.coords:
            encoding[group][coord] = {"compressors": [compressor]}
s1_grdh.to_zarr("s1_grdh_v3.zarr", zarr_format=3, encoding=encoding, mode="w")

This worked for me with only warnings on consolidated metadata issue and produced a valid Zarr v3 store. Anyone has suggestions to avoid this messy re-write of encoding for datatree??

P.s I used
zarr: 3.0.6
numcodecs: 0.15.1
xarray: 2025.3.0

P.P.s and thank you xarray developers to make datatree works with zarr3!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-zarr Related to zarr storage library
Projects
None yet
Development

No branches or pull requests