Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect results with differently ordered coords #360

Closed
khaeru opened this issue Jun 19, 2020 · 4 comments
Closed

Incorrect results with differently ordered coords #360

khaeru opened this issue Jun 19, 2020 · 4 comments
Labels
bug Indicates an unexpected problem or unintended behavior upstream

Comments

@khaeru
Copy link

khaeru commented Jun 19, 2020

Describe the bug
Multiplication of two sparse-backed xr.DataArray with differently-ordered coords on matching dimensions produces incorrect results, silently.

To Reproduce
Consider two CSV files.

  • These are based on an in-use example, but I've anonymized the data for simplicity.
  • Focus on the coordinates (a0, b2, c3, c3, d0, e0); the values in the files are, respectively, about 211 and 50, giving a product of about 10556.
  • Note that the "b" dimension labels appear in a different order in the two files.
a.csv
a,b,cx,cy,d,e,value
a0,b1,c0,c1,d0,e0,15.525114
a0,b1,c1,c1,d0,e0,65.011181
a0,b1,c1,c2,d0,e0,65.011181
a0,b1,c2,c2,d0,e0,81.430746
a0,b1,c2,c3,d0,e0,81.430746
a0,b1,c3,c3,d0,e0,129.680365
a0,b3,c0,c1,d0,e0,10.350076
a0,b3,c1,c1,d0,e0,20.224739
a0,b3,c1,c2,d0,e0,20.224739
a0,b3,c2,c2,d0,e0,0.000000
a0,b3,c2,c3,d0,e0,0.000000
a0,b3,c3,c3,d0,e0,0.000000
a0,b2,c1,c1,d0,e0,111.111111
a0,b2,c2,c2,d0,e0,166.666667
a0,b2,c3,c3,d0,e0,211.111111
a0,b0,c1,c1,d0,e0,100.000000
a0,b0,c2,c2,d0,e0,150.000000
a0,b0,c3,c3,d0,e0,190.000000
b.csv
a,b,cx,cy,d,e,value
a0,b1,c0,c1,d0,e0,30.0
a0,b1,c0,c2,d0,e0,30.0
a0,b1,c0,c3,d0,e0,30.0
a0,b1,c1,c1,d0,e0,30.0
a0,b1,c1,c2,d0,e0,30.0
a0,b1,c1,c3,d0,e0,30.0
a0,b1,c2,c2,d0,e0,30.0
a0,b1,c2,c3,d0,e0,30.0
a0,b1,c3,c3,d0,e0,30.0
a0,b2,c0,c1,d0,e0,50.0
a0,b2,c0,c2,d0,e0,50.0
a0,b2,c0,c3,d0,e0,50.0
a0,b2,c1,c1,d0,e0,50.0
a0,b2,c1,c2,d0,e0,50.0
a0,b2,c1,c3,d0,e0,50.0
a0,b2,c2,c2,d0,e0,50.0
a0,b2,c2,c3,d0,e0,50.0
a0,b2,c3,c3,d0,e0,50.0

and the code:

import pandas as pd
import xarray as xr

SPARSE = True

A = xr.DataArray.from_series(
    pd.read_csv("a.csv", index_col=list(range(6)))["value"],
    sparse=SPARSE,
)
B = xr.DataArray.from_series(
    pd.read_csv("b.csv", index_col=list(range(6)))["value"],
    sparse=SPARSE,
)

result = A * B
if SPARSE:
    result.data = result.data.todense()

print(result.to_series())
print(result.sel(a="a0", b="b2", cx="c3", cy="c3", d="d0", e="e0"))
  • When run with SPARSE = False, the final line prints:

    <xarray.DataArray 'value' ()>
    array(10555.55555)
    Coordinates:
        b        <U2 'b2'
        a        <U2 'a0'
        cx       <U2 'c3'
        cy       <U2 'c3'
        d        <U2 'd0'
        e        <U2 'e0'
    
  • When run with SPARSE = True, the result is instead, and incorrectly: array(nan).

    The second-to-last line shows that the result contains only the labels 'b1' and 'b2', but all the values for b='b2' are NaN.

Expected behavior
Correct results, or at least an exception or error message indicating that the results will not be correct.

System
Ubuntu 20.04

~/tmp$ python3 -c "import sparse; print(sparse.__version__)"
0.10.0
~/tmp$ python3 -c "import numba; print(numba.__version__)"
0.49.1
xarray.show_versions()
~/tmp$ python3 -c "import xarray; xarray.show_versions()"

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.2 (default, Apr 27 2020, 15:53:34) 
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_CA.UTF-8
LOCALE: en_CA.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.7.3

xarray: 0.15.1
pandas: 1.0.3
numpy: 1.17.4
scipy: 1.3.3
netCDF4: 1.5.3
pydap: None
h5netcdf: 0.7.1
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.1.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.17.2
distributed: None
matplotlib: 3.2.1
cartopy: 0.17.0
seaborn: 0.10.0
numbagg: None
setuptools: 45.2.0
pip: 20.0.2
conda: None
pytest: 5.4.2
IPython: 7.15.0
sphinx: 3.0.4
@khaeru khaeru added the bug Indicates an unexpected problem or unintended behavior label Jun 19, 2020
@hameerabbasi
Copy link
Collaborator

Possible duplicate of #340. I'm not too familiar with XArray internals, is it possible for you to post a minimal verifiable example using only sparse? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

@dcherian
Copy link

This looks like pydata/xarray#4019 but I am not sure.

@khaeru can you try on xarray master? If that doesn't work, then we'll have to narrow it down to an xarray issue or sparse issue.

@khaeru
Copy link
Author

khaeru commented Jun 19, 2020

@hameerabbasi @dcherian thanks for the quick replies, and sorry about my confusion with where the bug lies. I'll try xarray master and get back to you.

@khaeru
Copy link
Author

khaeru commented Jun 19, 2020

Can't reproduce this with xarray master; with SPARSE = True I get the correct results.

Thanks again for the quick follow-up! I'll wait for the next xarray release

@khaeru khaeru closed this as completed Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior upstream
Projects
None yet
Development

No branches or pull requests

3 participants