Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected time resampling leads to extremely large dask graphs in default installation #2111

Open
2 tasks done
Hem-W opened this issue Mar 20, 2025 · 0 comments · May be fixed by #2112
Open
2 tasks done

Unexpected time resampling leads to extremely large dask graphs in default installation #2111

Hem-W opened this issue Mar 20, 2025 · 0 comments · May be fixed by #2112
Labels
bug Something isn't working

Comments

@Hem-W
Copy link
Contributor

Hem-W commented Mar 20, 2025

Setup Information

  • Xclim version: 0.55.0
  • Python version: 3.11.11
  • Operating System: Ubuntu 22.04

Description

When inputting dask datasets, setting freq in standardized indices (SPI and SPEI) could lead to unexpected extremely large graphs, which chokes the following computations. However, when the user use freq=None, this could be avoided.

Besides, this issue only occurs when flox is not installed (which is an optional package). So by default the user does not install this package (but would be installed in the development environment) and may input the same freq as the input dataset to accidentally trigger this large graph layers instead of using freq=None or installing flox to avoid this. Also, it would raise a UserWarning that is confusing to the user as the user may have complied with non-chunk time dimensions (UserWarning: The input data is chunked on time dimension and must be fully rechunked to run fit on groups . Beware, this operation can significantly increase the number of tasks dask has to handle.).

Steps To Reproduce

Please note that this only occurs when user uses the default installation (i.e. without flox installed)

import xarray as xr
from xclim.testing.utils import open_dataset
import xclim.indices as xci

da = open_dataset("sdba/CanESM2_1950-2100.nc").pr.resample(time="MS").mean().chunk({"time": -1, "location": 1})

spi_wFreq = xci.standardized_precipitation_index(pr=da, freq="MS", window=1, dist="gamma", method="ML")
spi_wFreq
Image

Additional context

Using freq=None or having flox installed will not lead to this issue but I guess this is not intuitive to users.

spi_woFreq = xci.standardized_precipitation_index(pr=da, freq=None, window=1, dist="gamma", method="ML")
spi_woFreq
Image

Contribution

  • I would be willing/able to open a Pull Request to address this bug.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Hem-W Hem-W added the bug Something isn't working label Mar 20, 2025
@Hem-W Hem-W linked a pull request Mar 20, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant