Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict fastpath isel indexes to the case of all PandasIndex #10066

Merged
merged 3 commits into from
Feb 24, 2025

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Feb 21, 2025

Implement #10063 (comment). Since we use type(idx) is ... instead of isinstance(idx, ...) this should be even faster.

The only performance degradation would be in the case of single coordinate custom indexes implemented in 3rd-party libraries, which could go through the fastpath as well but is harder to detect, but I think that's ok for now.

@hmaarrfk @dcherian

@benbovy benbovy added the run-benchmark Run the ASV benchmark workflow label Feb 21, 2025
@benbovy benbovy mentioned this pull request Feb 21, 2025
9 tasks
@dcherian dcherian added the plan to merge Final call for comments label Feb 21, 2025
@hmaarrfk
Copy link
Contributor

Thank you for tackling this in a very cool way!!!

On main:

software_timestamp.nc
h5netcdf  lazy      : 100%|█████████████████| 100000/100000 [00:06<00:00, 16285.88it/s]
h5netcdf  computed  : 100%|█████████████████| 100000/100000 [00:05<00:00, 19672.38it/s]
netcdf4   lazy      : 100%|█████████████████| 100000/100000 [00:06<00:00, 16254.36it/s]
netcdf4   computed  : 100%|█████████████████| 100000/100000 [00:05<00:00, 19496.47it/s]
software_timestamp_float64.nc
h5netcdf  lazy      : 100%|█████████████████| 100000/100000 [00:06<00:00, 16260.72it/s]
h5netcdf  computed  : 100%|█████████████████| 100000/100000 [00:04<00:00, 20045.22it/s]
netcdf4   lazy      : 100%|█████████████████| 100000/100000 [00:06<00:00, 16329.55it/s]
netcdf4   computed  : 100%|█████████████████| 100000/100000 [00:05<00:00, 19780.63it/s]

On this branch:

software_timestamp.nc
h5netcdf  lazy      : 100%|█████████████████| 100000/100000 [00:06<00:00, 15767.95it/s]
h5netcdf  computed  : 100%|█████████████████| 100000/100000 [00:05<00:00, 19327.47it/s]
netcdf4   lazy      : 100%|█████████████████| 100000/100000 [00:06<00:00, 15904.21it/s]
netcdf4   computed  : 100%|█████████████████| 100000/100000 [00:05<00:00, 18807.63it/s]
software_timestamp_float64.nc
h5netcdf  lazy      : 100%|█████████████████| 100000/100000 [00:06<00:00, 15825.22it/s]
h5netcdf  computed  : 100%|█████████████████| 100000/100000 [00:05<00:00, 18786.63it/s]
netcdf4   lazy      : 100%|█████████████████| 100000/100000 [00:06<00:00, 15515.68it/s]
netcdf4   computed  : 100%|█████████████████| 100000/100000 [00:05<00:00, 19033.53it/s]

On v2024.03.0

software_timestamp.nc
h5netcdf  lazy      : 100%|█████████████████| 100000/100000 [00:09<00:00, 10237.03it/s]
h5netcdf  computed  : 100%|██████████████████| 100000/100000 [00:14<00:00, 6672.00it/s]
netcdf4   lazy      : 100%|██████████████████| 100000/100000 [00:10<00:00, 9718.41it/s]
netcdf4   computed  : 100%|██████████████████| 100000/100000 [00:14<00:00, 6923.85it/s]
software_timestamp_float64.nc
h5netcdf  lazy      : 100%|█████████████████| 100000/100000 [00:09<00:00, 10752.14it/s]
h5netcdf  computed  : 100%|█████████████████| 100000/100000 [00:05<00:00, 17995.04it/s]
netcdf4   lazy      : 100%|█████████████████| 100000/100000 [00:09<00:00, 10957.65it/s]
netcdf4   computed  : 100%|█████████████████| 100000/100000 [00:05<00:00, 17768.37it/s]

It seems that the performance improvements that were brought over 2024 are maintained (I think the 300its/s drop is real, but the main gains were in the 1000s of it/s in my benchmarks). In 2024 there was also a bug that made timestamps very slow to load in which is why my benchmark tests against data where the timestamp is also stored as a float64.

import xarray
from tqdm import tqdm

for filename in [
    'software_timestamp.nc',
    'software_timestamp_float64.nc',
]:
    print(filename)
    for engine in ['h5netcdf', 'netcdf4']:
        for computed in [False, True]:
            dataset = xarray.open_dataset(filename, engine=engine)
            N_frames = dataset.sizes['frame_number']
            N_iters = 100_000
            software_timestamp = dataset['software_timestamp']
            if computed:
                software_timestamp = software_timestamp.compute()

            if computed:
                computed_str = "computed"
            else:
                computed_str = "lazy"
            desc = f"{engine:10s}{computed_str:10s}"
            for i in tqdm(range(N_iters), desc=desc):
                software_timestamp.isel(frame_number=i % N_frames)

            dataset.close()

@benbovy
Copy link
Member Author

benbovy commented Feb 24, 2025

Looking at your benchmark, I think we could (should?) cache Dataset.xindexes, which creates one temporary dict and two other dicts for storing the indexes and variables in the Indexes object and which also does redundant checks. We would just need to ensure that in-place updating of the Dataset indexed coordinates will invalidate the cache.

We could also cache Indexes.group_by_index() (safe here since Indexes is immutable). In repeated calls most time is spent on creating the list and dict objects, since Indexes._id_coord_names is already cached.

I think this will provide benchmark results at least on par with the fastest ones above while this will work with all kinds of indexes (both single and multi-coordinate) and without duplicating _apply_indexes logic.

@benbovy benbovy mentioned this pull request Feb 24, 2025
1 task
@dcherian dcherian merged commit 2475d49 into pydata:main Feb 24, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

isel returns one index per coordinate for multi-coordinate custom indexes
3 participants