Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netCDF: optimize loading of network located dataset with unlimited dimension #7737

Merged
merged 3 commits into from
May 23, 2023

Conversation

rouault
Copy link
Member

@rouault rouault commented May 10, 2023

Helps for https://lists.osgeo.org/pipermail/gdal-dev/2023-May/057209.html

The time dimension is unlimited, and is the one used to determine GDAL bands. Retrieving the values of the time variable for the metadata requires scanning a lot of places within the file, which is network access unfriendly.
So:

  • do not set the NETCDF_DIM_xxxx_VALUES dataset metadata item for such use case
  • and defer loading of the NETCDF_DIM_xxxx band metadata item until it is really required

All that combines enables access of a given band with a more reasonable amount of HTTP access.

  • 85 requests for gdalwarp "vrt://netCDF:\"/vsicurl/http://localhost:8080/tos_day_ACCESS1-3_rcp85_r1i1p1_20560101-20651231.nc\":tos?bands=1" out2.tif -overwrite
  • 19 requests for same invokation with GDAL_NETCDF_BOTTOMUP=NO env variable set (the output is slightly different, but that must be a side effect of the geoloc transformer)

@rouault rouault added this to the 3.8.0 milestone May 10, 2023
@rouault rouault force-pushed the netcdf_defered_band_metadata branch from ef8e145 to 8be3d1a Compare May 10, 2023 15:55
int Taken = 0;

for (int i = 0; i < nd - 2; i++)
{
int result;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't result being used uninitialized if the following condition is false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it is also initialized in the else branch at line 2055

@rouault rouault force-pushed the netcdf_defered_band_metadata branch from 8be3d1a to 0ae885e Compare May 10, 2023 16:22
@mdsumner
Copy link
Contributor

this is looking good, I'm not getting blocked from usage now :0

one side question, when I don't use the '/vsicurl' prefix on the bare url I get these vsimem truncated subdataset forms:

gdalinfo https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/netcdf/var_with_column.nc

Files: none associated
Size is 512, 512
Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"/vsimem/http_1/var_with_column.nc":"VAR:NAME"
  SUBDATASET_1_DESC=[2x2] VAR:NAME (32-bit floating-point)

with the prefix it's correct

gdalinfo /vsicurl/https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/netcdf/var_with_column.nc
Driver: netCDF/Network Common Data Format
Files: /vsicurl/https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/netcdf/var_with_column.nc
Size is 512, 512
Subdatasets:
  SUBDATASET_1_NAME=NETCDF:"/vsicurl/https://github.com/OSGeo/gdal/raw/master/autotest/gdrivers/data/netcdf/var_with_column.nc":"VAR:NAME"

is that just a "don't use bare url" situation?

@rouault
Copy link
Member Author

rouault commented May 11, 2023


is that just a "don't use bare url" situation?

yes, the HTTP driver is a bit of a hack, and is for one-time usage of small remote files, as it creates either a in-memory (or on-disk as fallback) temporary file with the whole content of the file before passing it to any other driver that accepts it. This doesn't fly well with subdatasets, at least in a command line usage. Within the same process, and if you keep the original dataset opened, opening the subdatasets could potentially work, but I'm not totally sure.

@mdsumner
Copy link
Contributor

awesome, as always much appreciated 🤘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants