Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: add toy data #493

Merged
merged 25 commits into from
Jan 9, 2025
Merged

Feature: add toy data #493

merged 25 commits into from
Jan 9, 2025

Conversation

jsmariegaard
Copy link
Member

@jsmariegaard jsmariegaard commented Dec 26, 2024

>>> import modelskill as ms
>>> cc = ms.data.vistula()
>>> cc

<ComparerCollection>
Comparers:
0: Tczew - Discharge [m3/s]
1: Krasnystaw - Discharge [m3/s]
2: Sandomierz - Discharge [m3/s]
3: Szczucin - Discharge [m3/s]
4: Nowy Sacz - Discharge [m3/s]
5: Tryncza - Discharge [m3/s]
6: Ptaki - Discharge [m3/s]
7: Suraz - Discharge [m3/s]

>>> cc = ms.data.oresund()
>>> cc

<ComparerCollection>
Comparers:
0: Drogden - Surface Elevation [meter]
1: Barseback - Surface Elevation [meter]
2: Helsingborg - Surface Elevation [meter]
3: Kobenhavn - Surface Elevation [meter]
4: Koege - Surface Elevation [meter]
5: MalmoHamn - Surface Elevation [meter]
6: Vedbaek - Surface Elevation [meter]

Both datasets are now around 1MB, both contains aux data and attrs, that could be used for examples and testing in the future (not yet used). The data module has been added to the api docs.

And a later point it would be great to add more datasets: nortseawaves, ...

It would also be great to add the new notebook to the examples in docs.

@jsmariegaard jsmariegaard linked an issue Dec 26, 2024 that may be closed by this pull request
@jsmariegaard
Copy link
Member Author

What is an acceptable data file size to include in the package? 1MB per case? Should we maybe remove some stations in above examples? Could we reduce to float16 or use other tricks to save some file size?

@ecomodeller
Copy link
Member

I think we can remove some stations and change to float32.

@jsmariegaard
Copy link
Member Author

Ways to make datasets smaller on disk:

  • reduce time period
  • reduce number of observations
  • float32 instead of float64
  • crop modelresult to period covered by obs
  • reduce time resolution of modelresults (e.g. 3 hourly instead of 30 min)

Further ideas to reduce disk size (not tried):

@jsmariegaard jsmariegaard marked this pull request as ready for review January 8, 2025 15:20
@jsmariegaard jsmariegaard merged commit 5e11449 into main Jan 9, 2025
6 checks passed
@jsmariegaard jsmariegaard deleted the Feature-489-add-toy-data branch January 9, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add toy data
2 participants