Skip to content
This repository was archived by the owner on Mar 15, 2019. It is now read-only.

Support for loose search #16

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Support for loose search #16

wants to merge 3 commits into from

Conversation

andersy005
Copy link
Contributor

@andersy005 andersy005 commented Jan 8, 2019

@andersy005 andersy005 changed the title Looser search Support for loose search Jan 8, 2019
@martindurant
Copy link

I haven't looked at the content here, but you might find it interesting to compare with intake-bluesky and the PR to implement passing searches off to an Intake server. Of course, if you only want to do this alternate form of searching in the client (I think this is the case), then things might be simpler for you.

@andersy005
Copy link
Contributor Author

@martindurant, thank you for chiming in! I've been looking at intake-bluesky implementation trying to figure out what to borrow from it and support it in intake-cmip. However, I seem to be hitting some brick walls due to my limited experience with intake maybe? :)

source = intake.open_cmip5_search(database='glade', model="CanESM2")

This query returns a list of files that cannot be concatenated by xarray, therefore, the user needs to provide a fined-grained query. As @danielballan pointed out in pangeo-data/pangeo#522 (comment),

it may be useful to return a dynamically-built Catalog, containing all the matches as entries

I am having a hard time conceptualizing how to return a dynamically-built catalog when some user's searches might return results that cannot be concatenated into an xarray dataset. This is due to the fact that for intake-cmip to load results of a query in an xarray dataset, the user needs to specify the following required arguments: model, experiment, frequency, realm (the main reason behind this restriction is due to differences in coordinates, dimensions found in CMIP data sets). How should I go about this?

Regarding the use of an intake-server, I am going to look into this idea once I figure out how to do the searching in the client.

@martindurant
Copy link

how to return a dynamically-built catalog when some user's searches might return results that cannot be concatenated into an xarray dataset

The result here would be another intake catalog, listing the various things (models?) that the user can choose between. I am also ignorant of what you expect to get back from you query, but something like this

cat = intake_cmip(myserver)  # makes a catalogue 
list(cat) # -> all possible datasets
cat_x = cat.search(experiment='bigun')  # also makes a catalog
list(cat) # -> a subset of possible datasets
onlyone = cat.search(fully-specified-search)  # -> catalogue with one entry

where every catalogue entry/dataset refers to a CMIP source will all required parameters given.

i.e., CMIP data and CMIP catalogue should be two separate classes. In the final case of a query sufficiently well-specified to identify only one dataset, there could be a shortcut to return the data class instance instead of the catalogue with one entry.

By the way, there already is a search method on catalogues, which looks at the names and descriptions of contained data sources and catalogues. In your case, the extra knowledge of what the server can do is important, but the simple version may be illuminating too.

def _read_database(self, database):
if database == "glade":
database = glade_cmip5_db
if os.path.exists(database):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be a local file, or can we accept remote things like elsewhere in intake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be a local file. I just opted for a local file for prototyping purposes for the time being. As this matures, it makes sense to support remote databases/files.

@danielballan
Copy link

Just in case it's useful, the MongoDB-backed catalog mentioned in my comment above is now documented. Our documentation on search might be slightly helpful.

@andersy005
Copy link
Contributor Author

Thank you, @danielballan. I've been making some progress on dynamic search and the documentation will definitely be useful.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants