Here are instructions for structuring incoming data and getting it ready to ingest:
- The loader script expects columns named according to the values in the data/columns.csv file. Use CSV, comma separated value format.
- basis_of_record controlled vocabulary
- certainty controlled vocabulary
- prediction_class controlled vocabulary
- trait controlled vocabulary. See "trait" column.
- datasource controlled vocabulary for all datasources we are working with. Put new datasource in directory named according to the datasource itself. For example, "sample" goes in a directory called "data/sample". Only load data files less than 10,000 records in github. All others will be added to .gitignore file
We can visualize data that is loaded at our beta Phenobase Query Page
# load 09.06.2024 and do not drop existing records (default option is False)
python loader.py /home/exouser/code/phenobase_data/data/iNaturalist.09.06.2024 False