NOAA MRIP Data Project

Introduction

NOAA Fisheries’ Marine Recreational Information Program (MRIP) conducts annual recreational saltwater fishing surveys at the national level to estimate total recreational catch. This data is used to assess and maintain sustainable fish stocks. Survey data is available from 1981 to 2023.

In this project, survey data will be extracted from an NOAA website and loaded into a data warehouse. This data will then be transformed as needed to be ready for reporting & analytics. A web app will be used to interact with the transformed data and generate insights.

Approach

An end-to-end data product will be built consisting of extracting, loading, and transforming (ELT) of raw data to generating dynamic and interactive visualizations on a web application. The high level data flow, with technologies used, can be seen below:

How to Run this Project

Go to directory where repo will be cloned to
- cd <directory>
Clone repo to directory
- git clone https://github.com/lopezj1/noaa_eda.git
Switch to project directory
- cd noaa_eda
Create nginx_proxy_manager_default network if it does not exist (*this network is needed in production)
- docker network inspect nginx_proxy_manager_default >/dev/null 2>&1 || docker network create nginx_proxy_manager_default
Run docker compose to spin up container
- docker compose up -d
Visit prefect dashboard at http://localhost:4200
- Wait 1-2 minutes for Prefect Agent to start and Deployments to be created.
Quick run ingest flow from Deployments
- Default year range is from 2018-2023 to have quicker loading time ~ 5 minutes.
Quick run dbt flow from Deployments
- Running all models will take about 5 minutes.
- If running Docker Enginer on WSL, you may need to allocate more memory in .wslconfig
Visit streamlit app at http://localhost:8501
Visit dbt docs at http://localhost:8080

Project Steps

Extract & Load (EL)

Survey data is stored at:

https://www.st.nmfs.noaa.gov/st1/recreational/MRIP_Survey_Data/CSV/.

Data is stored as csv files inside zip folders cataloged by year and wave (if multiple survey were taken that year). Python script ingest_noaa.py will handle the extract and load (EL) of the data. The EL pipeline consists of the following general steps:

GET request to retrieve folders named by year and wave
Unzip folders to extract csv files
Copy csv files to /tmp folder in main project directory
INSERT pandas dataframes into appropriate tables within a persistent DuckDB schema named raw
- To be memory efficient each individal csv file is processed with the help of the DLT python library.

Transform (T)

After loading the source data into DuckDB, data models were created to transform the raw data to feature rich data in a separate schema named analytics using dbt. Documentation for this dbt project can be found at http://localhost:8080.

Orchestration

The ELT pipeline was orchestrated using Prefect. This allow monitoring of tasks and flows within the pipeline and also customization of input data ranges to process. Prefect dashboard can be accessed at http://localhost:4200.

Web Application for Visualizations

A web app was built using Streamlit to allow for self serve analytics. Web app can be accessed at http://localhost:8501.

Future Considerations

Future implementation could consist of bringing in tidal, weather, lunar data via APIs in conjunction with this survey data to create predictive ML models on catch success rate.

References

About NOAA MRIP
NOAA MRIP Survey Data
dbt
Prefect
dlt
duckdb
streamlit

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.dbt		.dbt
.devcontainer		.devcontainer
.streamlit		.streamlit
app		app
data		data
dbt_transforms		dbt_transforms
docker		docker
images		images
ml		ml
prefect_flows		prefect_flows
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
noaa_eda.code-workspace		noaa_eda.code-workspace
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NOAA MRIP Data Project

Introduction

Approach

How to Run this Project

Project Steps

Extract & Load (EL)

Transform (T)

Orchestration

Web Application for Visualizations

Future Considerations

References

About

Releases

Packages

Languages

lopezj1/noaa_eda

Folders and files

Latest commit

History

Repository files navigation

NOAA MRIP Data Project

Introduction

Approach

How to Run this Project

Project Steps

Extract & Load (EL)

Transform (T)

Orchestration

Web Application for Visualizations

Future Considerations

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages