A comprehensive Apache Airflow implementation for fetching and managing Jira data through multiple APIs. This project provides automated solutions for retrieving and storing Jira issues, users, and Tempo worklogs in various SQL databases (PostgreSQL, MySQL, MSSQL).
- Multiple data pipelines for:
- Jira Issues (with detailed field mapping)
- Jira Users (with account details)
- Tempo Worklogs (with time tracking)
- Support for multiple SQL databases:
- PostgreSQL
- MySQL
- Microsoft SQL Server
- Advanced error handling with exponential backoff
- Configurable retry mechanisms
- Efficient pagination handling
- Docker-based deployment with monitoring tools
- Docker and Docker Compose
- Jira API Access Token (Basic Auth)
- Tempo API Bearer Token
- Git (for cloning the repository)
- One of the supported databases:
- PostgreSQL
- MySQL
- MSSQL
.
├── dags/
│ ├── fetch_jira_issues.py # DAG for Jira issues
│ ├── fetch_jira_users.py # DAG for Jira users
│ ├── fetch_jira_tempo_worklogs.py # DAG for Tempo worklogs
│ └── jira/
│ ├── db_ops.py # Database operations factory
│ ├── jira_fetcher.py # Core API fetcher
│ ├── jira_base_processor.py # Base processor class
│ ├── issues/
│ │ └── issue_processor.py # Issue data processing
│ ├── users/
│ │ └── users_processor.py # User data processing
│ └── worklogs/
│ └── worklog_processor.py # Worklog data processing
├── docker-compose.yaml # Docker services configuration
├── Dockerfile # Custom Airflow image build
├── .env # Environment variables
├── .dockerignore # Docker build exclusions
└── README.md # Project documentation
-
Clone the repository
git clone https://github.com/noepinefrin/airflow-jira-etl.git cd airflow-jira-etl
-
Configure environment variables Create a
.env
file:# Airflow settings AIRFLOW_UID=50000 AIRFLOW_GID=0 # Jira Configuration JIRA_ATLASSIAN_DOMAIN_NAME=your-domain JIRA_REST_API_TOKEN=your_base64_encoded_token TEMPO_API_BEARER_TOKEN=your_tempo_token # Database Settings JIRA_TEMPO_WORKLOG_SQL_TABLE_NAME=AF_JIRA_WORKLOGS JIRA_USER_SQL_TABLE_NAME=AF_JIRA_USERS JIRA_ISSUE_SQL_TABLE_NAME=AF_JIRA_ISSUES # Database Connection (Choose one based on your database) AIRFLOW_CONN_POSTGRES_DEFAULT=postgresql://airflow:airflow@postgres/airflow # AIRFLOW_CONN_MYSQL_DEFAULT=mysql://user:pass@mysql/db # AIRFLOW_CONN_MSSQL_DEFAULT=mssql://user:pass@mssql/db
-
Launch services
docker compose up -d
-
Access interfaces
- Airflow UI: http://localhost:8080
- Adminer: http://localhost:8081
- Schedule: Daily
- Start Date: 2025-01-01
- Retries: 1
- Retry Delay: 5 minutes
- Fetches detailed issue information
- Includes fields like:
- Resolution details
- Priority
- Labels
- Assignee information
- Status and progress
- Project details
- Retrieves user account information
- Stores:
- Account IDs
- Display names
- Avatar URLs
- Account status
- Captures time tracking data
- Records:
- Time spent
- Billable time
- Work descriptions
- Start dates and times
The project uses a factory pattern for database operations (DatabaseOperationsFactory
), supporting:
AIRFLOW_CONN_POSTGRES_DEFAULT=postgresql://airflow:airflow@postgres/airflow
AIRFLOW_CONN_MYSQL_DEFAULT=mysql://user:pass@mysql/db
AIRFLOW_CONN_MSSQL_DEFAULT=mssql://user:pass@mssql/db
- Retry mechanism with exponential backoff
- Configurable retry attempts (default: 5)
- Adjustable wait times:
- Multiplier: 1
- Minimum: 4 seconds
- Maximum: 10 seconds
- Create new processor in appropriate subdirectory
- Implement
BaseProcessor
methods - Update database operations if needed
- Create or modify DAG file
# Test specific task
docker compose run airflow-worker airflow tasks test [dag_id] [task_id] [date]
# Test database connection
docker compose run airflow-worker airflow connections test [conn_id]
-
Permission Issues
sudo chown -R 50000:0 .
-
Port Conflicts
# Check ports sudo lsof -i :5432 sudo lsof -i :8080 # Stop conflicting services sudo service postgresql stop
-
Database Connection Issues
- Verify connection string format
- Check database service status
- Ensure proper credentials
-
API Issues
- Verify API tokens
- Check domain configuration
- Review Airflow logs
- Fork the repository
- Create a feature branch
- Implement changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.