As financial markets produce vast volumes of structured and unstructured data, the ability to extract insights and develop predictive models has become increasingly important. Financial Data Science Python Notebooks provide a practical guide for analysts, researchers, and data scientists looking to apply Python and its broad ecosystem of libraries, tools, frameworks, and community resources to financial analysis, econometrics, and machine learning.
Designed to support financial data science workflows, the companion FinDS Python package demonstrates how to use database engines such as SQL, Redis, and MongoDB to manage and access large datasets, including:
-
Core financial databases such as CRSP, Compustat, IBES, and TAQ
-
Public economic data APIs from sources like FRED and the Bureau of Economic Analysis (BEA)
-
Structured and unstructured data from academic and research websites
In addition to data access, it provides practical examples and templates for applying:
-
Financial econometrics and time series modeling
-
Graph analytics, event studies, and backtesting strategies
-
Machine learning for predictive analytics
-
Natural language processing (NLP) to extract insights from financial text
-
Neural networks and large language models (LLMs) for advanced decision-making
March 2025: Updated with data through early 2025 and incorporated the latest LLMs -- Microsoft Phi-4-multimodal (released Feb 2025), Google Gemma-3-12B (March 2025), DeepSeek-R1-14B (January 2025), Meta Llama-3.1-8B (July 2024), GPT-4o-mini (July 2024).
notebook | Financial | Data | Science |
---|---|---|---|
1.1_stock_prices | Stock price properties | CRSP stocks | Statistical moments |
1.2_jegadeesh_titman | Price momentum | CRSP stocks | Hypothesis testing, Newey-West estimator |
1.3_fama_french | Value and size | CRSP stocks, Compustat |
Linear regression |
1.4_fama_macbeth | CAPM | Fama-French | Non-linear regression, Quadratic optimization |
1.5_contrarian_trading | Mean reversion, Implementation shortfall |
CRSP stocks | Structural breaks |
1.6_quant_factors | Factor investing, Backtesting |
CRSP stocks, Compustat, IBES |
Cluster analysis |
1.7_event_study | Event studies | S&P key developments | Multiple testing, Fourier transforms |
2.1_economic_indicators | Economic data revisions, Employment payrolls |
ALFRED | Outlier detection |
2.2_regression_diagnostics | Consumer and producer prices |
FRED | Linear regression diagnostics |
2.3_time_series | Industrial production and inflation |
FRED | Time series analysis |
2.4_approximate_factors | Approximate factor models | FRED-MD | Unit root test, EM Algorithm |
2.5_economic_states | State space models | FRED-MD | Gaussian mixture, hidden Markov models |
3.1_term_structure | Interest rates | FRED yield curve | Low-rank approximation |
3.2_bond_returns | Bonds risk factors | FRED bond returns | Principal component analysis |
3.3_options_pricing | Binomial tree, Black-Scholes-Merton |
simulated | Monte Carlo simulations |
3.4_value_at_risk | Value-at-risk | FRED crypto-currencies | Conditional volatility |
3.5_covariance_matrix | Portfolio risk | Fama-French industries | Covariance matrix estimation |
3.6_market_microstructure | Market liquidity | TAQ tick data | High frequency volatility |
3.7_event_risk | Earnings expectations | IBES | Poisson regression, generalized linear model |
4.1_network_graphs | Supply chain | Compustat principal customers | Network graphs |
4.2_community_detection | Industry taxonomy | Hoberg-Phillips | Community detection |
4.3_graph_centrality | Input-output uses | Bureau of Economic Analysis | Graph centrality |
4.4_link_prediction | Product markets | Hoberg-Phillips | Link prediction |
4.5_spatial_regression | Earnings surprises | IBES, Hoberg-Phillips | Spatial regression |
5.1_fomc_topics | FOMC meetings | Federal Reserve | Topic modeling |
5.2_management_sentiment | Management discussions | SEC Edgar, Loughran-Macdonald |
Sentiment analysis |
5.3_business_textual | Business descriptions | SEC Edgar | Part-of-speech, Density-based clustering |
6.1_classification_models | Industry classification | SEC Edgar | Classification |
6.2_regression_models | Macroeconomic forecasts | FRED-MD | Regression |
6.3_deep_learning | Industry classification | SEC Edgar | Neural networks, word embeddings |
6.4_convolutional_net | Macroeconomic forecasts | FRED-MD | Convolutional neural nets, vector autoregression |
6.5_recurrent_net | Macroeconomic forecasts | FRED-MD | Recurrent neural nets, dynamic factor models |
6.6_reinforcement_learning | Retirement spending | SBBI | Reinforcement learning |
6.7_language_modeling | Fedspeak | Federal Reserve | Language modeling, Transformers |
7.1_large_language_models | Market risk disclosures | SEC Edgar | Text summarization |
7.2_llm_finetuning | Industry classification | SEC Edgar | LLM fine-tuning |
7.3_llm_prompting | Financial news sentiment | Kaggle | Prompt engineering |
7.4_llm_agents | Corporate philanthropy | MVCP textbook | Multi-agents, chatbots, retrieval-augmented generation |