Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
-
Updated
Mar 12, 2025 - Python
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
A lightweight data processing framework built on DuckDB and 3FS.
A light-weight, flexible, and expressive statistical data testing library
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Large-scale pretraining for dialogue
Python Stream Processing
Extract Transform Load for Python 3.5+
Concurrent Python made simple
Data and tools for generating and inspecting OLMo pre-training data.
Large-scale pretrained models for goal-directed dialog
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
All-in-one text de-duplication
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Production-ready data processing made easy and shareable
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
📈 PatternPy: A Python package revolutionizing trading analysis with high-speed pattern recognition, leveraging Pandas & Numpy. Effortlessly spot Head & Shoulders, Tops & Bottoms, Supports & Resistances. For experts & beginners. #TradingMadeEasy 🔥
Python Adaptive Signal Processing
Compose multimodal datasets 🎹
Super fast list of dicts to pre-formatted tables conversion library for Python 2/3
Add a description, image, and links to the data-processing topic page so that developers can more easily learn about it.
To associate your repository with the data-processing topic, visit your repo's landing page and select "manage topics."