Learn how to design, develop, deploy and iterate on production-grade ML applications.
-
Updated
Aug 18, 2024 - Jupyter Notebook
Learn how to design, develop, deploy and iterate on production-grade ML applications.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 15+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
A high performance and generic framework for distributed DNN training
Fast and flexible AutoML with learning guarantees.
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
Training and serving large-scale neural networks with auto parallelization.
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
DLRover: An Automatic Distributed Deep Learning System
Library for Fast and Flexible Human Pose Estimation
DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.
Efficient Deep Learning Systems course materials (HSE, YSDA)
Best practice for training LLaMA models in Megatron-LM
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
Resource-adaptive cluster scheduler for deep learning training.
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Add a description, image, and links to the distributed-training topic page so that developers can more easily learn about it.
To associate your repository with the distributed-training topic, visit your repo's landing page and select "manage topics."