Simple and Distributed Machine Learning
-
Updated
Apr 19, 2025 - Scala
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Simple and Distributed Machine Learning
酷玩 Spark: Spark 源代码解析、Spark 类库等
Feathr – A scalable, unified data and AI engineering platform for enterprise
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
[PROJECT IS NO LONGER MAINTAINED] Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
A Spark Atlas connector to track data lineage in Apache Atlas
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
A recommender system for discovering GitHub repos, built with Apache Spark
Apache Spark on AWS Lambda
Big Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Project for James' Apache Spark with Scala course
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
Spark Connector to read and write with Pulsar
Created by Matei Zaharia
Released May 26, 2014