---
layout: global
title: Third-Party Projects
type: "page singular"
navigation:
weight: 5
show: true
---
This page tracks external software projects that supplement Apache Spark and add to its ecosystem.
spark-packages.org
spark-packages.org is an external,
community-managed list of third-party libraries, add-ons, and applications that work with
Apache Spark. You can add a package as long as you have a GitHub repository.
Infrastructure Projects
- Spark Job Server -
REST interface for managing and submitting Spark jobs on the same cluster
(see blog post
for details)
- SparkR - R frontend for Spark
- MLbase - Machine Learning research project on top of Spark
- Apache Mesos - Cluster management system that supports
running Spark
- Alluxio (née Tachyon) - Memory speed virtual distributed
storage system that supports running Spark
- Spark Cassandra Connector -
Easily load your Cassandra data into Spark and Spark SQL; from Datastax
- FiloDB - a Spark integrated analytical/columnar
database, with in-memory option capable of sub-second concurrent queries
- ElasticSearch -
Spark SQL Integration
- Spark-Scalding - Easily transition
Cascading/Scalding code to Spark
- Zeppelin - an IPython-like notebook for Spark. There
is also ISpark, and the
Spark Notebook.
- IBM Spectrum Conductor with Spark -
cluster management software that integrates with Spark
- EclairJS - enables Node.js developers to code
against Spark, and data scientists to use Javascript in Jupyter notebooks.
- SnappyData - an open source
OLTP + OLAP database integrated with Spark on the same JVMs.
- GeoSpark - Geospatial RDDs and joins
- Spark Cluster Deploy Tools for OpenStack
Applications Using Spark
- Apache Mahout - Previously on Hadoop MapReduce,
Mahout has switched to using Spark as the backend
- Apache MRQL - A query processing and optimization
system for large-scale, distributed data analysis, built on top of Apache Hadoop, Hama, and Spark
- BlinkDB - a massively parallel, approximate query engine built
on top of Shark and Spark
- Spindle - Spark/Parquet-based web
analytics query engine
- Spark Spatial - Spatial joins and
processing for Spark
- Thunderain - a framework
for combining stream processing with historical data, think Lambda architecture
- DF from Ayasdi - a Pandas-like data frame
implementation for Spark
- Oryx - Lambda architecture on Apache Spark,
Apache Kafka for real-time large scale machine learning
- ADAM - A framework and CLI for loading,
transforming, and analyzing genomic data using Apache Spark
Additional Language Bindings
C# / .NET
- CLR for Spark
Clojure
- clj-spark
- Sparkling
Groovy
- groovy-spark-example