Big Data Development Environment using Docker¶

Ferry helps you create big data clusters on your local machine. Define your big data stack using YAML and share your application with Dockerfiles. Ferry supports Hadoop, Cassandra, Spark, GlusterFS, and Open MPI.

Here’s an example Hadoop cluster:

backend:
   - storage:
        personality: "hadoop"
        instances: 2
        layers:
           - "hive"
connectors:
   - personality: "hadoop-client"

Then get started by typing ferry start hadoop. This will automatically create a two node Hadoop cluster and a single Linux client. You can customize the Linux client during runtime or define your own using a Dockerfile. In addition to Hadoop, Ferry also supports Spark, Cassandra, GlusterFS, and Open MPI.

Ferry is useful for:

Data scientists that want to experiment and learn about big data technologies
Developers that need a locally accessible big data development environment
Users that want to share big data application quickly and safely

Ferry provides several useful commands for your applications:

Start and stop services
View status and create snapshots
SSH into clients
Copy over log files to a host directory

For example, let’s inspect all the running services

$ ferry ps
UUID Storage  Compute  Connectors  Status   Base  Time
---- ------- --------- ---------- ------- ------- ----
sa-2    se-6 [u'se-7']       se-8 removed hadoop    --
sa-1    se-3 [u'se-4']       se-5 stopped openmpi   --
sa-0    se-0 [u'se-1']       se-2 running cassandra --

Ferry is under active development, so follow us on Twitter to keep up to date.

If you’re interested in collaborating or have any questions, feel free to send an email to info@opencore.io.