Raspberry Pi Cluster

A photo of the Southampton University Raspberry Pi Cluster.The first time I heard about a computer comprised of multiple Raspberry Pi’s, it was the Southampton University ‘Lego’ Computer.

I discovered a blog entry which described a Raspberry Pi cluster built by Nnvidia High Performance Computing Engineer, Adam DeConinck Some technical details about Adam DeConinck’s pi cluster.  Due to public demand, they made some details available, including a readme file on github with more information.


Load Balancing

Load balancing is very possible for the Pi.

There are many different ways to do this, one is to split services (webserver,mysql, ftp) over different Pi’s. Another is to split the files across Pi’s (images on one, html on another etc). But in my opinion, the best (and most complicated) way to achieve load balancing is to setup one Pi as a nginx proxy to distribute the load to the two Pi’s hosting the same files. You can even set how much load is distributed to each Pi!


“The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”

– http://hadoop.apache.org/

Basic Modules

  • Hadoop Common – libraries and utilities needed by other modules
  • Hadoop Distributed File System (HDFS) – stores data providing very high aggregate bandwidth across the entire cluster
  • Hadoop Yarn – managing resources and scheduling users and applications
  • Map Reduce – programming model that scales data across the processes

Hadoop Distributed File System

Distributed scalable and portable file system written in Java for the Hadoop Framework.  Includes secondary NameNode which connects to primary NameNode and builds snap shots of directory structure.

MapReduce Engine

Different ways to submit and track jobs. Overhaulled in Hadoop 2.3 MRV2 referred to as YARN.  Split up the major functionality of job tracker, resource manager and scheduler. YARN is compatible with MapReduce.

Started as Google Bigtable and MapReduce / Google File System (GFS).  Wanted to be able to access the data with a SQL style language.  Facebook, LinkedIn and Yahoo stack all had some similar products in their stacks.



What to do with my Raspberry Pi’s…

A photograph of a fresh installation of Raspberry Pi with some Linux books.
Rounded up all the Raspberry Pi’s and Linux books from around the house.

Most of the time with Raspberry Pi’s, I look at them and think it would be nice to do something really cool…. but then real life takes over and the moment is gone.  I’m not alone in this, but now with refreshed determination…. and the fact that I have an Operating Systems and Network project to complete on anything to do with Linux… my plan is to build a cluster of Raspberry Pi’s and experiment with Hadoop and see where that takes me, initially operating the environment within a dedicated local wireless network.

First thing to do is install emacs… get some system info and practise shell scripts by way of revision for the course, just waiting for…

sudo apt-get upgrade
sudo apt-get install emacs (or your preferred editor)
// expand your SD card storage
sudo raspi-config -> menu option Expand FileSystem