Understing the Hadoop High Availability (HA) Options

Once you start to use Hadoop in your day-to-day business operations, you’ll quickly find that uptime is an important consideration. No one wants to explain to the CEO why a report is not delivered. While most of Hadoop’s architecture is designed to work in the face of node failure (such as the DataNodes), other components such as the NameNode must be configured with an HA option.

The following is a quick and dirty list of Hadoop HA options:

  • Cloudera CDH4 (free)
    • Uses shared storage
  • Hortonworks (free)
    • Option 1: Use Linux HA (Uses shared storage)
    • Option 2: Use VMWare
  • IBM BigInsights ($$$)
    • GPFS-SNC: Provides a shared nothing HA option
  • MapR M5 ($$$)
    • Shared nothing HA for both NameNode and JobTracker

 

If you’re brave, you can also apply Facebook’s patches to Apache Hadoop to get an “Avatar” based HA option. This is what FB uses in production.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s