Once you start to use Hadoop in your day-to-day business operations, you’ll quickly find that uptime is an important consideration. No one wants to explain to the CEO why a report is not delivered. While most of Hadoop’s architecture is designed to work in the face of node failure (such as the DataNodes), other components such as the NameNode must be configured with an HA option.
The following is a quick and dirty list of Hadoop HA options:
- Cloudera CDH4 (free)
- Uses shared storage
- Hortonworks (free)
- Option 1: Use Linux HA (Uses shared storage)
- Option 2: Use VMWare
- IBM BigInsights ($$$)
- GPFS-SNC: Provides a shared nothing HA option
- MapR M5 ($$$)
- Shared nothing HA for both NameNode and JobTracker
If you’re brave, you can also apply Facebook’s patches to Apache Hadoop to get an “Avatar” based HA option. This is what FB uses in production.