Month: October 2012

Hadoop Distributions

Leave a comment
Hadoop

The following is a repost of my answer to a question on LinkedIn, but I thought it may prove useful to people evaluating Hadoop distributions. The following is a substantially over simplified set of choices (in alphabetical order): Amazon: Apache Hadoop provided as a web service. Good solution if your data is collected on Amazon…saves you the trouble of uploading gigs and gigs of data. Apache: Apache Hadoop is the core code based upon which […]

Understing the Hadoop High Availability (HA) Options

Leave a comment
Hadoop

Once you start to use Hadoop in your day-to-day business operations, you’ll quickly find that uptime is an important consideration. No one wants to explain to the CEO why a report is not delivered. While most of Hadoop’s architecture is designed to work in the face of node failure (such as the DataNodes), other components such as the NameNode must be configured with an HA option. The following is a quick and dirty list of […]