The following is a repost of my answer to a question on LinkedIn, but I thought it may prove useful to people evaluating Hadoop distributions.
The following is a substantially over simplified set of choices (in alphabetical order):
Amazon: Apache Hadoop provided as a web service. Good solution if your data is collected on Amazon…saves you the trouble of uploading gigs and gigs of data.
Apache: Apache Hadoop is the core code based upon which the various distributions are based.
Cloudera: CHD3 is based on Hadoop 1 (the current stable version) and CDH4 is based on Hadoop 2. CDH is based on Apache Hadoop. The only piece that’s not open source (AFAIK) is Cloudera Manager, which allows you to install up to 50 nodes for free before you go to the paid version. Cloudera is an extremely popular solution that runs on a wide variety of operating systems.
Hortonworks: HDP1 is 100% open source and is based on Hadoop 1. HDP is designed to run on RedHat/CentOS/Oracle Linux.
IBM: IBM BigInsights adds the GPFS filesystem to Hadoop, and is a good choice if your company already is an IBM shop…and you need to integrate with other IBM solutions. Free version is available as InfoSphere BigInsights Basic Edition. Basic Edition does not include all of the value add features found in Enterprise Edition (such as GPFS-SNC).
MapR: MapR uses a proprietary file system plus additional changes to Hadoop that addresses issues with the platform. They have a shared nothing architeture for the NameNode and JobTracker. MapR M3 is available for free, while M5 is a paid version with more features (such as the shared nothing NameNode). People who have used MapR tend to like it.