Hadoop big data framework – Hadoop virtual machines

Hadoop is an open-source framework for processing large amount of data across clusters of computers with the use of high-level data processing languages.

It’s modules provides easy to use languages, graphical interfaces and administration tools for handling petabytes of data on thousands of computers.

Hadoop is fault-tolerant and its settings allows to customize data redundancy levels.

Hadoop project includes 4 bundles modules:

  • Hadoop common: supporting the modules
  • Hadoop Distributed File System (HDFS): a distributed, high-performance, redundant file system
  • Hadoop Yarn: job scheduler and cluster resource manager
  • MapReduce: system for parallel data procession

Tutorials:

  • Setup and installation:

Download virtual machines and Hadoop with pre-installed GUI and high-level query languages for experimental purposes.

HortonWorks Hadoop

HDP 2 packaged Hadoop environment – setup and run in 15 minutes. Download

Cloudera Hadoop

CDH packaged Hadoop environment – setup and run in 15 minutes. Download
Share on LinkedInShare on FacebookShare on Google+Tweet about this on TwitterEmail this to someone

Comments