Since everything moves fast in the IT world, you have new terminologies entering their 3rd or 4th generation by the time you get a chance to get your hands dirty with them. Big Data has been one of them, an alluring technology allowing massive distributed power over large datasets using the famous map-reduce algorithm. Apache Hadoop allows scaling to massive proportions and has been in use with tech giants like Google and Facebook.
I decided to start running a Hadoop cluster myself using the following guide as a started.
https://github.com/GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop/blob/master/README.md
This version installs Hadoop locally but uses the Google App Engine and Google Cloud Storage and allows basic scaling/clustering. I started running the pre-requisites on a VM Centos 6.4 and things were going ok. Then I realized that I needed to go deeper into Hadoop and maybe run a sample locally, without achieving the Cloud version first. Then I went to the following:
http://tecadmin.net/steps-to-install-hadoop-on-centosrhel-6/#
It had simple enough steps to get it installed. Now I am reading Hadoop in Action by Alex Holmes.
I decided to start running a Hadoop cluster myself using the following guide as a started.
https://github.com/GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop/blob/master/README.md
This version installs Hadoop locally but uses the Google App Engine and Google Cloud Storage and allows basic scaling/clustering. I started running the pre-requisites on a VM Centos 6.4 and things were going ok. Then I realized that I needed to go deeper into Hadoop and maybe run a sample locally, without achieving the Cloud version first. Then I went to the following:
http://tecadmin.net/steps-to-install-hadoop-on-centosrhel-6/#
It had simple enough steps to get it installed. Now I am reading Hadoop in Action by Alex Holmes.
Comments