Skip to main content

Big Data & Hadoop

Since everything moves fast in the IT world, you have new terminologies entering their 3rd or 4th generation by the time you get a chance to get your hands dirty with them. Big Data has been one of them, an alluring technology allowing massive distributed power over large datasets using the famous map-reduce algorithm. Apache Hadoop allows scaling to massive proportions and has been in use with tech giants like Google and Facebook.

I decided to start running a Hadoop cluster myself using the following guide as a started.

https://github.com/GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop/blob/master/README.md

This version installs Hadoop locally but uses the Google App Engine and Google Cloud Storage and allows basic scaling/clustering. I started running the pre-requisites on a VM Centos 6.4 and things were going ok. Then I realized that I needed to go deeper into Hadoop and maybe run a sample locally, without achieving the Cloud version first.  Then I went to the following:

http://tecadmin.net/steps-to-install-hadoop-on-centosrhel-6/#

It had simple enough steps to get it installed. Now I am reading Hadoop in Action by Alex Holmes.


Comments

Popular posts from this blog

Employment Based Green Card Marathon

There are 3 most popular ways to get a Green Card to live and work in US: Through Marriage with a US Citizen, Employment Sponsorship and the Diversity Lottery. I would like to articulate on the unfairness of the process for the Employment based applicants and its repercussions. After getting a scholarship to study at Brandeis University, I arrived in the United States on August 28, 1998 on an F-1 Student Visa. After graduating with my Master's degree, I had 1 year of OPT - Optional Practical Training which allowed me to work for companies that were in fields similar to my concentration. Next chapter in my immigration story is the H1-B Work Visa which is frowned upon. This visa provides an entry point for skilled immigrants and it is one of the very few points of entry to the US based on merit. Scorn on this quota of about 85k is well deserved on an emotional level, especially when considering high unemployment of today's workplace. Yet, working in IT and being involved with...

Windows 7 Do NOT Remember folder settings

There used to be an option in prior windows version where under Folder Options, you would "NOT remember" the view settings. This way, the browsing would be uniform, unless you wanted to change it. I found a way to do it in Win 7. Go to Folder Options\View and click Apply to Folders. This will apply the view in the current window to all folder views.

VMDisk

The purpose is to create a disk to keep contents shared across the VMs Following the instructions at http://www.vmware.com/support/ws45/doc/ws40_disks.html#1046465 Find your VM installation and run the following C:\Program Files\VMware\VMware Workstation> vmware-vdiskmanager.exe  -s 10GB -a ide -t 1 -c "D:\Virtual Machines\vmDisk.vmdk" (s: size, a: interface, t: type (0-3 for increase and size options), c:create) Creating disk 'D:\Virtual Machines\vmDisk.vmdk'   Virtual disk creation successful. How to browse a Vmdk? http://www.vmware.com/support/ws45/doc/disks_add_ws.html#1008949 Tried to add this newly created disk when the vm server was on by going to settings/add/hard drive/existing virtual. Got error: failed to add disk ide0:1 I shutdown the vm pc and add disk then try starting the machine. Then went to disk mgt in windows where I got the "Initialize Disk" prompt New Volume > Simple > Quick format and voila!!