Skip to main content

Big Data & Hadoop

Since everything moves fast in the IT world, you have new terminologies entering their 3rd or 4th generation by the time you get a chance to get your hands dirty with them. Big Data has been one of them, an alluring technology allowing massive distributed power over large datasets using the famous map-reduce algorithm. Apache Hadoop allows scaling to massive proportions and has been in use with tech giants like Google and Facebook.

I decided to start running a Hadoop cluster myself using the following guide as a started.

https://github.com/GoogleCloudPlatform/solutions-google-compute-engine-cluster-for-hadoop/blob/master/README.md

This version installs Hadoop locally but uses the Google App Engine and Google Cloud Storage and allows basic scaling/clustering. I started running the pre-requisites on a VM Centos 6.4 and things were going ok. Then I realized that I needed to go deeper into Hadoop and maybe run a sample locally, without achieving the Cloud version first.  Then I went to the following:

http://tecadmin.net/steps-to-install-hadoop-on-centosrhel-6/#

It had simple enough steps to get it installed. Now I am reading Hadoop in Action by Alex Holmes.


Comments

Popular posts from this blog

The feature you are trying to use is on a network resource that is unavailable

You see this:
The feature you are trying to use is on a network resource that is unavailable. This is happening because the cached installer is missing (in my case due to running a scan on folder sizes, seeing this particular windows folder with all the uninstallers (the cache) and removing them.)


Use this:
http://support.microsoft.com/mats/Program_Install_and_Uninstall
Uninstall the program that is causing trouble using the tool and voila.

VS 2012 on Windows 7

Microsoft is great in many things but the application/version handling is mediocre. I got really frustrated with the Visual Studio 2012 crashes everytime I opened a project:


Problem signature:
  Problem Event Name:CLR20r3
  Problem Signature 01:devenv.exe
  Problem Signature 02:11.0.50727.1
  Problem Signature 03:5011ecaa
  Problem Signature 04:Microsoft.VisualStudio.Progression.LanguageService.CSharp
  Problem Signature 05:11.0.50727.1
  Problem Signature 06:5011cc19
  Problem Signature 07:131
  Problem Signature 08:43
  Problem Signature 09:System.MissingFieldException
  OS Version:6.1.7601.2.1.0.256.1
  Locale ID:1033
  Additional Information 1:0a9e
  Additional Information 2:0a9e372d3b4ad19135b953a78882e789
  Additional Information 3:0a9e
  Additional Information 4:0a9e372d3b4ad19135b953a78882e789

Read our privacy statement online:
  http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409

If the online privacy statement is not available, please read our privacy statement offline:
  C:\Windows\s…

npm nuget or other repository problems

I was experiencing an issue where my nuget packages would not restore the dlls. I would get Error CS0234 The type or namespace name 'Entity' does not exist in the namespace 'System.Data' (are you missing an assembly reference?) as the error but it would say all nuget packages are already installed. I followed the sage advice of deleting everything in c:\users\\.nuget\packages as well as the packages folder in solution and force a restore. http://stackoverflow.com/questions/34543267/nuget-does-not-unpack-assemblies-from-package I believe this could be applied to all package repositories in case this type of problems are faced. By the way, check out Artifactory if your environment is hostile to public yum, apt-get, npm, nuget or any other repositories.