CloudEase - Enterprise Hadoop + Hive + Pig + HBase + Mahout + Cascading + Zookeeper et. al.
<T.B.D.>
Nugget 1: Standalone Hadoop
Being able to start hadoop in Local mode allows for quick developer sanity checks before deploying to a Hadoop Cluster or worse to AWS which has high charges for Hadoop.
I assume you have downloaded the latest hadoop distribution and unzipped it.
If you are on Windows you would need to install Cygwin to try this. Linux(s) don't need anything additional.
Open Cygwin Bash Shell and goto Hadoop home directory.
Create a directory to hold the input files for the Local Hadoop Job> mkdir input
Copy some files into the input directory> cp conf/* input
Run the hadoop example map reduce job that comes along with hadoop distribution> bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

The hadoop job runs and shows something like above.
The job output is stored in the 'output' directory. Its contents are as follows.
cat the output file to screen to see its contents.
Nugget 2: Standalone Cascading
Similar to running Hadoop locally being able to build and run a Cascading job locally allows for quick sanity checks. We will run a cascading job to parse Apache Web Server logs.
The source code download contains a 'jobs' directory which should be placed at the level of hadoop and cascading directories. Directory names in the build script and commands may need to be tweaked. Run the ant build by typing 'ant' in the job folder (assuming you have ant on the path). It build the cascading job as follows
The cascading job jar 'LocalCascadingJob.jar' gets built. Launch it over hadoop locally using (in cygwin bash shell) in the job directory> ../hadoop-0.20.2/bin/hadoop jar LocalCascadingJob.jar apachelogs output
Cat'ing contents of all the files in the output directory using> cat output/*