CloudEaseTM - Enterprise - Hadoop + Hive + Pig + HBase + Mahout + Cascading + Zookeeper + Nutch...
CloudEase is the name of Ready To Use [For Development] & Deploy [Into Production] Enterprise Hadoop & Related Stack of Products.
It is available as the following Releases
Pseudo Distributed Release
Fully Distributed Release [Composed of Master & Mode Parts]
(1) is typically used for development and (2) is typically used for production environments.
Release Formats are as follows...
Virtual Appliances
VMWare
VirtualBox
Open Virtualization Format
RackSpace Image
AWS AMI
The Distributions contain the following in a ready to use pre-installation over Ubuntu 64bit Server OS.
Hadoop Core + HDFS
Hive
Pig
HBase
Mahout
Cascading
ZooKeeper
Nutch
*** Also included are getting started examples for each of the above
The Fully Distributed Releases also contain
Cluster Monitoring Tools, and
Cluster Setup & Management Tools
CloudEase - PseudoDistributed Distribution of Hadoop Hive Pig HBase Mahout Cascading & ZookeeperHadoop is a software platform that allows processing of BigData (TeraBytes/PetaBytes) of data. The Hadoop Core contains the basic Map Reduce system and HDFS a distributed file system. While HDFS stores peta bytes of data that need to be processed by the hadoop cluster. The Map Reduce paradigm allows a problem to be broken into thousands/millions of small tasks (Map) and to be processed over the cluster and the results aggregated back (Reduce) into a consistent usable result. Hive and Pig are used for Analytics and DataWarehousing over the Hadoop Cluster. HBase is an internet scale non-relational database system that runs over hadoop. Mahout is for internet scale machine learning systems. Cascading is for Data Processing Workflows. And ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Setting up Hadoop and managing it is a very cumbersome task that requires a dedicated Admin team. The VirtualBox Cloud Ease virtual machine is a pseudo distributed installation of Hadoop and other tools mentioned above. Everything is installed on a single virtual machine though it behaves like a cluster. It is perfect to kickstart Hadoop development and can be run using VirtualBox on Windows, Linux or Mac host operating systems. It is not intended for a production cluster deployment as is. If you need to run it over VMWare or any other virtualization platform you can export a Open Virtualization Format (OVF) VM Image using Virtual Box and use that. CloudEase is the only distribution which comes with the above 7 tools pre-installed and pre-configured ready to develop with.
Specs:
The CloudEase VM is built using Ubuntu 64-bit server OS with Jdk 1.6, ant and the tools mentioned above. The VM has a 400GB expandable HDD.
Terms:
We do not provide technical support. The VM is provided as is without any liabilities or warranties. It is licensed for use by one 'Named' developer.
Price:
US$ 150/developer license

CloudEase - PseudoDistributed Distribution of Hadoop & Nutch (Coming Soon!!!)Hadoop is a software platform that allows processing of BigData (TeraBytes/PetaBytes) of data. The Hadoop Core contains the basic Map Reduce system and HDFS a distributed file system. While HDFS stores peta bytes of data that need to be processed by the hadoop cluster. The Map Reduce paradigm allows a problem to be broken into thousands/millions of small tasks (Map) and to be processed over the cluster and the results aggregated back (Reduce) into a consistent usable result. Nutch is a distributed Search Engine built to utilize Hadoop Map Reduce for distributed crawling and indexing. And Uses the Hadoop File System (HDFS) to serve search results through a Tomcat based web application.
Setting up Hadoop and Nutch and managing it is a very cumbersome task that requires a dedicated Admin team. The VirtualBox Cloud Ease virtual machine is a pseudo distributed installation of Hadoop and Nutch. Everything is installed on a single virtual machine though it behaves like a cluster. It is perfect to kickstart Hadoop development and can be run using VirtualBox on Windows, Linux or Mac host operating systems. It is not intended for a production cluster deployment as is. If you need to run it over VMWare or any other virtualization platform you can export a Open Virtualization Format (OVF) VM Image using Virtual Box and use that. CloudEase is the only distribution which comes with the Hadoop & Nutch pre-installed and pre-configured ready to develop with.
Specs:
The CloudEase VM is built using Ubuntu 64-bit server OS with Jdk 1.6, ant and the tools mentioned above. The VM has a 400GB expandable HDD.
Terms:
We do not provide technical support. The VM is provided as is without any liabilities or warranties. It is licensed for use by one 'Named' developer.
Price:
US$ 50/developer license

CloudEase - Fully Distributed Distribution of Hadoop
Hive Pig HBase Mahout Cascading & Zookeeper (Coming Soon!!!)
CloudEase - Fully Distributed Distribution of Hadoop & Nutch (Coming Soon!!!)
FAQ 1: What is the username and password for the VM'sAll CloudEase VM's use the user 'hadoop' and password 'hadoop' without the quotes.
FAQ 2: The network doesn't work!When you copy/move an Ubuntu VM the network doesn't start as the mac address changes. To fix it you will need to do the following once.
1) sudo pico /etc/network/interfaces
<Change all occurences of eth0 to eth1>
2) sudo /etc/init.d/networking restart
3) ifconfig
<to see the ip address and make sure the network interfaces have started>
restart the VM using
4) sudo shutdown -r now
FAQ 3: How do I change the VirtualBox virtual disk (.vdi) UUID?When you copy a Virtual Disk you will be unable to add it to VirtualBox as its UUID is the same as the one it was copied from. So when you try to copy virtual disks to make multiple virtual machines from the same disk image. You will need to change the UUID's of ALL the copied virtual disks. To do so do the following...
VBoxManage internalcommands setvdiuuid /path/to/virtualdisk.vdi
FAQ 4: How do I change the hostname of the Ubuntu machine/os?When you setup a cluster you will copy the slave hard disk multiple times. For the cluster to work properly you will need to change the hostnames of each of the slave virtual machines. e.g. slave1, slave2 etc. To do so you will need to edit the following two files and change the hostname set in there. And restart the virtual machine.
sudo pico /etc/hostname
sudo pico /etc/hosts