[Beowulf] hadoop

Sat Feb 7 04:40:03 PST 2015

Depends on the nature of the tasks, I'm sure you could use it for back end processing, load balancing would come as part of the job distribution. 

You probably want to check the website for the types of workloads it supports.

Matt

--
Matthew Wallis
mattw at madmonks.org

> On 7 Feb 2015, at 7:48 pm, Jonathan Aquilina <jaquilina at eagleeyet.net> wrote:
> 
> Can it be used for example in a web hosting application to process site requests in the form of load balancing etc
> 
> Sent from my iPhone
> 
>> On 07 Feb 2015, at 09:45, Matt Wallis <mattw at madmonks.org> wrote:
>> 
>> Hi Jonathan,
>> 
>>> On 7 Feb 2015, at 6:20 pm, Jonathan Aquilina <jaquilina at eagleeyet.net> wrote:
>>> 
>>> Can someone explain to me what exactly the purpose of hadoop is and what we mean when we say big data? Is this for data storage and retrieval? Number crunching?
>> 
>> Hadoop can be thought of as HTPC, High Throughput Computing, over a collection of simple servers. Where in HPC you might have hundreds of nodes with a shared file system working on the same copy of the data, Hadoop distributes the data to local storage in each node of the cluster using the Hadoop Filesystem, and then collects the output at the end. I believe it has built in redundancy, allowing you to distribute the same job to 2 or 3 nodes for fault tolerance. It means your "cluster" can be very simple, no complex parallel filesystems, no specialised networks, no redundancy at the hardware level.
>> 
>> Originally built to work with MapReduce as it's core application, there are a number of other applications that can be found on the Apache website. 
>> 
>> As for big data, this is basically about taking things like 10 billion tweets, breaking them up into chunks of 500,000 or so, and doing analytics on them. Things like that break up very easily for distribution, as there is usually very little linkage between each tweet. 
>> 
>> Hadoop came out of the need for places like Google, Yahoo, Paypal and eBay to process terabytes of transaction logs an hour. They already had the servers, but they were in data centres all over the world. Rather than hook them all up to some common file server, just build a system to package up the data and the application and send it where ever can process it the quickest. Send it 3 times to make sure it gets done, then pull back the results at the end.
>> 
>> Matt.
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>