[Beowulf] Mature open source hierarchical storage management
    Nifty Tom Mitchell 
    niftyompi at niftyegg.com
       
    Tue Oct 27 18:02:03 PDT 2009
    
    
  
these 
On Fri, Oct 23, 2009 at 04:12:11PM +1100, Carl Thomas wrote:
> Date: Fri, 23 Oct 2009 16:12:11 +1100
>    We are currently in the midst of planning a major refresh of our existing
>    HPC cluster.
Carl,
Do add "PowerFile" to your research list.
    http://www.powerfile.com/
My back of the email envelope view of what you are doing should have
quick cluster disks for binary objects, swap and libs /scratch /tmp and a
largish NFS RAID based filesystem with an archival back end.  Perhaps a
large slow spinning disk staging RAID in the middle or off to the side too.
There are multiple "delta equations" that
you need to evaluate.  I know I missed some
   - delta file change (GB/day).
   - performance delta at each layer.
   - cost delta at each layer.
   - management cost delta
   - operational cost delta
   - cost of compliance -- what the law requires, by method.
   - cost of physical storage on and off site, include handling and shipping.
   - cost of user training delta.
   - cost of expansion delta.
   - cost of necessary bandwidth, by layer.
Clusters are unique in that they have the potential
of hosting their own distributed RAID (lustre, gluster, zfs)
and with a sufficient archival backend life could be good.
Thus select systems that you can add a second disk to.
Choice of filesystem can help too (see dmapi and friends).
Have fun.
mitch
    
    
More information about the Beowulf
mailing list