
We built Free Factories to help the PGx team win the Archon X PRIZE for Genomics and to meet the needs of the Personal Genome Project. The prize is awarded for sequencing one hundred complete human genomes in less than ten days. Doing this with Polony sequencing and related technologies, as the PGx team plans to do, will involve distilling many petabytes of raw data to produce about 100 gigabytes of output. This DNA sequencing capacity can be used to help build a database of personal genome-phenome data sets; coupled with a data mining and analysis engine, this will provide opportunities for many new discoveries.
Free Factories: Unified Infrastructure for Data Intensive Web Services
This paper shall be first published in Proceedings of USENIX '08.
Abstract: We introduce the Free Factory, a platform for deploying data-intensive web services using small clusters of commodity hardware and free software. Independently administered virtual machines called Freegols give application developers the flexibility of a general purpose web server, along with access to distributed batch processing, cache and storage services. Each cluster exploits idle RAM and disk space for cache, and reserves disks in each node for high bandwidth storage. The batch processing service uses a variation of the MapReduce model. Virtualization allows every CPU in the cluster to participate in batch jobs. Each 48-node cluster can achieve 4-8 gigabytes per second of disk I/O. Our intent is to use multiple clusters to process hundreds of simultaneous requests on multi-hundred terabyte data sets. Currently, our applications achieve 1 gigabyte per second of I/O with 123 disks by scheduling batch jobs on two clusters, one of which is located in a remote data center.
Source Code:
http://dev.freelogy.org/svn/polony/polony-tools/trunk/warehouse/
Warehouse Library Documentation:
http://factories.freelogy.org/pod/
Client Library Installation:
echo "deb http://dev.freelogy.org/apt hardy main contrib non-free" \
| sudo tee -a /etc/apt/sources.list
wget -q http://dev.freelogy.org/53212765.key -O- |sudo apt-key add -
sudo apt-get update
sudo apt-get install libwarehouse-perl
Sign up for an account on the clusters:
Controlpanel