NAME

Warehouse -- Client library for the storage warehouse.


VERSION

Version 0.01


SYNOPSIS

 use Warehouse;
 my $whc = Warehouse->new;
 my $sample_content = "some binary data";
 # Store data
 my $filehash = $whc->store_block ($sample_content)
     or die "write failed: ".$whc->errstr;
 # Store a [possibly >64MB] file into multiple blocks
 $whc->write_start or die "Write failed";
 while(<>) {
     $whc->write_data ($_) or die "Write failed";
 }
 my @filehashes = $whc->write_finish or die "Write failed";
 # Retrieve data
 my $content = $whc->fetch_block ($filehash);
 # Retrieve data without verifying hash($content)==$filehash
 my $content = $whc->fetch_block ($filehash, 0);
 # Give a manifest a name in the warehouse
 $whc->store_manifest_by_name ($newkey, $oldkey, $name)
     or die "update failed";
 # Retrieve key of a named manifest
 my $key = $whc->fetch_manifest_key_by_name ($name);
 # Get a list of mapreduce jobs
 my $joblist = $whc->job_list;
 my $joblist = $whc->job_list (id_min => 123, id_max => 345);
 print map { "job ".$_->{id}." was a ".$_->{mrfunction}.".\n" } @$joblist;
 # Submit a mapreduce job
 my $jobid = $whc->job_new (mrfunction => "zmd5", ...);


METHODS

new

 my $whc = Warehouse->new( %OPTIONS );

Creates a new Warehouse object. Returns the new object on success. Dies on failure.

Options

warehouse_name
Name of a warehouse configured in /etc/warehouse/warehouse-client.conf

warehouse_servers
Comma-separted list of warehouse servers: host:port,host:port,... Comes from warehouse-client.conf if not specified.

memcached_servers
Memcached servers (arrayref; see Cache::Memcached(3)). Comes from memcached.conf.pl if not specified.

mogilefs_trackers
Comma-separated list of MogileFS tracker hosts. Comes from warehouse-client.conf if not specified.

mogilefs_domain
MogileFS domain. Comes from warehouse-client.conf if not specified.

mogilefs_directory_class
MogileFS class used for storing directory listings. Comes from warehouse-client.conf if not specified.

mogilefs_file_class
MogileFS class used for storing files. Comes from warehouse-client.conf if not specified.

mogilefs_size_threshold
Minimum block size to store in mogilefs, in bytes. Default is 0. Do not use a value greater than 1 + memcached_size_threshold.

memcached_size_threshold
Maximum block size to store in memcached, in bytes. Default is 1000000. Zero means never use memcached for data. Negative means never use memcached for either data or mogilefs paths. Blocks are stored in memcached as <= 1000000-byte chunks in any case.

store_block

 my $hash = $whc->store_block ($data)

Store a <= 64MB chunk of data. On success, returns a hash which can be used to retrieve the data. On failure, returns undef.

write_start

 $whc->write_start;

Prepares to store a file (possibly more than 64M bytes) in the warehouse. Analogous to open(2).

write_data

 $whc->write_data ($data) or die "Write failed.";

Appends some data to a file in the warehouse. Analogous to write(2). Returns true on success. Returns undef on failure.

write_finish

 my @hashes = $whc->write_finish or die "Write failed";

Writes to disk all remaining data from previous write_data() calls. Analogous to close(2). Returns a list of hashes on success. Returns undef on failure.

fetch_block

 foreach my $hash (@hashes)
 {
     my $data = $whc->fetch_block ($hash) or die "Read failed";
     print $data;
 }

Retrieves content previously stored using store_block or write_data. Returns binary data on success. Returns undef on failure.

store_in_keep

 my $data = "foo";
 my ($hash_with_hints, $nnodes) = $whc->store_in_keep (hash => $hash,
                                                       nnodes => 2);

fetch_from_keep

 my $dataref = $whc->fetch_from_keep ($hash);
 die "could not fetch $hash from keep" if !defined $dataref;

_hash_keeps

 ($keeps_arrayref, @probeorder) = $self->_hash_keeps($warehouse_id, $hash);

Return an array of all keepd servers (``host:port'') in $keeps_arrayref, and a list of indexes into that array representing the order in which they should be attempted when storing $hash.

fetch_manifest

 my $manifest = $whc->fetch_manifest ($key);

Retrieve a manifest with the given key.

Note: If the manifest is large, this is not an efficient way to read it. Better to fetch one block at a time.

store_manifest_by_name

 $whc->store_manifest_by_name ($newkey, $oldkey, $name)
     or die "failed";

fetch_manifest_key_by_name

 my $key = $whc->fetch_manifest_key_by_name ($name);

Looks up a named (signed) manifest.

Returns a key, which can be used to retrieve the manifest with fetch_block. On failure, returns undef.

list_manifests

 my @manifest = $whc->list_manifests;
 foreach (@manifest)
 {
   my ($key, $name) = @$_;
   ...
 }

job_list

    my $joblist = $whc->job_list;
    my $joblist = $whc->job_list (id_min => 123, id_max => 345);

job_freeze

    $whc->job_freeze (id => 1234);
    $whc->job_freeze (id => 1234,
                      stop => 1);

job_new

    my $id = $whc->job_new (mrfunction => "zmd5",
                            revision => 836,
                            inputkey => "f171d0aa385d601d13d3f5292a4ed4c5",
                            knobs => "GZIP=yes\nFOO=bar",
                            nodes => 20,
                            stepspernode => 4,
                            photons => 1);
    my $id = $whc->job_new (thaw => 1234,
                            nodes => 10,
                            stepspernode => 3,
                            photons => 1);

iostats

 print $whc->iostats;

Returns a human-readable summary of blocks and bytes sent to and received from the warehouse, as well as average I/O bandwidth since the client object was created.

block_might_exist

 foreach my $hash (@hashes)
 {
     $whc->block_might_exist ($hash) or print "block is missing: $hash";
 }

Returns 1 if it seems likely that the specified block exists in the cache.