Thoughts on software and people.


Thrift: Reading Thrift Objects from Disk with Java

02/05/2009
Reading Thrift objects is as easy as writing them to disk. Here's the utility class I use for reading one or more Thrift objects (of the same type) serialized to disk:
import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

import com.facebook.thrift.TBase;
import com.facebook.thrift.TException;
import com.facebook.thrift.protocol.TBinaryProtocol;
import com.facebook.thrift.transport.TIOStreamTransport;

/**
 * A simple class for reading Thrift objects (of a single type) from a file.
 *
 * @author Joel Meyer
 */
public class ThriftReader {
  /**
    * Thrift deserializes by taking an existing object and populating it. ThriftReader
    * needs a way of obtaining instances of the class to be populated and this interface
    * defines the mechanism by which a client provides these instances.
    */
  public static interface TBaseCreator {
    TBase create();
  }

  /** File containing the objects. */
  protected final File file;

  /** Used to create empty objects that will be initialized with values from the file. */
  protected final TBaseCreator creator;
  
  /** For reading the file. */
  private BufferedInputStream bufferedIn;

  /** For reading the binary thrift objects. */
  private TBinaryProtocol binaryIn;
  
  /**
    * Constructor.
    */
  public ThriftReader(File file, TBaseCreator creator) {
    this.file = file;
    this.creator = creator;
  }
  
  /**
    * Opens the file for reading. Must be called before {@link read()}.
    */
  public void open() throws FileNotFoundException {
    bufferedIn = new BufferedInputStream(new FileInputStream(file), 2048);
    binaryIn = new TBinaryProtocol(new TIOStreamTransport(bufferedIn));
  }

  /**
    * Checks if another objects is available by attempting to read another byte from the stream.
    */
  public boolean hasNext() throws IOException {
    bufferedIn.mark(1);
    int val = bufferedIn.read();
    bufferedIn.reset();
    return val != -1;
  }

  /**
    * Reads the next object from the file.
    */
  public TBase read() throws IOException {
    TBase t = creator.create();
    try {
      t.read(binaryIn);
    } catch (TException e) {
      throw new IOException(e);
    }
    return t;
  }

  /**
    * Close the file.
    */
  public ThriftReader close() throws IOException {
    bufferedIn.close();
    return this;
  }
}
This class is useful if you've got a file containing a list of Thrift objects of the same type. Let's say you had a Thrift object called Album and a file containing a list of these Album objects. To read the file you'd do this:
// Create the reader

ThriftReader thriftIn = new ThriftReader(new File("/some/thrift/file.thrifty", new ThriftReader.TBaseCreator() {
  @Override TBase create() {
    return new Album();
  }
});

// Open it

thriftIn.open();

// Read objects

List<Album> albums = new ArrayList<Album>();
while (thriftIn.hasNext()) {
  albums.add((Album)thriftIn.read());
}

// Close reader

thriftIn.close();
As mentioned in the post on writing Thrift objects, TBase is the interface all Thrift objects implement that provides the read(...) and write(...) methods. When you're dealing with generated files you're limited to the code that's generated, so in this case I opted to use the TBase interface since that at least guarantees that the read and write methods will exist. I thought about making the class generic and casting to TBase, but then it's not clear (well, not contractually enforced) that the class is expecting a Thrift object. The downside of my method is that you have an unchecked cast, which I'm not a fan of.

comments powered by Disqus