next up previous contents
Next: Lore Up: Technologies Previous: Technologies

Haystack

 

Over the past two years our research group has worked to implement the first version of the Haystack personal information repository system  [2]. As of August, 1997 the first version, written in Perl, was released for public review. The system implemented a number of our design goals, the most abstract of which was to provide the users of our system with a repository that implements both intelligent storage and retrieval of arbitrary data. Although we are currently limited to textual representations of the data, anything on the user's hard drive or on the Internet should be fair game for becoming an entry in the user's Haystack repository. A user can easily organize and find information in their repository at a later time. To facilitate these operations, Haystack will make an effort to extract a textual representation of the data object, object metadata, and allow users to annotate the object. In the future, we hope that we will not need to be reliant on ``textual representations'' and be able query images or sounds. However, for the most part, we are dependent on the technologies of today.

At a more detailed level, Haystack consists of a number of core modules. User interface modules (which currently include a personal web server, command line, and Emacs) provide access to the system. Textifiers provide the machinery to extract text from a variety of file formats. Field finders extract metadata information about a given object. Finally, Haystack provides modules that interact bi-directionally with information retrieval (IR) systems. If any textual information is derived from the object, Haystack can index this information in any basic information retrieval system. We want this abstraction because different IR systems will represent their data differently and present different query options to users. This includes anything from basic Boolean operations (such as ``all documents that contain 'MIT' and 'Artificial Intelligence' ''), to more complicated relevance ranked schemes (i.e. ``documents that are most strongly related to MIT and Artificial Intelligence'').

As noted above, Haystack extracts both text (through textifiers) and metadata (through field finders) from data objects it encounters. Metadata includes anything from the automatically generated checksum field, to the user generated comment field.


next up previous contents
Next: Lore Up: Technologies Previous: Technologies

Copyright 1998, Eytan Adar (eytan@alum.mit.edu)