Why All Your Data Should Live in One Application

A couple of days ago Adam Pash at Lifehacker posted a criticism of “everything buckets”—applications aimed at gathering every kind of information you work with into a single place.   I can’t resist responding as the article touches on some of the issues that have framed my past 15 years of research into information management.  It gives me a chance to talk about what’s wrong with today’s application model and about how to create a truly effective everything bucket.

Adam was initially excited to use Evernote as a “universal capture tool” but has since become disenchanted.  He builds on a presentation on perfecting digital filing systems by lifehacker founder Gina Trapani and an even earlier anti-everything-bucket post by Alex Payne.  A few quotes from that post summarize Alex’s (and Adam’s) take on everything buckets, though I encourage you to read the entire post:

An Everything Bucket, since you’re probably wondering, is what I call applications that encourage the user to throw anything and everything into them. They’re virtual scrapbooks, applying a lightweight organization system to (often) unrelated data of varying types.

Computers work best with structured data. Everything Buckets discourage the use of structured data by providing a convenient place to commingle “structureless” data like RTF and PDF documents. Rather than forcing the user to figure out the rhyme and reason of their data (for example, by putting receipts in a financial management application and addresses in an address book), Everything Buckets cry: “throw it all in here! Search it!

This proposition should not sound great. If you think you’re going to save time in the long run by throwing your data into a big bucket now, then sifting through it later, you are mistaken. There are better ways.

Adam and Alex think everything buckets reflect a Faustian bargain: for the sake of short-term convenience, you give up on the data structuring that makes your applications useful information managers.

Below, I respond to this position, arguing that

  1. Taking the Faustian bargain is often rational, because
  2. Using current structured-data apps is just accepting a different Faustian bargain, and
  3. There is a way to escape these bargains and create best-of-all-worlds information management tools

Current Apps

As Alex points out, “when you need to store some data, there are so many wonderful applications to pick from.”  So you have to wonder at the perversity of people who not only avoid them, but but don’t even put their information into a computer at all!  We did, so my two students Michael Bernstein and Max Van Kleek, along with me and my frequent collaborator mc schraefel, carried out an extensive interview-based study to determine what drives people to put information—sometimes copied out of the computer—on pads on their desk, sticky notes attached to their monitor, scraps of paper in their wallet, paper calendars, or the backs of their hands.  The results were presented in ACM TOIS.  We found several recurring themes.

The first is quick capture.  Adam and Alex highlight the benefits of retrieving your information from a structured repository, but ignore the cost of putting it there.  Launching an application, navigating through its screens and menus, and thinking about where in the organizational structure to put your new information, or even worse about how to modify the organization so your new information fits, is a significant cost that might easily outweight the benefits of structure at retrieval time.  Indeed, recent analysis of use of our list.it “everything bucket” suggests that much of the information people file is never retrieved—thus, the benefit of any structured organization must be discounted by the likelihood of retrieval, which may drive it below the cost of careful filing.

The second is the rigidity.  If you adopt a particular application, you adopt its schema.  You can store only the data that application is prepared to store, and only in the form that application is prepared to store it.  You can only look at it the way that application wants to show it to you.   If I use Thunderbird’s address book, I can store addresses and phone numbers, but where do I put the dietary preferences of the contacts I invite to dinner?  I can store a contact’s birthday, but where do I put their anniversary?  I use winamp for my music collection, but winamp thinks I’m managing plain old songs, when in fact I’ve got music that I play at the folk dance session I run every week.  Where do I put (and use) information about dance choreographer, tempo,  style, difficulty, and date of teaching?   I can always use the (pervasive in many apps) “custom” or “miscellaneous” fields, but these are often limited and number and do not offer the same organizatoin or visualization benefits offered by built-in fields.

A closely related problem is fragmentation.  Applications exist to gather up and relate different pieces of information.  But they only gather what’s in their own purview; linkages between different applications are extremely sparse.  I’ve written about this at length in a CACM article (copy here) later expanded to a book chapter (copy here).  This means that to do one task, you often have to open up several different applications, searching in one for information I’ve already found in another, or even retyping it (since the application schemas don’t match, you can’t copy and paste). I know many of the choreographers shoehorned into my music application, but if I’m looking at the song and want to ask them a question, I have to go search for their entry in the address book.

In summary, while unstructured note tools may demand a Faustian bargain to give up on effective retrieval, structured tools often impose a different Faustian bargain around your ability to record and use information the way you want.

Thus, depending on the balance of costs and benefits, I believe that there’s a great role for everything buckets in the short term.  We built our own, list.it, based on the insights we gained from our user interviews.  It’s a lightweight firefox extension with about 16,000 users who’ve taken about 120,000 notes and, judging by the mozilla reviews, seem very enthusiastic about the quick capture and all-in-one-place aspects of the tool.  We published a paper in CHI 2009 that studied initial usage and supported the arguments I made above.

Future Apps

Looking further ahead, I agree with Alex and Adam in the dominant role of structured data.  But to get there, we need a way around these Faustian bargains.  I believe there is an approach that can satisfy our need for structure (to help management) and our need for flexibility and data linking.   The solution is to build structured data applications where the users themselves define and adapt the structure to meet their own needs.   We’ve built a series of tools aimed at exploring this vision.  The first was the Haystack desktop application, a structured everything-bucket.  Eytan Adar implemented the first version in perl (CIKM 1999), while Dennis Quan created the follow-on Java implementation (ISWC 2003).  Instead of the textual data models used in today’s everything buckets, Haystack started with a universal structured data model, holding objects of arbitrary types with arbitrary properties and connected to other objects by arbitrary relationships.  On top of this generic data model, Haystack provided user-configurable views of subsets of the data, as well as user-configurable operations that could be applied to those items.

The Haystack Client

The Haystack Email Client

The Haystack Brain Client

With Haystack’s framework, you could build a reasonable mail-reading application over a set of types such as people and email messages, but you could equally easily (using an “application editor” created by Karun Bakshi) build an application for editing a neuroscience research paper, managing your brain data, relevant publications, and your coauthors.  Because the underlying data was integrated, the on-screen view of your coauthors provided access both to their bibliographies and to their address book entries so you could email them.

I’m still a believer in the Haystack vision, but in practice we found it difficult to convince people to abandon their long-cherished pim tools in favor of a half-baked research tool.  Products like Filemaker and Bento have found a niche but still feel more like database tools than applicatoins.   David Huynh wisely recognized that the Web might offer an easier environment for deploying new tools, and developed the Exhibit framework.

Presidents

Exhibit of U.S. Presidents

Exhibit lets you author (not program) the kind of rich interactive data visualizations you can create with structured data, without pushing you through all the hassle of installing, programming, or operating a database, a templating web server, or an Ajax-y Javascript code.   You just publish a (structured) data file, such as a spreadsheet, and stick a few special tags in your HTML document to describe how you want it to be displayed—as a map, a timeline, a scatterplot, a list, a table—with what kinds of interactive sorting and filtering.  Then you just link to the Exhibit javascript, which takes care of making everything happen in the client browser.   The data can follow whatever schema you choose, and you’re also in charge of the look and feel of the visualization.  People have used it to publish information about restaurants,court cases, pollution, chemical compounds, political scandals, disease outbreaks, bridge safety, classical music, linguistics, legal databases, publications, and much else (you can see others on the project web site and here).  Lifehacker Gina Trapani mentioned above created an exhibit of all the Broadway shows she’s seen.

Exhibit is a tool for publishing data, but we’ve recently looked at repurposing it into a tool for managing data.  With Dido, your exhibit becomes an in-place editable structured-data document.   You get all the rich visualization and interaction, but you can WYSIWYG edit the data you’re looking at (as well as its visualization), then save the document to persist your changes.  Like Exhibit, Dido leaves it entirely up to you to decide what kind of data you want to manage and how it should look.

Exhibit and Dido solve the rigidity problem but don’t really address fragmentation.  We’re starting to see sites like Freebase that offer to become everything-buckets in the cloud, unifying every imaginable information entity into a single richly structured data model.  This is great for data of public interest, but we’re going to need a personal version to store our own esoteric data.  And individuals need to create their own flexible visualizations of appropriate slices of their data for specific tasks.

Ultimately, there’s even hope of combining these affordances with quick capture: natural language processing and other machine learning techniques can be used to take information that users jot down quickly with tools like list.it and infer the implicit structured meaning needed to incorporate that information into a structured repository.  We’ve already seen baby examples of this in tools like Google’s quick-add feature for their calendar.

Conclusion

The everything-bucket is here to stay.  Today’s version provides the quick capture that is often the most important feature of an information management tool—in its absence, information may not get recorded in the computer at all.  Looking ahead, a structured everything bucket is the right way to cope with information fragmentation, letting you link together all the different kinds of information you need to tackle different tasks, instead replicating or partitioning it among different applications.  To effectively use that information, people will be able to author their own task-specific information visualizations that draw appropriate slices from the everything-store.

6 Responses to “Why All Your Data Should Live in One Application”

  • Lars Ludwig says:

    Well, maybe a single everything-bucket versus many applications debate is not a good idea after all. An everything-bucket is more of an information store. Specialized applications can incorporate information stores, or they are designed as mere functional services. So we can have both at the same time. Or, in other words, this is not at all the core of the problem. The core of the problem is another type of fragmentation. The fragmentation of information from knowledge. That is, memory structures are expressed into information structures, and thus are disconnected, frozen, aging, losing connection. The mind cannot easily extend towards these information structures anymore. The fragmentation is between organic and digital information structures. Only a better understanding of memory stuctures and their relation towards digital/formal information structures will tackle the real problem of ‘information fragmentation’.

    Best

    Lars Ludwig

  • Thank you for this valuable write-up.

    My hunch is however, that the net itself will become the ultimate everything bucket in the end.

    The road to this isn’t clear yet, although recent advances in decentralized social networking, authentication, authorization, and microdata point in the general direction.

  • Atle Iversen says:

    Very interesting discussion re.Everything Buckets….

    I agree with one of the comments at Lifehacker that refers to it as “Miscellaneous Buckets” instead.

    Yes, some structured data is better served in dedicated applications where you can “work the data” in different ways. However, you will always have a lot of information that you can’t easily put in a standard “bucket”, and the Misc Bucket is for storing these information scraps.

    Important characteristics:
    – Quick Capture (”capture it in case you need it some day”)
    – Quick Search (”when you actually need what you captured earlier”)

    Especially the second part, actually re-finding information that you have stored “somewhere”, is the main case against using specialized applications and web sites. You know you have stored it *somewhere*, but *where* ?

    Any structure you can get automatically is an advantage, and any structure you require from the user is a disadvantage…

    (shameless plug)
    I work for a company that has created a “Miscellaneous Buckets” tool, and would love to hear your opinion about it.

    A short blog article on PpcSoft iKnow vs. Evernote vs. Onenote
    http://www.ppcsoft.com/blog/iknow-onenote-evernote.asp

    A short blog article on the principle behind PpcSoft iKnow:
    http://www.ppcsoft.com/blog/wikipedia-google-iknow.asp

    (/shameless plug)

  • [...] thoughts while wandering the web and reading Why All Your Data Should Live in One Application by David Karger on the MIT Haystack blog… David looks at recently-created applications that [...]

  • [...] just read an interesting post by David Karger about PIM, end-user programming, data publishing, and lots of other interesting HCI [...]