I’ve just returned from the European Semantic Web Conference, where I gave a keynote talk on “The Semantic Web for End Users”. My talk addressed the problem that has interested me for eighteen years: making it easier for end users to manage their information. The thesis was that
- The current state of tools for end users to capture, manage, and communicate their information is terrible (yesterday’s post), and
- The Semantic Web presents a key part of the answer to building better tools (this post), but
- Not enough work is being directed toward this problem by the community (tomorrow).
Since I had a lot to say (217 slides) I’m breaking the summary into three separate posts aligned with these three bullets. Come back tomorrow for the next.
Our story so far
Yesterday, I discussed the dire state of information management for end users. I argued that our traditional applications are designed around fixed schemas, and that whenever an end user wants to use their own schema, or connect information from different schemas, these traditional applications fail them. Users are forced to settle for generic tools like spreadsheets and spread their data over multiple tools. Voida et al.’s Homebrew Database paper (a must read) shows how awful the results are.
The Semantic Web can Help
Our first attempt to address the “schema diversity” problem was Haystack, a tool that could be used to store and visualization arbitrary information. Haystack could store arbitrary user-defined entities with arbitrary properties and relations to other entities, and also allowed its user to customize visualizations of those entities. You could create something that looked quite like a traditional application, over whatever schema you decided was useful.
We created the first version of Haystack before the Semantic Web was visioned. However, it was obvious after the fact that Haystack was a Semantic Web application (more specifically a Semantic Desktop) and when RDF was released as a web-standard data model, we adopted it as the native model for later versions of Haystack.
Haystack reflects what I consider the key novel perspective of the Semantic Web community—the idea of a web of data reflecting vast numbers of distinct schemas. While the database community has devoted significant effort to data integration, their canonical example has been, e.g., the combination of a few large corporate databases when two companies merge. It hasn’t really addressed the far more anarchic situation of a different schema on each web site.
I believe that this setting demands a new kind of application. Instead of traditional applications with their hard-coded schemas and interfaces, we need applications like Haystack whose storage and user interface can effectively present and manipulate information in any schema that their user encounters or creates. This is a challenging task since we tend to rely on knowing the schema to create good user interfaces; however, I believe the challenge can be met.
To support this argument, I presented three of these flexible-schema Semantic Web applications. The first is Related Worksheets, being developed by my student Eirik Bakke. Eirik recognized the incredible dominance of spreadsheets as a schema-flexible data management tool, and asked how we can make spreadsheets better for this task without changing their fundamental nature. His approach is to improve spreadsheets to better present and navigate the entities and relationships represented in them.
A typical spreadsheet may have, e.g., one table consisting of university courses (one row per course) referring to another table consisting of course readings (one row per reading) and another table of course instructors. In a traditional spreadsheet this “reference” is just a textual correspondence—there’s a cell in the course table that names the title of a reading that’s in the readings table. But if you recognize that the reading is actually an entity, you can do better. First, you can present information about each reading nested inside the cell in the course listing table, so you can immediately see more information about the reading without having to go find it in the readings table. Second, you can “teleport” from the reading shown in the course table to the corresponding row in the readings table, where you can see or modify more data about the reading (and, e.g., teleport onward to the author of the reading). A user study showed that these features can significantly improve end users’ speed at extracting information from the worksheet.
I then presented is Exhibit, a tool that lets end users author interactive data visualizations on the web. The motivation for Exhibit was the recognition that while professional organization are able to create fancy data-interactive web sites that offer templating, sorting, faceted browsing, and rich visualizations, end users generally don’t have the necessary programming and database administration skills necessary to do so, and thus tend to publish only text and static images.
My student David Huynh recognized that a large number of the professional sites fit a common pattern, and that it was possible to add a quite small extension to the HTML vocabulary that was sufficient to describe these professional sites just in HTML. The vocabulary describes common elements such as views (lists, tables, maps, timelines), facets (for filtering the data shown in the views), and lenses (HTML templates for individual items). Any end user can drop these extended HTML elements into a web page, point them at a data file (spreadsheet, json, or csv) and instantly publish an interactive data visualization. To make it even easier, Ted Benson and Adam Marcus created Datapress by integrating Exhibit into WordPress, so you can “blog your data” using WordPress’ built-in WYSIWYG editor.
There are now over 1800 exhibits online, covering an incredible spectrum of data sets—from ocarinas to failing bridges, European Court for Human Rights cases, pollution measurements in Spain, map stores, classical music composers, strange sports, prescription drugs, mining information, teacher bonuses in Florida and an Urdu-English dictionary.
By the way, anybody who wants to try exhibit for themselves can just copy one of the ones on the web and start playing with it. For example, if you’re an academic, perhaps you could use a nicer publications page. Just download mine and replace the data with your own. But if you want a more careful introduction, take a look at this tutorial I put together.
The last tool I described was Atomate, built by my student Max van Kleek and Brennan Moore to demonstrate how end users could author automation rules to reduce their effort handling incoming social media and other information streams. For example, a user might want to be notified when their calendar shows that a certain band is performing and their social media stream reports that a particular friend is in town, so that they can attend the performance together. A big challenge is coming up with a query language that is simple enough for end users. We settled on a controlled natural language—a query language that looks like English but is actually unambiguous filters over the properties and values in the user’s structured data collection. Drop-down menus and autocomplete ensure that the user is only able to create meaningful queries. You can click the image on the right to see a demonstration video.
A user study of Atomate revealed that users were able to create meaningful queries when given a specific task, that they recognized the general utility of this system, and that they were able to envision specific ways (particular rules they could write) to use it for their own benefit.
I’ve now outlined four applications that, in my mind, leverage the “special sauce” of the Semantic Web—the idea that applications must be designed to work effectively over whatever schemas their users choose to create or import. This creates major challenges in the design of user interfaces, since we often want to leverage a hard-coded schema to determine the ideal domain-specific interface. But there are ways around this problem, either using generic interfaces like spreadsheets (Related Worksheets) or natural language (Atomate), or putting more of the user interface authoring in the hands of the end user (Haystack and Exhibit). Each of these tools demonstrate that it is possible to give an end-user a tool that can work with arbitrary schemas.
Given the potential, I’m disappointed with the level of effort being invested in this kind of work by the Semantic Web community. In my next post, I’ll discuss what work I think is missing, how to do it well, and changes we might make to our Semantic Web conferences to encourage it.