I spent last week at the 2012 International Semantic Web Conference. This conference addresses the important topic of structured data on the web. I had two “funny” experiences; one humorous and one peculiar.
At the beginning of the conference, I was amused to see that ISWC, whose central theme is linking the web’s data together into a coherent whole, had more trouble than any other conference I’d been to in picking a twitter hashtag for the event. Most conferences just announce one at the beginning, but at ISWC it was left to emerge “organically”, which meant tweets were inconsistently tagged as #iswc, #iswc12, #iswc2012, or #iswcboston. I tweeted a joke to this effect. The responses that I got back were classic. Reflecting the philosophy of the Semantic Web, various individuals argued that this was a good thing; that expecting everyone to agree on a single vocabulary was contrary to the Semantic Web vision of linking disparate ontologies. Another pointed out that if only twitter were “doing things right”, treating their hash tags as ontological entities, and letting different people label each entity differently, then we wouldn’t have this problem. These responses are completely logical but ignore reality. We may know better than twitter how things ought to be, but in the meantime there’s an easy solution (that most other conferences have adopted) that works fine with the way twitter is now.
The more seriously funny experience was at the ISWC demo session. The two demos that most impressed me were systems for (i) browsing upcoming events (concerts etc.) and (ii) browsing academics and their publications. Both of these systems were characterized by rich data models and nicely designed user interfaces that delivered valuable information and insights from their chosen domains.
The funny part is that neither of these applications should really be called a “Semantic Web application.” Someone unaware of the Semantic Web, tasked with building these applications, would see a traditional data management and visualization problem that they would solve using traditional database tools (SQL) and web APIs. The fact that these tools are storing their data in a triple store instead of a SQL database is irrelevant to the user experience. And the fact that at least one of them is exposing a SPARQL endpoint for querying the data they are managing is good citizenship, helpful to the next project, but not important for this one.
This story fits what I argued in a talk at an ISWC workshop on programming the semantic web. The original description of the semantic web envisioned applications that could wander through a linked world of hundreds of different ontologies, discovering/learning those new ontologies as it went and combining information from all of them to produce valuable answers. It seems to me that the vast majority of applications don’t need this power. Instead, these applications have fixed ontologies imposed by their creators at creation time. They can therefore be created using traditional techniques.
This begs an obvious broader question: what kinds of work is Semantic Web research that should appear at ISWC.? I ask this not in the interest of jealously guarding the “purity” of a discipline—I like breadth—but in the interest of directing research to the venue where it can be best disseminated and evaluated. The semantic web technology stack is pretty mature at this point. But that means that using a semantic web back-end doesn’t automatically turn your project into semantic-web research. If I build a traditional interactive application on top of a triple store, my contribution is the application, and it should probably go to a conference like CHI that specializes in assessing human computer interaction. A system that uses natural language processing or machine learning to recognize entities in text doesn’t suddenly become a semantic web contribution by outputting its results in RDF; instead it should be submitted to a venue like NAACL or ICML where it can be assessed by the best researchers in NLP and ML.
One might worry that the Semantic Web is going to suffer the same image problem as AI: that as soon as it works, it isn’t Semantic Web. But I don’t think that’s the case. There are certain research questions that are, and will continue to be, core to the Semantic Web.
With regard to the Semantic Web’s role in traditional applications, I would love to see at ISWC some studies that compared the relative developer effort required to build applications using the traditional and Semantic Web tool stacks. Nobody’s going to argue against making data easier to reuse. But the Semantic Web community still has to prove, I think, that their approach to reusability is better than others. If I’m going to build a traditional application that consumes and manipulates data from one or two fixed sources, does using a triple store instead of a SQL (or noSQL) database make it easier for me to build that application or maintain it later? Most applications hide their databases behind object-relational mappers, so will it even be noticeable which underlying database technology I am using? When I want to pull data from my target source, does it help me to have that data available via a SPARQL endpoint, or would it be just as effective to present it me via a SQL endpoint, or an API that returns JSON objects?
If we are able to make a case that the Semantic Web really does help with reuse of data, then there’s a host of ISWC-relevant questions around transitioning the legacy of traditional data repositories to the Semantic Web. For example, this paper shows how to “scrape” a traditional web API so it can be used with other Semantic Web tools.
Then there are the true Semantic Web applications, pan-schematic systems with no built-in assumptions about the schemas to which they are applied. Almost by definition, these systems aren’t designed for domain specific tasks; however, they can be really useful for general-purpose information seeking, browsing, or organization. Tools like tabulator try to support generic data browsing; semantic desktops like our old Haystack system provide personal information management over arbitrary schemas. There’s also the Semantic Web search problem, of being able to search data that is structured but has no particular schema, more effectively than we can via text search. Progress on these problems has been far slower than I expected or hoped; it seems like we’re mostly still stuck in the world of “big fat graph” visualizations. This is a place where I’d really like to see ISWC focus its attention. Perhaps this could serve to define a Semantic Web Challenge for next year: build an application that would let you win a scavenger hunt over the linked open data web.