Does the Semantic Web Need Ontologies?

Ever since returning from the 2009 International Semantic Web Conference last week I’ve been bursting to discuss a panel that took place there on the topic “Does the Semantic Web need Ontologies?”.    But the WWW2010 deadline was today and we had 3 papers to write.  With that deadline now 10 minutes past, I can finally post!  When it was first proposed, I was concerned because panels need controversy to be fun, and I didn’t think there’d be debate on this topic.  However, the organizer was confident that he’d be able to arrange different viewpoints on the panel.

When I attended the panel I was sorry to discover that the panelist did in fact all agree.  Far worse, they all said “yes” and wanted to debate what kind of ontologies were needed. Those who’ve followed my slow conversation with Stefano Mazzocchi won’t be surprised at my reaction—ajump to the audience microphone to voice a strong “no!”   I asserted that a bunch of data presented in spreadsheets was already a big step forward over our current unstructured web.  This led to some interesting discussion that helped me clarify some points in my mind that I’ll try to lay out here.

The panelists’ general reaction was amazement that I could be opposed to ontologies.  Without ontologies, how could any tool actually use the data?  What good would that data be without an explanation of what it meant?

Tim Berners Lee tried to mediate by suggesting that I did support ontologies.  After all, a spreadsheet has an ontology: the ontology specifies rows, columns, cells, and the relationship between them. But by this definition, any structured data necessarily has an (implicit) ontology, and saying “ontology” is just another way of saying “structured data”.   And I think this diverges from the standard meaning of “ontology” in the Semantic Web community, which I would read as “an explicitly recorded, machine readable description of the ontology of the given data.”   While I am a big proponent of structured data I’m going to bet that the panelists would not consider their implicit ontologies to be ontologies in the Semantic Web sense.   So we do in fact disagree.

Why then do I think we don’t need (explicit) ontologies?   Because I’m focused on the ways that human beings, rather than machine agents, will consume the data being shared.  And for humans, a machine-readable explanation of the data’s meaning is often unnecessary because the human who is consuming that data can figure it out in other ways.  For example, the meaning of the data elements might be explained in English, a “caption” of the data I am inspecting.     Even without captions, if I get a data table with column headings, I can use my comprehension of English to understand the meaning of those headings and from it infer the roles of the columns.  Even if there aren’t column headings, the “shape” of the data can tell me a lot—I’ll recognize standard person names, phone numbers, addresses, prices, book titles, and such from the textual patterns or from matches to my large wetware database of known entities.  And if I see enough examples I can draw conclusions about the values in the column (indeed, Google Squared suggests that you might not even need a human in the loop to make these inferences).

So humans can understand data without (explicit) ontologies, but is it any use?  Sure!  Just to plug some of my own group’s tools, they can use Exhibit to throw it into a rich visualization—a map, timeline, or list with faceted browsing and sorting.   Or they can combine it with another data set using Potluck, and throw the combined data into an Exhibit visualization.  I can make a post on ManyEyes or throw the data into DabbleDB for further processing.  These activities typically require me to match certain properties (columns) of the data set into roles in the UI (Exhibit, ManyEyes) or to properties in the other data set (Potluck, DabbleDB)—a straightforward task.  They don’t require the machine to understand the data, because I’m the one taking these actions.  They do require that the data be structured, since otherwise there’s no way for me to say “which column” to the tools I’m trying to use.

That’s the argument I wanted to make at the panel, but it’s a bit hard to squeeze into 20 seconds at the audience-feedback microphone.  So I’m afraid the panelists instead thought that I was arguing against ontologies, asserting that they should not be deployed at all.

On the  contrary, I like ontologies.  But I’m convinced that ontologies are a luxury, not a necessity. They’re certainly nice to have, and there are some things you can only do if you have them–for example, theycan help me understand column headings written in Russian or Spanish by connecting them to explanations in English.  But I remain captivated all the opportunities that arise just by making data easily accessible in raw form.   Too often, what people want to do with information is perfectly easy to explain, but impossible to do without serious programming, for silly reasons.

And it’s that enthusiasm for open data that keeps me energetically arguing that we don’t need ontologies.  If we need ontologies, then work on freeing data needs to stop until we get them.  I think that’s a very dangerous perspective.  It’s the one that says “there’s no point to building tools for scientists to publish their data, until we’ve figured out the right huge ontology that we’ll force them all to publish in.”

Instead, I think we should go right ahead with our research on ontologies and tools for them, but in the meantime, let the data fly!

P.S. When someone rose to support me, arguing that we should forget ontologies and concentrate on Linked Open Data, I mudied things further by asserting that we don’t really need the “Linked” part, and Open Data is useful in its own right.  While it comes from the same place as my perspective on ontologies above, that’s the substance of my discussion with Stefano, and I won’t repeat it here.

9 Responses to “Does the Semantic Web Need Ontologies?”

  • If you’re in favor of lining data up into columns, and putting labels on the columns, then in an abstract sense you’re at least enough in favor of ontologies to dispel any reflexive incredulity. “Ontology” != “Structure”. You’re *for* structure.

    So what you’re against, I think, are a couple ideas that are not inherent in the idea of structured data, but tend to get swept up into capital-O “Ontology” and capital-SW “Semantic Web”, in the sense that those constitute a kind of information-science political party:

    1. logically-rigorous data-modeling must precede any other information task
    2. logically-rigorous *univesal* data-modeling must precede any *local* information task

    To be fair, I don’t think there are actually a lot of people taking *intentional* hard lines on either of these two purist principles. The banner “Linked Data” kind of stands, politically, for relaxing the prior-rigor and universality constraints in favor of getting some data out into the world.

    The problem, I think, is that this is a fairly superficial tolerance. Underneath, you still have RDF and OWL, which can begrudgingly put up with a certain amount of temporary human fuzziness, as long as you don’t ask them to sanction it morally or participate very deeply, but as soon as you want these information tools to actually pitch in and do more than keep your columns of data lined up straight for you, they get snippy and basically demand that you go back and do all the stuff you thought you’d got away without.

    So in an academic sense, I think maybe what you’re saying is actually even *more* contrarian and critical than it initially sounds, not less. Or maybe this is just my extrapolation, but to me it follows from what you’re saying: This whole semantic-web-ontology-linked-data project has kind of been done backwards. The issues of opening and structuring data, in isolated sets with only as much rigor and universality as those sets need for their own purposes, should have come first. Then, on that distributed foundation, accepting all the variation and logical messiness that comes with it as human design constraints, we should have built the layer for combining and generalizing. Attack the problems in that order and I’m pretty sure we’d come up with something a bit different from RDF, and quite a bit different from OWL.

    And thus the salient theoretical question, to me, is whether any/all of these existing SW constructs can be refined and redrafted into forms by which computers can support human truth-seeking, rather than vice versa, and which ones it would be more efficient to just replace with something new.

  • David Karger says:

    I think you’ve articulated my position quite well. But can you elaborate on what you mean about RDF and OWL demanding more rigor if you want them to “pitch in” and do more than line up columns?

  • The cool-sounding additional stuff OWL can supposedly do for you, like inferring relationships that are not explicitly present in your data, and asserting the equivalences that represent reconciliation of variant references, are modeled in logical terms, not practical ones, so they tend to be foiled by data-messiness rather than energized by it.

    Here, not entirely by coincidence, is my rant about “owl:sameAs” from a couple days ago:

    http://www.furia.com/page.cgi?type=log&id=333

    Also, although this is probably what you’d guess, I meant “*universal*” up there, not some other mysterious quality called “*univesal*”…

  • I’ve been advocating the “semantic wild west” approach in my own area of genomics and I’m mostly getting either pushback or blank looks, although a few people are starting to come around.

    Some of the people I work with are really into ontologies, and some people are pursuing RDF triple stores:

    http://gmod.org/wiki/August_2009_GMOD_Meeting#Linked_Data_for_GMOD_Databases

    but I’m wondering if it would be better to just stick everything in CouchDB or one of the other “NoSQL” stores. Right now I just want to store the data in something that will scale. If a piece of software needs more structure in its input, then we can just give it a query. Higher-level modeling can come later, and that modeling will be so much better and easier if it already has a body of data to refer to. I’m even hoping that we’ll be able to automatically learn the ontologies from the data; that process would be so much easier if it could use semistructured data as input rather than natural language text.

  • Igor says:

    I also support the semantic wild wild west approach. I think it is pretty much consistent with how the Web started in the first place – don’t put too much load on the average publisher to publish! I do, however, think that ontologies are only good for certain problems e.g. integration problems and then only in a closed setting where the interested parties rely heavily on the ontology to solve particular problems which cannot be solved otherwise which then justifies the cost of developing them. At Web scale, ontologies won’t work – or won’t begin to work at least until we have open data on the web comparable to the size of the actual Web. As you say just moving from documents to data is a huge leap. For the near future we should support end user interaction over data. It would be really cool to see every web site have a little icon, as the rss\atom button, that allows you to go from page mode to data mode and explore their data in some usable way.

  • [...] gotten ISWC’s Ontology Panel off my chest, I want to take the time to discuss the Best Paper Awards we gave at the conference.  The papers [...]

  • [...] Does the Semantic Web Need Ontologies? – HayStack BLog [...]

  • CpILL says:

    There is of course a solution that satisfies both “wild-west” and ontology approaches but requires a slight bending of both but you have to understand the deeper meaning of linking to the the ontology term. In fact the terms have no meaning unless something links to them (or they to things). The terms in the ontology don’t have to follow a specific format, they are just symbols, things in which to link to. They could be a word, document, image, video etc.

    People can link to whatever they like, and collectively this builds up meaning. The meaning is in the connections not in the nodes. But you need nodes in order to have links. This is the connectionist model of information management and how the brain works.

    So, let there be ontology’s but let them arise as people find meaning in referencing (linking) common resources. Perhaps these resources could be self explanatory, or is this just adding metadata ontop of metadata?

  • Queen says:

    There are a few things that are going to make the semantic web easier to understand. For one there is an excellent book out by Dean Allemang and Jim Hendler named “Semantic Web for the Working Ontologist” which though it has a funky title – but that did not stop Dean attracting over 300 attendees at his JavaOne presentaion a month ago, and it was on the last day on Friday afternoon – make the point very clearly that RDF is very well layered. So if you want to just work at the data layer, RDF is all you need. This is what you mean by the graph layer I think. You can then help clarify your information one little step at a time. RDFS is very easy to understand, and by chapter 2 of the book I think that should be clear.

    So try that book. It is one of the first books on the subject that does not start by trying to explain rdf/xml.