CHI 2011’s RepliCHI Panel

This past week at CHI, our very own Michael Bernstein participated in a panel discussion about the role of replication and reproduction in the CHI community. Thanks to Max Wilson, the panel coordinator, I got the opportunity to log the event and live-tweeted the whole thing; here are my notes.

Max starting things off, with these comments:

  • Replication is a cornerstone in some fields, in CS it’s often a benchmarking tool.
  • HCI often suffers from generalizability, but replication to fix that problem can be very time consuming.
  • We also aren’t entirely a science community – would you try to replicate art?

Wendy Mackay was the first invited panelist:

  • CHI crosses disciplines, and so do attitudes about replication.
  • We often draw from experimental psychology (start with a model, revise the model, and replicate things in between), as well as from ethnography (observations and re-observations).
  • These approaches focus on developing theories or knowledge about the world, whereas design focuses on building artifacts.
  • We also draw from engineering and computer science – engineering has repetition, but much of CS does not.

Harold Thimbleby followed:

  • He promised to share a core science background (as opposed to Wendy’s psychology framing).
  • “The only reason you’re in this room today is because you’ve got hope [...] to live [...] and hope for the future of CHI.” (maybe paraphrased a bit!)
  • CHI hopes to change the world for the better. In order to do it with confidence, we often use statistics measuring our confidence.
  • We get excited at conferences by ideas, and we go home and try to use those ideas. That’s replication.
  • Those iterations cause evolution-like improvements of ideas and knowledge.
  • Deliberate reproducibility is good science, and it can train young scientists and fix issues.
  • “Non-reproducibility is cheating” – if we don’t make the process needed to reproduce work clear in papers, we fail as authors.
  • In reality, we need to get people to use our ideas. We write papers to spread our ideas.
  • “Sadly, most of what we publish isn’t reproducible.”
  • A third of papers published in a machine learning journal weren’t reproducible (this was determined by a survey of authors in that journal).
  • HT replicated this by asking three other journals and found the same thing – this is a problem in computer science, not just in HCI.
  • We can look at post-war cargo cult examples as a parallel to our work – they built planes and other war paraphernalia hoping that it would result in cargo drops, but missed the point. Similarly, we often neglect to reproduce things at a useful level.
  • We do have reasons for not being reproducible, including business ones. A study of different Casio calculator models saw different answers to arithmetic problems, which was obvious not something Casio wanted published.
  • Being reproducible on consumer devices can be really detrimental to a business.
  • “Go forth and reproduce [create new scientists] and be reproducible [with your work].”

Next up was Ed Chi, with the point of view of industry research:

  • “There is more to replication than simply duplication.”
  • Early contributions to the field came from computer scientists and cognitive psychologists.
  • In a memo establishing HCI research at PARC, it was evident that there was a need to establish HCI as a science.
  • The intellectual heritage of HCI comes from Vannevar Bush and JCR Licklider, augmenting cognition.
  • Our background comes from psychology, where replication is the norm (echoing Wendy).
  • Psychology teaches students early on to design good studies.
  • In CHI 97, there was a browse-off – the hyperbolic browser won, but replication attempts showed no clear winner.
  • Individual differences in subjects where overwhelming anything in the design of the browser, showing the value of replication as a tool to more fully understand what was happening.
  • This first experiment at CHI 97 was just the beginning of something bigger, and that’s why replication was needed, and is still needed.

Michael spoke next on behalf of grad students everywhere:

  • He couldn’t speak for everyone but used an “unassailable, extremely scientific data collection protocol” (this is facetious) and got responses from 93 students (his social network and student volunteers).
  • 83% of grads hadn’t ever replicated a study, 62% said “hell no” they never would replicate a study or a system.
  • One response said “I’m more creative than that”, another said “New studies confirming old studies have no chance of publication.”
  • There’s a general perception that reviewers don’t feel that work is necessary, and that it isn’t novel.
  • “The grad student must conform”, and so, since no one’s publishing replication work, there isn’t any more being published.
  • He also solicited haikus – “Think analyzing / CMC is tough? Try it / reproducibly” and “Repeat to be sure / We stand on giant’s shoulders / But do so on faith.”

Dan Russell from Google, speaking with the experience of someone with access to large data sets:

  • What CHI insights can we replicate?
  • Replicating a measure should be straightforward, but it’s not in our very diverse community.
  • The knowledge needed for replication sometimes gets left out of papers.
  • Changing things slightly, such as in wording or font, can dramatically change the ability to reproduce work – so does a change on the web.
  • DR was conducting a study about finding difficult-to-locate information online, and suddenly, everyone got WAY better… because someone had posted the answer online on a Q&A site! Changes that are out of our control online can dramatically affect reproduction.
  • Google is kind of a Large Hadron Collider. We can’t reproduce the LHC studies without our own, so we must take them on faith. Likewise, we don’t all have access to Google’s huge data sets or user bases, as so we must take some of that on faith as well.
  • “Ultimately, we are a faith-based community. And that’s the nature of science.”

NB that the panelists posted statements beforehand on; look there for more detailed summaries.

There were several questions and comments that prompted discussion. I’ve gotten them down here as best I could. Apologies if I’ve misquoted or misattributed anything!

  • Gary Olson, from UC Irvine – Wendy said we should replicate and extend. [...] Extension is critical.
    • Wendy – “I of course agree. But there’s a disciplinary issue.” Something are relatively easy, depending on what their intellectual heritage is, some can’t be done.
    • Ed – We often place the responsibility of generalizability on the author. He or she must make that claim. In other fields, that burden falls on the reader.
  • Sharoda Paul from PARC – We must address the interdisciplinary nature of CHI. How can we manage the expertise and backgrounds between reviewers?
    • Ed – depending on the person, there can be a sense of “why should we waste our time on replication?” – but replication can heighten understanding.
    • Ed – part of the goal of this panel is to change the between-reviewers issue.
    • Harold – we should note that there are different types of reproducibility:
      1. Replication work done to acquire skills and to learn.
      2. Just redoing work (because of a failure to immerse oneself in literature), which is not publishable. (This is the bad kind of reproduction.)
      3. Writing papers honestly to be reproduced.
      4. Reproduction with an adaptation to a different area, or an extension on previous knowledge.
    • Wendy – part of it may also be finding ways to publish more philosophical things. PC meetings are a place where things like this are discussed as well.
  • Eric Baumer from Cornell – “Replication is not reproduction.” There are different kinds of replication; we should consider what replication means.
  • Lorrie Cranor from CMU – SOUPS gets around paper length issues by including appendices with information for reproduction.
    • Wendy – we should think about who will be reproducing the work as well – we should let people reproduce work in products, or in things that affect the real world.
    • Wendy – of course there are IP issues, but this could be part of our long term goal. We don’t pursue just science, but world-changing innovations.
    • Michael – Rebuilding systems is so, so hard. We often only have screenshots to go off of, and there might even be errors in the paper. Replication happening in Rob Miller’s HCI class led to a discovery of a constant being off by a factor of 10 in a noted paper.
    • Harold – Papers can also be about inspiring, rather than being about reproduction… or they can be entirely open-sourced.
    • Harold – we should be clear about how reproducible we intend things to be in our papers.
    • Ed – paper limits come from the publishing model, but in the digital world, we need to now change the community standard.
  • Question from an unknown person (sorry! let me know if it was you!) – When you replicate and find different results, what do we do? Some reviewers might be insulted. Do we reproduce things specifically to falsify others’ work?
    • Michael – that feeling echoes grad student opinions, and it’s worsened by the assumption that if you find errant results, you messed up, especially if it’s work by an important researcher.
    • Max – sometimes we reproduce things and it confirms surprising results though – the value of the content may change the value of reproduction.
    • Wendy – the hope is that there are multiple reviewers, and this hopefully means that any controversy is viewed very clearly.
    • Wendy – controversial findings like that are more interesting than others.
    • Michael – Unfortunately, we don’t always know why, and that causes increased skepticism.
    • Michael – It’s good when intro classes include replication of results. It can demystify things.
    • Wendy – I have more faith in program committees than to believe that good papers would disappear if they’re controversial.
  • Lora Oehlberg from Berkeley – Design research discusses failures as well as successes. Do we encourage people not to replicate pointless results, which could be considered failures?
    • Replication of results can improve the quality of data.
  • What’s the role of releasing code in systems work?
    • Ed – “Ownership of code [and data] has been a way research territory is protected. Monetization might be the root of all evil.”

Panelists shared their final thoughts:

  • Max – perhaps we need an alt.chi or similar session called repliCHI, a place for people to publish work like this.
  • Wendy – that might be possible! “I think we should encourage students to replicate in coursework” and then publish like that.
  • Harold – Think of how you can “build something that improves reproducibility” – we can change the models of publication this way.
  • Ed – We must change the HCI curriculum. It doesn’t always [though there are notable cases where it does] include stuff drawn from psychology. We can always experiment in conferences.
  • Michael – There are techniques to “replicate” systems quickly, like as part of a prototyping process, that can inform our design, and we shouldn’t neglect these.
  • Dan – I almost always ask interns to reproduce results. Perfect reproductions are boring, but they’re almost never perfect, and then we learn something.

7 Responses to “CHI 2011’s RepliCHI Panel”

  • David Karger says:

    I agree with the comment that “replication is not reproduction”. We may rarely replicate experiments exactly. But how often do we perform an experiment that informs regarding the same issue as another experiment? From the perspective of exploring a space of possibilities, I think one could argue it is better to perform a different but related experiment (that might reveal something entirely new) than to duplicate a previous one.

  • The opening remark about “would you try to replicate art” strikes me as curious—of course you would. For centuries, artists constantly replicated each other. Either directly (think of all the Lichtenstein which are studies of impressionist and post impressionist paintings) or indirectly (an example here is the collection of artists painting the same scenes of Montmartre). In both of these example cases, as was pointed out here, replication was not about duplication. This speaks to David Karger’s ‘perform a different but related experiment’ comment here; on this theme, Toulouse-Lautrec has been quoted of saying “novelty is seldom essential” for the same reasons. Perhaps, the community is focused on novelty too much?

  • [...] or prior to making “incremental” advances vs. a focus on novelty and innovation. Nice summary here. I think the panelists and audience were generally leading toward increasing the use of replication [...]

  • [...] A blog post summarising the RepliCHI panel. [...]

  • The “unknown person” was Prof. Jan Borchers from the Media Computing Group at RWTH Aachen University in Germany.

  • Niels Henze says:

    I wondered about the statement “A third of papers published in a machine learning journal weren’t reproducible”. I guess it was the Journal of Machine Learning Research. Does anyone have a link that backs up the statement?

  • Melanie says:

    The opening remark about “would you try to replicate art” strikes me as curious—of course you would.