There’s a lot of discussion about the right way to evaluate and support systems research in SIGCHI. Maybe too much. (I’m allowed to say that because I contributed to it, right?) But for this to be a productive conversation, we need to tackle the other half: what makes for a bad systems paper?
I say bad paper, rather than bad research, because often this is about framing and not the actual work. My conversations at CHI and throughout the alt.chi process helped draw out some of the common killer problems that HCI systems papers run into. These are legitimate problems with a paper, and we need to own up to them if we want our work to be taken seriously.
Issue 1. My Contribution is the System
Pete Pirolli hit this one on the nose at the alt.chi presentation. Systems authors often frame the technological artifact they built as the entire contribution of the paper. The fact that I built a system, say one called ACRONYM, is largely immaterial. In a way, it’s part of the evaluation: ACRONYM is proof that the ideas can be instantiated. But what are the ideas driving the system design? In order to learn something from the paper, we need to focus on the ideas rather than the system when describing our contribution.
Issue 2. My Study Proves That This Is Unquestionably The Best
Many social scientists who I talked to complained that systems papers often overclaim their results based on a small study. If you read a CHI paper by one of your favorite social scientists, they are very good at clearly scoping what can and can’t be concluded from a study. CS has a way of always claiming that my ACRONYM system absolutely buries all the competition. If we are a little more careful in our claims, I think it will help many systems papers on the bubble.
This past week at CHI, our very own Michael Bernstein participated in a panel discussion about the role of replication and reproduction in the CHI community. Thanks to Max Wilson, the panel coordinator, I got the opportunity to log the event and live-tweeted the whole thing; here are my notes.
Max starting things off, with these comments:
- Replication is a cornerstone in some fields, in CS it’s often a benchmarking tool.
- HCI often suffers from generalizability, but replication to fix that problem can be very time consuming.
- We also aren’t entirely a science community – would you try to replicate art?
Wendy Mackay was the first invited panelist:
- CHI crosses disciplines, and so do attitudes about replication.
- We often draw from experimental psychology (start with a model, revise the model, and replicate things in between), as well as from ethnography (observations and re-observations).
- These approaches focus on developing theories or knowledge about the world, whereas design focuses on building artifacts.
- We also draw from engineering and computer science – engineering has repetition, but much of CS does not.
Harold Thimbleby followed:
- He promised to share a core science background (as opposed to Wendy’s psychology framing).
- “The only reason you’re in this room today is because you’ve got hope [...] to live [...] and hope for the future of CHI.” (maybe paraphrased a bit!)
- CHI hopes to change the world for the better. In order to do it with confidence, we often use statistics measuring our confidence.
- We get excited at conferences by ideas, and we go home and try to use those ideas. That’s replication.
- Those iterations cause evolution-like improvements of ideas and knowledge.
- Deliberate reproducibility is good science, and it can train young scientists and fix issues.
- “Non-reproducibility is cheating” – if we don’t make the process needed to reproduce work clear in papers, we fail as authors.
- In reality, we need to get people to use our ideas. We write papers to spread our ideas.
- “Sadly, most of what we publish isn’t reproducible.”
- A third of papers published in a machine learning journal weren’t reproducible (this was determined by a survey of authors in that journal).
- HT replicated this by asking three other journals and found the same thing – this is a problem in computer science, not just in HCI.
- We can look at post-war cargo cult examples as a parallel to our work – they built planes and other war paraphernalia hoping that it would result in cargo drops, but missed the point. Similarly, we often neglect to reproduce things at a useful level.
- We do have reasons for not being reproducible, including business ones. A study of different Casio calculator models saw different answers to arithmetic problems, which was obvious not something Casio wanted published.
- Being reproducible on consumer devices can be really detrimental to a business.
- “Go forth and reproduce [create new scientists] and be reproducible [with your work].”
Next up was Ed Chi, with the point of view of industry research:
- “There is more to replication than simply duplication.”
- Early contributions to the field came from computer scientists and cognitive psychologists.
- In a memo establishing HCI research at PARC, it was evident that there was a need to establish HCI as a science.
- The intellectual heritage of HCI comes from Vannevar Bush and JCR Licklider, augmenting cognition.
- Our background comes from psychology, where replication is the norm (echoing Wendy).
- Psychology teaches students early on to design good studies.
- In CHI 97, there was a browse-off – the hyperbolic browser won, but replication attempts showed no clear winner.
- Individual differences in subjects where overwhelming anything in the design of the browser, showing the value of replication as a tool to more fully understand what was happening.
- This first experiment at CHI 97 was just the beginning of something bigger, and that’s why replication was needed, and is still needed.
Michael spoke next on behalf of grad students everywhere:
- He couldn’t speak for everyone but used an “unassailable, extremely scientific data collection protocol” (this is facetious) and got responses from 93 students (his social network and student volunteers).
- 83% of grads hadn’t ever replicated a study, 62% said “hell no” they never would replicate a study or a system.
- One response said “I’m more creative than that”, another said “New studies confirming old studies have no chance of publication.”
- There’s a general perception that reviewers don’t feel that work is necessary, and that it isn’t novel.
- “The grad student must conform”, and so, since no one’s publishing replication work, there isn’t any more being published.
- He also solicited haikus – “Think analyzing / CMC is tough? Try it / reproducibly” and “Repeat to be sure / We stand on giant’s shoulders / But do so on faith.”
Dan Russell from Google, speaking with the experience of someone with access to large data sets:
- What CHI insights can we replicate?
- Replicating a measure should be straightforward, but it’s not in our very diverse community.
- The knowledge needed for replication sometimes gets left out of papers.
- Changing things slightly, such as in wording or font, can dramatically change the ability to reproduce work – so does a change on the web.
- DR was conducting a study about finding difficult-to-locate information online, and suddenly, everyone got WAY better… because someone had posted the answer online on a Q&A site! Changes that are out of our control online can dramatically affect reproduction.
- Google is kind of a Large Hadron Collider. We can’t reproduce the LHC studies without our own, so we must take them on faith. Likewise, we don’t all have access to Google’s huge data sets or user bases, as so we must take some of that on faith as well.
- “Ultimately, we are a faith-based community. And that’s the nature of science.”
NB that the panelists posted statements beforehand on replichi.org; look there for more detailed summaries.
There were several questions and comments that prompted discussion. I’ve gotten them down here as best I could. Apologies if I’ve misquoted or misattributed anything!
- Gary Olson, from UC Irvine – Wendy said we should replicate and extend. [...] Extension is critical.
- Wendy – “I of course agree. But there’s a disciplinary issue.” Something are relatively easy, depending on what their intellectual heritage is, some can’t be done.
- Ed – We often place the responsibility of generalizability on the author. He or she must make that claim. In other fields, that burden falls on the reader.
- Sharoda Paul from PARC – We must address the interdisciplinary nature of CHI. How can we manage the expertise and backgrounds between reviewers?
- Ed – depending on the person, there can be a sense of “why should we waste our time on replication?” – but replication can heighten understanding.
- Ed – part of the goal of this panel is to change the between-reviewers issue.
- Harold – we should note that there are different types of reproducibility:
- Replication work done to acquire skills and to learn.
- Just redoing work (because of a failure to immerse oneself in literature), which is not publishable. (This is the bad kind of reproduction.)
- Writing papers honestly to be reproduced.
- Reproduction with an adaptation to a different area, or an extension on previous knowledge.
- Wendy – part of it may also be finding ways to publish more philosophical things. PC meetings are a place where things like this are discussed as well.
- Eric Baumer from Cornell – “Replication is not reproduction.” There are different kinds of replication; we should consider what replication means.
- Lorrie Cranor from CMU – SOUPS gets around paper length issues by including appendices with information for reproduction.
- Wendy – we should think about who will be reproducing the work as well – we should let people reproduce work in products, or in things that affect the real world.
- Wendy – of course there are IP issues, but this could be part of our long term goal. We don’t pursue just science, but world-changing innovations.
- Michael – Rebuilding systems is so, so hard. We often only have screenshots to go off of, and there might even be errors in the paper. Replication happening in Rob Miller‘s HCI class led to a discovery of a constant being off by a factor of 10 in a noted paper.
- Harold – Papers can also be about inspiring, rather than being about reproduction… or they can be entirely open-sourced.
- Harold – we should be clear about how reproducible we intend things to be in our papers.
- Ed – paper limits come from the publishing model, but in the digital world, we need to now change the community standard.
- Question from an unknown person (sorry! let me know if it was you!) – When you replicate and find different results, what do we do? Some reviewers might be insulted. Do we reproduce things specifically to falsify others’ work?
- Michael – that feeling echoes grad student opinions, and it’s worsened by the assumption that if you find errant results, you messed up, especially if it’s work by an important researcher.
- Max – sometimes we reproduce things and it confirms surprising results though – the value of the content may change the value of reproduction.
- Wendy – the hope is that there are multiple reviewers, and this hopefully means that any controversy is viewed very clearly.
- Wendy – controversial findings like that are more interesting than others.
- Michael – Unfortunately, we don’t always know why, and that causes increased skepticism.
- Michael – It’s good when intro classes include replication of results. It can demystify things.
- Wendy – I have more faith in program committees than to believe that good papers would disappear if they’re controversial.
- Lora Oehlberg from Berkeley – Design research discusses failures as well as successes. Do we encourage people not to replicate pointless results, which could be considered failures?
- Replication of results can improve the quality of data.
- What’s the role of releasing code in systems work?
- Ed – “Ownership of code [and data] has been a way research territory is protected. Monetization might be the root of all evil.”
Panelists shared their final thoughts:
- Max – perhaps we need an alt.chi or similar session called repliCHI, a place for people to publish work like this.
- Wendy – that might be possible! “I think we should encourage students to replicate in coursework” and then publish like that.
- Harold – Think of how you can “build something that improves reproducibility” – we can change the models of publication this way.
- Ed – We must change the HCI curriculum. It doesn’t always [though there are notable cases where it does] include stuff drawn from psychology. We can always experiment in conferences.
- Michael – There are techniques to “replicate” systems quickly, like as part of a prototyping process, that can inform our design, and we shouldn’t neglect these.
- Dan – I almost always ask interns to reproduce results. Perfect reproductions are boring, but they’re almost never perfect, and then we learn something.
Recently the CHI workshop on Crowdsourcing and Human Computation got some press courtesy of Jim Giles and New Scientist. Near the end of the workshop, the working group on Future Directions and Community had some interesting suggestions that I’ll echo here.
Can we take some of the crowdsourcing tools and techniques we have developed as a community and put them to use in our own publishing and review processes?
- Use online tools to disseminate research quickly. Arxiv.org plays a part of this role, but it’s more of a database than a venue.
- Significantly shorten review periods. What if research could come back with an initial review 48 hours after submission? We have early evidence that fewer reviews may be necessary in the early stages.
- Maintain living documents where the authors can publish errata and appendices.
- Cross traditional disciplinary boundaries so that authors don’t need to choose between publishing in a human computation venue and a “home” venue.
The bigger question put to the group was: should crowdsourcing and crowd computing develop into their own disciplines, or continue to jump around between existing conferences in the ACM, IEEE and AAAI?
I’ve been attending the CHI conference in Vancouver this week, presenting some of my work on database user interfaces. It was interesting to attend Tuesday’s “Re-Engineering Health Care with Information Technology” panel and hear about what appears to be one of the biggest application areas for database UIs on the planet: Electronic Medical Records (EMRs). Ben Schneiderman referred to the thousands of different systems that are currently used for communications between and within health care institutions as a giant “Medical Internet” that indirectly serves more Americans (94%) than the regular Internet. US health care spending is currently far higher per GDP (and relative to performance metrics such as life expectancy and infant mortality) than that of any other country in the world, and it is clear that effective IT use must be at least a part of a solution to this problem.
I took note of several interesting anecdotes from the panelists:
- In many cases today, EMRs actually disrupt the workflows of health care workers. A physician may log onto their computer system in the morning, browse through several poorly adapted views of patient records in order to find the information she needs for the day, and then write it down on paper. At the end of the day, she (or her assistant) returns to the computer to type in handwritten changes to the various records involved.
- Thomas Payne, MD, talked about the Computerized Patient Record System (CPRS) of the Veterans Health Administration. The CPRS has been recognized as an example of a highly successful large-scale EMR system in the US. We got to see a screenshot, and it’s actually a good old text-based DOS interface (or at least it used to be in 1997—fair enough).
- There are between 300 and 600 vendors of EMR systems in the US, and they differentiate themselves by each having a separate architecture and user interface. Thus, a physician who might work at one hospital for three days a week and another for two will need training in two completely different systems.
Although I’ve been going to CHI for a few years, I still feel like something of a foreigner, not certain which talks to attend. Many of my friends and colleagues probably have a much better idea than I of which talks are given by speakers I would like and which offer insights I would find particularly valuable. So I try to ask around, but I often get the information too late.
So I convinced my students, Michael Bernstein and Adam Marcus, to build a system to help me out. We connected our FeedMe recommender system (presented in a paper at CHI last year) to the CHI program presented in Danny Soroker’s Eventmap. As you build your own personal program of talks to attend, you can also recommend any you think I (or any of your other friends) will be interested in. I hope you will.
Eventmap already let you browse for talks you might attend (I suggest using the table view, which shows all the abstracts) and click on them to add them to your own schedule. Now you’ll also get a “recommend using Feedme” button. If you click it then you’ll be able to specify email addresses of friends who’ll be interested. FeedMe will take care of notifying them of your recommendation and incorporating it into their personal eventmaps—if they log into FeedMe they’ll see a little green bubble over each talk that’s been recommended to them. A convenience of Feedme is that after a little bit of practice, it will start to guess which of your friends you’re going to recommend a particular talk to, and let you do so with a single click instead of typing in email addresses.
FeedMe also works as a standalone system; you can use it recommend any google newsreader story, or any arbitrary webpage, to any of your friends. You can find details on the FeedMe site.
Feedme reflects our interest in friendsourcing—getting your friends to help you in crowdsourcing workflows that rely on their knowledge of you. While I wouldn’t expect random crowds to do a very good job recommending information to me, I can hope that people who know me and my interests well can do a better job than any pure-computer (e.g. machine learning) system. So please, while you’re looking over the CHI program to plan your attendance, if you see a paper you think I’ll like, fire off a recommendation to me using Feedme, and I’ll thank you for it (every feedme notification includes a one-click thank-you button). And if you’d like to receive some recommendations, tell your knowledgeable friends about Feedme!
I attended the VISSW 2011 workshop last sunday. It was fun, but a few of the papers exhibited a painfully familiar pattern: they put together a plausible-seeming user interface but didn’t evaluate it with a user study. I left frustrated, with no sense of whether the ideas of the interfaces would be good or bad to incorporate in my own work. With the system already implemented, other researchers are disincentivized from implementing (it wouldn’t be novel) so they can’t evaluate it. Thus, if the original researchers don’t do the evaluation, nobody will. This is a not uncommon complaint in computer science—our field doesn’t seem committed to following through with evaluations of the ideas they invent and implement. Some faculty at Stanford have even created a course aimed at teaching students how to properly evaluate their research systems.
So here’s a proposal for improving the incentives a little bit. Change the submission requirements for conference papers: they have to contain the system description and the hypothesis to be tested, along with a detailed evaluation plan. Papers are then evaluated and accepted on the basis of a commitment to execute the evaluation plan (and update the paper with results) before the conference but after acceptance.
This approach would have several benefits.
- Researchers could defer the work of evaluation until their submission is accepted. Once it’s accepted, they have strong motivation to do the evaluation (else the paper cannot be presented). For work that turns out not to be publishable, the evaluation work is not wasted.
- The evaluations would take place after the submission deadline, meaning work on the system could continue right up to that deadline. This gives us something to do in the “dead space” between acceptance and presentation (which is forced upon us by the long lead time for required for travel planning). The work presented at the conference would be “fresher”; the long lead time on conference submission would have less impact on the publication of timely results
- This approach would also address the recently popularized problem of a bias towards positive-outcome evaluation that may lead to incorrect claims of statistical significance in outcomes. If reviewers consider a paper that contains only the system and evaluation procedure, they will be forced to asses the paper purely on the grounds of whether the proposed system is interesting enough to be worthy of evaluation. If it is, then the paper should be accepted regardless of whether the outcome of that evaluation is positive or negative. If it is not, then the inclusion of a positive evaluation should not change the rejection decision.
Turning to logistical concerns, this approach means that the paper is not finalized until shortly before the conference (a couple weeks, to give reviewers a chance to confirm that the evaluation plan was followed). But as more conferences move towards electronic-only publication, this schedule becomes feasible. And this scheme wouldn’t cover e.g. multi-year longitudinal evaluations. But it would certainly cover a large number of the papers with short (inadequate?) user studies appearing in our HCI conferences.
Of course, there’s the simpler approach of requiring evaluations at submission. This meets the primary goal of having systems evaluated, but loses the three benefits I’ve outlined above: researchers invest energy evaluating systems that would be rejected independent of the evaluation; the evaluation work will be older/staler by the time of the conference, and the bias of reviewers to accept positive results would continue.
A few weeks ago, I finished writing a thought piece with Mark Ackerman, Ed Chi, and Rob Miller about the state of systems research in social computing. It grew out of conversations with a lot of researchers in the area, and examines questions of novelty, evaluation, and the industry/academia question in the field.
I submitted the paper to alt.chi, where it generated quite a bit of discussion in the alt.chi open reviewing process: twenty two reviews (twenty one “very high interest”s, and one “high interest”). To be honest, I was really blown away by the positive response. I chose alt.chi as a venue because wanted to get a lot of feedback, and that worked out in spades.
In the spirit of alt.chi’s open process, I’m now going to open up some of those reviews back to the community so that I can make the paper even better. (While I wouldn’t do this kind of thing for typical paper, I think that alt.chi reviews are written with a higher expectation of openness, so it’s OK in this case.) These are some of the most cogent points I took away from the feedback, and they are what I’m going to try and address before Friday’s final deadline. I’d love to see continued discussion here in the comments if you have thoughts. There are a lot, but I’ve tried to highlight main points.
Here’s the submitted PDF, and the original abstract:
Social computing has led to an explosion of research in understanding users, and has the potential to similarly revolutionize systems research. However, the number of papers designing and building new sociotechnical systems has not kept pace. In this paper we analyze the reasons for this disparity, ranging from misaligned methodological incentives, evaluation expectations and research relevance compared to industry. We suggest improvements for the community to consider and evolve so that we can chart the future of our field.
Here we go — these are my favorite comments, both from the reviewing process and out-of-band emails I got. Please share any thoughts or reactions! (I’ve stripped reviewer names and affiliations for privacy reasons.)
- Is it a rant?: “The paper felt a bit too much like a list of particular criticisms ACs have raised against your papers in the past. It was unclear how principled and complete of an exploration of the problems of social computing systems research it is. How pervasive are criticisms of exponential growth and snowball sampling, really? Aren’t they just easy stand-ins for ACs to sidestep underlying, thornier problems?”
- Discussion of industry vs. academia: “[It] was too simplistic. I think a third way that should be explored much more is to what extent academia can partner with either large industry or small startups. See Joel Brandt’s collaboration w/ Adobe, Niki Kittur’s with Wikimedia, etc.”
- Distinction between spread and steady state: “In a ‘living’ social computing system, there is no simple steady state. To maintain the appearance of continuity, the system itself has to be constantly updated, changed, tweaked to respond to the changing balance and makeup of the user community; to keep up the arms race against spammers, etc. Steady state is an illusion created by the never-ending work of the maintainers of social computing systems.”
- The 4:1 submission ratio: “There is an implicit claim that the number of papers submitted, or accepted, is roughly equivalent to the impact of a particular type of research. The ratio of “understanding users” to “systems” was 4:1 – so what? Is this a declining trend or steady state? Most papers end up being read (and cited) infrequently. This may be especially true about papers that study and describe populations in systems with half-lives of 2-3 years. How many study papers that are 10+ years old do you still consider worthwhile? How many systems papers? Is there a real imbalance at that scale?”
- Snowball sampling disagreement: “I think that, in most cases, this is undesirable, except in cases where the target user demographic is the same as our social networks (e.g., highly educated tech early adopters)”
- Field study difficulty: There is an unnecessary slam on lab research as being too easy. We need to be more balanced here.
- Arguments aren’t particularly “controversial”: we’re not taking a stand that’s horribly divisive. (That’s fine with me. I’m OK with just drawing out the issues.)
- Generalizability: Some reviewers felt that these results could generalize beyond social computing to other areas. Others felt that we should broaden even to traditional CSCW topics like small group collaboration and communication. Many people felt that these arguments resonated even outside of our direct community. I’m honestly less sure of my footing here; I don’t want to overclaim.
- Stronger argument why academia matters: “The argument could be made stronger for why should social computing systems should have a place at CHI or in academia if they can be done in industry with more access to data and better resources. The authors mention market incentives that can be avoided in academia. However, the majority of researchers have to find funding from the NSF or from industry so there are markets in both cases.”
- Why do social computing systems matter?: “This submission could be stronger, especially for young PhD researchers, if it clearly outlined what contributions social computing systems research brings to the table. Why is it important that it be done?” “More discussion on the goals and the assessment of quality of social computing research would be extremely helpful.”
- Qualitative studiers: Don’t forget about Studiers in anthropology, cultural and media studies. “These qualitative studiers often ask for research to a) engage in actual conversations with users and b) discuss the larger cultural and societal implications of one’s system.”
- Big Data vs. Industry: “It does not, however speak to the so-called Big Data movement we have seen in Social Computing (and that has been addressed in various forms by myself, Scott Golder, d boyd, and others. While this is a bit orthogonal, it does address the sampling questions also detailed in the article.”
- Builder/Studiers too simplistic?: “I think that there’s continually the problem in CHI that it’s a conference of minorities, and it’s a case of 20% builders, 20% studiers, 20% designers, 20% usability people studying Fitts Law until their socks fall off and so on. I’m not sure I agree with their characterization that ‘the prevalence of Studiers in social computing means that Studiers are often the most available reviewers for a systems paper on a social computing topic’. My experience is that whoever I want to review my paper – studiers for a study, builders for a technical system – I’ll end up with someone from the wrong place who can’t understand.“
- Replication: “If replication isn’t highly valued in our community, then one possible outcome is that the expectations for a social computing systems paper become quite high. The paper would have to not only introduce the system, but also provide a solid evaluation of it, because the bias against replication implies that future evaluations aren’t likely to be forthcoming.”
We’ve gotten the news that both our Knight News Challenge proposals have made it through the first round. Both projects propose to deploy systems that we think will be useful to bloggers of all sorts, but particularly of a journalistic bent. I’m continuing to seek users interested in alpha testing these tools. That means you get to run into all the problems nobody’s seen before, but also means that you get to impact our work by telling us which problems bug you the most.
The first project, Datapress, is a wordpress plugin that brings our Exhibit framework into your blog, so you can add rich data visualizations to your blog posts using the same old WYSIWYG editor you’ve been using so far. I blogged about it here. The second, Tipsy, is a tool for collecting voluntary micro-donations from the people who consume content from your blog. Blog post here. Please, follow the links to take a look at the proposals (and comment upon them favorably!). And if you’re interested in trying out either tool, send me an email. Please also spread the word to others who might be interested.
I just finished submitting my reviews for WWW and, as has happened several times before, not one of the papers I reviewed had even a remote chance of acceptance. Nonetheless, each such paper got three reviewers carefully writing down the (same) reasons why it wouldn’t get in.
I wrote in a post last year about why I think three reviewers per paper is overkill, but even if we do put three reviewers per paper on average, is it really best for our field that each paper get exactly three reviews? Wouldn’t it be better for us to direct more of our reviewing effort towards papers that will actually be published and read by the community? I propose that a sequential reviewing mechanism would work better. Let one reviewer take a preliminary read of the paper. If it is clearly out of scope, or clearly below the bar, they can say so. Then let the next reviewer take a turn. They can simply “concur” with the first review or choose to write another. Ditto for the third reviewer. Finally, once a paper has been accepted, sic a fourth reviewer on it who can be as harsh as they like reviewing to enhance the paper because they know it won’t affect acceptance.
If I knew that I could cut my effort on weak papers, I’d have more cycles to spend on the boundary papers, where a close look actually matters to outcomes, and on accepted papers, where my time would produce a better paper for the community. In fact, I’d also be willing to spend more time on the weak papers where I was first reviewer, since I’d know that’s the only review they’re going to get. In general, knowing whether I was reading a weak, borderline, or strong paper would completely change the way I read it.
Obviously this proposal runs in the face of the tradition of “independent” reviews. But I just don’t believe that reviewers are so spineless. I’ll certainly have no problem expressing my opinion just because it’s different from someone else’s. Keeping reviewer identities anonymous is also a good way to allow that kind of debate. I’ve also heard concerns about scheduling. But I think they’re overblown. We offer incredibly long reviewing timelines in order to give our reviewers flexibility. If that timeline is segmented, they still have flexibility: a reviewer can choose to review only in a particular segment if that’s all they’ve got free, or to spread their reviews over multiple segments if they’d rather.
I hope some PC takes a stab at experimenting with this. We’re scientists; we should be willing to test our own hypoteses about what makes a good program committee process.