Data by the people, for the people

Inspired by danah boyd, I decided to create a crib of my talk at HarambeeNet 2010. It discusses the boundaries between data and people, as well as our recent work on the Soylent project. Please let me know any feedback you have!

Data by the people, for the people: Powering interactions via the social web

When we’re talking about social networks in computer science education, we have two methodological traditions to fuse. One is computer science, which we can see here through the lens of network science. It puts the network primary. Here is the first figure in the upcoming Easley and Kleinberg textbook “Networks, Crowds and Markets”, of a 34-person karate club:

This quote from the introduction lays out the focus and formalism that this approach uses:

“In the most basic sense, a network is any collection of objects in which some pairs of these objects are connected by links.” – p2, Easley and Kleinberg

This is an appealing definition and approach, because it provides a mathematical formalism that enables us to derive proofs, reason about groups at high levels and write interactive systems like Facebook. It doesn’t matter than friendship is a fuzzy concept: so long as both parties have agreed that it’s an undirected edge, we can do friend recommendation, build a news feed, and compute tie strengths (or as Facebook calls it, EdgeRank). It’s a very top-down approach, because computer scientists are good at dealing with lots of data.

The other strong tradition in this space is characterized by social science: social psychology, sociology, cultural anthropology, and the broad spectrum of ideas and methodologies encompassed by conferences like CSCW. Where computer science approach may put the network primary, social science puts the person primary. The goal of this approach is to understand why those links form, what they mean, and how they are utilized. This can be very bottom-up: social psychology, for instance, tends to take the individual as the unit of analysis. It asks questions like, “Why do groups form and split?”

If we look at the original paper that Easley and Kleinberg got the data for this figure from, it says this (emphasis added):

“The analysis of patterns of social relationship in the group is then conducted on the graph, which is merely a shorthand representation of the ethnographic data.”

The network is more an aid to the processes being investigated.

When cultures collide, if we naively follow our methodological training, expectations get mismanaged. In her keynote at WWW, danah boyd critiqued the approach that many computer scientists take when they consider network problems:

Many of you are sitting on terabytes of data about human interactions. The opportunities to scrape data – or more politely, leverage APIs – are also unprecedented. And folks are buzzing around wondering what they can do with all of the data they’ve got their hands on. But in our obsession with Big Data, we’ve forgotten to ask some of the hard critical questions about what all this data means and how we should be engaging with it.

[boyd, danah. 2010. "Privacy and Publicity in the Context of Big Data." WWW. Raleigh, North Carolina, April 29.]

danah is referencing ethical and privacy questions largely, but there is an even bigger implication for computer science in my mind: we cannot write crowd programs without really knowing what it is that the crowd is doing. Without this, computer scientists will ignore human aspects of our data, and social scientists won’t be able to take advantage of computer science’s toolset.

danah would talk about the de-anonymization of the Netflix dataset. I have another angle on the situation: understanding humans was what ultimately won the million dollars. Basic collaborative filtering techniques can get you so far. But one of the techniques that BellKor’s Pragmatic Chaos used was temporality: it turns out that when people rate a bunch of movies at a time, they tend to be movies that they saw a long time ago. And those kinds of movies exhibit a specific kind of rating pattern, with older movies rated higher:

The authors speculate, but I think this has to do with cognitive psychology: that we are much more likely to remember events with high emotional arousal than those without, and more likely to remember remember positive events than negative events.

So it is when we program systems involving networks and crowds. We have a lot of data, and even more interest in that data, as demonstrated by the number of influential and award-winning papers that have been written by the amazing people sitting in front of me right now. When we talk about data, we are fundamentally bridging the attractive networks abstraction and the equally attractive social science abstraction. When we’re successful like BellKor’s Pragmatic Chaos was, it takes us farther than either process in isolation.

I’m a social computing systems builder: I build interfaces that are powered by social data and interfaces that encourage social interaction. To do this well, I have to get this balance right. I want to share with you a few ways in which I’ve been using the social web to develop new tools, and the ways in which we have wrestled with humans and algorithms simultaneously to make them work.

Soylent: A Word Processor with a Crowd Inside

I want to start with a discussion of Mechanical Turk. For years, human-computer interaction researchers have used Wizard of Oz techniques to prototype interactive systems. This technique typically meant having one of the design team members behind a curtain simulating parts of an artificial intelligence that hadn’t been built yet. But, we now have artificial intelligence for hire via services like Amazon Mechanical Turk, where you can pay cents for workers largely in the U.S. and India to perform tasks for you. The Soylent project asks: what happens when you embed those workers inside of an interface — when you have a Wizard of Turk? Can we help end users when interfaces aren’t necessarily bound by AI-hard problems any more, but by humans?

Here are a few preliminary thoughts, which will show up at the ACM UIST conference this year.

We are focused on writing. We’ve learned to write since grade school; it’s the stock and trade of how most of us exchange ideas today. I think we can all agree that writing is hard. Even seasoned experts will make mistakes: non-parallel constructions, typos, or just plain being unclear. If we make a high level decision like changing a story from past tense to present tense or shifting references from ACM format to MLA format, we have to execute a daunting number of tasks. And of course, when we have that 10-page limit and our paper is 11 pages, we spend hours whittling our writing down to size.

Shortn

Let’s take the example of trying to shorten a document by a few lines, via Soylent’s Shortn component. Soylent gets a group of Mechanical Turkers to examine your paragraph, find sections that are wordy or verbose, and propose shorter alternatives. Then we can provide a single slider to let you shorten your paragraph to a desired length. In our evaluations we found that Shortn can cut a paragraph down to about 85% of its original length without making any major cuts to language or content.


Crowdproof
Microsoft Word’s proofreading capabilities are still quite poor — they miss all sorts of problems. But we can get Mechanical Turk workers to skim behind you as you write your document and flag errors that Word misses. They’ll specify why it’s a problem in plain English, as well as some ways to fix it. You can simply click on the underlined text in Word and replace your text with the suggestion.




The Human Macro
There are lots of tasks in Word Processing that would require complex macro programming to complete. Maybe you’re trying to flesh out citation sketches into a real references section, or transform a short story from past tense to present tense — you need to execute daunting numbers of edits over the whole document. Instead, we can simply get Turkers to map over the entire document, executing these tasks for you. In our studies we’ve recruited turkers to find Creative Commons figures to describe a paragraph, find BibTeX for citation sketches like [Bernstein UIST 2010] and change the tense of a story. These requests written by users are unclear, misspelled, and worse, but Turkers can still make sense of them.

Find-Fix-Verify

I can tell you that these features work — but they wouldn’t if you naively asked Turkers to take on the tasks. Empirically about 30% of work on open-ended tasks where we ask Turkers to directly edit the user’s data produce a bad result. This is unacceptable; we can’t just push it on to the user. But since this is human data, we can try to get inside the humans’ heads to see why it’s happening. We’ll illustrate via two personas: the Lazy Turker and the Eager Beaver.

The Laxy Turker tries to get away with as little work as possible. When given an error-filled paragraph from a high school essay site and asked to edit it, the Lazy Turker submits just one minor change to a mis-spelled word. Why did they make this change? Because it was underlined in red in the user’s browser as a misspelling. The Lazy Turker wants to signal that they made a change, and do that with as little work as possible.

The Eager Beaver also wants to signal that they made an effort, but they overcompensate. Given the same paragraph, in addition to other edits, this Eager Beaver added newlines between each sentence. This would be problematic to return to the user.

So we can’t just go ahead and give these results back to the user: we need a way to control and channel the efforts of the Turkers. These biases hold across the 9000 Turkers we’ve interacted with for this project. Turkers are looking for a way to signal that they made a contribution: for the Lazy Turkers, this means the smallest noticeable action, and for Eager Beavers, it means overcompensating. The situation seems comparable to the state of programming interactive systems before we had design patterns like Model-View-Controller to describe best practices.

We’ve developed one design pattern, Find-Fix-Verify, for tasks like Soylent’s. Find-Fix-Verify splits open-ended tasks into three pipelined stages where each Turker can make a clear contribution.

The first stage is called Find. In Find, ten Turkers are asked to identify portions of the text that can be improved, but not to do any improvement themselves. We can then use independent voting to find patches of the text that multiple Turkers called out.

In the second stage, Fix, three to five Turkers are shown one of the identified patches and asked to provide an alternative. If 30% of the open-ended work is poor as I said earlier, then this is enough to provide at least one good alternative.

In the last stage, Verify, we do quality control. A third set of Turkers sees all the alternatives generated by the Turkers in the FIx stage and flag poor submissions. Each task may have different requirements: for example, in Shortn we want to flag submissions that introduce style errors and those that change the meaning of the paragraph.

Why do we need to split Find from Fix? If we didn’t, then all the Lazy Turkers would fix the same problem, and we’d have wasted a lot of money. By running the Find stage first, we can utilize the Eager Beavers to find patches that the Lazy Turkers would ignore, then force the Lazy Turkers to work on those patches instead. A separate Fix stage also means that we know which edits are trying to fix the same underlying problem.

Why add Verify? Certainly, it adds lag to the process in return for removing poor work. But, we’ve found that the best way to do quality control is to use Turkers to vet the work of other Turkers — to put them in productive tension.

Conclusion

Soylent is just one example of an interface that is powered jointly by a human and an algorithm. We had to recognize the fundamentally human process underlying the technology and adjust our approach to match it.

Moving forward, when we research and teach social networks, we will benefit hugely if we remember:

1. Data is made of people: Soylent’s human-crowd algorithms depend entirely on humans;
2. Data is made by people: a crowd created the humongous Twitter datasets we use today;
3. Data is made for people: the people in those datasets created the data to communicate with each other, not with computer scientists.

Thanks! The Soylent paper will be made public in a week or two.

HarambeeNet Brainstorm

Data Are People Too

APIs,

When we’re talking about social networks in computer science education, we have two methodological traditions to fuse. One is computer science, which we can see here through the lens of network science. It puts the network primary. Here is the first figure in the Easley and Kleinberg textbook, of a 34-person karate club.

This quote from the introduction lays out the focus and formalism that this approach uses:

“In the most basic sense, a network is any collection of objects in which some pairs

of these objects are connected by links.” – p2, Easley and Kleinberg

This is an appealing definition and approach, because it provides a mathematical formalism that enables us to derive proofs, reason about groups at high levels and write interactive systems like Facebook. It doesn’t matter than friendship is a fuzzy concept: so long as both parties have agreed that it’s an undirected edge, we can do friend recommendation, build a news feed, and compute tie strengths (or as Facebook calls it, EdgeRank). It’s a very top-down approach, because computer scientists are good at dealing with lots of data.

The other strong tradition in this space is characterized by social science: social psychology, sociology, cultural anthropology, and the broad spectrum of ideas and methodologies encompassed by conferences like CSCW. Where computer science approach may put the network primary, social science puts the person primary. The goal of this approach is to understand why those links form, what they mean, and how they are utilized. This can be very bottom-up: social psychology, for instance, tends to take the individual as the unit of analysis. It asks questions like, “Why do groups form and split?”

If we look at the original paper that Easley and Kleinberg got the data for this figure from, it says this (emphasis added):

.”

The network is more an aid to the processes being investigated.

When cultures collide, if we naively follow our methodological training, expectations get mismanaged. In her keynote at WWW, danah boyd critiqued the approach that many computer scientists take when they consider network problems:

Many of you are sitting on terabytes of data about human interactions. The opportunities to scrape data – or more politely, leverage APIs – are also unprecedented. And folks are buzzing around wondering what they can do with all of the data they’ve got their hands on. But in our obsession with Big Data, we’ve forgotten to ask some of the hard critical questions about what all this data means and how we should be engaging with it.

Raleigh, North Carolina, April 29.

danah is referencing ethical and privacy questions largely, but there is an even bigger implication for computer science in my mind: we cannot write crowd programs without really knowing what it is that the crowd is doing. Without this, computer scientists will ignore human aspects of our data, and social scientists won’t be able to take advantage of computer science’s toolset.

danah would talk about the de-anonymization of the Netflix dataset. I have another angle on the situation: understanding humans was what ultimately won the million dollars. Basic collaborative filtering techniques can get you so far. But one of the techniques that BellKor’s Pragmatic Chaos used was temporality: it turns out that when people rate a bunch of movies at a time, they tend to be movies that they saw a long time ago. And those kinds of movies exhibit a specific kind of rating pattern. The authors speculate, but I think this has to do with cognitive psychology: that we are much more likely to remember events with high emotional arousal than those without, and more likely to remember remember positive events than negative events.

So it is when we program systems involving networks and crowds. We have a lot of data, and even more interest in that data, as demonstrated by the number of influential and award-winning papers that have been written by the amazing people sitting in front of me right now. When we talk about data, we are fundamentally bridging the attractive networks abstraction and the equally attractive social science abstraction. When we’re successful like BellKor’s Pragmatic Chaos was, it takes us farther than either process in isolation.

I’m a social computing systems builder: I build interfaces that are powered by social data and interfaces that encourage social interaction. To do this well, I have to get this balance right. I want to share with you a few ways in which I’ve been using the social web to develop new tools, and the ways in which we have wrestled with humans and algorithms simultaneously to make them work.

Soylent: A Word Processor with a Crowd Inside

I want to start with a discussion of Mechanical Turk. For years, human-computer interaction researchers have used Wizard of Oz techniques to prototype interactive systems. This technique typically meant having one of the design team members behind a curtain simulating parts of an artificial intelligence that hadn’t been built yet. But, we now have artificial intelligence for hire via services like Amazon Mechanical Turk, where you can pay cents for workers largely in the U.S. and India to perform tasks for you. The Soylent project asks: what happens when you embed those workers inside of an interface — when you have a Wizard of Turk? Can we help end users when interfaces aren’t necessarily bound by AI-hard problems any more, but by humans?

Here are a few preliminary thoughts, which will show up at the ACM UIST conference this year.

We are focused on writing. We’ve learned to write since grade school; it’s the stock and trade of how most of us exchange ideas today. I think we can all agree that writing is hard. Even seasoned experts will make mistakes: non-parallel constructions, typos, or just plain being unclear. If we make a high level decision like changing a story from past tense to present tense or shifting references from ACM format to MLA format, we have to execute a daunting number of tasks. And of course, when we have that 10-page limit and our paper is 11 pages, we spend hours whittling our writing down to size.

Shortn

Let’s take the example of trying to shorten a document by a few lines, via Soylent’s Shortn component. Soylent gets a group of Mechanical Turkers to examine your paragraph, find sections that are wordy or verbose, and propose shorter alternatives. Then we can provide a single slider to let you shorten your paragraph to a desired length. In our evaluations we found that Shortn can cut a paragraph down to about 85% of its original length without making any major cuts to language or content.

Crowdproof

Microsoft Word’s proofreading capabilities are still quite poor — they miss all sorts of problems. But we can get Mechanical Turk workers to skim behind you as you write your document and flag errors that Word misses. They’ll specify why it’s a problem in plain English, as well as some ways to fix it. You can simply click on the underlined text in Word and replace your text with the suggestion.

The Human Macro

There are lots of tasks in Word Processing that would require complex macro programming to complete. Maybe you’re trying to flesh out citation sketches into a real references section, or transform a short story from past tense to present tense — you need to execute daunting numbers of edits over the whole document. Instead, we can simply get Turkers to map over the entire document, executing these tasks for you. In our studies we’ve recruited turkers to find Creative Commons figures to describe a paragraph, find BibTeX for citation sketches like [Bernstein UIST 2010] and change the tense of a story. These requests written by users are unclear, misspelled, and worse, but Turkers can still make sense of them.

Find-Fix-Verify

I can tell you that these features work — but they wouldn’t if you naively asked Turkers to take on the tasks. Empirically about 30% of work on open-ended tasks where we ask Turkers to directly edit the user’s data produce a bad result. This is unacceptable; we can’t just push it on to the user. But since this is human data, we can try to get inside the humans’ heads to see why it’s happening. We’ll illustrate via two personas: the Lazy Turker and the Eager Beaver.

The Laxy Turker tries to get away with as little work as possible. When given an error-filled paragraph from a high school essay site and asked to edit it, the Lazy Turker submits just one minor change to a mis-spelled word. Why did they make this change? Because it was underlined in red in the user’s browser as a mis-spelling. The Lazy Turker wants to signal that they made a change, and do that with as little work as possible.

The Eager Beaver also wants to signal that they made an effort, but they overcompensate. Given the same paragraph, in addition to other edits, this Eager Beaver added newlines between each sentence. This would be problematic to return to the user.

So we can’t just go ahead and give these results back to the user: we need a way to control and channel the efforts of the Turkers. These biases hold across the 9000 Turkers we’ve interacted with for this project. Turkers are looking for a way to signal that they made a contribution: for the Lazy Turkers, this means the smallest noticeable action, and for Eager Beavers, it means overcompensating. The situation seems comparable to the state of programming interactive systems before we had design patterns like Model-View-Controller to proscribe best practices.

We’ve developed one design pattern, Find-Fix-Verify, for tasks like Soylent’s. Find-Fix-Verify splits open-ended tasks into three pipelined stages where each Turker can make a clear contribution.

of the text that multiple Turkers called out.

In the second stage, Fix, three to five Turkers are shown one of the identified patches and asked to provide an alternative. If 30% of the open-ended work is poor as I said earlier, then this is enough to provide at least one good alternative.

In the last stage, Verify, we do quality control. A third set of Turkers sees all the alternatives generated by the Turkers in the FIx stage and flag poor submissions. Each task may have different requirements: for example, in Shortn we want to flag submissions that introduce style errors and those that change the meaning of the paragraph.

Why do we need to split Find from Fix? If we didn’t, then all the Lazy Turkers would fix the same problem, and we’d have wasted a lot of money. By running the Find stage first, we can utilize the Eager Beavers to find patches that the Lazy Turkers would ignore, then force the Lazy Turkers to work on those patches instead. A separate Fix stage also means that we know which edits are trying to fix the same underlying problem.

Why add Verify? Certainly, it adds lag to the process in return for removing poor work. But, we’ve found that the best way to do quality control is to use Turkers to vet the work of other Turkers — to put them in productive tension.

Conclusion

Soylent is just one example of an interface that is powered jointly by a human and an algorithm. We had to recognize the fundamentally human process underlying the technology and adjust our approach to match it.

Moving forward, when we research and teach social networks, we will benefit hugely if we remember:

  1. people: Soylent’s human-crowd algorithms depend entirely on humans;
  2. people: a crowd created the humongous Twitter datasets we use today;
  3. people: the people in those datasets created the data to communicate with each other, not with computer scientists.

Thanks! The Soylent paper will be made public in a week or two.

One Response to “Data by the people, for the people”

  • sigit says:

    “Data by the people, for the people” is very related with one of the Web 2.0 characteristics that is user generated content…

    btw, can’t wait for the Soylent paper…