Monday, January 09, 2006

Summing collective ignorance

A month ago, when I was talking about Yahoo Answers, I wrote:
A popularity contest isn't the best way of getting to the truth.

People don't know what they don't know. Majority vote doesn't work if people don't have the information they need to have an informed opinion.
Today, I saw Nathan Torkington's post on O'Reilly Radar, "Digging the Madness of the Crowds":
Steve Mallett, O'Reilly Network editor and blogger, was very publicly accused, via a Digg story, of stealing Digg's CSS pages. The story was voted up rapidly and made the homepage, acquiring thousands of diggs (thumbs-up) from the Digg community along the way. There was only one problem: Steve didn't steal Digg's CSS pages.
Take a majority vote from people who don't know the answer, and you're not going to get the right answer. Summing collective ignorance isn't going to create wisdom.

See also my previous post, "Digg, spam, and most popular lists".

On a side note, the O'Reilly Radar story mentions a site called Pligg. Like de.lirio.us for del.icio.us, Pligg is a free open source clone of Digg.

Update: There is a good discussion going on in the comments to this post.

Update: Yahoo Answers PM Yumio Saneyoshi and Yahoo My Web Community Manager Matt Stevens dropped by and left comments on this post. Well worth reading their thoughts.

12 comments:

Costas said...

Although I agree with you Greg, you may find this Economist article interesting. Basically given little information people will come to the correct decision more often than you may think...

Tejaswi said...

Isn't the same problem plaguing Wikipedia? How do we know anything on Wikipedia is authentic.

Their claim that popularity means authenticity, and if blatantly wrong information is published, and is a part of a rumor, or a chinese whisper, or is subtle enough not to be corrected, it will stay forever.

This is a social science problem that needs to be tackled at multipled fronts.

Pete Cashmore said...

Greg,

This isn't really true. In fact, ignorant people can make good collective decisions because their errors cancel out. The problem isn't that the wisdom of crowds doesn't work, it's that Digg doesn't reflect the wisdom of crowds as described by James Surowiecki. More here:

Digg and the So-Called Wisdom of Mobs

Greg Linden said...

Thanks for that link, Pete. Wisdom of the mob. Heh, heh.

I'm essentially trying to make the same point as you, though you may have made it more clearly than I. Majority vote may extract the wisdom of the crowd, or it may only extract the groupthink of the mob. Hiding the votes before people have voted might help a little, but I think there has to be a better, more reliable ways to extract the wisdom from the crowd.

I think what you want to do is attempt to identify experts in the crowd, people with the necessary information to make the decision, and weight their opinions more heavily. Slashdot discussions, Amazon customer reviews, many sites have early attempts at this, but obviously a lot more work needs to be done.

Thanks, Costas, I did see that Economist article. It was good. If you liked that and want to dig in deeper, you might be interested in David Heckerman's excellent tutorial on Bayesian networks. Fun stuff.

Pete Cashmore said...

Greg,

You've missed the point. The *whole point* of the wisdom of crowds is that the collective wisdom of a group can be *as good or better* than an expert or team of experts. For instance, instead of hiring consultants ("experts" by another name), companies should set up internal prediction markets so their employees can vote on issues relevant to the company. Picking out "experts" is the last thing that Digg should do - this may actually lead to a less accurate/truthful result. Instead, they need to remove the groupthink from the system.

One last point: is it actually a good thing for Digg if the stories are accurate and truthful? If the community cannot see how others are voting (necessary to prevent groupthink), then the community aspect becomes weaker. So while these measures might limit groupthink, it seems to me that it's actually a shared sense of identity that holds the Digg community together. To put it another way, Digg is more likely to survive precisely because of the inherent groupthink.

More at wikipedia.

Greg Linden said...

Pete, I think we're agreeing here.

I shouldn't have used the word "expert". That sounds like I'm referring to one person in the crowd. I meant making some effort to isolate the people in the crowd with an informed opinion.

Internal prediction markets work for exactly this reason. If you go to the Iowa Electronic Market and plop down cash, you likely have a good reason to be asserting that X is true. There is a cost to making a bet, so only people with informed opinions make bets.

Great point on how groupthink might help Digg succeed. There certainly are news outlets that have benefited from creating a strong community with groupthink.

b7j0c said...

digg corrected itself, and quite quickly. you can see posters in the comments section of the original story correctly remarking on pligg within minutes of the post.

there is a feedback mechanism. it works. please stop thinking democracies aren't informed, this is the elitist technocrat thinking that has proven again and again to be a deadend.

Pete Cashmore said...

Greg,

Understood. So perhaps "editor" is a better word than "expert". Better still: "trusted people". I'm not opposed to subscribing to "trusted people", rather than topics. In fact, if you subscribe to a member of del.icio.us you generally get better results than subscribing to a tag - what's more, it's spam free.

But there's a tendency (risk?) that we're just trying to reintroduce the old-media hierarchies into this new "democratic" system. If there are to be editors, it needs to be the case that anyone can be an editor, and users can subscribe to whichever editor they choose (which might be what you're suggesting). If there are a limited number of editors, then you re-introduce the problems that new-media is trying to solve (ie. news is dictated from above by a select number of people). In fact, the word "editor" is so strongly associated with hierarchies that I'd prefer to avoid it altogether.

Like I say, I'm not suggesting that Digg should necessarily change - it may indeed be the ideal system if the aim is to foster a close community. But if you wanted to create a new system that was less prone to groupthink AND avoid the introduction of traditional hierarchies, you might consider some of these ideas.

PS. As you probably know, monetized prediction markets are more accurate, but setting one up means tackling all kinds of legal issues - it's something I've been considering for a while. I spoke a bit about this in my post on Smarkets.

Yumio said...

Hello there Greg. I'm the Product Manager for Yahoo! Answers, so I've given this issue a lot of thought (and am still thinking...).

As you've seen on Yahoo! Answers, we've erred on the side of "trusting" the crowd to largely answer people's questions correctly.
First, the assumption with Yahoo! Answers is that many "ordinary" people know things (about cars, travel, pets, books, etc.) that would be useful to other people.
Second, the cost of providing this information to others is very low.
So we don't necessarily need to rely on an army of experts to make this site useful.
In terms of worrying about the accuracy of answers/posts, we share the same problems with Wikipedia, Digg, etc., but consider what incentives "most" people have to post something false "on purpose". Without money, fame, or a political agenda involved, there isn't much incentive for this, except the few malicious trolls and malcontents out there. So Yahoo! Answers is definitely a work in progress, but I think this is just the beginning...

Greg Linden said...

Hi, Yumio. Good to hear from you!

It's true that the cost of posting an answer is fairly low and the benefit of posting false answers is negligible, but there is also little benefit from posting correct answers, especially if figuring out the correct answer requires any work.

It gets back to the filter you want. Prediction markets work because they impose costs and benefits that filter for people with information.

The current filter on Yahoo Answers will likely filter out experts (since their time is valuable) and well written, correct answers (since those take more time to do). Yahoo Answers likely will favor bored people who quickly throw down an answer to a question without much thought or effort.

More on that in my previous post, "Yahoo Answers and the wisdom of the crowd".

Matt Stevens said...

Greg, I'm the co-Community Manager of Yahoo! Answers. Since Yumio isn't available to respond today, I'd like to continue his dialogue with you.

You are correct to focus on the importance of Answers' point system. We're well aware that our system of "incentives" plays a major role in shaping the behavior of our users. I'm putting "incentives" in quotes because, although we are planning certain "special thank-yous" for high-scorers, we're not offering anything of significant monetary value. We just give points.

Now, these points aren't negotiable for anything. They don't even have significance anywhere else on Yahoo! So, why do people care?

People value points because they greatly value the respect of others. Knowing this, we've structured Answers in such a way that points will both:

(1) reflect the respect of those who know about the poster from viewing his content firsthand, and
(2) create comparable respect in people who haven't yet been able to personally view his content.

This linkage is very important to us. To maintain a high-quality community, we want to reward people for good

(1) content (questions and answers),
(2) behavior, and
(3) other meaningful participation (voting, thumbs-up ratings, etc.)

As Answers grows beyond what one individual can read, the best way we can provide recognition is to highlight good users. For this purpose, points need to be an accurate proxy for the three items above.

During this first beta, we've been closely watching how our initial crack at a point system has played out. Overall, it has worked quite well--because our users have been more thoughtful and rational than you'd feared. Unless the initial question was smart-assed, jokey or (frankly) stupid to begin with, voters and askers alike have tended to deprecate sloppy and cursory responses. Complete and well-reasoned answers are preferred.

We reflect the importance of this by giving 10 points for having your answer chosen as the best. This dwarfs any other reward in our system. Also, best-answers get the chance to keep on earning points--50 points each, with one point for each favorable "thumbs-up" rating given by future readers. Crappy answers get zip.

So, there are major incentives to do good work. Even so, we have plans to enhance them. We publicly have sought the users' opinion about adjustments to our beta point system that will make asking a question point-neutral (as long as you return to pick a best answer; if you blow it off, you'll actually lose points for asking), so that there's no incentive to flood the system with duplicative or dumb questions just to scam some points.

Under that scenario, the incentive for people to ask questions will be--wait for it--their desire to know the answer! The main sources of points will be writing good anwsers, voting for good answers, and giving props to finalized good answers.

As Yumio says, Y! Answers is a work in progress. However, we think the users have made a hell of a good start.

zby said...

My understanding of the Wisdom of Crowds (I need to add I did not read the book - this is based on what I've read about it on the internet) is that it is good for things where the errors conceal each other (like predicting the number of stones in a bowl - the underestimations conceal the overestimations). This is well based in the probabilistic theory. But does it work in other situations?