Collective knowledge systems

I think you could make a strong argument that the most important technologies developed over the last decade are a set of systems that are sometimes called “collective knowledge systems”.

The most successful collective knowledge system is the combination of Google plus the web. Of course Google was originally intended to be just a search engine, and the web just a collection of interlinked documents. But together they provide a very efficient system for surfacing the smartest thoughts on almost any topic from almost any person.

The second most successful collective knowledge system is Wikipedia. Back in 2001, most people thought Wikipedia was a wacky project that would at best end up being a quirky “toy” encyclopedia. Instead it has become a remarkably comprehensive and accurate resource that most internet users access every day.

Other well-known and mostly successful collective knowledge systems include “answer” sites like Yahoo Answers, review sites like Yelp, and link sharing sites like Delicious.  My own company Hunch is a collective knowledge system for recommendations, building on ideas originally developed by “collaborative filtering” pioneer Firefly and the recommendation systems built into Amazon and Netflix.

Dealing with information overload

It has been widely noted that the amount of information in the world and in digital form has been growing exponentially. One way to make sense of all this information is to try to structure it after it is created. This method has proven to be, at best, partially effective (for a state-of-the-art attempt at doing simple information classification, try Google Squared).

It turns out that imposing even minimal structure on information, especially as it is being created, goes a long way. This is what successful collective knowledge systems do. Google would be vastly less effective if the web didn’t have tags and links. Wikipedia is highly structured, with an extensive organizational hierarchy and set of rules and norms. Yahoo Answers has a reputation and voting system that allows good answers to bubble up. Flickr and Delicious encourage user to explicitly tag items instead of trying to infer tags later via image recognition and text classification.

Importance of collective knowledge systems

There are very practical, pressing needs for better collective knowledge systems. For example, noted security researcher Bruce Schneier argues that the United States’ biggest anti-terrorism intelligence challenge is to build a collective knowledge system across disconnected agencies:

What we need is an intelligence community that shares ideas and hunches and facts on their versions of Facebook, Twitter and wikis. We need the bottom-up organization that has made the Internet the greatest collection of human knowledge and ideas ever assembled.

The same could be said of every organization, large and small, formal and and informal, that wants to get maximum value from the knowledge of its members.

Collective knowledge systems also have pure academic value. When Artificial Intelligence was first being seriously developed in the 1950’s, experts optimistically predicted they’d create machines that were as intelligent as humans in the near future.  In 1965, AI expert Herbert Simon predicted that “machines will be capable, within twenty years, of doing any work a man can do.”

While AI has had notable victories (e.g. chess), and produced an excellent set of tools that laid the groundwork for things like web search, it is nowhere close to achieving its goal of matching – let alone surpassing – human intelligence. If machines will ever be smart (and eventually try to destroy humanity?), collective knowledge systems are the best bet.

Design principles

Should the US government just try putting up a wiki or micro-messaging service and see what happens? How should such a system be structured? Should users be assigned reputations and tagged by expertise? What is the unit of a “contribution”? How much structure should those contributions be required to have? Should there be incentives to contribute? How can the system be structured to “learn” most efficiently? How do you balance requiring up front structure with ease of use?

These are the kind of questions you might think are being researched by academic computer scientists. Unfortunately, academic computer scientists still seem to model their field after the “hard sciences” instead of what they should modeling it after — social sciences like economics or sociology. As a result, computer scientists spend a lot of time dreaming up new programming languages, operating system architectures, and encryption schemes that, for the most part, sadly, nobody will every use.

Meanwhile the really important questions related to information and computer science are mostly being ignored (there are notable exceptions, such as MIT’s Center for Collective Intelligence). Instead most of the work is being done informally and unsystematically by startups, research groups at large companies like Google, and a small group of multi-disciplinary academics like Clay Shirky and Duncan Watts.

Share:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Reddit
  • Slashdot
  • Suggest to Techmeme via Twitter
  • Tumblr
  • Twitter
  • HackerNews

Related posts:

  1. To make smarter systems, it’s all about the data
  2. Presenting Founder Collective
  3. The challenge of creating a new category
  4. Bad trend of the week: security questions
  5. Graphs

View Comments

#1 Tweets that mention Collective knowledge systems cdixon.org – chris dixon's blog -- Topsy.com on 01.17.10 at 11:57 am

[...] This post was mentioned on Twitter by chris dixon, nigelwalsh. nigelwalsh said: RT @cdixon: Collective knowledge systems http://bit.ly/7ryPqQ (part 1 in a series…) [...]

#2 David Semeria on 01.17.10 at 5:49 pm

You can break the space into two main groups: ex-ante and ex-post.

Ex-post techniques try to create order from disorder (Google). Ex ante systems (at a stretch Wikipedia) try to provide order beforehand (in reality, Wikipedia pages are semantically messy, contain large blocks of untagged text, and lack consistency between themselves).

Ex-ante comes with a significant user burden – who wants to semantically tag everything they write/do? – but, like all human curation methods, is good at reducing ambiguity. Ex-post is generally painless for the user, but shifts all the semantic burden onto machines (something they're not very good at, and perhaps never will be).

In summary, I would bet on ex-ante systems showing the most future promise…

#3 jazzmann91 on 01.17.10 at 5:55 pm

Thanks for writing on this topic rather than business. :-)

I'm research/designing one of these for opinion and debate, it's also going to be a social game.

#4 chris dixon on 01.17.10 at 6:01 pm

Excellent way to put it and agree systems with at least some ex-ante are the most promising in the future.

#5 Mark Essel on 01.17.10 at 6:42 pm

Thanks for the clarification David.

Resources will funnel towards increased research of machine learning without always requiring supervision, as long as there's a stepping stone or ladder that gives intermediate results (payoffs/benefits). I just don't know if there are rungs between where we're at now, and creating self learning systems.

#6 David Semeria on 01.17.10 at 7:01 pm

Thanks Mark, even it wasn't really a clarification of Chris' post – it's just a simple way of categorizing the space.

Debates on machine learning always tend to veer off into philosophical abstractions – what does mean mean? etc – and so it's always important to stay practical (I'm not referring to you here, just in general).

I really liked Caterina Fake's post on a similar theme from a few days back. I think we both agree that, at least in the short term, progress will come from the innovative application of existing and not particularly ground-breaking techniques.

In other words, advances in UI & UX design which reduce the friction of adding semantic sugar on input. That kind thing: simple tech, but still hard to get right.

#7 Benjamin on 01.17.10 at 7:23 pm

Great post, a few remarks.
Collective knowledge systems are (should) be more about Intelligence Augmentation than about Artificial Intelligence, a point Pattie Maes has repeatedly made.
For a provoking minority view on collective knowledge system and web 2.0 you should read the recent book by Jaron Lanier, “You Are Not A Gadget”.
Or read his edge article entitled Digital Maoism: http://www.edge.org/3rd_culture/lanier06/lanier...
Btw I think his criticism especially applies to meta-sites that aggregate contributions by anonymous users, whereas hunch is trying to build a truly smart collective by putting great value on and connecting the persons contributing.

#8 Knowtu » links for 2010-01-17 on 01.17.10 at 8:04 pm

[...] Collective knowledge systems cdixon.org – chris dixon's blog (tags: knowledge collaboration) [...]

#9 John Stepper on 01.17.10 at 8:30 pm

I agree with you that much more can and should be done to research the questions you posed. And, I'd also like to some of that rigor applied to implementation.

I'm struck by how many mid- to large-sized companies barely make use of what's already available. It seems like only a small fraction of companies I'm familiar with make effective use of search, forums, social equity systems, or expert location tools. There are notable exceptions – typically the same oft-cited ones in books on social media – but there remains a huge opportunity for us to greatly improve productivity implementing what we already have.

#10 chris dixon on 01.17.10 at 8:32 pm

Thanks, Benjamin. I think Lanier is talking about bad collective knowledge systems. Wikipedia for example doesn't “average” people's opinions. It works because people with expertise tend to self-select and edit things they know about. Other systems use explicit reputation systems and other mechanisms (pagerank).

#11 chris dixon on 01.17.10 at 8:33 pm

I totally agree. I guess it just takes organizations like that a long time to appreciate the positive benefits since they aren't as directly measurable as, say, installing a new accounting system.

#12 davidkpark on 01.18.10 at 12:52 am

There's some interesting overlap between what you call “collective knowledge systems” and what Yochai Benkler calls “new open source economics.” Benkler gives a nice overview on TED – http://bit.ly/JWPe6. He talks about how Seti@Home far outpaces (almost doubles) IBM Gene Blue and NEC supercompter, Open Source software has captured 70% of the web backend, etc.

As a response to David Semeria, I would frame the collective knowledge system, not as two separate spaces, but as an interaction between user generated data (collective) and a technical algorithm (knowledge) to structure that data in a meaningful way. For example, as Benkler notes, Google is so useful because it has a collective (millions of web designers creating pages with tags and links) and knowledge (a kick-ass ranking algorithm). Wikipedia can be thought of the same way – a collective (millions of people entering words) and knowledge (a kick-ass algorithm – the brain – that can structure words in a meaningful way).

That was an interesting comment about how computer scientists should model themselves after social scientists such as economists and sociologists. As a professor in the social sciences (and an entrepreneur who is trying to leverage the open source framework), I would replace economists with biologists. I think much of the work you're talking is being done by a variety of disciplines (but not by economists), with the main intellectual anchor being networks (for example you mention Duncan Watts).

As a side note, I was speaking with Robert Frank, the Cornell economist, and he has an interesting prediction about who economists in 100 years will say was the most influential. It won't be Smith, Keynes, Milton, etc. but Darwin. So maybe economists will embrace biologists but not soon.

Again, a fascinating post.

PS I'm not sure if you knew, but the CIA has been using an internal Wiki to share knowledge and foster collaboration – http://bit.ly/6nMe4z. From what I've heard, it's been pretty successful. I'm not sure if they're taking the next step and harnessing (or mining) that information via some algorithm to find patterns, connections, etc.

#13 Curtis Lassam » Blog Archive » Weekend Top 5 - January 18, 2010 on 01.18.10 at 4:54 am

[...] Dixon explores the popularity of Collective Knowledge Systems. This is something I’m interested in. So [...]

#14 Howard Lindzon » Blog Archive » What is Goldman Sachs? on 01.18.10 at 9:50 am

[...] my opinion, Goldman is a dangerous, closed ‘Collective Knowledge System‘ . Having a closed Collective Knowledge System is cool, but I have long not trusted what it [...]

#15 Pascal-Emmanuel Gobry on 01.18.10 at 10:40 am

While your general point about collective knowledge systems is absolutely right, I shudder at the thought of a collective Facebook/Yammer/Wikipedia (BlueKiwi?) for the US intelligence community.

There is a saying among intelligence operatives that the dissemination of information is the _square_ of the number of people who know that piece of operation. The United States Intelligence Community includes 16 agencies, who together employ tens (hundreds?) of thousands of people. Let's guesstimate that out of these 5 to 10 thousand have a high security clearance.

Even if this Intelligence Facebook is open only to them, even assuming total imperviousness to “hard” hacking attacks and zero malfeasant users, it will be a matter of weeks until a social engineering hack or just simple negligence ends up with most of its contents posted on Wikileaks or worse (the Al Qaeda Yammer?).

The worst part is, I am certain that such a collective intelligence system would, if it were feasible to keep it safe, reap tremendous benefits in fighting terrorism. Until it is compromised with disastrous consequences.

Sometimes — sometimes — closed is better than open.

#16 chris dixon on 01.18.10 at 12:59 pm

Doesn't it depend on the information? If the information is a bunch of clues that together suggest a terrorist plot, isn't it better to err on the side of sharing (and disclosing) than undersharing and keeping it secret? Seems like with counter-terrorism where you are basically on defense all the time sharing is better.

#17 Pascal-Emmanuel Gobry on 01.18.10 at 1:18 pm

Yes, except that when the information leaks, terrorists would know *what
we're looking for*, which is like putting up big neon signs around the Death
Star's secret weakness.

(Which is why, incidentally, racial profiling at airports is a bad idea.)

#18 glen_NIXTY on 01.18.10 at 5:40 pm

Love the post. A couple of quick thoughts. I think the emphasis on organizing the knowledge in the process of creating the knowledge is key. Second, you highlight the need for researchers to focus on the social science edge of this. I couldn't agree more and just want to highlight that the field of social psychology is ripe with research that I think is quite applicable to organizing and incentivizing this type of behavior. We (http://www.nixty.com) are actively working on a educational collective knowledge system.

#19 Mark Geller on 01.18.10 at 6:13 pm

Great post, another example of successful collective knowledge systems is markets, in particular stock markets, where users pursue their own self-directed interests and in the process create useful shared data, i.e., company valuations.

#20 Frances on 01.19.10 at 4:48 am

I shall tag my post here as <great>, <informative> and <racist>.
Do you think I'm right? If you reply, it probably means you agree since this post has a reply.

To rephrase: I hope that CS steers as far and clear of today's social sciences as possible. The idea that my categorizations carry some universal meaning (or any meaning for that matter) is entirely false. You might say, we'll mod what you say by your reputation (determined by pagerank / number of tweets / whatever). But this doesn't solve anything apart from identifying whether my arbitrary categorization is shared by (or is amusing to) a wide variety of folks. Complex scenarios that will inevitably arise (cross-referencing, context, circularity, etc etc) shall strip it of any remaining meaning.

Ex-ante, in this classification, I believe, is the only promising path. Unless you have a full, reliable, and conclusive model of “me”, you will run into falsehoods that will make your analysis an easy target for the first buddying statistician.

Disclosure 1: I'm a social scientist, with an interest and some background in CS.
Disclosure 2: <Partially-a-lie> Disclosure 1 </Partially-a-lie>

#21 Michael Shynar on 01.19.10 at 9:50 am

I believe we'll see a conversion of the ex-ante and ex-post methods as time goes on. One interesting way this can happen is by user-approved automatic structuring. This saves the user the time of manually tagging data and is, in a way, moving from an essay-based exam to a multiple-choice one.

For example, in my blog I use an extension called Zemanta, which offers various kinds of automatic linking and tagging. I accept less than %10 of what it offers me, but those %10 add more structure to my blog posts.

#22 Collective knowledge systems | Igniting Startups - nPost on 01.19.10 at 11:11 am

[...] From cdixon.org [...]

#23 Entropy and the Future of the Web « Collective Web on 01.19.10 at 3:51 pm

[...] by this post of Chris Dixon, I summarized my thoughts on the future of the web in a single tweet like this: The [...]

#24 ronald on 01.19.10 at 3:58 pm

I think the problem has deeper roots. Our Information theory is rooted in physics. Try to TEACH any system the meaning of “all” and it's usage (knowledge) after you have a system which can do that. Can it learn Trust?
We have no agreed upon def for this context of what Information is and how it can be tested. Same goes for Knowledge and …

What is learning? How do we test it? Just memorization? In/ex-cludes building of abstractions (see above)? How important is learning, or is just following some obscure rules,with no understanding (what's that) good enough?

Or for the CS challenged, build a massive parallel system without locks, you will need that for “all”. For a little more advancement let it do it's own decomposition, which means it should be able to LEARN math. If you think we are born with some magical math algorithm, please explain one-two-many cultures.

After that we can talk about Information and Intelligence.

#25 David Semeria on 01.19.10 at 8:24 pm

Good points. I argued for similar “low tech” innovations in UI & UX design in a reply below.

The goal is to obtain (at least some) human curation with the lowest possible friction.

#26 idogreen on 01.26.10 at 11:44 pm

Thank you for the good post.
I think this type of innovation will start bottom-up by nature (e.g. start ups and informally). In big/lazy/heavy organization people/developers/and other Phds like to work on problems that are 'pure' and not tie to the surface of the real world.

#27 What is Goldman Sachs? « Social Leverage on 01.29.10 at 6:51 pm

[...] my opinion, Goldman is a dangerous, closed ‘Collective Knowledge System‘ . Having a closed Collective Knowledge System is cool, but I have long not trusted what it [...]

#28 Information Overload + Architecture | Architecture and Anthropology on 02.01.10 at 1:44 am

[...] Collective knowledge systems (cdixon.org) [...]

#29 Can Search Discover the Spark of Life? » Victus Spiritus on 02.06.10 at 4:16 pm

[...] Collective knowledge systems (cdixon.org) [...]

#30 Internet Strategy for News Organisations » Blog Archive » Course Syllabus on 06.03.10 at 2:20 am

[...] Collective Knowledge Systems, Chris Dixon [...]

#31 Internet Strategy for News Organisations » Session 4: Software for communities on 06.09.10 at 12:53 pm

[...] Collective Knowledge Systems, Chris Dixon [...]

#32 Notes from the RWW Real-Time Web Summit | outside.in blog on 06.16.10 at 10:00 am

[...] “crowdsourced” was pejorative. He also mentioned a blog post by Chris Dixon (definitely this one) that had posited that the most important startups in the past decade had been based on collective [...]

blog comments powered by Disqus