# Chris Dixon

Graphs

It has become customary to use “graph” to refer to the underlying data structures at social networks like Facebook. (Computer scientists call the study of graphs “network theory,” but on the web the word “network” is used to refer to the websites themselves).

A graph consists of a set of nodes connected by edges. The original internet graph is the web itself, where webpages are nodes and links are edges. In social graphs, the nodes are people and the edges friendship. Edges are what mathematicians call relations. Two important properties that relations can either have or not have are symmetry (if A ~ B then B ~ A) and transitivity (if A ~ B and B ~ C then A ~ C).

Facebook’s social graph is symmetric (if I am friends with you then you are friends with me) but not transitive (I can be friends with you without being friends with your friend).  You could say friendship is probabilistically transitive in the sense that I am more likely to like someone who is a friend’s friend then I am a user chosen at random. This is basis of Facebook’s friend recommendations.

Twitter’s graph is probably best thought of as an interest graph. One of Twitter’s central innovations was to discard symmetry: you can follow someone without them following you. This allowed Twitter to evolve into an extremely useful publishing platform, replacing RSS for many people. The Twitter graph isn’t transitive but one of its most powerful uses is retweeting, which gives the Twitter graph what might be called curated transitivity.

Graphs can be implicitly or explicitly created by users. Facebook and Twitter’s graphs were explicitly created by users (although Twitter’s Suggested User List made much of the graph de facto implicit). Google Buzz attempted to create a social graph implicitly from users’ emailing patterns, which didn’t seem to work very well.

Over the next few years we’ll see the rising importance of other types of graphs. Some examples:

Taste: At Hunch we’ve created what we call the taste graph. We created this implicitly from questions answered by users and other data sources. Our thesis is that for many activities – for example deciding what movie to see or blouse to buy – it’s more useful to have the neighbors on your graph be people with similar tastes versus people who are your friends.

Financial Trust: Social payment startups like Square and Venmo are creating financial graphs – the nodes are people and institutions and the relations are financial trust. These graphs are useful for preventing fraud, streamlining transactions, and lowering the barrier to accepting non-cash payments.

Endorsement: An endorsement graph is one in which people endorse institutions, products, services or other people for a particular skill or activity. LinkedIn created a successful professional graph and a less successful endorsement graph. Facebook seems to be trying to layer an endorsement graph on its social graph with its Like feature. A general endorsement graph could be useful for purchasing decisions and hence highly monetizable.

Local: Location-based startups like Foursquare let users create social graphs (which might evolve into better social graphs than what Facebook has since users seem to be more selective friending people in local apps). But probably more interesting are the people and venue graphs created by the check-in patterns. These local graphs could be useful for, among other things, recommendations, coupons, and advertising.

Besides creating graphs, Facebook and Twitter (via Facebook Connect and OAuth) created identity systems that are extremely useful for the creation of 3rd party graphs. I expect we’ll look back on the next few years as the golden age of graph innovation.

Great post, Chris. There will be even more graphs, for sure. One that doesn't get much attention is the one organized around advertising. I call it the “brand social graph”, which gets assembled based on interests, influence and where natural brand affinities exist. I blogged about it here: http://dpakman.wordpress.com/2010/04/28/the-bra

• http://iamnotaprogrammer.com/ Colin Nederkoorn

I've been thinking recently about graphs as they apply to ChallengePost.com. A lot of sites that have sprung up in the past few years shoved in “Add your friends” even when it didn't make sense. I guess it makes it easy to be viral to invite people to mass add their friends. In many cases, my friends don't care about the same things I do. Twitter is a great example of mixing friends and interests. The mush between my ears is starting to churn again. Good food for thought!

• http://blog.nahurst.com/ Nathan Hurst

It will be interesting (maybe not useful) to see how reflexivity (A ~ A) starts to play into these graphs. Example: if a financial trust graph was symmetric, non transitive, and non reflexive (you don't trust yourself), those you trust would have to approve financial transactions for you (preventing you from buying too many gadgets maybe).

• CJ

Interesting post Chris… though this begs the question, are there going to be graphs catered across several verticals or is there going to be just one core graph with dimensions in each of those areas?

You could argue that Facebook's social graph is going to make serious headway into Taste, Financial Trust, Endorsement, and Local.

Time will tell though I guess.

• Warren

Great insight to all! At InfiniteGraph I try to think about where things are today, and where its going. Some of the graph examples such as Twitter are so simple in terms of leveraging the edges (two vertices, one edge…yaaawn). But if you see the value of looking beyond a second vertex, what do you see? And if that's interesting, then what's beyond that? This type of thinking makes better use of graph database technology, and that's definitely interesting stuff. Thanks!

• Pingback: Links for 7-22-2010 | Sue Cline

• Lateefivy

I like the “highly monetizable” part. I'll build an endorsement graph.

• http://www.venturevoice.com gregory

Chris, Great post! We're building something like what you describe in endorsement graph at http://Shoutworthy.com

What do you call it when graphs mix? Hunch, as an example, lets me sign in with Twitter/Facebook but then offers its own follow graph and, as you point out, taste graph.

• http://hdemott.wordpress.com Harry DeMott

Interesting. I always viewed the “social graph” as a sort of digital residue – “I went here, I went there, I liked this movie, great gelato here, hey where are you drinking tonight” Not all that unlike the WW II moniker “Kilroy was here” seen later by folks passing through. The graphs are just an organizational framework thrown over the residue: currently feeding on and off each other (let me check my share on twitter and facebook boxes below) – and generating some level of utility for the users and an enormous amount of success for those who succeed in organizing a successful graph. Most media is like this – magazines are interest graphs like twitter – completely unidirectional, unless you see an article you like and pass along your copy of the magazine, not highly efficient like a re-tweet – but the same principle. To me, the question is how to pare down your use of graphs so that you get everything you need – and some random interesting stuff without drowning in inputs. To me, twitter, facebook, foursquare etc.. are all interesting and highly useful graphs – but they present the data in a less than efficient way. However, I was blown away by flipboard – which can present a lot of this data in a far more compelling and easier to consume manner.

• http://earnedmedia.wordpress.com/ Christian Brucculeri

I'm not sure who these people are:

“This allowed Twitter to evolve into an extremely useful publishing platform, replacing RSS for many people”

I know for me, Twitter moves too quickly to replace RSS. Also, and probably more importantly, people can't be contained in a single topic– so you end up with so much noise when all you want is topical content.

For example, I really enjoy this blog: it's got it's own module on my iGoogle. But, I would never put your Twitter feed in that space. Not because I don't enjoy your tweets, I just wouldn't want that type of content to take up that much real estate in my brain.

• http://www.cdixon.org chris dixon

I still use RSS but it's mostly for “what did I miss yesterday” instead of my primary newsreading software which has become Twitter.

• http://earnedmedia.wordpress.com/ Christian Brucculeri

You've inspired me to optimize my Twitter feed.

• tomewing

As people's awareness of their graphs and the importance of them grows I suspect one of the next developments will be giving users more control over them – not just at the “who I follow” level but at a more macro level too.

For instance, privacy controls are a way of managing a social graph, but they only affect transmission of information. In a more graph-literate culture though it's easy to imagine tools that let you manage reception better: not just via filtering, groups, recommendations but by eg. letting you adjust the noise/serendipity level of a flow of info, or the ratio of information to personal updates, or even the mood of the updates you're getting – a kind of social thermostat.

• http://www.leftbraintorightbrain.com/ Scott Carleton

Thanks for clearing up where the term 'graph' came from in this context.

Very interesting post. How do you think the big 3 GAMe players (Google, Apple, Microsoft) are doing with Taste, Financial Trust, Endorsement and Local?

• http://www.cdixon.org chris dixon

They seem to be kind of out of the game for now…

• Marc Hedlund

Facebook has implicitly transitive friend sharing – my comments show up for you if we have a friend in common, even though I was talking to my friend.

Flickr did asymmetric friending before Twitter did; I'd thought Twitter learned it there.

I think you're putting theory before practice on “other types of graphs” (and my theory is practice always comes first – Facebook was a way to hook up long before it was called a social graph). But the ideas are interesting. I hadn't thought of Square as a trust system even though it's obvious once you read it.

• JMcEntire

I like that “Post as …” requires me to type something so that we differ the value proposition until later thus making me decide between the effort/security concerns of logging in versus the loss of the effort involved in writing this comment. Talk about a UX anti-pattern.

Anyway. Graphs aren't new or being innovated, imo. They're old and some people (Web 2.0 types, execs, CS nerds, et cetera) are currently in the process of understanding them and recognizing them. In fact, I've been waiting and hoping for a while now that someone would come along and do what Facebook/LinkedIn/Twitter/Foursquare/et al ought to have been all along. That is to say: be a proactive social networking website. These insights you've noted here aren't new, they're just unexplored because people aren't prepared culturally to recognized the possibilities of the underlying mathematical models from graph theory.

Similarly, NoSQL isn't new. Codd mentioned almost all of the solutions we see today in his white paper. But, no one has read it and so these “new, innovative” solutions get a lot of press. C'est la vie.

tl;dr – someone build a “social networking” website that doesn't tell me what I told it or what my friends told it. Build one that develops a profile of its users in n-space and analyzes that data to suggest new things. We didn't learn about Trogdor the Burninator by chance. People with similar tastes in those sorts of things told us about him. Why couldn't Facebook have done that? Imagine logging on and being told about a band playing at pub near the office that's hosting your 4:30. You've not heard the band or been to the pub; but, based upon your tastes, the site can recommend it with an 82% certainty. Oh, and try their house-specialty drink, it's right up your alley.

• tetris

At the risk of being too self-promoting, the predictive stuff you talk about is exactly what we are trying to do at Hunch.

• http://www.cdixon.org chris dixon

at the risk of being self-promoting, the predictive stuff you describe is exactly what we are trying to do at Hunch.

Excellent post. A graph database is awesome if you want to work with graph-shaped data. Here's a short Scobleizer interview that introduces graph databases and shows how you can take a social graph (people <-> people) plus a geo graph (places <-> places) and evolve it into an LBS graph (people <-> places):

I agree that these are the years of graph innovation: there's lots of high-value use cases emerging right now where the value is in the *connection* between things, not the raw data itself. A graph database like http://neo4j.org can traverse 1-2 *MILLION* hops in a graph per second. You can discover a lot of interesting patterns and relationships in your data with that kind of backend.

-EE

Chris,

I've been pretty skeptical of “social graph” and related concepts, as per http://www.dbms2.com/2010/06/08/profile-of-reve… . In essence, I think the parts of the data that are most naturally represented as a graph are only a relatively small part of the whole, which in the ideal case I called a “Profile of Revealed Preferences.”

• http://www.3scalesolutions.net stevenwillmott

Great post. I'd add another graph – “Influence” or “social power” which affects how information/ideas propagate. In way this is already present in the twitter, facebook, hunch graphs and there are different dimensions of influence (I might influences by someone's fashion taste, but not their politica views).

Another one which we deal with poorly on the web right now is time: we almost always perceive the Web as in a state of “now” and there are some resources which talk about the past (wikipedia articles on Charles Darwin for example), but almost all information sources don't even publish the date of a story on the page. However, hopefully this will change and this will make it possible to construct graphs of the flow of ideas / events across time which is very hard to do now.

• http://www.victusspiritus.com/ Mark Essel

One of my driving interests is custized search agents. Hunch has endeavored to scale taste into mssive cluster based decisions (abductive reasoning). Curious too see if there are social apps waiting to be powered by Hunch. Extreme echo chambers hehe.

• amolsarva

Feels like graph is “just” a new name for behavioral data? Reaching back: Netflix and Amazon collaboratively filtered recommendations, Doubleclick/Netgravity clicktrail-targeted ads, Nielsen TV habits, direct mailer algorithms…

I do agree that it keeps getting cheaper and easier to collect and compute data about people.

What's neat about Hunch is you are collecting completely new data

Here are some new kinds of data we can collect now
- where you are (with check in or connected gps)
- what you say (email, text, tweet)
- what you read (browser, ebooks, email text)
- what you consider buying (browser)

Must be some cool “graphs” to come from that too

• Pingback: It’s What You Node « Vodpod Blog

Great breakdown of some of the most relevant graphs. I think the current wave of activity around graphs is to understand them first as static structures to get at the meaning of relationships and suggest new, potentially value-adding relationships. Expect a later wave of activity to look at graph dynamics to better understand how graphs grow over time and in response to certain inputs and events. Which events have the greatest impact per unit time? What are the strategies for most quickly expanding a graph at low cost?? Exploring cascades, graph velocity, expansion patterns etc. will lead to valuable insight into behavior and influence.

• georgi

I don't agree with the lack of transitivity on Twitter. After all, that's what retweets are all about – propagation across the graph.

• Pingback: Facebook 想要你所有的社会化图谱 | SocialBeta

Your blog just came in time! We have included 3 out of the four graphs that you have mentioned in our product – Taste, Endorsement and Local. Feel free to try http://www.myBantu.com and give feedback. Will be glad to share what we are doing, if you are interested – Raman Suprajarama

• http://www.graduatetutor.com Senith @ MBA tutor

Excellent way to look at these services. Explains the value of many services I took for granted in a new way! Thank you

• http://www.graduatetutor.com Senith @ MBA tutor

The nature of the activities decide the types of graphs that are possible. Some activities lend itself to graphing and some dont. I run an online tutoring service targeted MBA students. Being competitive in nature, most want to get ahead in their class or impress others and so tend not to want to share the news of a service like http://www.graduatetutor.com. So unless I am missing something, I think it is very difficult to build a graph for a service like this. If anyone has ideas, we are all ears

PS: Some segments of our customer base do not mind sharing our service for example we also tutor executives or professionals trying to build a specific skill like financial modeling, forecasting, etc and we notice that this segment does refer other friends or colleagues.

• http://homeloanninjas.com homeloan_ninja

wow. very nice post. i have read it twice without leaving the page.

i agree about twitter. it has become my primary source of news. and excellent point on people's associative behavior on local graphs. i never really looked at it that way, but it makes sense. i'd rather befriend a weirdo halfway around the world that one in my own back yard.

• David Frankel

Outstanding analysis Chris

• joepistell

The Great Wall of Social Media is upon us.

We all have a finite amount of time to participate with SM. “I'm the mayor of Bill's Deli” or “I'm in a taxi going west” works in A~B relationships, but the greater the “degrees of separation” the more Social media's output becomes spam to the viewer.

In A~B's “I'm at JFK” works. But if C is a road warrior, A gets SM fatigue with “I'm at MCO”, followed by “I'm at JFK” followed by “I've earned a new badge at the Mariott”. For A~C, 4sq is 88% SPAM.

Did I mention how little I look at tweets lately because of 4sp spam?

The Great Wall of Social Media is upon us.

• Justin Meltzer

Very interesting post. I took a class last year at UPenn with professor Michael Kearns called Networked Life that deeply explored Network Science. We learned about different graph creation models to study certain properties of these graphs, such as the average diameter (the average path length between every two pairs of nodes, and a measure of how connected the graph is), the clustering coefficient (the average across the fraction of each node's neighbors that are connected to each other, and a measure of how clustered the graph is), and the existence of a large connected component. Real life network graphs, such as the western states power grid, the nervous system of the C. elegans worm, the Kevin Bacon graph, and even social graphs like the one found on Facebook seems to display a small average diameter and a high clustering coefficient. Even the six degrees of separation experiment shows us that social connections throughout the U.S. extend far, and that two seemingly disconnected people are really only a few social hops away. A small average diameter and a high clustering coefficient, however, seem at odds with one another. A graph with a lot of random connections would tend to allow for a smaller diameter, while a graph with a lot of connections being formed between friends who already have common friends would lead to a high clustering coefficient, but not necessarily decreasing the average path length throughout the entire graph. However, a mathematical model called the Alpha model was created to show the simultaneous existence of these two properties that are seemingly at odds with one another, yet exist in many real-life networks. Other networks that network science can help explain are the internet (already mentioned by Chris), the router system extending throughout the U.S., and trading networks among countries. It was a very interesting class.

• http://www.perfectoled.com led billboard

Great post

• Blaine Cook

Nicely put. If you haven't already, take a look at what we're doing with Webfinger, PubSubHubbub, and related tech; basically, the idea is to move the graphs out of companies' websites and into the “internet”.

In my mind, this transfer would make sites like Hunch and Foursquare *more* monetizable. Network effects mean that if your service is competitive on its merits, rather just on the extent to which you can extract a monopoly on the social graph in your field, then your ability to extract value from the part of the network that you control is far greater.

Take, for example, SMS in the US. Before the carriers allowed their customers to text the customers on other networks, SMS was essentially zero-value. After they “federated”, the value of SMS was immense, particularly compared to costs.

There are plenty of other examples from the past, but it's worthwhile to perform thought experiments with an eye to the future. What does Foursquare look like when its graph is shared with Gowalla?

• Pingback: Recent Bookmarks

• ali0482

I know for me, Twitter moves too quickly to replace RSS. Also, and probably more importantly, people can't be contained in a single topic– so you end up with so much noise when all you want is topical content.

Spoken like a true Valley VC.

Users don't create graphs, they interact and the software that tracks their interactions infers relationships that are by their very nature backward facing. Shaping the value of the interactions by calling them graphs or behaviours in reality provides value to the host or software vendor usually in order to justify fees to users or advertisers or both.

It's a simple trade, use of a platform for tracking you like a mouse in a maze until the data justifies the exit price.

The platforms have provided value though in Silicon Valley all software is a commodity including Hunch.

Software usually provides two forms of value, process acceleration and compliance and Hunch has some room to add more value in both of these areas. The open question however is whether Catherine's camera purchase works for me if I ask the same questions. I'd say probably not as my story, the processes that I need accelerating, my software, brand preferences and the way I measure must be different.

Here is a Semantic SEO map of the way people and Search Engine spiders view a few of Hunch's Twitter feed — Dog Food, Wine, Real Estate, Dating, Gay Marriage and Bread.