Chris Dixon

There are two ways to make large datasets useful

I’ve spent the majority of my career building technologies that try to do useful things with large datasets.*

One of the most important lessons I’ve learned is that there are only two ways to make useful products out of large data sets. Algorithms that deal with large data sets tend to be accurate at best 80%-90% of the time (an old “joke” about machine learning is that it’s really good at partially solving any problem). Consequently, you either need to accept you’ll have some errors but deploy the system in a fault-tolerant context, or you need to figure out how to get the remaining accuracy through manual labor.

What do I mean by fault-tolerant context? If a search engine shows the most relevant result as the 2nd or 3rd result, users are still pretty happy. The same goes for recommendation systems that show multiple results (e.g. Netflix). Trading systems that hedge funds use are also often fault tolerant: if you make money 80% of the time and lose it 20% of the time, you can still usually have a profitable system.

For fault-intolerant contexts, you need to figure out how to scalably and cost-effectively produce the remaining accuracy through manual labor. When we were building SiteAdvisor, we knew that any inaccuracies would be a big problem: incorrectly rating a website as unsafe hurts the website, and incorrectly rating a website as safe hurts the user. Because we knew automation would only get us 80-90% accuracy, we built 1) systems to estimate confidence levels in our ratings so we would know what to manually review, and 2) a workflow system so that our staff, an offshore team we hired, and users could flag or fix inaccuracies.

* My first job was as a programmer at a hedge fund, where we built systems that analyzed large data sets to trade stock options. Later, I cofounded SiteAdvisor where the goal was to build a system to assign security safety ratings to tens of millions of websites. Then I cofounded Hunch, which was acquired by eBay – we are now working on new recommendation technologies for ebay.com and other eBay websites.

Increasing velocity

Two common discussions in the startup world right now are 1) the increasing speed at which new apps/websites can gain mass adoption (Instagram, Pinterest, OMGPOP’s Draw Something, etc), and 2) the rise in seed stage valuations. These two trends are real and related.  An investor with a broad portfolio of companies might rationally invest at an average valuation of, say, 10m (which is historically considered very high for that stage) if they have a chance for one of the investments to become the next Instagram or Pinterest. A billion dollar hit pays for a lot of misses.

The increasing velocity has implications for the valuations of incumbent tech companies. Users have limited time, and while web and app usage are growing, hit startups are growing much faster and therefore gaining adoption, at least in part, at the expense of incumbents. It’s not clear this risk is priced into the valuations of companies like Facebook (P/E expected to be ~100) and Zynga (P/E ~31). In other words, faster velocity should lead to a narrower distribution of valuations from seed to late stages. We’ve seen the seed stage adjust but not the late stage.

The current posture of big VCs seems to be to wait to see what takes off and then chase the winners. Tons of investors tried to invest in Instagram’s A and B rounds, and I’m sure VC interest in Pinterest is intense.

The problem with this model of Series A and B investing is that, in reality, many of the companies with big hits weren’t overnight successes. Pinterest, OMGPOP, Twitter, and Tumblr were around for years before taking off and all benefited greatly from having patient investors. In the current financing environment, a lot of good companies won’t live to get Series As and Bs and big VCs will pay valuations on hits that are priced to perfection.

Increasing velocity is great for users and for the winning companies and investors. But when good companies aren’t getting follow on rounds because they aren’t yet “hockeysticking”, the long term health of the startup ecosystem suffers.

Seriously, what’s up with old media not crediting bloggers?

From my March 16 blog post “The myth of the overnight success“:

Angry Birds was Rovio’s 52nd game. They spent eight years and almost went bankrupt before finally creating their massive hit. Pinterest is one of the fastest growing websites in history, but struggled for a long time. Pinterest’s CEO recently said that they had “catastrophically small numbers” in their first year after launch, and that if he had listened to popular startup advice he probably would have quit.

Fast company on April 3, the opening of “The dirty little secret of overnight success“:

Angry Birds, the incredibly popular game, was software maker Rovio’s 52nd attempt. They spent eight years and nearly went bankrupt before finally creating their massive hit.

Pinterest is one of the fastest-growing websites in history, but struggled for a long time. Pinterest’s CEO recently said that it had “catastrophically small numbers” in its first year after launch and that if he had listened to popular startup advice he probably would have quit.

No link or attribution.

Update: Thanks to Fast Company for a fast response and changing it to a quote with citation.

Facebook’s response to Yahoo’s patent lawsuit

Like many in tech, I believe all software patents should be abolished. That said, I think Facebook made the right move by filing a lawsuit against Yahoo’s patent attack.

As I see it, Facebook had 4 choices:

- Settle. Given their pending IPO, this would have been the easiest route. But, by rewarding Yahoo, settling would have encouraged more frivolous patent lawsuits.

- Defend without countersuing. On the surface this would have been the “principled” stance, but it would have severely weakened their legal position, and therefore would have made it more likely that Yahoo profited from the lawsuit.

- Countersue without signaling any aversion to patent lawsuits.

- Countersue and signal that they are averse to patent lawsuits, which in turn signals that they will drop the lawsuit if Yahoo does. This seems to be what Facebook has done:

“From the outset, we said we would defend ourselves vigorously against Yahoo’s lawsuit,” Ted Ullyot, Facebook’s general counsel, said in a statement. “While we are asserting patent claims of our own, we do so in response to Yahoo’s short-sighted decision to attack one of its partners and prioritize litigation over innovation.” [emphasis added] – NYTimes

Countersuing gives Facebook the best chance of fending off Yahoo’s lawsuit – and therefore not rewarding patent lawsuits. And signaling they are only doing so in response to Yahoo (hence might drop the suit if Yahoo does) keeps them on the right side of innovation.

Revisited: big VCs investing in seed rounds

A few years ago, the trend of companies raising smaller seed rounds combined with the emergence of new seed funds caused many big VCs to create seed investment programs. This triggered a debate among entrepreneurs and investors about whether it was risky for seed-stage companies to take small investments from large VCs. (I blogged about the issue here, here, here).

Since then, enough founders have directly experienced the downside of taking seed money from big VCs that I think it’s safe to say there is no more room for debate. I can think of about 15 founders I’ve spoken to recently who tried or are trying to raise Series As but are seriously hampered by the fact that a big VC invested in the seed round but isn’t participating in the Series A. (I’d love to mention specific companies and firms but it wouldn’t be appropriate for me to do so – I guess I’ll just have to cite Jay Rosen’s “I’m there, let me tell you what I see” principle of reporting).

There are two important nuances to point out here. First, there are big VCs who invest in seed rounds the right way – with the genuine expectation to follow on and the intention to help out during the seed stage (some that I’ve invested with include USV, True, and Spark). One important sign of this is how much they want to invest. If a $300M fund wants to invest $100K, they are buying an option. If they want to invest $500K, they are more likely making an investment.

The second nuance can be counterintuitive: the danger of taking seed money is positively correlated with the reputation of the firm. If a top VC invests in the seed round and then passes on the A, other VCs will have difficulty overlooking that the smartest money that knows the company the best isn’t following on. If the VC isn’t well respected, it is easier for other VCs to second guess them.

I’m not revisiting this issue to criticize big VCs. A healthy startup environment requires smart, ethical investors at all stages. But I don’t think these big VC seed programs benefit anyone. And there are enough angry entrepreneurs out there that I expect the message will get through.

Give away the diagnostic, sell the remedy

Companies that employ the “freemium” business model give away a product or service for free and then charge for additional features. The freemium model has gotten more popular as the cost to deliver free services has dropped but the cost of employing sales and marketing people hasn’t. One of the hardest questions around freemium models is deciding how to divide free from paid features.

One particularly effective version of freemium is: “give away the diagnostic, sell the remedy.” The best known example of this is anti-virus companies that give away free virus scans but charge for virus removers. In fact, this tactic works so well for anti-virus that it almost seems coercive (and has indeed been abused, for example, by “anti-spyware” software that deliberately conflates cookies and viruses). But, in general, giving away a diagnostic seems like a reasonable way to demonstrate the effectiveness of a product while still being able to sell valuable additional features.

Selling the remedy has become increasingly popular with B2B companies. For example, a friend recently wanted to ensure that his company’s (non-spam) e-mails weren’t getting blocked by spam filters, so he contacted an “email delivery optimization” company. They ran a free test and reported that his emails weren’t getting filtered. Two months later they called back and said “uh oh, your emails are getting filtered.” Sure enough his open rates had dropped and his anecdotal tests confirmed that his emails were being inaccurately labelled as spam. Because of the free diagnostic, he had confidence in the company’s technology, and was willing to pay them to fix his problem. And the email optimization company had spent almost nothing to acquire a new customer.

 

The myth of the overnight success

Angry Birds was Rovio’s 52nd game. They spent eight years and almost went bankrupt before finally creating their massive hit. Pinterest is one of the fastest growing websites in history, but struggled for a long time. Pinterest’s CEO recently said that they had “catastrophically small numbers” in their first year after launch, and that if he had listened to popular startup advice he probably would have quit.

You tend to hear about startups when they are successful but not when they are struggling. This creates a systematically distorted perception that companies succeed overnight. Almost always, when you learn the backstory, you find that behind every “overnight success” is a story of entrepreneurs toiling away for years, with very few people except themselves and perhaps a few friends, users, and investors supporting them.

Startups are hard, but they can also go from difficult to great incredibly quickly. You just need to survive long enough and keep going so you can create your 52nd game.

 

The problem with investing based on pattern recognition

A famous story in artificial intelligence is how the US military developed algorithms to determine whether an image had a tank in it. They used a standard machine learning method: feed the computer a “training set” of photos, some of which had tanks in them and some of which didn’t, and let algorithms identify which features in the photos correlated to tanks being shown.

This method worked for a while but then mysteriously stopped working. Since the features the computer identified were embedded in complicated mathematical equations, no one could figure out what it was really doing and therefore why it stopped working. Eventually someone realized that in the training set, all of the images with tanks were taken on a cloudy day, and all the images without tanks were taken on a sunny day. The algorithms had fixated on the most obvious pattern – the color of the sky. When the algorithm was tested on new photos where the weather varied, it was completely flummoxed.

It is commonly said that good startup investors develop “pattern recognition” that allows them to identify great entrepreneurs and companies. If you look at the hugely successful startups of the last decade, the founders have many similarities that are easy to observe. When they started, many were male, young, unmarried, computer programmers, dropouts of elite universities, etc. As a result, a lot of investors look for founders with these characteristics. But without an understanding of the deeper reasons these founders succeeded, these observable characteristics could just as well be the color of the sky and not the tanks.

At the level of individual investors, pattern recognition can lead to bad investments and missed opportunities. In the context of markets, it can cause companies and sectors with the “right patterns” to be overvalued, and ones with the “wrong patterns” to be undervalued. In the broader cultural context, it can cause large groups of talented entrepreneurs to be denied access to capital.

The classic scientific method provides a better model for investing. Scientists observe data, notice patterns, develop hypotheses, and then test those hypotheses. Pattern recognition is only a step along the way to developing hypotheses about the underlying cause.

Perhaps dropping out of college shows a strong level of commitment. Knowing computer science was probably a necessary condition for starting a tech company in the past, but no longer is. Being young could mean you are inexperienced enough to pursue bold ideas that more experienced people would consider crazy. I am just speculating – I don’t know why these characteristics are common among past successful founders. But the mere repetition of patterns shouldn’t be satisfactory to anyone who wants to understand and predict the success of startups.

Some tips for interacting with the press

Here are a few things I’ve learned over the years about the best ways for entrepreneurs to interact with the press (by press I mean blogs as well as traditional media).

- Don’t be afraid to ask what the rules are. Is this on or off the record? If they are writing an article about your company, do they require exclusivity? What is the angle of the story?

- Don’t use a PR firm unless you are so successful that you need someone to help you manage inbound press interest. Most journalists, when talking candidly, will tell you they’d vastly prefer getting an email from the founder of a startup than a PR firm. If you’re Bill Gates, it is understandable that you have someone reaching out for you. If you are a small startup, having a PR rep contact a journalist says “I’m not competent enough to reach you” or “I don’t respect your time enough to reach out directly.”

- Treat journalists with respect. Tech/business journalists often interact with rich and powerful people, some of whom treat them disrespectfully. Like entrepreneurs, journalists are usually interesting people with diverse interests. You’ll probably like them if you talk to them and might even become friends.

- Unless you’re a super hot startup, the existence of your company is not a news story. Exclusives of launches, financings and acquisitions are usually news stories. Trend stories that you are part of could be a news story. Relating your startup or data your startup generates to something already newsworthy (journalists call this “pegging”) can dramatically increase your chances of getting covered.

- Whether you like it or not, the press will put your company into a category, and might run “horserace” stories comparing how the companies in your category are doing. The best you can do here is to try to choose which category you’ll be put into. Arguing that you have no competitors or are creating a new category is pretty much impossible.

- Try to put yourself in the mindset of the journalist. How will this story get them on Techmeme or featured by their editors? What were their most successful recent stories? Do background research on any reporter before talking and read a bunch his/her articles.

- Don’t just contact reporters when you need them: try to be helpful even when you don’t. Sometimes, I get calls to talk about, say, the state of the venture market or asking for some background on a tech sector that is new to the journalist. My guess is they appreciate this and are more responsive when I contact them about a possible story.

The internet is reshaping our economy from one of huge corporations with lots of jobs to huge platforms with lots of income streams

From Innovation and the Bell Labs Miracle in today NYTimes:

Innovation is an important new product or process, deployed on a large scale and having a significant impact on society and the economy, that can do a job “better, or cheaper, or both.” Regrettably, we now use the term to describe almost anything. It can describe a smartphone app or a social media tool; or it can describe the transistor or the blueprint for a cellphone system. The differences are immense. One type of innovation creates a handful of jobs and modest revenues; another, the type Mr. Kelly and his colleagues at Bell Labs repeatedly sought, creates millions of jobs and a long-lasting platform for society’s wealth and well-being.

The conflation of these different kinds of innovations seems to be leading us toward a belief that small groups of profit-seeking entrepreneurs turning out innovative consumer products are as effective as our innovative forebears. History does not support this belief. The teams at Bell Labs that invented the laser, transistor and solar cell were not seeking profits. They were seeking understanding. Yet in the process they created not only new products but entirely new — and lucrative — industries.

Putting aside the obvious rebuttal that large companies like Intel, Microsoft, Apple and even AT&T were once startups, the author seems to confuse “jobs” with “income streams”. For example, it would be easy to dismiss a website like Craigslist as a “social media tool” that has only created a few dozen jobs for its employees. But in fact it has created billions of dollars of income streams for people buying and selling things on its platform. The internet is increasingly reshaping our economy from one of huge corporations with lots of jobs to huge platforms with lots of income streams.