The Babe Ruth Effect in Venture Capital

“How to hit home runs: I swing as hard as I can, and I try to swing right through the ball… The harder you grip the bat, the more you can swing it through the ball, and the farther the ball will go. I swing big, with everything I’ve got. I hit big or I miss big.”  – Babe Ruth

One of the hardest concepts to internalize for those new to VC is what is known as the “Babe Ruth effect”:

Building a portfolio that can deliver superior performance requires that you evaluate each investment using expected value analysis. What is striking is that the leading thinkers across varied fields — including horse betting, casino gambling, and investing — all emphasize the same point. We call it the Babe Ruth effect: even though Ruth struck out a lot, he was one of baseball’s greatest hitters. — ”The Babe Ruth Effect: Frequency vs Magnitude” [pdf]

The Babe Ruth effect occurs in many categories of investing, but is especially pronounced in VC. As Peter Thiel observes:

Actual [venture capital] returns are incredibly skewed. The more a VC understands this skew pattern, the better the VC. Bad VCs tend to think the dashed line is flat, i.e. that all companies are created equal, and some just fail, spin wheels, or grow. In reality you get a power law distribution.

The Babe Ruth effect is hard to internalize because people are generally predisposed to avoid losses. Behavioral economists have famously demonstrated that people feel a lot worse about losses of a given size than they feel good about gains of the same size. Losing money feels bad, even if it is part of an investment strategy that succeeds in aggregate.

People usually cite anecdotal cases when discussing this topic, because it’s difficult to get access to comprehensive VC performance data. Horsley Bridge, a highly respected investor (Limited Partner) in many VC funds, was kind enough to share with me aggregated, anonymous historical data on the distribution of investment returns across the hundreds of VC funds they’ve invested in since 1985.

As expected, the returns are highly concentrated: about ~6% of investments representing 4.5% of dollars invested generated ~60% of the total returns. Let’s dig into the data a little more to see what separates good VC funds from bad VC funds.

Home runsAs expected, successful funds have more “home run” investments (defined as investments that return >10x):

Screenshot 2015-06-06 11.55.45

(For all the charts shown, the X-axis is the performance of the VC funds: great VC funds are on the right and bad funds are on the left.)

Great funds not only have more home runs, they have home runs of greater magnitude. Here’s a chart that looks at the average performance of the “home run” (>10x) investments:

Screenshot 2015-06-06 11.55.55

The home runs for good funds are around 20x, but the home runs for great funds are almost 70x. As Bill Gurley says: “Venture capital is not even a home run business. It’s a grand slam business.”

Strikeouts: The Y-axis on the this chart is the percentage of investments that lose money:Screen Shot 2015-05-25 at 9.48.04 PMThis is the same chart with the Y-axis weighted by dollars invested per investment:

Screen Shot 2015-05-25 at 9.45.05 PM

As expected, lots of investments lose money. Venture capital is a risky business.

Notice that the curves are U-shaped. It isn’t surprising that the bad funds lose money a lot, or that the good funds lose money less often than the bad funds. What is interesting and perhaps surprising is that the great funds lose money more often than good funds do. The best VCs funds truly do exemplify the Babe Ruth effect: they swing hard, and either hit big or miss big. You can’t have grand slams without a lot of strikeouts.

Exponential curves feel gradual and then sudden

“How did you go bankrupt?”
“Two ways. Gradually, then suddenly.”
― Ernest Hemingway, The Sun Also Rises

The core growth process in the technology business is a mutually reinforcing, multi-step, positive feedback loop between platforms and applications.  This leads to exponential growth curves (Peter Thiel calls them power law curves), which in idealized form look like:

Screen Shot 2015-05-12 at 5.46.11 PM

The most prominent recent example of this was the positive feedback loop between smartphones (iOS and Android phones) and smartphone apps (FB, WhatsApp, etc):

Screen Shot 2015-05-12 at 6.05.30 PM

After the fact, exponential curves look relatively smooth. When you are in the midst of them, however, they feel like they are divided into two stages: gradual and sudden.

Screen Shot 2015-05-12 at 5.48.37 PM

Singularity University calls this the “deception of linear vs exponential growth”:

linear-vs-exponential-1024x658-1

Today, smartphone growth seems obviously exponential. But just a few years ago many people thought smartphones were growing linearly. Even Mark Zuckerberg underestimated the importance of mobile in the “feels gradual” phase. In 2011 or so, he realized what we were experiencing was actually an exponential curve, and consequently dramatically increased Facebook’s investment in mobile:

Screen Shot 2015-05-12 at 6.19.33 PM

Exponential growth curves in the “feels gradual” phase are deceptive. There are many things happening today in technology that feel gradual and disappointing but will soon feel sudden and amazing.

“It all blossomed out of this tiny little seed”

Steve Jobs in 1985:

I felt it the first time when I visited a school. It was third and fourth graders, and they had a whole classroom full of Apple II’s. I spent a few hours there, and I saw these third and fourth graders growing up completely different than I grew up because of this machine.

What hit me about it was that here was this machine that very few people designed — about four in the case of the Apple II — who gave it to some other people who didn’t know how to design it but knew how to make it, to manufacture it. They could make a whole bunch of them. And then they give it some people that didn’t know how to design it or manufacture it, but they knew how to distribute it. And then they gave it to some people that didn’t knew how to design or manufacture or distribute it, but knew how to write software for it.

Gradually this sort of inverse pyramid grew. It finally got into the hands of a lot of people — and it all blossomed out of this tiny little seed.

It seemed like an incredible amount of leverage. It all started with just an idea. Here was this idea, taken through all of these stages, resulting in a classroom full of kids growing up with some insights and fundamentally different experiences which, I thought, might be very beneficial to their lives. Because of this germ of an idea a few years ago.

That’s an incredible feeling to know that you had something to do with it, and to know it can be done, to know that you can plant something in the world and it will grow, and change the world, ever so slightly.

– Steve Jobs brainstorms with NeXT team 1985 (starting at minute 18:24)

The idea maze for AI startups

An “idea maze” is a map of all the key decisions and tradeoffs that startups in a given space need to make:

A good founder is capable of anticipating which turns lead to treasure and which lead to certain death. A bad founder is just running to the entrance of (say) the “movies/music/filesharing/P2P” maze or the “photosharing” maze without any sense for the history of the industry, the players in the maze, the casualties of the past, and the technologies that are likely to move walls and change assumptions.

– Balaji Srinivasan, “Market Research, Wireframing and Design

I thought it would be interesting to show an example of an idea maze for an area that I’m interested in: AI startups. Here’s a sketch of the maze. I explain each step in detail below.

Screen Shot 2015-02-01 at 11.51.53 AM

“MVP with 80–90% accuracy.” The old saying in the machine learning community is that “machine learning is really good at partially solving just about any problem.” For most problems, it’s relatively easy to build a model that is accurate 80–90% of the time. After that, the returns on time, money, brainpower, data etc. rapidly diminish. As a rule of thumb, you’ll spend a few months getting to 80% and something between a few years and eternity getting the last 20%. (Incidentally, this is why when you see partial demos like Watson and self-driving cars, the demo itself doesn’t tell you much — what you need to see is how they handle the 10–20% of “edge cases” — the dog jumping out in front of the car in unusual lighting conditions, etc).

At this point in the maze you have a choice. You can either 1) try to get the accuracy up to near 100%, or 2) build a product that is useful even though it is only partially accurate. You do this by building what I like to call a “fault tolerant UX.”

“Create a fault tolerant UX.” Good examples of fault-tolerant UXs are iOS autocorrect and Google search’s “did you mean X?” feature. You could also argue Google search itself is a fault tolerant UX: showing 10 links instead of going straight to the top result lets the human override the machine when the machine gets the ranking wrong. Building a fault tolerant UX isn’t capitulation, but it does mean a very different set of product requirements. (In particular, latency is very important when you want the human and machine to work together—this generally affects your technical architecture).

Ok so let’s suppose you decide to go for 100% accuracy. How do you get there? You won’t get the 10–20% through algorithms. You’ll only get there with lots more data for training your models. Data is the key to AI because 1) it’s the missing ingredient — we have great algorithms and virtually endless computational resources now, and 2) it’s the proprietary ingredient—algorithms are mostly a shared resource created by the research community. Public data sets, on the other hands, are generally not very good. The good data sets either don’t exist or are privately owned.

“Narrow the domain.” The amount of data you need is relative to the breadth of the problem you are trying to solve. So before you start collecting data you might want to narrow your domain. Instead of trying to build a virtual bot that can do anything (which would basically mean passing the Turing Test—good luck with that), build a bot that can just help someone with scheduling meetings. Instead of building a cloud service that predicts anything, build one that can predict when a transaction is fraudulent. Etc.

“Narrow domain even more.” After you are done narrowing the domain, try narrowing it even more! Even if your goal is to build X, sometimes building an MVP that is part of X is the best way to eventually get to X. My advice would be to keep narrowing your domain until you can’t narrow it anymore without making the product so narrow that no one wants to use it. You can always expand the scope later.

“How do you get the data?” Broadly speaking, there are two ways: build it yourself or crowdsource it. A good analogy here is Google Maps vs Waze. Google employs thousands of people driving around to map out roads, buildings, and traffic. Waze figured out how to get millions of people to do that for them. To do what Google does, you need far more capital (hundreds of millions, if not billions of dollars) than is generally available to pre-launch startups.

Startups are left with two choices to get the data. 1) Try to mine it from publicly available sources. 2) Try to crowdsource it.

The most common example of 1) is crawling the web, or big websites like Wikipedia. You could argue this is what the original Google search did by using links as ranking signals. Many startups have tried mining Wikipedia, an approach that hasn’t led to much success, as far as I know.

The most viable approach for startups is crowdsourcing the data. This boils down to designing a service that provides the right incentives for users to give data back to the system to make it better. Building a crowdsourced product is its own topic (which is why that part of the idea maze points to another, nested idea maze), but I’ll give an example of one approach to doing this, which was tried by company called Wit.ai that we invested in last year. Wit’s idea was to provide a service for developers for doing speech-to-text and natural language processing. The v1.0 system gave the right answer most but not all of the time. But it also provided a dashboard and API where developers could correct errors to improve their results. For developers using the free version of the service, the training they performed would get fed back to make the overall system smarter. Facebook acquired Wit so their future will unfold now as part of a larger company. The approach they took was very clever and could apply to many other AI domains.

This is a rough sketch of how I see the AI startup idea maze. A few caveats: 1) I could very well be mistaken or have overlooked other paths through the maze — idea mazes are meant to aid discussion, not serve as gospel, and 2) As Balaji says, new technological developments can “move walls and change assumptions.” Look out especially for new infrastructure technologies (internet, smartphones, cloud computing, bitcoin, etc) that can unlock new pathways in many different idea mazes, even ones that at first seem unrelated.