The only college major that matters

If you want to work in venture capital focusing on internet/software companies, or start one of those companies, or work as an employee in any role at one of those companies, there is only one undergraduate major you should consider:  computer science.*

I’m not saying you need a computer science degree, but I am saying it’s incredibly helpful to know computer science.  Lots of great computer scientists are self taught. But almost all of them started coding in their teens.  If you are a coder already and want to spend your college years majoring in something else for the heck of it, great.  I spent my whole childhood coding, and worked during college as a programmer, so decided to major in Philosophy because I thought it was interesting.

Why is it so much better to learn computer science in college (or before)?  Because after college it’s very hard to find the time and discipline to teach yourself coding.  On the other hand, it’s pretty easy to pick up business skills, economics and all sorts of other skills on the job or in grad school.

Why is a computer science degree so important to VC and startups?  I would estimate in about half the conversations I have at my own startup, with tech founders, and with venture capitalists, there is a moment in the conversation when we start getting technical.  Sometimes someone will even ask “Are you technical?” before starting down a topic.  The non-technical people in the room just sit there like we are speaking Greek.

It’s a shame that student enrollment in computer science is in decline.  The thinking apparently is that computer programming is increasingly moving overseas.  What these students fail to realize is you don’t need to be a professional coder all your life to find computer science an incredibly valuable major.

* There is a whole separate world of VC and startups in energy and healthcare.  In those areas I’d recommend analogous technical undergraduate majors.

To make smarter systems, it’s all about the data

As this article by Alex Wright in the New York Times last week reminded me, when the mainstream press talks about artificial intelligence – machine learning, natural language processing, sentiment analysis, and so on – they talk as if it’s all about algorithmic breakthroughs.  The implication is it’s primarily a matter of developing new equations or techniques in order to build systems that are significantly smarter than the status quo.

What I think this view misses (but I suspect the companies covered in the article understand) is that significant AI breakthroughs come from identifying or creating new sources of data, not inventing new algorithms.

Google’s PageRank was probably the greatest AI-related invention ever brought to market by a startup.  It was one of very few cases where a new system was really an order of magnitude smarter than existing ones.  The Google founders are widely recognized for their algorithmic work.  Their most important insight, however, in my opinion, was to identify a previously untapped and incredibly valuable data source – links – and then build a (brilliant) algorithm to optimally harness that new data source.

Modern AI algorithms are very powerful, but the reality is there are thousands of programmers/researchers who can implement them with about the same level of success.  The Netflix Challenge demonstrated that a massive, world-wide effort only improves on an in-house algorithm by approximately 10%. Studies have shown that naive bayes is as good or better than fancy algorithms in a surprising number of real world cases.  It’s relatively easy to build systems that are right 80% of the time, but very hard to go beyond that.

Algorithms are, as they say in business school, “commoditized.”  The order of magnitude breakthroughs (and companies with real competitive advantages) are going to come from those who identify or create new data sources.

Machine learning is really good at partially solving just about any problem

There’s a saying in artificial intelligence circles that techniques like machine learning (and NLP) can very quickly get you, say, 80% of the way to solving just about any (real world) problem, but going beyond 80% is extremely hard, maybe even impossible.  The Netflix Challenge is a case in point: hundreds of the best researchers in the world worked on the problem for 2 years and the (apparent) winning team got a 10% improvement over Netflix’s in-house algorithm.  This is consistent with my own experience, having spent many years and dollars on machine learning projects.

This doesn’t mean machine learning isn’t useful – it just means you need to apply it to contexts that are fault tolerant:  for example, online ad targeting, ranking search results, recommendations, and spam filtering.  Areas where people aren’t so fault tolerant and machine learning usually disappoints include machine translation, speech recognition, and image recognition.

That’s not to say you can’t use machine learning to attack these non-fault tolorant problems, but just that you need to realize the limits of automation and build mechanisms to compensate for those limits.  One great thing about most machine learning algorithms is you can infer confidence levels and then, say, ship low confidence results to a manual process.

A corollary of all of the above is that it is very rare for startup companies to ever have a competitive advantage because of their machine learning algorithms.  If a worldwide concerted effort can only improve Netflix’s algorithm by 10%, how likely are 4 people in an R+D department in a startup going to have a significant breakthrough.  Modern ML algorithms are the product of thousands of academics and billions of dollars of R+D and are generally only improved upon at the margins by individual companies.

Wator – population simulation

I made this program using Flash about 10 years ago just for fun. It was based on an old “recreational computing” Scientific American article by A.K. Dewdney. You can find more elaborate versions of Wator elsewhere, for example here. I tried to make my version focusing more on aesthetics than on customizability.

The basic idea behind Wator is that the brown dots represent “sharks” and the blue dots represent “fish”. Sharks survive by eating fish and fish survive by breeding and not getting eaten by sharks. From these very simple rules you can see what looks like the ebb and flow of real population changes.

It is quite easy for the sharks to eat all the fish and then themselves all die out since they have no more food (insert your own lesson about future of the human race etc here). Whether and how long the shark & fish populations survive depends on the parameters you set such as shark breeding time, fish breeding time etc. In the case of my program I “cheated” by adding an “I am Legend” rule that when there is only 1 shark left, that shark can’t die (ok-can’t remember if that’s actually consistent with I am Legend). If you watch the flash movie for a while you’ll see this happen occasionally.