As this article by Alex Wright in the New York Times last week reminded me, when the mainstream press talks about artificial intelligence – machine learning, natural language processing, sentiment analysis, and so on – they talk as if it’s all about algorithmic breakthroughs. The implication is it’s primarily a matter of developing new equations or techniques in order to build systems that are significantly smarter than the status quo.
What I think this view misses (but I suspect the companies covered in the article understand) is that significant AI breakthroughs come from identifying or creating new sources of data, not inventing new algorithms.
Google’s PageRank was probably the greatest AI-related invention ever brought to market by a startup. It was one of very few cases where a new system was really an order of magnitude smarter than existing ones. The Google founders are widely recognized for their algorithmic work. Their most important insight, however, in my opinion, was to identify a previously untapped and incredibly valuable data source – links – and then build a (brilliant) algorithm to optimally harness that new data source.
Modern AI algorithms are very powerful, but the reality is there are thousands of programmers/researchers who can implement them with about the same level of success. The Netflix Challenge demonstrated that a massive, world-wide effort only improves on an in-house algorithm by approximately 10%. Studies have shown that naive bayes is as good or better than fancy algorithms in a surprising number of real world cases. It’s relatively easy to build systems that are right 80% of the time, but very hard to go beyond that.
Algorithms are, as they say in business school, “commoditized.” The order of magnitude breakthroughs (and companies with real competitive advantages) are going to come from those who identify or create new data sources.