Chris Dixon

Machine learning is really good at partially solving just about any problem

There’s a saying in artificial intelligence circles that techniques like machine learning (and NLP) can very quickly get you, say, 80% of the way to solving just about any (real world) problem, but going beyond 80% is extremely hard, maybe even impossible.  The Netflix Challenge is a case in point: hundreds of the best researchers in the world worked on the problem for 2 years and the (apparent) winning team got a 10% improvement over Netflix’s in-house algorithm.  This is consistent with my own experience, having spent many years and dollars on machine learning projects.

This doesn’t mean machine learning isn’t useful – it just means you need to apply it to contexts that are fault tolerant:  for example, online ad targeting, ranking search results, recommendations, and spam filtering.  Areas where people aren’t so fault tolerant and machine learning usually disappoints include machine translation, speech recognition, and image recognition.

That’s not to say you can’t use machine learning to attack these non-fault tolorant problems, but just that you need to realize the limits of automation and build mechanisms to compensate for those limits.  One great thing about most machine learning algorithms is you can infer confidence levels and then, say, ship low confidence results to a manual process.

A corollary of all of the above is that it is very rare for startup companies to ever have a competitive advantage because of their machine learning algorithms.  If a worldwide concerted effort can only improve Netflix’s algorithm by 10%, how likely are 4 people in an R+D department in a startup going to have a significant breakthrough.  Modern ML algorithms are the product of thousands of academics and billions of dollars of R+D and are generally only improved upon at the margins by individual companies.

  • Pingback: Machine learning is really good at partially solving just about any problem | Igniting Startups - nPost

  • Andres Burgos

    It’s the equivalent of wanting cars to drive themselves. It’s not going to happen for a very very long time. For now let’s focus on building better roads and drivers.

  • Pingback: Twitter Trackbacks for cdixon.org / Machine learning is really good at partially solving just about any problem [cdixon.org] on Topsy.com

  • Martin

    Quite interesting but I would argue it depends on how you use ML. If it is applied to a problem where nobody ever used ML before you can have quite an advantage.

    Especially less mathematical areas people tend to stay away from advanced methods due to a lack of understanding (on both sides, application side does not know about the powers of ML and the ML community has no clue what the application side needs).

  • chris

    Martin – good point. I guess I’m coming from the perspective of the tech startup world where people are generally familiar with ML techniques.

    If you have any examples of areas where ML was freshly applied to create an advantage I’d be really interested to hear about them.

  • http://thenoisychannel.com/ Daniel Tunkelang

    Chris, I’m with you. Machine learning is great, but one of the lessons I derive from the Netflix Challenge is that it quickly hits a point of diminishing return. Rather the focus exclusively on automated methods, it might be a good idea to develop interfaces that draw the must useful information out of people.

    More here:

    http://thenoisychannel.com/2008/11/21/the-napoleon-dynamite-problem/

  • http://www.webinometry.com Rathan Haran

    What about attending to the problem from an entirely new prospective? It seems like the NetFlix challenge participants used a lot of techniques in data mining rather than approaching it from a completely clean slate. I wonder how many teams started in this fashion instead of jumping right into the data.

  • http://machine-learning.eggsprout.com Ian Ma

    Hi Chris. I was just starting a conversation about that in my forum. I’m building a home for ML folks and I’d love to have you in conversation with us. Please take a look at http://machine-learning.eggsprout.com and think about joining. Thanks for the article — I hope you don’t mind me sharing it on our thread!

    –Ian

  • Pingback: cdixon.org / To make smarter systems, it’s all about the data

  • Ramaseshan

    I agree with you that the insight lies in the data itself. Most of the time we solve problems using sparse data or known data to solve the problem is limited. Known contexts, social or otherwise, and ML algos may help us put the puzzle pieces together.

  • http://www.netrics.com/blog Stef Damianakis snd@ne

    @ Chris…

    I humbly submit Netrics as an example of freshly applying Machine Learning to deliver value and create market advantage.

    Also, using the Netflix Challenge as a single data point to flog all of ML is not exactly fair.

    The impact and importance of ML will only grow – these are a very exciting times!

  • Pingback: INDEX // mb - Against Forecasting: A Case for More Agility in Book Publishing

  • http://noosphere.tumblr.com irene

    In collaboration with an art history PhD candidate, I used various ML suites to see if they could correctly classify ancient Mesopotamian ivory sculptures. They did well. A bit better than 80%.

    Interestingly enough, it was the mistakes the algorithms made that lead to the most significant discovery. I took all the misclassifications and examined them myself in an Excel spreadsheet. In doing so, I found an intriguing pattern which ended up adding a lot of value to our study.

    • http://twitter.com/danielharan Daniel Haran

      edit: sorry, wasn’t meant as a response

  • stealth_reader

    Chris,

    WAY wrong directions.

    ‘Machine learning’ is a junk field because it has no solid rational foundation and no powerful methodology. About all the field is is heuristics and bad applications of cookbook statistics 101.

    I worked in the field at the Watson lab in Yorktown Heights and each day had to hold my nose not to upchuck from the stench. Finally I took our central problem, found a solid solution, and published it.

    E.g., ‘machine learning’ keeps looking for an ‘algorithm’. They are already lost, digging in the wrong place. By analogy they would look for an ‘algorithm’ to say how to navigate a space craft from Earth to a selected spot on a selected moon of Jupiter. Laughable. Instead, start with Newton’s law of gravitation and laws of motion and some ordinary differential equations. For the software and computing, that’s just to do the arithmetic. There’s no ‘algorithm’ (unless want to count some numerical techniques for solving the differential equations). What the computer science people are missing in ‘machine learning’ is analogous to Newton’s laws.

    It’s possible to do much better on the problems being attacked by machine learning, but the computer science community doesn’t know how to proceed. The needed techniques are rock solid but they are quite advanced. Nearly no one, even at the top of research computer science, has the prerequisites because they didn’t take the right courses in grad school. The fields that understand the needed techniques believe that as research ‘machine learning’ problems are trivial and that too much is already known.

  • http://twitter.com/_girishrao Girish Rao

    One might argue that in the early days Google advanced their lead in the search engine market and continue to lead in search monetization by leveraging their machine learning algorithms to identify and rank the best links/ads to display.

    Or maybe their success is due more to smart engineering than research optimization.

  • http://twitter.com/danielharan Daniel Haran

    On the Netflix prize 15 movies out of 17770 (<0.1%) accounting for more than 8% of the remaining error:
    http://www.netflixprize.com/community/viewtopic.php?id=1126

    Machine learning can solve 80% of the problem, but it's still important to pose it well. What Netflix needed wasn't just better predictions, it was better retention.

    Algorithms alone are a poor competitive edge. With data and a better problem definition, they are a huge competitive edge.

  • http://twitter.com/mgershoff Matt Gershoff

    There are two questions here; 1) ML or not ML and 2) if ML which approach?
    For many situations, a ML approach dominates attempting to explicitly program a solution (e.g. face recognition or robot controllers – check out Andy Ng’s helicopters). The value is generated by finding situations where ML can be applied.
    It is true that swapping out one algorithm for another often only leads to a marginal improvement for a fixed set of data. As the data increases, the selection of the algorithm becomes even less important for accuracy (or what ever the loss function is).
    Given that for ML problems the accuracy of the optimization method is ultimately bounded by the Generalization error, what becomes important is that the algorithm can scale. So for Netflix, it was going back to good old SGD to solve the SVD and I have a ‘Hunch’ that might be the approach taken elsewhere ;) .

  • Pingback: cdixon.org – chris dixon's blog / Inferring intent on mobile devices

  • http://www.filmeshd.tv/ download de filmes

     valew i liked the tip of the blog will always download movies to Verica visiting the news there was valew @D0Wn10@D_F1LW35 Jogos para celularDownload filmes