Google should open source what actually matters: their search ranking algorithm

Websites live or die based on how a small group of programmers at Google decide their sites should rank in Google’s main search results.  As the “router” of the vast majority of traffic on the internet, Google’s secret ranking algorithm is probably is the most powerful piece of software code on the planet.

Google talks a lot about openness and their commitment to open source software. What they are really doing is practicing a classic business strategy known as “commoditizing the complement“*.

Google makes 99% of their revenue by selling text ads for things like plane tickets, dvd players and malpractice lawyers. Many of these ads are syndicated to non-Google properties. But the anchor that gives Google their best “inventory” is the main search engine at Google.com.  And the secret sauce behind Google.com is the algorithm for ranking search results. If Google is really committed to openness, it is this algorithm that they need to open source.

The alleged argument against doing so is that search spammers would be able to learn from the algorithm to improve their spamming methods. This form of argument is an old argument in the security community known as “security through obscurity.” Security through obscurity is a technique generally associated with companies like Microsoft and is generally opposed as ineffective and risky by security experts. When you open source something you give the bad guys more info, but you also enlist an army of good guys to help you fight them.

Until Google open sources what really matters – their search ranking algorithm – you should dismiss all their other open-source talk as empty posturing. And millions of websites will have to continue blindly relying on a small group of anonymous engineers in charge of the secret algorithm that determines their fate.

* You can understand a large portion of technology business strategy by understanding strategies around complements. One major point: companies generally try to reduce the price of their products complements (Joel Spolsky has an excellent discussion of the topic here). If you think of the consumer as having a willingness to pay a fixed N for product A plus complementary product B, then each side is fighting for a bigger piece of the pie. This is why, for example, cable companies and content companies are constantly battling. It is also why Google wants open source operating systems to win, and for broadband to be cheap and ubiquitous. [link to full post]

Share:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Reddit
  • Slashdot
  • Suggest to Techmeme via Twitter
  • Tumblr
  • Twitter

Related posts:

  1. Should Apple be more open?
  2. Search and the social graph
  3. The inevitable showdown between Twitter and Twitter apps
  4. What’s strategic for Google?
  5. Google and newspapers: the false choice of opting out
  • Jimbo
    They are already Crowd sourced (not open) One of the places Google has a huge advantage is in their huge base of advertisers that one by one populate the paid serps side of the engine. To be honest when doing a lot of searches I find a better quality result via the paid links than in some instances of the organic.

    In reply to better overall results in the search marketplace I don't think it even matters at this point. People have already associated search with Google. If a magical no name engine appeared tomorrow with "Better" results than Google who would use it?

    IMO I think only one company could compete with Goog in search and they are more closed than them and that is Apple.

    I think at this time in the search space the masses use a comfortable brand. Bing or Yahoo don't have the same brand mind share that Google has in search and it shows in terms of use penetration. Take a popular brand like apple with a growing % of hardware devices such as Mac, Ipod Touch, Iphone, Islate(soon) and load them all with Apples search engine as the default and you will have a competitor to Google. Not Binghoo.
  • I commend Google for the openness they embrace--especially the "data liberation" movement Matt mentioned--where it benefits me, the consumer. And to the extent that the company's choice to keep search and advertising closed allows them deliver services I find valuable, I commend that too. That said, I am turned off by the corporation-cum-savior image they often attempt to project. Call me a cynic. Rosenberg clearly downplays the gritty specifics of how Google's strategy is profit-generating (i.e. the critical role of closed-ness) but is quite generous with the "feel good" side of things. The angle shows up elsewhere with Google, I think you'd agree. I'm sure when that proposed "re-write the MBA curriculum" does happen, it will be put much more objectively.
  • Bigdog
    Hey Chris, Why don't you open source your copyrights and your bank account, too?
  • joey
    Google is the new railroad. It's already built. It cannot be changed, just maintained.

    It's Google's world - we're just living in it.
  • dg
    Google is the number one proponent of copyright violation. Search for a crack of any software application, and Google will merrily pop up dozens if not hundreds of sites. They want to scan books and make them freely available online. Yet they want to keep THEIR algorithms secret. Hypocrisy of the highest order.
  • COP
    why is there only one page in Closure "Cookbook"

    http://code.google.com/closure/library/docs/xhr...

    Why GOOG why double standards?
  • Mike Kew
    "... also enlist an army of good guys to help you fight them." Yeah, that's worked really well for SMTP, hasn't it? Spam e-mail is practically unheard of nowadays.

    Good guys have better things to do with their time. The only ones who will work 24/7 on it are the ones who are making money that way, i.e. the bad guys.

    I agree that Google's rant was - not as simple and above-board as it might have been. But Google's approach has worked fairly well for over ten years now. It seems rash to change it just to satisfy some ideological hangups about "openness".
  • ha, yeah, so if one company controlled a secret email protocol we'd all be better off. seriously?
  • Jack
    Security deals with how to prevent a 3rd party from learning about data shared between two trusted parties. The goal of search ranking is identify the good from the bad. The two idea do not mix because in security there is always some secret involved, but certainly that's not the case for search ranking. It makes sense that the less we know about the algorithm, the less the bad guys can take advantage of it.
  • hm, really? i think email spam filters are very analogous to search spam filtering.
  • I agree. And at least a naive open source spam filter would make it really easy for a spammer to test an email or spammy web page--or even simulate a campaign--before posting it. It seems to me you're undermining your own argument for open-sourcing ranking!
  • Link got scrambled last post. For a discussion of how Google could turn their search technology into profitable open infrastructure that would allow other players to develop better search algorithms:

    http://jonathanstray.com/why-we-need-open-search

    The trick is to become a cloud computing provider that sells not just compute cycles, but compute cycles with local map-reduce access to a full web index.
  • Google is probably correct in saying that open-sourcing their search algorithm would lead to bad spamming. Real competition in search algorithm development would doubtless solve this: we would find page ranking methods with greater spam resistance. To enable this innovation, what Google needs to open is not their algorithm, but their infrastructure.

    Google could allow other providers to run map-reduce on their web index, who could then implement their own search algorithms. These independent search providers would be charged for compute time and the amortized cost of infrastructure, like a utility.

    In essence, Google -- and all other search providers -- are intentionally maintaining very high barriers to entry. It takes a lot of money to assemble a fast web-indexing infrastructure.

    For a detailed discussion of this point, including the economics, see my post here/
  • Saw the train wreck of comments over at BI... hence, my comment here. Yet, it's the same thing apparently since it is Disqus ;-)

    I am very curious about your take on this line from the open manifesto:

    "we will effectively re-write the MBA curriculum for the next several decades"

    This was the single line that jumped out at me. I mean jumped in the sense of wonder when you realize someone took the time to type that into an internal memo.
  • I understand your point, but don't think google should or will do it. And although they are not as open as it could be, i think google is open enough. Releasing their search algorithm would be akin to their business as Hunch opening their decision engine algorithm. Are you planning to do it?
  • markferrari
    The very term "Search Engine" flags the problem. You don't search for things that are not lost. We willl look back in the future at companies like Google and remember when a company that delivered millions (tens of millions) of results to a term that was typed into their "Search Engine" and charged billions of dollars to companies so that their information was at the top of the pile - Google - and best of all if you don't pay, Google has a secret algorithm (even the word makes me smile) after all what kind of business would dare argue that the "Algorithm" is fair. It's not about openess - it's about business! And Google is at this point in time the smartest business in the room. After microsofts dodgy product strategy of getting the customer to finish the product. Along comes Google and the most ridiculous application - a "Search Engine" that delivers more useless crap than ever before in history.
    It's an advertising medium not a charity - it answers to shareholders.
  • Google's managers are primarily responsible for maximizing shareholder value.

    Can you explain how open sourcing their algorithm would help accomplish that objective?
  • Welcome robotic Friedman-ites. Out here in the real world we include more data than shareholder value to make decisions. Feel free to read this blog post plus thousands of previous ones written by other people and then come back to contribute an intelligent comment.
  • I'm sorry - I didn't mean to come off as flip or dismissive.

    What I mean to say is that Google's (and everybody else's) compensation schemes are set up to reward those who "hit the numbers" and punish those who don't. And the compensation schemes, collectively, add up to what those at the top think (hope) will max shareholder value.

    Since those not at the top (and even some at the top) are concerned with feeding their families, paying the mortgage, etc.; they are compelled to "hit the numbers."

    So, forgetting about the abstract *shareholder value* notion, how will open sourcing Google's algorithm help its employees feed their families and pay their mortgages?

    And, please don't dismiss an honest question as an unintelligent comment.
  • I don't think we're debating whether keeping the algo closed is best for GOOG's bottom line but rather the hypocrisy (in my opinion) of not owning up to the fact that indeed maximizing shareholder value is the very reason they continue to guard the "secret sauce" so closely.

    Yet, rather than own up to this choice as a deliberate revenue-maximizing strategy, GOOG patronizes us with sanctimonious posturing about being open"...and offers up dubious excuses about protecting users interests as to why said openness doesn't extend into (its only revenue producing areas) search and ads.
  • jlfromhealogica
    Great post, Chris. I agree that Google needs to be called out on this issue. And I love your responses to the comments from the anonymous trolls. Keep the controversial posts coming!
  • A note aside, the easy one here if transparency/openness is the goal is to provide transparency on what Google's cut of the revenue is in AdWords/AdSense. Isn't that the biggest arbitrage of information asymmetry out there?

    For all of the Apple v. Google comparisons, where the former is closed, proprietary and evil, and the latter is open and benevolent, we at least know the rev split on App Store, don't we? Why would/should AdWords/AdSense be any different?
  • Google spends a lot of effort tuning their algorithm. It's probably better described as a system (which can act differently on different servers). Say a change a week, last time I checked.

    These guys hiked their ideas around to the search engines of the time and were told to take a running jump.


    This openness thing is not deeply sincere. It's Schmidt driven.

    The algorithm has become increasingly useless as Google, seemingly, tunes it to make them ad revenue rather than serve their audience. They do enough to keep the others at bay but their main interest is not searchers but advertising revenue. This has taken search in a less useful direction. It opens the possibility of other solutions moving in.

    It's a major part of their uniqueness. They'd be dumb to give it away.

    Consider:

    1) If you've thought even a little bit about data flow computations you can come up with similar computation networks to what they appear to have. It's not magic it's pretty simple. The hard bit, if you want to run your own, is setting up infrastructure, getting known, making money and not getting bored with it all. If you're serious do it, don't have a shot at stealing their ideas. I think a lot of people can easily formulate better algorithms, but the trouble is having automated inputs to that algorithm.

    2) Have you thought of the complexity of publishing something that changes regularly, is different on different servers, isn't easily expressible in notations that your average user can understand...

    3) If they published it are they going to keep improving it?

    Good thing to focus on but don't try to steal their life's work.


















  • "If Google is really committed to openness..."

    I know they write papers, blog posts and pronounce such things, but the first thing that one learns when running/ranking websites, using Adsense, or even running an Adwords campaign as an advertiser is that:

    Google is NOT open about anything - at all.

    "...you should dismiss all their other open-source talk as empty posturing."

    Correct!

    That empty posturing promotes their product very very well (a lesson they learned early on). Some people still believe that Google does no evil, that they are open, and only have the best interests of the community in mind.

    Remember, this is a publicly traded for-profit company. Who's best interests are kept in mind with any for-profit publicly traded company?

    Wake up.
  • Maybe Google should patent the algorithm then open it ...
  • dbv
    Google is a behemoth and can afford to make sanctimonious statements. All large powerful companies do it everyday sometimes for no reason than the PR group had a slot open for some market messaging. The external community always has to be vigilant about what such companies are doing (and not doing). Google will likely create another Library of Congress worth of information around this subject. In a few weeks when an unsuspecting student wants to learn about "open source" guess which results will be at the top of the Google search list? Funny that.
  • Hi Chris. Nice piece. Kudos for guts too -- its ridiculous how few folks are willing to point out how few clothes the emporers wear -- how the great gods Apple and Google not only don't prctice what they preach, but use the preaching to give competitors enough rope to hang themselves

    Which I think is the uber point -- openness is a great idea but a crummy practice. The winners in markets simply do not win by prancing around naked. They win by leveraging every competitive advantage they have.

    Everyone excoriated Microsoft for being the dark side closed-system proprietary Death Star 9and for making crummy products!) but in truth, Microsoft crushed it and dominated the market for a generation specifically because of such practices (after all, they make crummy products!)

    Apple is the supreme anti-open example -- heck, they sue bloggers over new product info publication! And iHardware+iTunes etc is the ultimate cproprietary closed system.

    And it rocks.

    So for me anyway the takeaway is, for the big winners like Oracle, Microsoft, Google, Apple, etc etc, the success formula seems to: preach the gospel and sing hymns as loudly as anyone, but live in the dirt of the real earth
  • from howard lindzon's blog
    "I like how Apple handles open. They could give a shit what you think. The products kick ass and when they stop kicking ass enough, they will lose." LOL
    http://howardlindzon.com/?p=4654
    I love apple products and as long as every one of their products is great I agree with Howard, who cares if they are open. (as a consumer, not as an internet observer etc).
  • Right on
  • I'm pondering the premise. As a startup minded kinda guy of course I'd love to see Google open source their search specifics. Heck if the entire world open sourced everything it would make my job developing a lot easier. Don't suppose you could open source the Hunch database (oh wait you did with the API ;)

    First, you included so many incredible links that my plateful of reading is chock full of win tomorrow morning. Thanks Chris, you're a legendary linker.

    Second, Google is a business and it's a BIG one with a large revenue stream and by my estimates they've surpassed the hypocrisy of the large threshold. The curse goes as follows, leadership will dictate one message, one vision, and hundreds of interpretations (and misinterpretations) later we'll see the effect on product development, service, and user value. For an extreme example, consider our own bureaucracy.

    There's a South Park episode that captures the concept better than I'm describing here, where America is split by pacifists and warmongers.
  • Chris, my personal guess is that Google won't open-source its ranking algorithms, because that would make spammers' lives easier and would degrade Google's search quality--something that you just complained about in your last post. Other search engines (notably Wikia) have tried going fully open-source, and it hasn't seemed to help their resulting search quality. In an ideal world, I would love to get to a place where Google's ranking could be fully transparent, yet still resistant to spam. I don't know whether that's possible, but it certainly is worth having that as a goal.

    But just because some parts of our ranking algorithms can't be fully disclosed doesn't mean that Google as a company can't pursue more openness. One of our biggest goals has been not to trap users' data, and I would argue that with Gmail, Calendar, Docs, and even our search history, Google has succeeded in giving users the ability to leave Google. That's a type of openness too. Not to mention that many web developers have benefited from tools and open-source projects that have been funded by Google.

    Very few companies are or can be 100% transparent about every aspect of their business. But that doesn't mean that Google shouldn't try to encourage more openness across the web.
  • @Matt: There is a middle ground, you know. You can open up the actual results of the algorithm, without opening the source for the algorithm. Remember, Rosenberg in his blog post called for openness in two areas: Source (code) and Information (data). If you don't want to open your algorithm (code), then you still have the ability to open your information (data) without spammers being able to gain any addition advantage than they already have.

    What do I mean by open-data-ing your web search? I go into heavy details in the following post, in the paragraph that starts "More importantly: Why is the information shown to the user not symmetric?" http://irgupf.com/2009/12/22/google-and-the-mea...

    My guess, however, is that you will also not open-data your search engine, even though spammers are not the problem in this latter scenario. Prove me wrong! :-)
  • Jeremy and I have been discussing this over at The Noisy Channel -- see the comment thread for http://thenoisychannel.com/2009/12/03/search-us... Indeed, one of my earliest blog posts was based on my raising this same security through obscurity point to Amit Singhal: http://thenoisychannel.com/2008/04/08/qa-with-a...

    I've been a strong advocate of transparency in information seeking, particularly in the context of interactive approaches like faceted search. But transparency doesn't play well with ranked retrieval in a market where sites compete for placement. In order for a search engine to make its ranking algorithm transparent, the ranking function has to create an incentive system that encourages sites to play fairly. It's an interesting problem from a game theory perspective, and I don't believe anyone has suggested a practical solution to it. There's certainly nothing stopping people from proposing and publishing ranking algorithms.

    Regardless, Chris, I think you're short-changing some of what Google has opened up. Publishing the GFS, MapReduce and BigTable papers was a big deal, enabling others to take advantage of some of Google's key infrastructure innovations. Even much of the theory behind Google's ranking has been published. So Google has shared much of what someone would need to build similar technology -- and projects like Hadoop (and companies like Facebook that use it) owe much to that openness.

    Finally, I think the argument for open-sourcing ranking would be far more compelling if it were the search consumers who were complaining, rather than site owners.
  • Daniel - You've become a shill for the Borg!! ;)
  • Hey, I have to pay a mortgage just like everyone else! :-)

    But seriously, I'm not trying to hide my cognitive dissonance--and I've included links to save people the effort of hunting down what I've said on record. But it was a lot easier for me to critique black box approaches in non-adversarial enterprise settings than in the adversarial world of the web. I suspect there is an incentive system that works, but we have yet to invent / discover it.

    I can also see de-emphasizing ranking and instead relying on interaction--an approach that I've advocated, particularly in domains where ranking breaks down. The problem with applying that approach more broadly to web search is that a lot of users would have to do more work for queries that are well addressed by ranking today. Not a fun trade-off.

    I recognize that the current state of adversarial search is painfully wasteful for site owners. My words on zero-sum SEO precede me: http://thenoisychannel.com/2008/11/24/life-the-... But I suspect that most users are cheerfully oblivious to this arms race except when spammers win--in which case they blame the search engine for not being smarter. I don't think those folks are clamoring for open-source ranking.
  • Ok, I wrote a more focused post on this subject. Most comments in this thread relate to the "won't somebody please think of the spammers" meme. I'd like to more concisely make the point that there are other, game-changing ways of being open with search that do not involve giving any more information to spammers than they already have, while simultaneously letting the users (and the third party software ecosystem around users) grow the pie!

    http://irgupf.com/2009/12/23/a-fragile-local-ma...
  • Remember, as I said above, there is a middle ground: Open the results themselves for reuse, remixing, refactoring, mashups, etc, without opening up the algorithms Google used to build those results.

    In this manner, the community can come up with better ways of using, reusing, and displaying results than Google is willing to do. And at the same time, Google gets to protect itself from spammers.

    So why isn't this happening? Why hasn't this happened?
  • Actually, wasn't / isn't this the Yahoo! BOSS strategy?

    http://developer.yahoo.com/search/boss/
  • Yup. But what good is a mashup, when I can only mashup one player? Google needs to also commit to the same openness.. and not because I want it to, but because Rosenberg/Google itself wants it to. And it sidesteps the whole spam issue, too, as Yahoo! has now proven. If anything, this move should be a slam dunk for Google: It's safe from spammers and shows complete commitment to.. not my ideals.. but its own ideals.

    What say, Daniel? Ready to make it happen?
  • As you might imagine, it isn't exactly my personal decision. :-) But my point was that a game-changing innovation built on top of BOSS would help build a case that Google isn't opening up enough. Or is BOSS not open in the ways you're thinking of?

    Also, there are mash-ups that combine Google and Yahoo! -- and more. Check out http://kosmix.com/ as an example.
  • Yup, jokin' about the personal decision. :-)

    My counterpoint is that a game-changing innovation built on top of BOSS isn't something that Google has to wait for in order to "open data" its search. Let me quote Rosenberg, from the original Google post:

    "If we can embody a consistent commitment to open — which I believe we can — then we have a big opportunity to lead by example and encourage other companies and industries to adopt the same commitment. If they do, the world will be a better place."

    Waiting until someone else proves that open search data is a game changer is exactly the opposite of leading by example. The "googly" thing to do would be to open early, open often. Eh? Because open wins in the long run.

    And Kosmix isn't exactly the kind of mashup I was talking about. I mean where you can remix the results lists themselves. Interleaving and that sort of thing. Right now, Kosmix shows Google results as an impenetrable silo.

    We should be able to do more than that, and we shouldn't have to sign any complicated licenses to do it, any more than we should have to sign a license to export our gmail data. That's open. And while I don't know all of BOSS's terms and conditions, I'm pretty sure that's more open then even BOSS right now.

    BTW, isn't Duck Duck Go built on BOSS?
  • Let's see how far we can nest this without causing stack overflow!

    My point re: BOSS was that we can learn from seeing how the market has responded to a serious effort to encourage interface innovation on top of a major web search engine. Duck Duck Go does (or at least did) use BOSS, and I've found it interesting enough to blog about. But I'm not aware of any hits that have come out of the BOSS effort. To me, that suggests that either there's a key missing piece (e.g., the licensing is too complicated or the API isn't open enough) or there really isn't value here.

    Put aside the philosophical debate for a moment. Can you describe a specific use case for web search openness that would make the world better for significant number of users? I believe that folks at Google are very receptive to such arguments. And I suspect folks at Yahoo and Bing are too.
  • One more thought:

    Put aside the philosophical debate for a moment. Can you describe a specific use case for web search openness that would make the world better for significant number of users?

    I'd still like to get into this question with you at some point, like I said. Would actually make for a good topic at SSM. Hmm.. I've got ideas already..

    But again, given Rosenberg's/Google's strong stance in that open letter, I would almost argue that the question should be turned around. The proper question to ask is:

    Can you describe a specific case resulting from web search (results) openness that would make the world worse for a significant number of users?

    If you can't, then search data needs to be open.

    It's as simple as that.
  • Matt Cutts is better placed to answer that question than I am. What I do know is that I've made suggestions to promote openness that I didn't imagine would have any downside, only to receive a response explaining actual spammer strategies that would have been more effective with access to the information I suggested to exposing.

    Anyway, I am looking to having this discussion in person at SSM, where we aren't constrained by character width!
  • FWIW, I'm not proposing exposing any more data than is currently exposed. I'm simply proposing going "open" with what is already there, by letting users "export" that data for full and unfettered reuse in any manner desired, the same way that users can export from gmail. All I'm talking about is letting users refactor, remix, reuse, share, etc. only that data that they already have full and complete access to through the Google home page. No more. But also no less.

    And without being "locked in" to Google's UI software (i.e. the home page). So if a user wanted to see his or her results with shade-of-blue #17 rather than #41, he or she is completely free to export the results into his or her own blue#17-based software.

    Maybe you could get Cutts to post, because I have a hard time seeing how spammers could get an upper hand using info that is already available to them.

    Yes, let's finish this discussion at SSM. Very relevant, I think.
  • I'll bet we can get this down to a single character in width! :-)

    Your question still comes, imho, from the wrong place. It would be like asking "what are the hits that have come out of the open gmail effort"? Has some radically fantastic, new email interface been created, because gmail lets you export your messages and contacts? No. Does that mean gmail should go back to being closed? Also no. The point is to be open, first, as Rosenberg called for in the letter.

    Honestly, I'm not trying to dodge your question about what specific use case for search result openness would make the world better for a significant number of users. Believe you me, I've got dozens of ideas, some that I can talk about, some that I can't. But even if we did get into one specific scenario right now that still wouldn't be the point. The point is that you don't need to know a specific use case before opening things up. It's not that your question isn't important; it is. It's just a separate question, and unrelated to the decision of whether or not to open up.

    The question that *is* related about whether or not to open up is the spammer question. And even there I have my doubts as to whether the web as a whole wouldn't compensate and defeat spammers, if Google/etc. were more open. Openness wins, right? But that is a question of algorithmic (source) openness, which is different than the question of data (results) openness. Yahoo BOSS has already shown that it is possible to be data-open, without spammers destroying your search engine. That is the sum total of information that Google needs to know before making the decision to go data-open. That was Rosenberg's only concern/excuse -- the spammers.
  • dannysullivan
    Matt, it's just perplexing to have Google do this big huge post about how important "open" is to Google, how much they want to do but oh by the way, we're not going to be open in the two areas where we are strongest: search and ads.

    I totally get there are security concerns. I also understand at Chris points out that there are different camps over whether being secure by not talking or being transparent is really that secure at all.

    Still, part of it just don't ring true. I can't get full backlink data for a particular page from Google to even figure out if you've got something screwing going on, because Google deliberately decides that this is just too sensitive to reveal. As an AdSense publisher, I can't tell what percentage of sales on my own site that Google keeps for itself.

    Google doesn't even confirm WHAT it uses in its ranking algorithm. Are you using toolbar traffic data? Is Google Checkout conversion data being used, as Sergey Brin hinted at? Do you tap into Google Analytics data? We've got the famed 200 factors -- but listing each and every one of them, that would be too sensitive? There's not enough obscurity still given that you wouldn't have to disclose the exact weight or things you use to also detect spam?

    Android is the classic example of openness, the post tells us. Open how? Open that every major Android phone out there is effectively partnered with Google? They run "Android," but it's an Android with a heavy Google flavor. How's that more open than Windows Mobile?

    Google deserves a lot of praise for pushing for user data to be liberated and transferable. But that's also got a healthy dose of self interest. By being open in this way, you help reassure people that it's fine to stay with Google, that to the world you can say see, there's easy switching. Again, it's admirable -- it's the right thing to do -- but Google also benefits from it overall.

    But more than anything else, when search and ads are set aside, it feels more like an excuse to protect what you have.

    The goal, remember, from the memo was this:

    "Our goal is to keep the Internet open, which promotes choice and competition and keeps users and developers from getting locked in."

    And search and ads can't be part of that, because we're told:

    "Opening up the code would not contribute to these goals and would actually hurt users. The search and advertising markets are already highly competitive with very low switching costs, so users and advertisers already have plenty of choice and are not locked in. Not to mention the fact that opening up these systems would allow people to "game" our algorithms to manipulate search and ads quality rankings, reducing our quality for everyone."

    How. How is reporting what a publisher actually earns off of AdSense hurting the goal? By NOT doing that, you stifle competition, because publishers can't know exactly how much they are worth and demand more from Google or rival ad networks.

    How, when few companies are able to understand the myriad of spam issues that impact web search, is anyone able to stand on the shoulders of Google (which itself stands on the shoulders of predecessors) and improve ranking functions.

    I think Google, in the spirit of the openness memo, shouldn't let the search and ad sides feel they have an opt-out. I think those areas, just like all of Google is being asked, should actively work harder to figure out what they can indeed open up. Otherwise, the memo just comes across more as PR than reality -- that Google will be open when convenient to itself.
  • wilner
    Matt Cutts is the mouthpiece of the Google Monster Borg.
  • "I think those areas [search and ads], just like all of Google is being asked, should actively work harder to figure out what they can indeed open up."

    I'd agree with that Danny, and I think people in search quality and webspam do think a lot about how to open up more.

    I believe the context of this memo is that Jonathan Rosenberg wanted to get down ideas about openness as a headstart for product managers in thinking about how to be more open at Google. But as Rosenberg himself mentions in the memo: "I encourage you to carefully read, review, and debate them." I take this not as an edict from on high, but as a good starting point for the conversation about how to encourage more openness across the web, but also at Google.
  • Thanks for the reply, Matt. Danny response is better than anything I could say. I hope this doesn't hurt Hunch's rankings. ;)
  • Of course not, Chris--don't worry. We're big believers in the Voltaire quote (I may disagree, but "I will defend to the death your right to say it"). :)
  • I totally kidding I hope you know.
  • It just seems so strange to put yourself out in the firing line when you know and are admitting that your own practices are questionable.
  • rikin, this is just my personal take, but the fact that Google should strive to be more transparent wherever we can (including search and ads where it's possible to open up more--hopefully without tipping our hand to spammers) shouldn't prevent Google from posting some thoughts about openness in general.
  • I can certainly believe Google would open-source its ranking algorithm if you got it to a place where it could be fully transparent and resistant to spam, but it doesn't sound like that's one of your top priorities, and it defeats a lot of the purpose of open source if you try to develop it to perfection behind closed doors and then release it.

    Why not start a parallel open source google algorithm with this as its main goal and host it under a different root URL? That way you'll get contributors with new ideas for both improving and exploiting it, and move more quickly toward converging on something that's transparent, resistant to spam, and of good quality. At that point you can swap it in or let users decide which one they like better.
  • this is an excellent suggestion Matt - wish I'd thought of it. Google would benefit as much as the search community - it would become a type of snadbox for new search ideas - I hope Matt is listening and suggests it internally to Google
  • My simplified net out on this one is that Google's real credo is “be open where commoditization is the goal, be closed when proprietary differentiation is the goal.”

    It's somewhat of a self-serving precept, something that I blogged about in:

    Open "ish": The meaning of open, according to Google:
    http://bit.ly/5ocoV3

    Check it out, if interested.

    Mark
  • Given the swift negative responses I can only conclude that you are onto something. Just trying to keep up.
  • i concur.
  • altrenda
    "Security through obscurity is a technique generally associated with companies like Microsoft"

    I thought Security through obscurity was more associated with Apple's OS X and Linux.
  • Not sure the analogy of open-source Security v. Ranking works here. Security is a defined problem, an operating system is either secure or it's vulnerable. We can open this problem to the community and measure the validity of proposed solutions based on whether it makes the system more or less secure.

    Ranking is a much more subjective problem, and one that current search engines just cludge together from a list of statistical correlations. How do you open-source relevancy? How do you objectively judge the merits of new relevancy algo contributions from the community? Google results aren't 'truth' or a solved problem - they're just a subjective lens focusing on a subset of the web's data.

    Philosophy guys out there - lend me a hand. I'm sure there's a logical issue here, but it's making my head hurt thinking about it. :/
  • Perhaps there are parallels to be drawn from the Netflix prize?
  • Although I'm an expert on neither security nor search, this point makes a lot of sense to me. It seems like search relevance is an ongoing arms race. Google's currently losing slowly, at least based on the dishwasher case study, but I'm not clear on how open-sourcing the algorithm would help. Wouldn't it just give the black hat SEO people more information to work with? Maybe security's a similar arms race, but at least there's much less economic incentive for the bad guys to keep up.

    Chris, I'd love to hear whether your experience with security gives you insight into how open-source search could work better than I'm envisioning it. Or do you mainly just want Google to shut up about openness? ;)

    BTW, condolences on the unlucky news link or whatever you seem to have been dealt for this post. I feel like I'm reading Slashdot comments.
  • I'm thinking the closest analogue in security might be Spam Assassin which last time I tried it worked well...

    Yeah, lot of trollish comments today. I think it's developers who actually buy all the google open talk.
  • Wow, Chris, you're really taking a beating in the comments. No love for you!

    Originally, i was skeptical when i saw your tweet about this, but now I get it. And you're right: under a guise of sanctimony, Google is telling everyone else to open themselves up and drive the profit out of their respective sectors, while they themselves do no such thing. And really, it's not a problem if it wasn't for the moralizing tone and the "don't be evil" mantra they shroud themselves in.

    "Don't be evil" is an impossible yet convenient mantra. Having power means you gotta be willing to protect your interests, which they clearly are, and invariably there is some trade off between what is good for users, the world and Google. That said, it is very good for Google to have its users believe that it has their interests and their interests alone at heart. In fact, the whole sanctimony play is quite Machiavellian.

    Well played, GOOG. Well played.

    As these comments attest, it's gonna be a while before the rest of the world figures this all out.
  • My point is that Google is trying to say their interests totally align with users & the world at large. That is a lie--a convenient lie and one that makes people & employees feel good about themselves, but a lie nonetheless.
  • motiontweet
    really stupid post.
    thinking of awarding crown of such stuppid posts on web to you. really.
  • Let me guess, you're a programmer who thinks all business people are stupid.
  • chris - great blog and a topic worthy of conversation! you're focusing on the "commitment to openness" and negative effects of what would happen if they opened.

    we need to think about the positive impact - if you were google, what would be the benefits to you and to the user base. some that i can think of -
    1. crowd-sourced algorithm
    2. users become your community, driving brand loyalty

    what else?
  • Uhhh, Google should open source Page Rank plus all the anti-manipulation measures. For one, Google doesn't even own the rights to Page Rank, Stanford University does and as such it is open for viewing the academic papers on it, however they have an exclusivity contract with Google for rights to use said method in handling the ranking structure of the web. Secondly, the various tweaking methods to prevent gaming and spam should be proprietary, so if you really want to see their search ranking algorithm, minus their customize modifications here. http://infolab.stanford.edu/~backrub/google.html
  • dannysullivan
    Google has a license to the PageRank patent in addition to Stanford. Not that it matters much to how it ranks things now.
  • programmers leaving comments here:

    current google ranking algorithm != original pagerank paper

    not even close.
  • antrod
    Well put Chris, well put. That post reminded me of some of the posturing that Apple does around the Apple Public Source license. It's nice to think that a big successful company can work like this like unicorns are nice.
  • The Riddler
    There is nothing Google cannot offer for free. The obvious easy stuff is anything electronic; however, they can even offer refrigerators, dentists, teachers, etc. for free in order to incorporate that information into their search results. Google will kill more jobs than it creates.
  • Agreed. If this keeps up, we're heading into a Google Economy where $70,000/year jobs are only worth $7k, and we're all scrounging for adsense revenue.
  • I agree with you, Chris, that PageRank alghoritm in its current implementation should be open. Maybe even more now that there is competition (Bing).
    But I also think that it should not released without a prior planning by Google on how to react to bad guy/seo behaviour.
  • The days of Google's hegemony will come to an end soon enough. As you detailed in a prior post, searching Google is becoming increasingly less worthwhile as people approximate (through the scientific method) the solution to the Google formula.

    Whether Google (I prefer the term "Frees") the algorithm or not, a time will come when freeing it results in little added value to the spammer. That said, I can understand why Google is in no hurry to hasten its own demise.
  • Sure, it would require a lot of planning, community building, etc.
  • Far point... unfortunately, I wonder if now the algorithm isn't some kind of recursive thing, were if you apply it now, the rankings will not be even close to what they should be because it takes legacy into account.
  • i bet its a big long list of hacks. they manually decide which sites are trustworthy and let juice propagate from there.
  • Agreed, but it's still recursive : if you don't know the previous state, you won't be able to apply successfully the algorithm.
  • agree - when I saw the open the "algorithm" I speaking loosely - i'd include any data/state info.
  • doh
    No sense here at all. Well why we are at it why don't request Coca Cola to open source their soda recipes? Or better yet require that every business who innovates tell the world exactly how they did it so everyone else can steal their work!

    Don't give me a BS line about me not understanding the point of this article either - I agree that Google should assist more with providing exact guidance on how to "legally" boost their rankings. Provide a better submission process / training / whatever.

    But there is no need to throw the baby out with the bath water.
  • Coca Cola doesn't write big letters about openness like this
    http://googleblog.blogspot.com/2009/12/meaning-...
  • ytspar
    The Coke secret recipe is a marketing tool, and largely a myth: http://www.snopes.com/cokelore/formula.asp and http://en.wikipedia.org/wiki/Coca-Cola_formula

    I can hunt down the references, but there have been plenty of studies debunking the 'special' Coke flavor. Put RC Cola in a Coke can and suddenly it's delicious. There was a New Yorker piece recently (http://www.newyorker.com/reporting/2009/11/23/0...) which made a passing reference to our weak understanding of what governs taste - that it's not mostly in our taste buds.

    So if Coke is a marketing company with relationships to bottlers, distributors and manufacturers who put in the 'hard' labor, and maintaining the illusion of a special recipe is vital to that function, can a similar analogy be made about Google?
  • There are a number of good comments on the Hacker News discussion on this post. For future readers/reference, check them out here: http://news.ycombinator.com/item?id=1009953
  • Well you know that Google's never going to release the code to the algorithm, wouldn't the next option be to commodotize the search engine?
  • yes exactly. they won't open it.
  • You might as well ask Coca Cola to release their secret recipe to the world. Why would Google open source their algorithm so Yahoo and Bing can copy them? Google supports open-source, but you have to understand they also have to make money somehow.
  • GGoogle’s secret ranking algorithm is a "secret sauce"...hmmmmm
    Last time I checked the google "secert sauce" results they were full of spam and blinking floating useless non relevant "real" time results....I think that if a stat up took the google "secrete sauce" and the results that are give to a VC today they would here some hear some harsh criticims and probaly be told to return to the black board to find a better formula....I think not so secert sauce is a companies contuning ability to keep users mesmirized belivers in a sub performing search technolgy....just think of it....goolge has billions for R and D to solve issues with finding relevant search results; and yet a common search for legitamate results when shopping for a dishwasher leads to spam and fustration.....Is this the best that the "Secret Sauce" can provide....If this is the case then maybe we dont want the google "Secert Sauce" in Open Source...This could distract developers from creating an Open Source alternative that works.
  • simon
    I don't think the analog to security stands up: with security software you have a very specialized set of folks who have incentives to either exploit or protect you. In Google's case there are tons of people who have incentives to get ranked high, and there are lots of legitimate sites whose business it is not but who should be ranked high (because of good content and actual relevance). Should they have to become experts on this algorithm to be ranked correctly?
  • There is already a multibillion dollar industry of SEO consultants who spend their time trying to guess how Google works, e.g
    http://www.seomoz.org/article/search-ranking-fa...
    Opening their code might allow a whole new industry of products/services that more intelligently build on the platform.
  • Shafi Ahmed
    Is this a sign that Hunch will open source its own algorithms?
  • Unlike google, I don't write long letters talking about how Hunch is
    all open sourced. That said, we have opened our most valuable asset,
    our database. Hunch.com/developers
  • So to the people who think it would increase spam, do you disagree with Chris' security analogy?... i.e do you think Microsoft products are more secure than their open source alternatives? If not, what are the differences between software security and search rankings? (I'm not taking sides here, just trying to further the discussion...)
  • So when shall we expect you to open source the hunch.com code Good Samaritan?
  • It's pretty open hunch.com/developers
    And I'm very open to suggestions to make it more so.
  • MrP
    hunch.com/developer - this is not an example of openness, it is just a list of API calls, an invite to build on your platform. Applications that use it will not be able to migrate easily as the backend is closed. This is as open as Microsoft's documentation makes Windows open
  • Ben
    http://en.wikipedia.org/wiki/PageRank#Algorithm
    What, you want Sergey to write your code for you too? What a ridiculous blog post.
  • No serious search experts think that has any relation to their current
    algorithm.
  • First of all, if it's a secret, then how do you know it's not PageRank, or something very similar?

    My guess is that they pre-filter, according to an mass of heuristics in an ongoing arms-race with spammers, and then pump the filtered data into PageRank + add some other features, eg, clustering to make sure that all meanings of the search are reflected on the first page (eg, "jaguar" the animal and "jaguar" the car) + a thousand other tweaks. In other words, it's PageRank with pre-filtering and post-filtering, and therefore they did, effectively, opensource it over a decade ago.

    I might be wrong, of course, but there's nothing I'm getting from your article or this comment that's compelling me to question my understanding. And open-sourcing the SEO-arms-race-prefiltering-hacks seems basically useless and highly counter-productive.
  • dannysullivan
    Because Google itself has said on many occasions that the 1998 paper on PageRank isn't how they do things now, now to mention that PageRank itself then, as is now, is only one part of many ranking factors that are employed.
  • Because there are thousands of people who devote their lives to reverse engineering Google. Here is the best summary of the smartest one's thoughts on how the algorithm works today:
    http://www.seomoz.org/article/search-ranking-fa...
  • AWESOME! I made that page.
  • Rand is awesome. Glad he is starting to cross over to the non search world.
  • Clay Christensen talks about this kind of thing.
    Value chains usually alternate between commodotized and de-commodotized products and that it would therefore be in Googles interests to aid the commodotization of all adjacent actors in their value chain
  • Yeah, I get a lot of my thinking from Christensen on this. Talk specifically about him here http://cdixon.org/2009/09/10/non-linearity-of-t...
  • Agreed, Chris. The Google Blog post re: openness smacked of self-righteous puffery.

    That said, I think you'll find a three-legged ballerina before Google open sources PageRank.
  • I agree they won't open it. But then they should stop talking sactimoniously about openness.
  • Um, PageRank *is* open. The full paper is here:
    http://infolab.stanford.edu/pub/papers/google.pdf
  • Their current algirithm has almost no relationship to the original
    paper.
  • Quiark
    Nice Tuesday's demagogy ;)
  • Still have a lot more demagogy to go this week.
  • Sebastian
    opening this algorithm would lead to A LOT of new spammers and SEO-crap. Google, please keep it a secret, PLEASE!
  • Stupid analogy, stupid post. You're comparing a cloud service with a desktop app. With a desktop app you have the binary to analyze for security flaws. A cloud service you only have access to its external APIs. So yes, it is much more secure from spammers if you don't release the code. The code and its parameters also change continuously, and the data it relies on is every bit as important as the code. I guess Google should release all that data as open source too so you can come back with another inane post about how Google's commitments to privacy are empty posturing.
  • oh i didn't realize the principles of openness and security magically change when something is a "cloud service."
  • What exactly are the "principles of openness"?

    As for principles of security, why wouldn't they change? These are vastly different scenarios.
  • perhaps you didn't read the article that this blog post was responding to
    http://googleblog.blogspot.com/2009/12/meaning-...
  • So you think Windows is more secure than Linux?
  • Bill G.
    Yep.
  • [face palm]
blog comments powered by Disqus