Google should open source what actually matters: their search ranking algorithm

Websites live or die based on how a small group of programmers at Google decide their sites should rank in Google’s main search results.  As the “router” of the vast majority of traffic on the internet, Google’s secret ranking algorithm is probably is the most powerful piece of software code on the planet.

Google talks a lot about openness and their commitment to open source software. What they are really doing is practicing a classic business strategy known as “commoditizing the complement“*.

Google makes 99% of their revenue by selling text ads for things like plane tickets, dvd players and malpractice lawyers. Many of these ads are syndicated to non-Google properties. But the anchor that gives Google their best “inventory” is the main search engine at Google.com.  And the secret sauce behind Google.com is the algorithm for ranking search results. If Google is really committed to openness, it is this algorithm that they need to open source.

The alleged argument against doing so is that search spammers would be able to learn from the algorithm to improve their spamming methods. This form of argument is an old argument in the security community known as “security through obscurity.” Security through obscurity is a technique generally associated with companies like Microsoft and is generally opposed as ineffective and risky by security experts. When you open source something you give the bad guys more info, but you also enlist an army of good guys to help you fight them.

Until Google open sources what really matters – their search ranking algorithm – you should dismiss all their other open-source talk as empty posturing. And millions of websites will have to continue blindly relying on a small group of anonymous engineers in charge of the secret algorithm that determines their fate.

* You can understand a large portion of technology business strategy by understanding strategies around complements. One major point: companies generally try to reduce the price of their products complements (Joel Spolsky has an excellent discussion of the topic here). If you think of the consumer as having a willingness to pay a fixed N for product A plus complementary product B, then each side is fighting for a bigger piece of the pie. This is why, for example, cable companies and content companies are constantly battling. It is also why Google wants open source operating systems to win, and for broadband to be cheap and ubiquitous. [link to full post]

Anatomy of a bad search result

In a post last week, Paul Kedrosky noted his frustration when looking for a new dishwasher using Google.  I thought it might be interesting to do some forensics to see which sites rank highly and why.

Paul started by querying Google with the phrase dishwasher reviews:

Screen shot 2009-12-18 at 11.36.20 PM

Pretty much every link on this page has an interesting story to tell about the state of the web.  I’ll just focus here on the top organic (non-sponsored) result:

http://www.consumersearch.com/dishwasher-reviews

clicking through this link takes you here:

Screen shot 2009-12-18 at 11.41.17 PM

Consumersearch is owned by About.com, which in turn is owned by the New York Times.

So how did consumersearch.com get the top organic spot? Most SEO experts I talk to (e.g. SEOMoz‘s Rand Fishkin) think inbound links from a large number of domains still matter far more than other factors. One of the best tools for finding inbound links is Yahoo Site Explorer (which, sadly, is supposed to be killed soon). Using this tool, here’s one of the sites linking to the dishwasher section of Consumersearch:

http://www.whirlpooldishwasher.net/

Screen shot 2009-12-18 at 11.50.38 PM

(Yes, this site’s CSS looks scarily like my own blog – that’s because we both use a generic WordPress template).

This site appears has two goals: 1) fool Google into thinking it’s a blog about dishwashers and 2) link to consumersearch.com.

Who owns this site?  The Whois records are private. (Supposedly the reason Google became a domain registrar a few years ago was to peer behind the domain name privacy veil and weed out sites like this.)

I spent a little time analyzing the “blog” text (it’s actually pretty funny – I encourage you to read it).  It looks like the “blog posts” are fragments from places like Wikipedia run through some obfuscator (perhaps by machine translating from English to another language and back?).  The site was impressively assembled from various sources. For example, the “comments” to the “blog entries” were extracted from Yahoo Answers:

Screen shot 2009-12-18 at 11.57.33 PM

Here is the source of this text on Yahoo Answers:

Screen shot 2009-12-18 at 11.57.58 PM

The key is to have enough dishwaster-related text to look like it’s a blog about dishwashers, while also having enough text diversity to avoid being detected by Google as duplicative or automatically generated content.

So who created this fake blog?  It could have been Consumersearch, or a “black hat” SEO consultant, or someone in an affiliate program that Consumersearch doesn’t even know. I’m not trying to imply that Consumersearch did anything wrong. The problem is systematic. When you have a multibillion dollar economy built around keywords and links, the ultimate “products” optimize for just that:  keywords and links. The incentive to create quality content diminishes.

The problem with online “local” businesses

One of the most popular areas for startups today is “local.”  I probably see a couple of business plans a week that involve local search, local news, local online advertising, etc.

Here’s the biggest challenge with local.  Let’s say you create a great service that users love and it gets popular.  Yelp has done this. Maybe Foursquare, Loopt etc. will do this.  Now you want to make money. It’s very hard to charge users so you want to charge local businesses instead.

The problem is that, for the most part, these local business either don’t think of the web as an important medium or don’t understand how to use it.  Ask you nearest restaurant owner or dry cleaner about online advertising.  They don’t see it as critical and/or are confused about it.  Even Google has barely monetized local.

People who have been successful monetizing local have done it with outbound call centers.   The problem with that approach is it’s expensive.  Even if you succeed in getting local businesses to pay you, it often costs you more to acquire them than you earn over the lifetime of the relationship.

To add insult to injury, local businesses often have very high churn rates.  I have heard that the average is as high as 40%.  Anyone who has done “lifetime customer value analysis” can tell you how that ruins the economics of recurring revenue businesses.

Hopefully this will change in time as local businesses come to see the web as a critical advertising medium and understand how to make it work for them.  But for now, monetizing local is a really tough slog.

* This is what I hear from industry sources.  If readers have better numbers or sources I’d love to hear them.

Why content sites are getting ripped off

A commenter on my blog the other day (Tim Ogilvie) mentioned a distinction that I found really interesting between intent generation and intent harvesting.  This distinction is critical for understanding how internet advertising works and why it is broken.  It also helps explain why sites like the newspapers, blogs, and social networks are getting unfairly low advertising revenues.

Today’s link economy is built around purchasing intent harvesting.  (Worse still, it’s all based on last click intent harvesting- but that is for another blog post).  Most of this happens on search engines or through affiliate programs.  Almost no one decides which products to buy based on Google searches or affiliate referrers.  They decide based on content sites – Gizmodo, New York Times, Twitter, etc.  Those sites generate intent, which is the most important part of creating purchasing intent, which is directly correlated to high advertising revenues.

But content sites have no way to track their role in generating purchasing intent.  Often intent generation doesn’t involve a single trackable click.  Even if there were some direct way to measure intent generation, doing so would be seen by many today as a blurring of the the advertising/editorial line.  So content sites are left only with impression-based display ads, haggling over CPMs without a meaningful measurement of their impact on generating purchasing intent.

All of this has caused a massive shift in revenues from the top to the bottom of the purchasing funnel – from intent generators to intent harvesters.  Somehow this needs to get fixed.

The new economy

According to the Business Insider, Facebook is “‘Beating The S— Out Of Its Numbers’ Thanks To Zynga’s Virtual Goods.”  I wanted to try to understand this new, emerging economy.

It all starts when a user sees an ad on Facebook:

Screen shot 2009-09-28 at 1.40.41 PM

After clicking and installing the app, she gets a little farm where she can grow tomatoes and such.

Screen shot 2009-09-28 at 1.43.08 PM

Game seems pretty fun.  But she runs out of seeds, and wants more.  So she goes shopping for virtual goods.

Screen shot 2009-09-28 at 1.39.17 PM

Let’s say our protagonist is too young to have a credit card, so she decides instead to buy coins by signing up for a free offer.

Screen shot 2009-09-28 at 1.39.25 PM

She decides to download a toolbar.  Free greeting cards seem like fun.

Screen shot 2009-09-28 at 1.39.56 PM

The download puts an Ask.com search toolbar in the user’s browser.  Ask.com makes money off search ads.  Ask probably paid $1 to $2 for the install.  Some portion of that goes to Zynga, and then back to Facebook when Zynga advertises.

Farmville apparently does not advertise on Ask.com:

Screen shot 2009-09-28 at 2.03.02 PM

Thereby preventing the entire new internet economy from imploding in an endless cycle of circularity.