Entries Tagged 'search' ↓

News is a lousy business for Google too

There is a widespread myth that search engines have taken profits away from news websites. A few months ago, Rupert Murdoch said: “Google has devised a brilliant business model that avoids paying for news gathering yet profits off the search ads sold around that content.”

The reality is that news is a lousy business. Period. Even Google doesn’t make money on it. For example, here are Google’s search results for the phrase “afghanistan war”:

Notice there aren’t any ads on the page. This is because ads for “afghanistan war” generate such low revenues per query that Google doesn’t think it’s worth hurting the user experience with a cluttered page. Google can afford to do this on news queries (along with many other categories of queries) because their real business is selling ads on queries where the user likely has purchasing intent. Big money-making categories include travel, consumer electronics and malpractice lawyers. News queries are loss leaders.

It’s an historical accident that hard news categories like international and investigative reporting were part of profitable businesses. The internet upended this model by 1) providing a new delivery method for classified ads (mainly Craigslist), 2) increasing the supply of newspapers from 1-2 per location to thousands per location, thereby driving the willingness-to-pay for news dramatically down, and 3) unbundling news categories, making cross subsidization increasingly hard.

The internet exposed hard news for what it is: a lousy standalone business. Google arguably contributed to this in many indirect ways, including by helping users find substitute news sources. But the idea that Google takes profits directly from newspapers is simply misinformed.

Incumbents

Almost every startup has big companies (“incumbents”) that are at some point potential acquirers or competitors.  For internet startups that primarily means Google and Microsoft, and to a far lesser extent Yahoo and AOL.  (And likely more and more Apple, Facebook and even Twitter?).

The first thing to try to figure out is whether what you are building will eventually be on the incumbent’s product roadmap. The best way to do predict this is to figure out whether what you are doing is strategic for the company. (I try to outline what I think is strategic for Google here). Note that asking people who work at the incumbents isn’t very useful – even they don’t know what will be important to them in, say, two years.

If what you are doing is strategic for the incumbents, be prepared for them to enter the market at some point. This could be good for you if you build a great product, recruit a great team, and are happy with a “product sale” or “trade sale” – usually sub $50M. If you are going for this size outcome, you should plan your financing strategy appropriately. Trade sales are generally great for bootstrapped or seed-funded companies but bad if you have raised lots of VC money.

If your product is strategic for the incumbent and you’re shooting for a bigger outcome, you probably need to either 1) be far enough ahead of the curve that by the time the big guys get there you’re already entrenched, or 2) be doing something the big guys aren’t good at. Google has been good at a surprising number of things. One important area they haven’t been good at (yet) is software with a social component (Google Video vs YouTube, Orkut vs Facebook, Knol vs Wikipedia, etc).

The final question to ask is whether your product is disruptive or sustaining (in the Christensen sense).  If it’s disruptive, you most likely will go unnoticed by the incumbents for a long time (because it will look like a toy to them). If the your technology is sustaining and you get noticed early you probably want to try to sell (and if you can’t, pivot). My last company, SiteAdvisor, was very much a sustaining technology, and the big guys literally told us if we didn’t sell they’d build it. In that case, the gig is up and you gotta sell.

Collective knowledge systems

I think you could make a strong argument that the most important technologies developed over the last decade are a set of systems that are sometimes called “collective knowledge systems”.

The most successful collective knowledge system is the combination of Google plus the web. Of course Google was originally intended to be just a search engine, and the web just a collection of interlinked documents. But together they provide a very efficient system for surfacing the smartest thoughts on almost any topic from almost any person.

The second most successful collective knowledge system is Wikipedia. Back in 2001, most people thought Wikipedia was a wacky project that would at best end up being a quirky “toy” encyclopedia. Instead it has become a remarkably comprehensive and accurate resource that most internet users access every day.

Other well-known and mostly successful collective knowledge systems include “answer” sites like Yahoo Answers, review sites like Yelp, and link sharing sites like Delicious.  My own company Hunch is a collective knowledge system for recommendations, building on ideas originally developed by “collaborative filtering” pioneer Firefly and the recommendation systems built into Amazon and Netflix.

Dealing with information overload

It has been widely noted that the amount of information in the world and in digital form has been growing exponentially. One way to make sense of all this information is to try to structure it after it is created. This method has proven to be, at best, partially effective (for a state-of-the-art attempt at doing simple information classification, try Google Squared).

It turns out that imposing even minimal structure on information, especially as it is being created, goes a long way. This is what successful collective knowledge systems do. Google would be vastly less effective if the web didn’t have tags and links. Wikipedia is highly structured, with an extensive organizational hierarchy and set of rules and norms. Yahoo Answers has a reputation and voting system that allows good answers to bubble up. Flickr and Delicious encourage user to explicitly tag items instead of trying to infer tags later via image recognition and text classification.

Importance of collective knowledge systems

There are very practical, pressing needs for better collective knowledge systems. For example, noted security researcher Bruce Schneier argues that the United States’ biggest anti-terrorism intelligence challenge is to build a collective knowledge system across disconnected agencies:

What we need is an intelligence community that shares ideas and hunches and facts on their versions of Facebook, Twitter and wikis. We need the bottom-up organization that has made the Internet the greatest collection of human knowledge and ideas ever assembled.

The same could be said of every organization, large and small, formal and and informal, that wants to get maximum value from the knowledge of its members.

Collective knowledge systems also have pure academic value. When Artificial Intelligence was first being seriously developed in the 1950’s, experts optimistically predicted they’d create machines that were as intelligent as humans in the near future.  In 1965, AI expert Herbert Simon predicted that “machines will be capable, within twenty years, of doing any work a man can do.”

While AI has had notable victories (e.g. chess), and produced an excellent set of tools that laid the groundwork for things like web search, it is nowhere close to achieving its goal of matching – let alone surpassing – human intelligence. If machines will ever be smart (and eventually try to destroy humanity?), collective knowledge systems are the best bet.

Design principles

Should the US government just try putting up a wiki or micro-messaging service and see what happens? How should such a system be structured? Should users be assigned reputations and tagged by expertise? What is the unit of a “contribution”? How much structure should those contributions be required to have? Should there be incentives to contribute? How can the system be structured to “learn” most efficiently? How do you balance requiring up front structure with ease of use?

These are the kind of questions you might think are being researched by academic computer scientists. Unfortunately, academic computer scientists still seem to model their field after the “hard sciences” instead of what they should modeling it after — social sciences like economics or sociology. As a result, computer scientists spend a lot of time dreaming up new programming languages, operating system architectures, and encryption schemes that, for the most part, sadly, nobody will every use.

Meanwhile the really important questions related to information and computer science are mostly being ignored (there are notable exceptions, such as MIT’s Center for Collective Intelligence). Instead most of the work is being done informally and unsystematically by startups, research groups at large companies like Google, and a small group of multi-disciplinary academics like Clay Shirky and Duncan Watts.

What’s strategic for Google?

Google seems to be releasing or acquiring new products almost daily. It’s one thing for a couple of programmers to hack together a side project. It’s another thing for Google to put gobs of time and money behind it. The best way to predict how committed Google will be to a given project is to figure out whether it is “strategic” or not.

Google makes 99% of their revenue selling text ads for things like airplane tickets, dvd players, and malpractice lawyers. A project is strategic for Google if it affects what sits between the person clicking on an ad and the company paying for the ad. Here is my rough breakdown of the “layers in the stack” between humans and the money:

Human - device – OS – browser – bandwidth –  websites - ads – ad tech – relationship to advertiser – $$$

At each layer, Google either wants to dominate it or commoditize it. (For more on the strategic move known as commoditizing the complement, see here, here and here). Here’s my a brief analysis of the more interesting layers:

Device: Desktop hardware already commoditized. Mobile hardware is not, hence Google Phone (Nexus One).

OS: Not commoditized, and dominated by archenemy (Microsoft)!!   Hence Android/Google Chrome OS is very strategic. Google also needs to remove main reasons people choose Windows. Main reasons (rational ones – ignoring sociological reasons, organizational momentum etc) are Office (hence Google Apps), Outlook (hence Gmail etc), gaming (look for Google to support cross-OS gaming frameworks), and the long tail of Windows-only apps (these are moving to the web anyways but Google is trying to accelerate the trend with programming tools).

Browser: Not commoditized, and dominated by arch enemy! Hence Chrome is strategic, as is alliance with Mozilla, as are strong cross-browser standards that maintain low switching costs.

Bandwidth:  Dominated by wireless carriers, cable operators and telcos. Very hard for Google to dominate without massive infrastructure investment, hence Google is currently trying to commoditize/weaken via 1) more competition (WiMAX via Clearwire, free public Wi-Fi) 2) regulation (net neutrality).

Websites/search (“ad inventory”): Search is obviously dominated by Google. Google’s syndicated ads (AdSense) are dominant because Google has the highest payouts since they have the most advertisers bidding. This in turn is due largely to their hugely valuable anchor property, Google.com. Acquired Youtube to be their anchor property for video/display ads, and DoubleClick to increase their publisher display footprint. On the emerging but fast growing mobile side, presumably they bought AdMob for their publisher relationships (versus advertiser relationships where Google is already dominant). The key risks on this layer are 1) people skip the ads altogether and go straight to, say, Amazon to buy things, 2) someone like Facebook or MS uses anchor property to aggressively compete in syndicated display market.

Relationships to advertisers:  Google is dominant in non-local direct-response ads, both SMB self serve and big company serviced accounts.  They are much weaker in display. Local advertisers (which historically is half of the total ad market) is still a very underdeveloped channel – hence (I presume) the interest in acquiring Yelp.

This doesn’t mean Google will always act strategically. Obviously the company is run by humans who are fallible, emotional, subject to whims, etc. But smart business should be practiced like smart chess: you should make moves that assume your opponents will respond by optimizing their interests.

Google should open source what actually matters: their search ranking algorithm

Websites live or die based on how a small group of programmers at Google decide their sites should rank in Google’s main search results.  As the “router” of the vast majority of traffic on the internet, Google’s secret ranking algorithm is probably is the most powerful piece of software code on the planet.

Google talks a lot about openness and their commitment to open source software. What they are really doing is practicing a classic business strategy known as “commoditizing the complement“*.

Google makes 99% of their revenue by selling text ads for things like plane tickets, dvd players and malpractice lawyers. Many of these ads are syndicated to non-Google properties. But the anchor that gives Google their best “inventory” is the main search engine at Google.com.  And the secret sauce behind Google.com is the algorithm for ranking search results. If Google is really committed to openness, it is this algorithm that they need to open source.

The alleged argument against doing so is that search spammers would be able to learn from the algorithm to improve their spamming methods. This form of argument is an old argument in the security community known as “security through obscurity.” Security through obscurity is a technique generally associated with companies like Microsoft and is generally opposed as ineffective and risky by security experts. When you open source something you give the bad guys more info, but you also enlist an army of good guys to help you fight them.

Until Google open sources what really matters – their search ranking algorithm – you should dismiss all their other open-source talk as empty posturing. And millions of websites will have to continue blindly relying on a small group of anonymous engineers in charge of the secret algorithm that determines their fate.

* You can understand a large portion of technology business strategy by understanding strategies around complements. One major point: companies generally try to reduce the price of their products complements (Joel Spolsky has an excellent discussion of the topic here). If you think of the consumer as having a willingness to pay a fixed N for product A plus complementary product B, then each side is fighting for a bigger piece of the pie. This is why, for example, cable companies and content companies are constantly battling. It is also why Google wants open source operating systems to win, and for broadband to be cheap and ubiquitous. [link to full post]

Anatomy of a bad search result

In a post last week, Paul Kedrosky noted his frustration when looking for a new dishwasher using Google.  I thought it might be interesting to do some forensics to see which sites rank highly and why.

Paul started by querying Google with the phrase dishwasher reviews:

Screen shot 2009-12-18 at 11.36.20 PM

Pretty much every link on this page has an interesting story to tell about the state of the web.  I’ll just focus here on the top organic (non-sponsored) result:

http://www.consumersearch.com/dishwasher-reviews

clicking through this link takes you here:

Screen shot 2009-12-18 at 11.41.17 PM

Consumersearch is owned by About.com, which in turn is owned by the New York Times.

So how did consumersearch.com get the top organic spot? Most SEO experts I talk to (e.g. SEOMoz’s Rand Fishkin) think inbound links from a large number of domains still matter far more than other factors. One of the best tools for finding inbound links is Yahoo Site Explorer (which, sadly, is supposed to be killed soon). Using this tool, here’s one of the sites linking to the dishwasher section of Consumersearch:

http://www.whirlpooldishwasher.net/

Screen shot 2009-12-18 at 11.50.38 PM

(Yes, this site’s CSS looks scarily like my own blog – that’s because we both use a generic Wordpress template).

This site appears has two goals: 1) fool Google into thinking it’s a blog about dishwashers and 2) link to consumersearch.com.

Who owns this site?  The Whois records are private. (Supposedly the reason Google became a domain registrar a few years ago was to peer behind the domain name privacy veil and weed out sites like this.)

I spent a little time analyzing the “blog” text (it’s actually pretty funny – I encourage you to read it).  It looks like the “blog posts” are fragments from places like Wikipedia run through some obfuscator (perhaps by machine translating from English to another language and back?).  The site was impressively assembled from various sources. For example, the “comments” to the “blog entries” were extracted from Yahoo Answers:

Screen shot 2009-12-18 at 11.57.33 PM

Here is the source of this text on Yahoo Answers:

Screen shot 2009-12-18 at 11.57.58 PM

The key is to have enough dishwaster-related text to look like it’s a blog about dishwashers, while also having enough text diversity to avoid being detected by Google as duplicative or automatically generated content.

So who created this fake blog?  It could have been Consumersearch, or a “black hat” SEO consultant, or someone in an affiliate program that Consumersearch doesn’t even know. I’m not trying to imply that Consumersearch did anything wrong. The problem is systematic. When you have a multibillion dollar economy built around keywords and links, the ultimate “products” optimize for just that:  keywords and links. The incentive to create quality content diminishes.

Google’s feature creep

Microsoft used to be considered the king of feature creep.  Here was Microsoft Word when it was most cluttered:

thumb-paperclipinterference

I don’t use any of Microsoft’s software anymore, but from what I hear they’ve toned down the feature creep a lot in recent versions of Windows and Word.

Google has been adding so many new features to its results page, they are starting to feel like the new Microsoft.  Here’s an approximation of what Google used to look like (I couldn’t find an image of actual Google 1998 SRPs — anyone have one?)

bbc-google-search

And here is Google today:

Screen shot 2009-12-17 at 11.35.35 AM

Options on the left, ads on top and on the right, news results up top, images, and buttons to vote results up/down and annotate them.  But worst of all are the new scrolling “real time” results.  The static image I’ve embedded doesn’t do justice to how annoying this is. Random, out-of-context, and mostly asinine fragments of conversations scrolling by.  I think it might be Google’s Clippy.

Search and the social graph

Google has created a multibillion-dollar economy based on keywords.  We use keywords to find things and advertisers use keywords to find customers.  As Michael Arrington points out, this is leading to increasing amounts of low quality, keyword-stuffed content. The end result is a very spammy internet. (It was depressing to see Tim Armstrong cite Demand Media, a giant domain-name owner and robotic content factory, as a model for the new AOL.)

Some people hope the social web — link sharing via Twitter, Facebook etc — will save us.  Fred Wilson argues that “social beats search” because it’s harder to game people’s social graph.  Cody Brown tweeted:

On Twitter you have to ‘game’ people, not algorithms. Look how many followers @demandmedia has. A lot less then you guys: @arrington @jason

These are both sound points. Lost amid this discussion, however, is that the links people tend to share on social networks – news, blog posts, videos – are in categories Google barely makes money on. (The same point also seems lost on Rupert Murdoch and news organizations who accuse Google of profiting off their misery).

Searches related to news, blog posts, funny videos, etc. are mostly a loss leaders for Google. Google’s real business is selling ads for plane tickets, dvd players, and malpractice lawyers. (I realize this might be depressing to some internet idealists, but it’s a reality). Online advertising revenue is directly correlated with finding users who have purchasing intent. Google’s true primary competitive threats are product-related sites, especially Amazon. As it gets harder to find a washing machine on Google, people will skip search and go directly to Amazon and other product-related sites.

This is not to say that the links shared on social networks can’t be extremely valuable.  But most likely they will be valuable as critical inputs to better search-ranking algorithms. Cody’s point that it’s harder to game humans than machines is very true, but remember that Google’s algorithm was always meant to be based on human-created links. As the spammers have become more sophisticated, the good guys have come to need new mechanisms to determine which links are from trustworthy humans. Social networks might be those new mechanisms, but that doesn’t mean they’ll displace search as the primary method for navigating the web.