In a post last week, Paul Kedrosky noted his frustration when looking for a new dishwasher using Google. I thought it might be interesting to do some forensics to see which sites rank highly and why.
Paul started by querying Google with the phrase dishwasher reviews:
Pretty much every link on this page has an interesting story to tell about the state of the web. I’ll just focus here on the top organic (non-sponsored) result:
http://www.consumersearch.com/dishwasher-reviews
clicking through this link takes you here:
Consumersearch is owned by About.com, which in turn is owned by the New York Times.
So how did consumersearch.com get the top organic spot? Most SEO experts I talk to (e.g. SEOMoz’s Rand Fishkin) think inbound links from a large number of domains still matter far more than other factors. One of the best tools for finding inbound links is Yahoo Site Explorer (which, sadly, is supposed to be killed soon). Using this tool, here’s one of the sites linking to the dishwasher section of Consumersearch:
http://www.whirlpooldishwasher.net/
(Yes, this site’s CSS looks scarily like my own blog – that’s because we both use a generic Wordpress template).
This site appears has two goals: 1) fool Google into thinking it’s a blog about dishwashers and 2) link to consumersearch.com.
Who owns this site? The Whois records are private. (Supposedly the reason Google became a domain registrar a few years ago was to peer behind the domain name privacy veil and weed out sites like this.)
I spent a little time analyzing the “blog” text (it’s actually pretty funny – I encourage you to read it). It looks like the “blog posts” are fragments from places like Wikipedia run through some obfuscator (perhaps by machine translating from English to another language and back?). The site was impressively assembled from various sources. For example, the “comments” to the “blog entries” were extracted from Yahoo Answers:

Here is the source of this text on Yahoo Answers:

The key is to have enough dishwaster-related text to look like it’s a blog about dishwashers, while also having enough text diversity to avoid being detected by Google as duplicative or automatically generated content.
So who created this fake blog? It could have been Consumersearch, or a “black hat” SEO consultant, or someone in an affiliate program that Consumersearch doesn’t even know. I’m not trying to imply that Consumersearch did anything wrong. The problem is systematic. When you have a multibillion dollar economy built around keywords and links, the ultimate “products” optimize for just that: keywords and links. The incentive to create quality content diminishes.
Related posts:



View Comments ↓
What's the solution? Is it using the social graph, or do you think there are other ways to combat this?
Re social graph: what I worry about is I don't think the social graph has a lot of dishwasher related links. Mostly news and funny videos. More likely to link to a dishwasher exploding on video than a great dishwasher review site.
The solution is “it depends”…but for an example like this, IMHO the real solution is sales data…if there was a way to track the information/data that lead to actual sales, then that would be the way to rank the search results for 'products'…
As far as the social graph for these sorts of thing goes, I think it's more about your social graph ranking/rating results…ie. Show me the 'dishwasher reviews' results that my social graph has ranked highest first (if there are any of course)…
Adding a human element (much like Hunch does by the way) is probably key to ensuring some level of quality and authenticity…
again just my humble opinion though
Social bookmarking seems like it could be a good solution, although you might be able to game that too. Maybe a pseudo-technical solution like Mechanical Turk and Google Image Labeler would be good for this. I know Google already does some manual search quality stuff, but it appears they need to do just a tiny bit more.
[...] This post was mentioned on Twitter by chris dixon and PEG, DealHorizon.com. DealHorizon.com said: #Venture Blogs: Chris Dixon published Anatomy of a bad search result @ http://cdixon.org/2009/12/19/anatomy-of-a-bad-search-result/ [...]
great post and great conclusion – for as long as gaming Google is profitable, people are going to do it.
Nice bit of spade work and an eye-opener. Thanks for taking the trouble.
Great post, great example. I agree with an earlier comment: ultimate having sales (or some sort of relevant performance) data will really benefit the search result. Google is not really that far from getting to the point — as they grow the footprint of Google Analytics, they can track a lot of user behaviors after they search for certain terms and click. One way to put this: “behavior tracking” can really complement search result. But, I'm not sure how far Google is willing to go on that path.
On a different note, I actually like consumersearch.com as a site. They provide a good summary of other sites' reviews, which save me a lot of time as a consumer to research and compare.
BTW – I'm not criticizing consumersearch.com's site, or even their SEO practices (I'll assume it was an affiliate or something who built that site). I am much more interested in this as an example of what's wrong with search today.
This is a perfect example of why the web is always an open playing field. New technologies that haven't been “figured out” by SEO specialists will rise when there are a paucity of good organic Google results on the first page. But it won't just be Google… …facebook, twitter. The latest and greatest need to give way for the new latest and the new greatest.
Great post.
Nice post Chris. Also, there was a good discussion over at Hacker News about this about a week ago: http://news.ycombinator.com/item?id=993271.
Chris have you checked out Answeroil from prismastar? It's my ideal shopping experience, and I'd like it to cover the entire web of products. (I wanted our semantic search/ad boxes at victusmedia to lead folks to one as a high functioning closing sales page)
Basically it takes a list of features that you determine are most important to your shopping criteria. Then slide some bars for priority and whala you have a list of products that fall into your “feature hot spot”.
Go look up customer reviews to make sure there are no glaring holes and pick one up.
i have reported them to google
hmm, don't think that's really the solution.
yeah, it is good news for startups I suppose.
I think Google ignores the problem because the vast majority of the optimized pages have google's ad sense program. As I said on Paul's original post, this is how Google can make money on both paid results and natural results – all paths lead to a google clcik.
It is and will be a game. History and recent events tell us that everything what we do invent or create or manufacture is not perfect. There is no perfect PageRank algorythm. There is no perfect SEO strategy. Yes, they have strategy and tactics, but Google pulls out a counter attack on that. And the loop begins again.
As Google keeps its secret sauce recipe to it self for all the reasons, we can only speculate who much brain power is behind PageRank, and how big or small are the timeframe from wave to wave of White and Black Hat SEO guys to get sites higher.
Kind of waves onto the beach, the the SEO guys want to see as much crap as possible stranding onto the beach.
This game will never end as long as we have no considerable computing power so that Google detects patterns very early in its beginning and is seeing the big picture of traffic (say Google DNS, Web history) on the network and our behavior.
We as humans are the most advances AI, so to speak, which detects fraud. You (Chris) detected very early reading that blog about dishwashers, that it was wonky and fraud, thus leaving the site very early. Now, this can Google read (maybe) in the future (say Google Chrome, ChromeOS, Google Open DNS) and feed back this into its alert system and eventually kicking the links from the blog out of its page rank.
Learning from our behavior is the greatest resource and mission asset Google has to pursue in its goal to collect all the worlds information and make it searchable. Thus ranking search results to our personal preference, location, time, search history, how educated you are, how old you are and so forth. All these levers make for a better search experience.
We somehow depend on Google to sift though the abundance of information (what a great business Google has, it found a need and created a need for accuracy).
And Google depends on us (balancing privacy concerns) to learn from us and feed it back into the algorithm and teaching its computers.
This comment could have been easily a short post.
I desperately want the ability to blacklist sites. I can hide results, but I want Google to learn that and stop showing me other results from the same sites.
The problem is that any system can be gamed. Google will change the rules and black hat SEOs will reverse engineer them and build new sites that exploit the new rules.
Chris, have you tried clicking “show options” above the search results and then selecting the “Fewer shopping sites” link?
I hadn't until you pointed it out.
But doing so highlights something I've been thinking about since I posted this and I'm somewhat surprised no one brought up: What would be a good search result for “dishwasher reviews”? I'm not sure I can find a site that should have come up. Which makes me wonder: is it a problem with search, or with the content out there on the web for certain topics like this?
I ended up buying a Bosch based on that blog post. But, I bought it from Sears instead. I got a call that certain Bosch dishwashers were recalled, so I need to followup up on that. But, all in all, I'm happy with the Bosch. I had a Kitchenaid before, and the extra clean slot broke so I couldn't use it. But kitchenaid was good too for the price.
[...] Read the rest of this post on the original site [...]
It's all the online advertising money that make the people to create such websites. And the worse is these people copy from some one else's blog with even giving credit to them!
You can thank Datapresser for most of the SEO spam wordpress blogs…
i'm a big fan of the answeroil engine too. well worth checking out.
demo: http://www.jessops.com/ – click camera selector from horizontal menu.
Information = Data in Context
Context = Organized Data
Learning = Self Organization of Data (builds Context)
Meaning = Augmentation of Data with [my]Data (from learned organization)
Any system build on these defs would have just dropped that post. Reading it makes no sense in context of a review. Which is an abstract pattern/context (description, comparison, features, price diffs … ). Problem is that Google and no other search engine is build on top of an engine which can build its own abstracts.
My test is. It can the system learn and build an abstract for something like “all” by itself, no programming.
Even if one would forgo the abstract pattern and just do augmentation, which is a form of decomposition in this context, it would have lead to many concepts which make no sense.
The main problem I see here is the use of simple keywords by Google, instead of decomposing words in context. Which requires learning if one does it right.
Perhaps, but you have to assume that long term that's a losing strategy. They want searchers to be happy.
There is a vast underbelly of the Web — huge e-marketing confabulations whose entire purpose is to build blogsites like this ad infinitum, ad absurdam, ad nauseam. They use the unemployed to write SEO-primed “articles” of 250-450 words for $2 to $4 apiece. (You can find crap jobs like this choking Craigslist, and websites like Textbroker.com trade on it.) Of course, at that rate of pay, you get cut-and-pasted word salad like the above. I'm certain that more efficient operations simply program 'bots to mine text from all over the web. It's just a logical advance from spam emails — indeed, about a couple of years ago I got tons of spam that didn't advertise a single product or service except for a tiny link; the remainder was huge blocks of text scrambled up from well-known Victorian novels. I assume these were test runs to figure out what could evade spam-filters. None of this, of course, sells product or provides information. But marketers can point to increased click-throughs or linkbacks that justify their fees. In short, it's a titanic racket.
Or is it a problem with the query? What (and where) would be a good search for dishwasher reviews? It's easy to pick on results for high funnel terms but the better (more specific) the query the better the results set gets. “24″ dishwasher reviews” gets a pretty nice SERP (+ a nice ad from Sears). Also going to Amazon and searching dishwashers by user rating seems to be a better option. I'm not calling “user error” – people search how they search – but often when we don't get helpful results we have our query skills to blame
[...] enumerated in many blog posts, the gaming of search with cookie-cutter content with little real meaning or utility in order to [...]
To build a little on what David Schneider said, Amazon's Turk is a pretty big player in this field. I researched the Turk service for several months and couldn't believe the abundance of these “article writing” jobs available. I'm pretty sure Amazon is fully aware of the role they're playing, but just looking the other way.
[...] Anatomy of a bad search result (cdixon.org) [...]
[...] Anatomy of a bad search result [...]
Good sleuthing. You just knocked down my trust in consumersearch.com . Used to love that site.
Great analysis. I don't think it's a bad example. Isn't it the core of SEO: keywords and links?
What would be a good search result for “dishwasher reviews”?
- Jason Calacanis talked about it, why he started mahalo.com.
He showed MSFT and Google search results and curated search results in a blind ABC testing. And all investors said, C (curated) was best.
Sure, this can be done only with the most popular searches, 3-5 million!? You can't scale curation like stacking up CPUs.
And on a side note, reading @jonathanmendez mentioning Amazon, yes, Amazon is looking at its user data too, thus coming up with recommendations.
[...] 27.12.2009 / 10pm CET (Link) Comment on ‘Anatomy of a bad search result’ by Chris Dixon. It is and will be a game. [...]
Another micro perspective: I have Google Alerts set for all our domain names. We registered 2 domains about 10 years ago, and in the past couple of years, the names have become popular key phrases. For about the past year, about 1/3 – 1/2 of the Alert results lead to a fake (gibberish) blog, affiliate spam blog, paid link post, scrapper blog, or some combination of these. At best, the content is really poor.
Big hint: most of the blog posts are written by “admin”.
“We as humans are the most advanced AI, so to speak, which detects fraud.”
It just doesn't make sense to refer to humans as 'the most advanced AI'. That defies the very meaning of the term.
You're right that it's a game: this problem is entirely social, and because of that, I don't think it will end, however smart the tools get. You can think of it as the problem of building a model of what a 'good' dishwasher result is.
The modeling tools for forgery get smart just as quickly as the tools for detection – they're the same problems! If the answer to that is, “We should just let Google have all the good tools”… well, that's not going to happen.
Where this is going, of course, is that eventually fake dishwasher review sites will be composed of coherent English dishwasher reviews that are indistinguishable from real reviews in every possible way. Except that they're fake.
The problem is social; it's that there's someone out there who is willing to lie and deceive us to make a buck.
But that's a *really* hard problem to solve, so I guess we're left with technical band-aids that will, eventually, peel off.
I think that's the wrong question.
No one really wants a good dishwasher review. And they'll never get one, either, because a review is too easy to fake.
What we all really want is a good _dishwasher_, and word of mouth review is the tried and true method that people use to approximately measure how good a dishwasher is.
“What's a good dishwasher? I dunno… 'Hey Bob, how's your dishwasher treating you?'”
I think we'll either fall back on real word of mouth (which is *incredibly* difficult to fake, unless you can clone my friends) or we'll solve the problem at a deeper level by mashing up a service that queries the Maytag repair ticket database
It's a problem that really is begging us to cut out the middleman – it just needs a few companies who decide that they are going to be transparent and honest as part of their customer service strategy.
“We as humans are the most advanced AI, so to speak, which detects fraud.”
It just doesn't make sense to refer to humans as 'the most advanced AI'. That defies the very meaning of the term.
You're right that it's a game: this problem is entirely social, and because of that, I don't think it will end, however smart the tools get. You can think of it as the problem of building a model of what a 'good' dishwasher result is.
The modeling tools for forgery get smart just as quickly as the tools for detection – they're the same problems! If the answer to that is, “We should just let Google have all the good tools”… well, that's not going to happen.
Where this is going, of course, is that eventually fake dishwasher review sites will be composed of coherent English dishwasher reviews that are indistinguishable from real reviews in every possible way. Except that they're fake.
The problem is social; it's that there's someone out there who is willing to lie and deceive us to make a buck.
But that's a *really* hard problem to solve, so I guess we're left with technical band-aids that will, eventually, peel off.
I think that's the wrong question.
No one really wants a good dishwasher review. And they'll never get one, either, because a review is too easy to fake.
What we all really want is a good _dishwasher_, and word of mouth review is the tried and true method that people use to approximately measure how good a dishwasher is.
“What's a good dishwasher? I dunno… 'Hey Bob, how's your dishwasher treating you?'”
I think we'll either fall back on real word of mouth (which is *incredibly* difficult to fake, unless you can clone my friends) or we'll solve the problem at a deeper level by mashing up a service that queries the Maytag repair ticket database
It's a problem that really is begging us to cut out the middleman – it just needs a few companies who decide that they are going to be transparent and honest as part of their customer service strategy.
[...] Anatomy of a bad search result cdixon.org – chris dixon’s blog [...]
[...] Anatomy of a bad search result. [...]