Please see update at bottom
Most websites spend massive amounts of time and money to get any of their pages index and ranked by Google’s search engine. Indeed, there is a entire billion dollar industry (SEO) devoted to helping companies get their content indexed and ranked.
Twitter and Facebook have decided to disallow Google from indexing 99.9% of their content. Twitter won’t let Google index tweets and Facebook won’t let Google index status updates and most other user and brand generated content. In Facebook’s case this makes sense for content that users have designated as non-public. In Twitter’s case, the vast majority of the blocked content is designated by users as public. Furthermore, Twitter’s own search function rarely works for tweets older than a week (from Twitter’s search documentation, they return “6-9 days of Tweets”).
There is a debate going today in the tech world: Facebook and Twitter are upset that Google won’t highly rank the 0.1% of their content they make indexable. Facebook and Twitter even created something they call the “Don’t be evil” toolbar that reranks Google search results the way they’d like them to be ranked. The clear implication is that Google is violating their famous credo and acting “evil”.
The vast majority of websites would dream of having the problem of being able to block Google from 99.9% of their content and have the remaining 0.1% rank at the top of results. What would be best for users – and least “evil” – would be to let all public content get indexed and have Google rank that content “fairly” without favoring their own content. Facebook and Twitter are right about Google’s rankings, but Google is right about Facebook and Twitter blocking public content from being indexed.
Update: after posting this I got a bunch of emails, tweets and comments telling me that Twitter does in fact allow Google to index all their tweets, and that any missing tweets are the fault of Google, not Twitter. A few people suggested that without firehose access Google can’t be expected to index all tweets. At any rate, I think the “Why aren’t all tweets indexed?” issue is more nuanced than I argued above.