By now you’ve no doubt smelled the shitstorm surrounding Google’s allegation that Bing are ‘cheating’ and copying Google’s search results. If not, in brief, Google spotted that Bing sometimes includes results that seemed to be copied from Google, so Google set up a honeypot â€“ they made some made up words like [hiybbprqag or juegosdeben1ogrande] cause Googleâ€™s SERPs to link to random unrelated sites. A few weeks later, around 8% of those sites showed up on Bing for those queries.
In what is being dubbed by many as “BingGate”, Google then leaked a great story onto searchengineland.com who ran with the headline Google: Bing is Cheating, Copying Our Search Results. This on the very same day that the Farsight 2011 event was held – a discussion about the future of search between Matt Cutts (Head of Webspam at Google), Harry Shum (Bing Corporate Vice President) and Rich Skrenta (CEO of Blekko) – great timing Google. Classy.
When I read about the story on hackernews, I pretty much immediately saw what was going on – Bing are using click data from everywhere to improve their results. Now I’m no MS evangelist. I run OSX, had gmail when it was still 6 invites per user, yada yada. (heck I run adsense and analytics on my blog. I love Google, which is why I’m massively disappointed to see this kind of behaviour from them!)
Here’s a ridiculously simplified version of how search engines used to work: If Page A contains the word ‘candy’ then it’s about candy. This is very simple to game. Google came up with the following innovation, which they call PageRank: If Page A links to Page B with the phrase “candy”, that’s a good indicator that Page B is about candy. Especially if Page A is a trusted site (i.e. with a lot of links to it). Today, many ‘signals’ are used to rank pages. Google recently started using information about how fast a site is as a signal.
Bing have added click information as a signal. So if you’re on Page A and you click on the candy link to Page B, that’s more important than a link that says “automobile” which links to Page C. On Google’s PageRank model, Page B would be as much about candy as Page C is about automobiles. But with Bing’s click signal, they can tell that Page B is more relevant to candy than Page C is relevant to automobile. Same number of inbound links, but different number of user clicks. Taken in aggregate across all users of the Bing toolbar, I imagine this will turn out to be a very useful signal.
Google is a big website. Lots of people click on it. So it makes some sense that Google’s influence on the click-data can be measured. And if Google were to construct an artificial test where they’re the only signal – like they did with the ‘synthetic queries’ – then it stands to reason that Bing would appear to ‘copy’ Google’s data. Because only Google have data on those queries. They’re made up.
That’s the situation as I perceive it, as I perceived it within an hour of reading about it. Let’s revisit Google’s actions. They hit the red button big-time. Searchengineland blogpost, mattcutts’s twitter, conference hijack and now an official Google blogpost.
At first I thought there could be three possible explanations to Google’s handling of this situation:
1) They genuinely thought that Bing were maliciously copying their results, and didn’t stop to think of another (imho more obvious) explanation; that Bing just collect all kinds of click data, including *but not limited to* clicks on Google.
2) They thought it would be a massive PR win, and who cares about the truth.
3) They can’t control what their own people say about them in public.
AnyÂ one of those is bad. 1) is bad because I don’t even work in search and I could spot the real situation. So did many people on hackernews, techcrunch, pretty much anywhere you read about this story. Google are pretty dumb if they didn’t spot this – which seems unlikely. But if true is worrying.
2) is bad because, damn that’s an evil marketing tactic.
3) is bad because it’s worrying if you can let one of your top public-facing engineers go to the media with the kind of story that promotes the headlines we’ve seen without any kind of internal oversight.
Option one I just refuse to believe. Google are not dumb. Perhaps they got carried away with their own epic tale of how they caught Microsoft red-handed. Perhaps. Option two is a shocking tactic and *really* doesn’t seem like the kind of thing we’ve come to expect from Google. Option three is looking less likely. Matt’s not a maverick, and Google just echoed the story on the official blog.
So option four. Matt did pump the Bing-gate story up without Google’s consent, and now they’ve no option but to take responsibility to avoid seeming like they’ve lost control. Sounds a bit Soviet to me. Indicative of insecurity? It’s no secret that Bing has been taking market share from Google at astonishing pace. What’s going on Google?
Update: great discussion about this post on hackernews. Thanks for the positive comments guys, much appreciated.
I’ve replied to some common criticisms in my followup post: What are Google thinking – part 2.