What are Google thinking: Part 2

Original blog post here: >What are Google thinking<

I’d like to respond to some of the comments made on hackernews and puremango about my post yesterday, and about Binggate / copygate in general.

Reply: “There was never a ‘link’ between [hiybbprqag] and the spiked sites, so Bing must have been directly copying Google”

Yes, I said “linking” but I didn’t mean it in the sense of a hyperlink (my bad) but rather a conceptual link. What I meant was: “Google spotted that Bing sometimes includes results that seemed to be copied from Google, so Google set up a honeypot – they made some made up words like [hiybbprqag] cause Google’s SERPs to link to random unrelated sites. A few weeks later, around 8% of those sites showed up on Bing for those queries.” I’ve edited the original post with this new wording.

It’s still not valid to jump from that to “Bing are copying Google”. All we can tell from that is “Bing are using information about URL and clicked pages”.

Here, in the clearest terms I can manage, are the two pictures of events:

1) Bing are copying Google

When you’re on Google, and Google alone, Bing kicks in some query string detector – based on URL, form fields, reading Google’s SERP HTML, whatever – and then waits for you to click a page. When you do, that query string is sent to Bing along with the URL of the page you clicked. That information creates a relationship between the query and the page which is then used to create Bing’s results.

2) Bing are just monitoring all clickthroughs

When you’re on any website, Bing kicks in some “What kind of page are we on” detector – based on URL, form fields, reading the HTML, whatever, and then waits for you to click a page. When you do, that data is sent to Bing along with the URL of the page you clicked. That information creates a relationship between the query and the page which is then used to create Bing’s results.

The first is reasonably a case of Bing copying Google, the second is not.

Moreover, why would Bing do that? If they wanted to copy Google, why wouldn’t they just scrape Google directly? But why would they want to copy Google? Because they’re not smart enough to create their own search engine? That’s what Google would like you to believe. I think it’s perfectly within MS’s budget to attract high-quality PhD students who know a thing or two about information retrieval. I mean, just search for [Microsoft Google defector] and you can see that a lot of Googlers used to work at MS. The talent is surely not so massively different that MS have to resort to copying?!? And if they did, why this convoluted strategy – plausible deniability, perhaps. I favour the simpler explanation.

Reply: Whatever, the main issue is that Bing Toolbar is spyware.

Firstly: the communications between Bing toolbar and Bing are encrypted, which hints to me that MS actually do care about protecting your data – the only reason to encrypt it is to ensure that malicious third parties can’t sniff your search data. That’s circumstantial, and could alternatively be read as “They don’t want anyone to know what info they’re collecting” I concede. Secondly: if it was the main issue, why isn’t it the main issue? Thirdly: Users accepted the ‘spying’ in the EULA. Now I know that no-one reads these and that it’s not a nice tactic for companies to hide behind that kind of get-out clause, so, Fourthly: Everyone spies on users. I recorded your browser and referer via Google Analytics when you visited this blog. All your web based email is being ‘spied’ on to create better adverts. This is the trade we make in exchange for free amazing web services. If you want privacy, use DDG.

Reply: It should be “Google is” not “Google are”

I’m English, that’s how we roll.

Reply: The bigger issue is that if this continues, Bing will end up as a copy of Google

Sure, for obscure queries, and perhaps the larger trend that Matt Cutts & co noticed is down to this. So Bing will likely change it. Harry Shum actually said “this is a new kind of clickfraud”. I’d be massively surprised if Bing don’t change the way their “ClickRank” system works (my shorthand word meaning “what I think Bing’s doing”). Just like if Googlebombing had continued, Google would have ended up full of spam. This is a non-issue, it won’t happen. As many have pointed out “It would have been smart to exclude Google from the start”. Yes it would, Bing didn’t anticipate that Google would have such an apparently huge effect on them, and much less that Google would handle this so immaturely.

Also, Google ‘copy’ snippets of webpages, they ‘copy’ images, they ‘copy’ entire websites for cache and for instant previews. They ‘spy’ on users for autosuggest. No-one has a problem with any of this. Because that’s how the web works. To get ahead, you have to start looking at more invasive signals. My own research on adaptive websites focused on tracking clicks, scrolling, word highlighting etc while a user is on a page. This is useful data which can be used to produce a better product if it’s handled sensitively.

I’m not saying “Bing do not have code in there to copy from Google directly”. They may do, they may be absolutely as desperate as Google paint them to be, but for that to be true, a lot of non-obvious, unsupported things (such as “it only happens on google.com”) have to also be true. Occam’s razor suggests we prefer the simpler explanation until further evidence arises.

Update: Bing have made an official response to Google. Here’s my response to that: What Bing Should Have Said To Google.

  1. #1 by Mary Branscombe on February 2, 2011 - 4:52 am

    Just remember: all companies IS singular ;-)

    • #2 by Howard Yeend on February 2, 2011 - 6:08 am

      Not in Britain though. In Britain all companies are plural. Although the BBC coverage contains phrases like “Bing is…”, so perhaps that usage is soon to be deprecated.

      • #3 by Peter on February 3, 2011 - 10:46 am

        If “Bing are copying Google” then why “Bing kicks in some query string detector” and not “Bing kick in…”?

        (Not to turn this into the Language Log, and not to be a pedant, I’m genuinely curious.)

        • #4 by Howard Yeend on February 9, 2011 - 1:49 pm

          I was talking about the search engine software when I said “Bing kicks in….”, rather than the company ;)

  2. #5 by Luigi Montanez on February 2, 2011 - 10:06 am

    So isn’t Google right? Google wants Bing to stop taking their results from clickstream data, and that’s what you think Bing will likely do.

  3. #6 by Daniel van Soest on February 3, 2011 - 1:58 am

    As a MSFT IT Pro Evangelist I sincerely thank you for your excellent clarification in this matter without losing skepticism!
    Much better, from a technical point of view, than the BING answer: http://www.bing.com/community/site_blogs/b/search/archive/2011/02/02/setting-the-record-straight.aspx

    Keep on rolling,
    Daniel van Soest

  4. #7 by Jo on February 3, 2011 - 10:21 am

    ” I think it’s perfectly within MS’s budget to attract high-quality PhD students who know a thing or two about information retrieval.”

    Microsoft is a PhD black hole. Name one innovative thing that company did in its history beside httprequest. Their engineers may be fantastic, but their entire business ethos is not to get into a market segment until its large. Then get into it by ripping off the current market leader and selling it for cheaper. Only thing is now Google caught them in their own crappy game, how do ‘sell for cheaper’ when the market leader is giving away their product ?

  5. #8 by Rhys on February 3, 2011 - 10:33 am

    I agree with your points in the post. I don’t think it’s a black and white issue in any way, though. I think it’s a good idea to use clicks as a ranking signal, and I’m uncertain whether I think Google should be specifically excluded from the data Bing uses as a matter of “fairness” or “competition.”

    One thing a friend and I discussed that did make me unhappy was the fact that this experiment proves that Bing using the clickstream data from Google allows Bing to piggyback off of Google’s proprietary technology. In particular we thought of Google’s spell corrector. Bing doesn’t have one, or at least not one nearly so good.

    So while I don’t see it as stealing and I think the issue is being misrepresented, or inflated, somewhat, I do think there are concerns to be addressed here. Like with lots of new tech!

  6. #9 by Mark on February 3, 2011 - 12:09 pm

    MS advertises Bing as a better way to do search. MS’ definition of search quality is different from Google’s in that Google defines the problem as returning the best result based on information that has been published on the web. MS defines it as published information plus “mechanical turk” results that can enlist MS customers without their knowledge. It’s not unethical to define the problem the MS way, but they should have incorporated a filter to include mechanical turk results from Bing searches only. Including Google’s click results while advertising that their methodology is superior is cheating.

  7. #10 by sh0ck_wave on February 5, 2011 - 6:44 am

    MS has revelaed that it ises bing toolbar to create something they call the search signal.Which is basically data on what users search for and which links they subsequently select.
    63% of searches are made on google…so data from search signal is basically made up of what people search on google and what they subsequently click.

    I agree that it is probably a not the most important factor in page ranking but in cases of rare terms with few links and for which bings page is very weak this factor can push links generated form other search engines that users click the most up the rank and make them appear on bing page.

    I belive bing should allow search engines the freedom to block data about user interaction with their sites from being sent by bing toolbar.

