In Soviet Russia, Google Researches You!

Martin Feuz, Matthew Fuller and Felix Stadler have a very clever paper in a recent edition of First Monday, titled “Personal Web searching in the age of semantic capitalism: Diagnosing the mechanisms of personalization.” In their study, they create three artificial search profiles on Google based on the topics of interest to three different philosophers (Foucault, Nietzsche and Kant, using terms from the indices of their books) and compare the results these personalized profiles receive to the results an “anonymous” profile – i.e., one without Google’s Web History service turned on – receives.

They see a very high degree of personalization – personalized search results appear in 50% of search queries for some of their profiles – and in the intensity of personalization – in some cases, 64% of results are different in content or rank from an anonymous profile. While there’s apparently lots of personalization going on, and personalized results emerge early in the training process, the authors don’t see the search algorithms reaching deep into the “long tail” of content. When personalized results differ from anonymous search results, 37% of the novel results can be found on the second page of anonymous results, while only 7% of novel results are found between results 100 – 1000 and 13% beyond result 1000. Finally, they are able to demonstrate that personalization is probably not based solely on the content an individual has searched for in the past – they see ample evidence that content on social networking is being heavily personalized for Nietzsche based only on his searches for power, morality and will, for instance.

That last example gives a bit of the flavor for the paper – it’s both a serious and methodologically defensible piece of research as well as a clever prank, demonstrating that Google will try to assign Immanuel Kant to a psycho/demographic group and target content based on those assumptions. This playful tone is accompanied by a willful naÃ¯vety that’s slightly frustrating – they start by taking Google’s descriptions of the effects of personalization at face value then offer the surprise that the hypotheses they derive from Google’s PR are invalidated. It’s not especially surprising for the reader, however, to discover that Google’s personalization is at least as much about helping advertisers target audiences as it is about helping users find the best possible content. It’s not a surprise to the authors, either – the term “semantic capitalism“, credited to Cristophe Bruno, implies that we’ve entered a world where words have market prices, with potentially different values to advertisers as to audiences.

While I find the levels of personalization the authors detect to be fascinating, I wonder whether their experiment correctly isolates the factors involved with personalization. Eli Pariser, in his talk last year at PDF and, presumably in his forthcoming book on the power and dangers of personalization, refers to 57 factors that allow Google to personalize results for users who are not using Web History (the “anonymous” users in this experiment.) The authors control for a key variable, conducting all searches from IP addresses in Central London. It’s unclear, though, whether Google is making other extrapolations – perhaps users who execute lots of searches at 3pm are more likely to be middle-aged businessmen than teenage girls, and results are targeted as a result? I’d be very interested to see the authors check to see if their anonymous search results are identical or nearly so – if not, there may be a great deal more personalization going on then they are accounting for outside of the experiment’s parameters.

I was struck by the apparent discontinuities in how often personalized search results appeared for the three different profiles. In one training session, there’s a sharp spike in personalization between sessions, a test of results where personalization appears three times as often as in other sessions. In another, there are two, smaller spikes, and in the third, a three-session long spike. With no easy way to explain what’s causing these spikes, it’s possible to speculate that Google’s algorithms for personalization are not just opaque and complex, but adaptive and changing. While the authors are experimenting with Google, it’s reasonable to assume that Google is experimenting with them, changing levels of personalization to see whether Google is able to achieve its desired result: clicks on ads.

I found the authors’ findings about the long tail particularly fascinating, though I’d frame them slightly differently than they do. They see the fact that most personalized results (results that differ between a query from a profiled and an anonymous user) that appear in the top 10 come from the top 100 results delivered to anonymous users as evidence that Google’s personalization is pretty shallow. I see the finding that 13% of personalized results in the top 10 come from outside of the top 1000 as downright remarkable – I’d thought that Google’s algorithm, both in terms of page rank and term relevancy, would resist such large reshufflings of the deck, bringing up pages considered irrelevant for an “anonymous” user to prominence for a profiled user. I see that finding as quite encouraging – even buried deep in the slag heap of low pagerank and low relevancy, personalization might occasionally bring a long-tail web page to the surface.

Of course, there’s another explanation: again, Google’s testing the experimenters as they’re testing the system. Google’s long said that they present different results to users as a way of testing result relevance – if a long-tail page appears in results and is widely clicked, perhaps it’s time to weight it more heavily or to tweak the algorithms that buried it in the first place.

This is the core problem of studying a system like Google. As the authors acknowledge, “How can we study a distributed machinery that is both wilfully opaque and highly dynamic? One which reacts to being studied and takes active steps to prevent such studies from being conducted on the automated, large-scale level required?” That second question is a reference to a methodological challenge the authors had – it’s deeply atypical behavior to click on every possible results page for a search query, which the authors needed to do, and Google periodically blocked their IPs for suspicion that they were bots attempting to scrape or game the search engine.

Google is not willfully opaque just out of spite or a desire to protect its secrets from Microsoft or other search engine builders. The sort of work their authors are conducting is exactly the sort of work search engine “optimizers” do by attempting to help their clients achieve a higher ranking in Google’s results. Were Google’s methods of personalization easy to understand, we would expect SEOfolk to take advantage of their newfound knowledge, as we’d expect them to use any knowledge about Google’s ranking algorithms. The more transparent those algorithms are, generally speaking, the more likely they are to be gamed, and the more gaming occurs, the less useful Google is for most users.

I wonder if there’s a provocative hypothesis the authors haven’t considered in analyzing the behaviors they saw – Google offers different results with a high frequency, in part because they’re trying to obfuscate their algorithms. The faster you poll the engine, the more variability you get, making it harder to profile the engine’s behavior. We can discard this hypothesis if the authors checked results of their anonymous searches against one another and got highly similar results – if not, then it’s possible that some of the hidden variables Eli Pariser talks about are in play… or that there’s an inherent amount of noise in the system, either for purposes of obfuscation or for allowing Google to try A/B tests with live users.

Researchers want to understand how Google works because it’s probably the most important node (at least at the moment) in our online information ecosystem. Whether we’re interested in driving attention or revenue, what Google points us towards becomes more powerful. But the better we understand Google, the more likely we are to break it. Security through obscurity is a dreadful strategy, but I’m hard-pressed to offer a better answer to Google for how they can prevent their engine from being gamed.

Deep in Feuz, Fuller and Stadler’s paper is the sense that there’s something unheimlich about the idea that something as important an influencer as Google being as mercurial as it is. Personalization is disturbing to the extent to which it separates us from the real, true, stable search results, the ur-results Google is withholding from us in the hopes of selling us ads for effectively… but even more disturbing is the idea that there’s no solid ground, no single set of best results Google could deliver, even if it wanted to.

3 thoughts on “In Soviet Russia, Google Researches You!”

Jennifer Cobb March 25, 2011 at 7:01 pm

Great summary. Thank you. I wonder if I actually click on an ad as a result of a google search, will that data be baked more deeply into my profile? If so, and we all start to catch on, we will all avoid clicking on ads. Ever. Another excellent paradox. I generally do avoid ad clicking, even now. I always have this slightly creepy feeling of being watched when I click on something that will make google happy.
mike russell March 25, 2011 at 11:48 pm

Well written, too bad there is no algorithm for that except the subjective opinion of human readers. Perhaps, that is another reason for the ‘noise’, to find what we deem quality or relivant, & mine our collective consciousness yet again.
Martin Feuz (author of paper) March 26, 2011 at 11:45 am

Hi Ethan

many thanks for your very interesting, favourable and tightly argued review of our paper. We think you raise important and valuable points.

On your hypothesis whether Google was experimenting with us: Certainly an interesting and valid question.
However, I did check the consistency of the search results for the anonymous user over the course of the testing-session and to a very high degree, the search results remain consistent, both in terms of content and rank.
Of course, some of the test-search queries, such as “video, software, blog, travel” are situated within highly dynamic content domains. Thus some level of underlying shifts in the index can be expected. But variation was mostly restricted to the bottom end of search results for these queries.

While I can only speculate, but given the idiosyncratic character of the philosopher profiles and corresponding search queries we used to generate the profiles, it wouldn’t surprise me that a more “natural” google search usage profile would be exposed to even more aggressive degrees of personalisation than those we saw for the philospher profiles.

Best,
Martin

Comments are closed.