Data Backup Digest

Do-It-Yourself Windows File Recovery Software: A Comparison

results »

Are Companies Storing Too Much of Our Personal Data?

Perhaps oddly, it’s considered standard that large businesses store huge amounts of data about us. Turn back the clock twenty years and it’d be a work of fiction that we’d willing handover our names, addresses, phone numbers, search history, and more to large corporations. Now that’s just reality.

Many services are built on the fact that consumers give over their data to use them. If you’ve ever wondered why a handy online service remains free, consider the fact that you’re probably the product. The information that you’ve used to sign up with is valuable. And the assumption is that the reason that some of these products continue to be good, like the personalised result of search engines, is because they learn from your personal information.

But is that a correct assumption? A new working paper from Lesley Chiou of Occidental College and Catherine Tucker of MIT has been released which challenges that. The pair suggest that the trade of data for service isn’t always necessary. They studied the effects of the EU privacy regulations and tried to measure if the anonymization and de-identification of user data hurts the quality of results in search engines.

All the popular search engines capture data that allows them to identify users across a session. One of these is IP address, but if you have an account with that provider then it makes that tracking even easier. The theory being tested here is whether this data allows these engines to improve their algorithms, better the search results for users, and get them to their destination quicker.

The study looked at how results from Yahoo and Bing were impacted by the European Commission’s change in rules on data retention. In 2008 they recommended that search engines reduced the period that they kept user data. Yahoo then anonymised user data after 90 days and Microsoft deleting IP addresses associated with searches after six months. Yahoo reversed this in 2011, changing 90 days to 18 months.

UK search history was then studied before and after the changes. The researchers studied the number of repeat searches to get a sense of dissatisfaction with the results. They found there was no statistically significant difference at each stage of the data retention policy changes. Although they noted that other studies have found different conclusions, they decided that “the cost of privacy may be lower than currently perceived”.

Of course, Microsoft’s own research has found that user data can yield better search results. And that’s probably true – if we’re searching similar things or topics, looking at previous data can probably help get to the most relevant result quicker. But the real question comes in how much data is collected and for how long it’s stored.

Chiou and Tucker say that while there can be advantages for search engines to collect our data, not least giving them a competitive advantage, we should also be sceptical. Old data may be less valuable to get results than fresher data; also, some searches are so uncommon that collecting suitable data on these isn’t worthwhile or feasible, even for larger companies.


No comments yet. Sign in to add the first!