Study shows problems with anonymizing data to improve security

With an increasing number of organizations looking to leverage so-called big data comes increasing risk that the datasets created could be misused by internal staff or hackers.

To combat this, many companies try to “anonymize” the data by making the numbers vague — in addition to deleting names, they also add “binning,” which creates discrete bins that correspond to a range of values and assign the records to those bins. That might change the time and location of a retail purchase from a day into a week, and a store location into a general region.

But according to a report from researchers at the Massachusetts Institute of Technology (MIT), who examined three months of credit card transactions from an unnamed source, “binning” may not be enough to hide the identity of people in the data.

The researchers, who published their results in the latest issue of Science magazine, found that four dates and locations of recent purchases are all that is needed to identity 90 per cent of people making the purchases. If price information is included, then only three transactions are necessary.

The study used anonymized data on 1.1 million people and transactions at 10,000 stores.The bank had stripped away names, credit card numbers, shop addresses, and even the exact times of the transactions, said the magazine’s synopsis. All that was left were the metadata: amounts spent, shop type — a restaurant, gym, or grocery store, for example — and a code representing each person. More than 40 per cent of the people could be identified with just two data points, it says, while five purchases identified nearly everyone.

How? By correlating the data with outside information. First, researchers pulled random observations about each individual in the data: information equivalent to a single time-stamped photo. These clues were simulated, the report says, but people generate the real-world equivalent of this information day in and day out, through geo-located tweets or mobile phone apps that log location, for example. A computer then used those clues to identify some of the anonymous spenders. The researchers then fed a different piece of outside information into the algorithm and tried again, until every person was de-anonymized.

The report is a caution not only to organizations that try to de-personalize data they hold themselves, but also to companies that collect data and resell it to other parties.

“In light of the results, data custodians should carefully limit access to data,” Science quotes Arvind Narayanan, a computer scientist at Princeton University. Narayanan was not involved with the study.

Would you recommend this article?

Share

Thanks for taking the time to let us know what you think of this article!
We'd love to hear your opinion about this or any other story you read in our publication.


Jim Love, Chief Content Officer, IT World Canada

Featured Download

Howard Solomon
Howard Solomon
Currently a freelance writer. Former editor of ITWorldCanada.com and Computing Canada. An IT journalist since 1997, Howard has written for several of ITWC's sister publications, including ITBusiness.ca. Before arriving at ITWC he served as a staff reporter at the Calgary Herald and the Brampton (Ont.) Daily Times.

Related Tech News

Get ITBusiness Delivered

Our experienced team of journalists brings you engaging content targeted to IT professionals and line-of-business executives delivered directly to your inbox.