Toggle Menu

You Are What You Click: On Microtargeting

Why privacy and anonymity are being violated online by an unstoppable process of data profiling.

David Auerbach

February 13, 2013

Every few months there’s a headline story about privacy violations committed by a high-profile online company, and the violations usually span the spectrum. Google was recently slapped with two fines: $22.5 million for tracking users of Apple’s Safari browser, and $25,000 for impeding an FCC probe into a bizarre episode of alleged wireless data “sniffing.” For some years now, the privacy policies of Facebook have been under investigation in Germany and the United States.   What’s always missing from these stories is context. Accounts of privacy violations bubble through the news and stir public outrage, which is often followed by a backlash and occasionally a fine. But these stories rarely reveal the porous privacy lines of the digital realm, or whether other types of violations are being committed online, by companies other than the household names. The outrage is selective and the enforcements ad hoc. News stories about hacking, data sniffing and the like have become red herrings. They provide false assurances that, in the normal course of things, our privacy is not being invaded on the Internet, that our personal data is safe, and that we are anonymous in our online—and offline—activities.

But we aren’t. “Privacy” and “anonymity” are being defined down, and single violations of individual privacy like hacking and identity theft, while aggravating, are trivial compared with efforts toward the comprehensive accumulation of data on every single consumer. The marketing industry is attempting to profile and classify us all, so that advertising can be customized and targeted as precisely as possible. Google, Facebook, Apple and thousands of lesser-known companies are making it their policy and business to profile us in detail, all in the hopes of crafting better sales pitches. For these companies, your value is expressed most often when you click on an ad, signaling that you’re interested in the product being sold. But will you buy it? For those paying for the ads, your value depends on other factors: your socioeconomic class, your credit, your purchasing record.

The sort of consumer profiling that has become increasingly necessary for targeted marketing has yet to generate significant increases in revenue. However, even the modest success of “microtargeting” has been enough to encourage the collection of vast quantities of consumer data, which is cheap and easy to do with today’s technology. There is no incentive to stop this activity; no law prohibits it, and the growing electronic data hoard will be very difficult to expunge. Big Brother is watching you, but he’s no longer a dictator; instead, he’s a desperate and persistent door-to-door salesman. Call him Big Salesman.

Big Salesman is engineering a far grosser violation of our privacy than most people suspect—not a single incident, but a slow, unstoppable process of profiling who we are and what we do, to be sold to advertisers and marketing companies. Information that we reveal about ourselves constantly every day in our online and offline actions has become valuable to those who collect and amass it. Because the value does not lie in any one piece of data but in its unification and aggregation, the data in sum is worth far more than its individual parts. Ticketmaster may know which concerts I’ve attended and Amazon may know which albums I’ve bought, but each company would benefit if it had the other’s file on me. It’s a slow death by a thousand clicks: thousands of people see you on the street every day and it does not feel like an invasion of privacy, but if one person follows you everywhere as you work, read, watch movies and do myriad other things, it becomes stalking. And so we are stalked in the pursuit of marketing optimization.

Where the Money Is

The key data lesson of the Internet age is that the amount of data one possesses is just as important as the type. With enough data, it becomes possible to see patterns that one could never guess in isolation. Consider two contrasting examples: toothpaste and the flu.

Google Flu Trends is an example of apparently beneficent data aggregation. By tracking when and where people are searching Google for terms related to the flu, flu symptoms and flu treatments, Google has been able to predict outbreaks of influenza before government agencies do, and has made it easier to track the path of a flu virus. In many First World countries, Google’s predictions have correlated reasonably closely with subsequent government data. Only an entity with access to an enormous plurality of all Internet searches could achieve such accuracy in prediction.

The same goes for consumer data. If a marketer sees me buy toothpaste at a drugstore, that is not tremendously valuable information by itself. But if he knows my entire history of toothpaste purchases, including which brands I buy and how often, he can predict when I might need to buy toothpaste again, and whether I might be inclined to click on an ad directing me to a lower-cost brand, or to a store selling my usual brand at a cheaper price. If my dental insurer knew my toothpaste purchases, it could classify me as higher or lower risk and adjust my premiums and payments accordingly.

Google Flu Trends gauges collective tendencies among many people, but market research is oriented toward the individual. Targeting the right set of consumers has always been at the heart of advertising, but when web ads first appeared in the 1990s, their click-through rates quickly plummeted as users wised up; people stopped clicking on even the brightest banner ads. The revolution in Internet advertising did not come until 2000, when Google introduced its AdWords program, which allows anyone to bid for placement on ad spots that appear in response to searches for keywords. Google’s system collected little to no data about a user; the ads were displayed based merely on the search query itself. Searching for “watches” generates watch ads, searching for “asbestos” generates ads for tort lawyers. The advertising model fortuitously avoided many of the privacy concerns that are emerging today, because the very nature of Google’s business ensured that it would find out exactly what consumers wanted at the exact moment they wanted it: during the search.

Yet Google’s system was dependent on its having a search engine—the search engine, in fact. Click-through rates for online ads not generated by search engines are considerably lower, and Google’s success in search ads has not been replicated anywhere else. For comparison, consider that Google’s gross revenue was $37.9 billion in 2011 (96 percent of it from advertising), while Facebook’s was merely $3.7 billion. And search continues to make up nearly half of all Internet advertising revenues, with Google dominating its competitors. Thus the social revolution precipitated by Facebook has not yet amounted to a shift in advertising effectiveness—one possible reason for Facebook’s drastic decline in share price following its IPO in May 2012.

Internet marketing companies that aren’t Google can’t observe people at the moment of search, but knowing more about their lives might help refine targeted advertising. And so it has: while microtargeting hasn’t come close to matching Google’s success, it appears to have increased click-through rates sufficiently that several industries have sprung up around profiling and targeting consumers for advertising.

The Information Pipeline   Consumer data is collected, assimilated, processed, resold and exploited by a leviathan encompassing hundreds, if not thousands, of companies playing dozens of roles: ad brokers, data exchanges, ad exchanges, retargeters, delivery systems, trading desks. Some of these companies are well-known: Google, Facebook, Apple, Twitter, Yahoo. Beyond these, there are far more obscure entities trying to collect similar data, but their web presence is minimal. Companies like Acxiom, BlueKai, Next Jump and Turn build up demographic profiles so that advertising can be targeted as precisely as possible.

Ironically, the activities of the big names are more mysterious than those of smaller companies you haven’t heard of. The operations of Facebook, Google and Amazon encompass the end-to-end collection of data: aggregation, usage and targeting. With their own private data sets, they don’t need to engage in transactions with, say, data exchanges. The greatest use to which they can put their data is internal.

If you know who the smaller companies are, it’s somewhat easier to find out what they do than it is to know what Google or Facebook is doing with your data. That said, Facebook and Google may not be doing single-handedly what these other companies do collectively; for many reasons, both business- and image-related, their agendas are different. But any company with a sufficient amount of consumer data can replicate the working model of the advertising ecosystem, and that should be enough to raise alarms. The model, though complex, breaks down into four stages through which consumer information is obtained and put to use: observation, collection, aggregation and targeting.

Observation

Whenever you browse the web, you leave a permanent trace of your activity. Every machine and device connected to the Internet has an IP address. The IP address does not identify you, but neither is it wholly anonymous. Blocks of IP addresses are associated with particular Internet service providers (ISPs), and most are geographically specific. This information is public: by visiting a website, you enable the website owner to learn where you are located. An ISP may assign you a different IP address over time, but frequently the address remains the same, so repeat visits can also be tracked.

Cookies are little pieces of data that websites ask your web browser to store. They can contain almost any data, but they’re frequently used for remembering user preferences: the language you speak, for example, or your login and password. They also provide an easy way to tell when the same user is returning to a site. Browsers will send cookies back only to the site that sent them, and there are few limitations on what the site does with that knowledge, such as sell it to any or many other companies.

Companies like BlueCava have figured out a way to track online behavior without cookies. BlueCava’s device identification platform attempted to identify individual users based on which browser and device they were using, information that is sent along with every request to a web server. (BlueCava now describes the service it provides as “multi-screen identification capabilities,” presumably because it’s harder to decipher what that actually means.) This is one of the reasons privacy remedies focused on particular technical mechanisms, such as the (mostly ignored) “do not track” header or third-party cookies, cannot suffice on their own.

The situation is different with sites like Facebook and Twitter, which require users to sign up for an account that they are encouraged to remain logged in to. Unless you micromanage your web privacy settings and browser activity, these sites have the ability to track you across the web. Every time you go to a site that has a Facebook “like” button or a Twitter “tweet” button or a Google “+1” button, or a site that lets you comment with your Facebook or Twitter or Google account, those companies know that you’ve visited the site, whether or not you click on the button. And every time you click the “like” button or authorize an application, Facebook eagerly hands over your data to the online gaming company Zynga, and to newspapers and publishing companies. Sharing your information with a third-party application on Facebook is akin to poking a hole in a water balloon: only one prick is needed for everything to leak out.

Collection

The lie of the web is that each page is a discrete entity. This was true a generation ago, when pages were merely formatted text, but now that they host all sorts of code and cookies, it’s more accurate to think of web pages as collages of content, advertisements, federated services and tracking mechanisms that can talk to one another to a lesser or greater degree depending on your browser’s privacy settings. The web is becoming a tightly connected mass of trackers and bugs, a single beast with a million eyes pointing in every direction.

If you’re logged in to Facebook, Twitter, Google or Amazon, it’s safe to say these sites are tracking and retaining everything you’re doing on their sites and on any other sites that host their scripts and widgets. It’s how they make recommendations: for friends, products, events. Advertising targeters like Acxiom, Turn and BlueKai are tracking users in a different though equally invasive way. A newspaper’s website may know all the visitors to its site, but it knows nothing about their activities elsewhere. Its advertisers might, however, and Google Analytics certainly does: it offers a wide array of services to websites, tracking where users are coming from and what they search for before arriving at a page, all behind a slick interface. In exchange, Google gets to see the entire history of a site’s access logs. Google Analytics and similar services like Quantcast and comScore are so ubiquitous that most of your web browsing is likely captured by one or more of these companies. They don’t have your name, but they have your IP address, rough physical location and a good chunk of your activity online. This “raw data” is considered quite sensitive, since many companies have policies against retaining it. Google, for example, boasts of anonymizing IP addresses after nine months and cookies after eighteen. But many more sites lack any such policy, and few websites will promise that your visit has not been permanently archived by some unaffiliated company.

Aggregation and Microtargeting   With so many entities collecting data and amassing consumer profiles, the profiles are often incomplete or even inaccurate. Acxiom may slot you, without your name, into a particular demographic: upscale young single male, suburban mom in a rich county, or one of many other categories. But for targeting coupons, a company needs far more than just a demographic. It needs to know which products you buy, the brands you like, when you buy and how.

This is the world of microtargeting. It has been used and refined over the last decade by political parties to determine where their voters are, so they don’t mistakenly encourage the wrong people to get out and vote on Election Day. In 2012, the Obama campaign took a huge leap forward in microtargeting; its technology was unmatched by that of the inept Romney campaign, giving Obama’s team a crucial edge in its ground game.

But identifying political affiliation is low-tech compared with advertising targeting, which needs to predict far more than one kind of behavior. Last June, The New York Times published a long article by Natasha Singer on Acxiom, which claims to have profiled 500 million consumers and offers its data in aggregated form to anyone who will pay for it, from websites to banks to insurance companies, and even to a US Army contractor. In the name of “multichannel marketing”—which is code for tracking a consumer in all her activities, from web browsing to television advertising to mail-order catalogs—Acxiom has been aggregating consumer data since its founding in 1969, and the explosion of data in the Internet age has been a big boon to it.

Acxiom is hardly an isolated actor, though. BlueKai claims to have profiled more than 160 million consumers, and in 2011 international advertising monolith WPP/GroupM started its own data aggregation division, Xaxis, which already claims to have amassed 500 million “unique consumer profiles.” A Xaxis executive told The Wall Street Journal that this data would be obtained partly by purchasing it from companies like BlueKai. After some negative press, Xaxis has ceased bragging about that number and now speaks only in vague terms of customer reach. In November 2011, when Yahoo bought Interclick, now called Genome (which says it “aggregates and organizes billions of data points from third-party providers—delivering actionable consumer insights, scalable audiences and the most effective campaign execution”), the company was not just buying technology. It was also buying data.

Acxiom offers a lengthy and confusing form to opt out of its database, as well as the ability to see some of the data it has collected on you, though not for free: “Access to information about you in our directory and our fraud detection and prevention products will be provided in the form of a Reference Report that is available for a processing fee of $5.”

To compensate for the limitations of their data sets, smaller sites are increasingly turning over their advertising operations to exchanges. An advertising exchange determines the user’s identity and targeted demographic and then offers that knowledge to advertisers, who bid in real time for the opportunity to show the user their ad. The consumer may be identified generally as a member of a particular microdemographic, but because these “segments” are very small, they can be linked to personally identifiable information with ease. The exchange knows exactly who you are.

In these exchanges, decisions must be made faster than humans can make them—in fractions of an instant—and so complex algorithms are used to establish who “won” the bidding war to show a particular customer an ad. The real-time nature of the advertising ecosystem has reached levels of complexity reminiscent of automated derivatives trading and hedge funds, except that what is being traded are bets not on the future value of assets, but on the future value of consumer behavior: the likelihood that someone will click on an ad, use the coupon, buy the product.

Google launched its exchange, AdX, in 2009. Facebook recently launched the Facebook Ad Exchange, partnering with a number of companies with vague, mysterious names: Turn, TellApart, DataXu, MediaMath, AppNexus. (Turn, which claims to have 700 million consumer profiles through partnerships with Acxiom, eXelate, Facebook and others, sells placements based on the number of users within a particular “segment.”) According to Bloomberg News, “Facebook Exchange will let advertisers reach specific types of users on the social network based on their browsing history.”

Google and Facebook can be more secretive about their exchanges because everyone has tacit knowledge of what they have: millions of profiles of distinct users, reliably tracked by login. It remains to be seen how the third-party exchanges will compare. The ensuing partnerships will likely merge data sets, thereby enlarging the enormous consumer database that much more. And because the value of data increases the more it is aggregated, there is every incentive for third-party exchanges to enlarge their data sets through partnerships and acquisitions, either to compete with Google and Facebook, or to join them.

Companies like Turn sell themselves on their ability to reach microdemographics, and they are considerably less forthcoming about how effective such real-time targeting is, apart from selling advertisers on the idea that everyone is doing it and they should be, too. It’s possible that consumers aren’t the only ones being manipulated: microtargeting seems to increase the effectiveness of advertising, but performing controlled experiments to gauge its impact is difficult. Sketchy evidence suggests that microtargeting raises click-through rates modestly, but falls short of doubling them. This is nothing compared to the advantage Google gained over its competitors when it launched AdWords. Nonetheless, the potential of microtargeting is indisputable, and for anyone who doubts the use of this data, there is the lesson of Google as a rejoinder: we just need more information, correlated with other data. The right combination of data, as Google found out, can be a gold mine.

Last summer, DataXu co-founder Michael Baker wrote in Forbes, “It is no longer difficult to imagine a time in the not so distant future when all media—TV, radio, outdoor—is digital and addressable and capable of being purchased in an auction on an individual impression-by-impression basis.” Baker is naturally speaking to advertisers, not consumers, who might find his vision rather alarming.

Microtargeting is not the only use being made of this data. Far more information has been collected than has been put to use, and the purposes to which it has been put have not always been visible to consumers. There is the danger of what law professor Frank Pasquale calls “runaway data,” where personal data collected under one privacy policy is sold, resold or otherwise distributed so that deleting it becomes practically impossible. If this data is your personal profile, more than mere knowledge of your consumer habits can be gleaned. Two great dangers exist with runaway data: first, being put to more nefarious uses than coupon targeting; and second, the risk of deanonymization of consumer data, which establishes a link between anonymous data and a person’s name. Both are enough to render any thought of personal privacy archaic.

Turn and other companies offer the ability to opt out of targeted advertising, and the “self-policing” group Network Advertising Initiative offers an opt-out promising that “the company or companies from which you opted out will no longer deliver ads tailored to your web preferences and usage patterns.” In other words, they can still collect and sell data about you; they just won’t use that data in one particular way.

The value of this consumer data is frequently thought to be in advertising, pure and simple, and viewed apart from the data-collecting methods undergirding it, advertising is comparatively innocuous: fine-grained targeting of online ads is creepy and invasive, but not dangerous.

Not so with other uses of that data. Consider credit ratings. Your FICO credit score is calculated by a secret formula. FICO has provided rough guidelines, saying it uses categories such as recent searches for your credit rating by third parties, but the company says nothing about what data is being used or where it comes from. There is nothing to prevent your Internet browsing activity from figuring into this calculation if credit bureaus are able to obtain it.

Things are worse with the less reputable “fourth bureau” credit agencies, which have been investigated by Ylan Mui in The Washington Post. These companies, such as L2C and LexisNexis, track people without a sufficient credit history to have files in the big three credit bureaus. Fourth bureaus do not follow the guidelines of the big three and have no disclosure regulations on what data they can use or any obligation to give it to you. Their information, as Mui discovered, is sometimes inaccurate, but you have no way of knowing that until their data results in your rejection for a job, an apartment or a loan.

Likewise, life and health insurance companies could have a field day with these files. A 2010 Wall Street Journal article described how Aviva’s, AIG’s and Prudential’s life insurance arms were “exploring whether data can reveal nearly as much about a person as a lab analysis of their bodily fluids.” Your profile can reveal all sorts of healthy or unhealthy tendencies: gym memberships, exercise equipment, bar tabs, poor diet, time spent on dating or hookup sites, an interest in recreational drugs. If you “like” the American Cancer Society or even Jack Daniel’s on Facebook, for example, it won’t just be advertisers who might take notice. And if there are any errors in your profile, you certainly won’t have the chance to correct them, much less even know about them—you won’t be told why your premiums are so high or your credit application was turned down.

Finally, there is the government. The National Security Agency, Department of Homeland Security and FBI already collect a tremendous amount of data on online activity, as Dana Priest and William Arkin revealed in Top Secret America: The Rise of the New American Security State. Some of the data, they found, was useful, but most of it was a vast stockpile of information on ordinary people, collected with a huge dragnet and permanently filed away because there was no pressing reason to delete it. Consumer profiles are ready-made grist for the national security mill. Companies like Google, Facebook and Twitter have sometimes resisted subpoenas for user information, but the chances that smaller, lower-profile marketing companies would stick their necks out by fighting a subpoena are slim. And you would not even know that your profile is being turned over to the government—you likely didn’t even know it was being created.

Once your profile is created and passed around among companies, the circle of entities that know your behavioral patterns and day-by-day online and offline activity can only grow. This data is powerful, and though you had little to no say in its collection or sale, it is raw capital that can be put to use for the sake of profit in both benign and dangerous ways.

False Promises of Privacy   Promises of anonymity are misleading and far from absolute. In a famous 2000 study, Latanya Sweeney determined that a voter list could be correlated with medical records at a rate of 87 percent based not on any personal information but on three pieces of demographic data: sex, ZIP code and birth date. This allowed the “anonymized” medical data to be linked to a particular name.

But it is not just those three pieces of data: enough anonymous data of any form allows for a positive identification. In 2006, Netflix offered up a huge, seemingly anonymous data set of the complete video ratings of nearly half a million members. In their 2008 paper “Robust De-anonymization of Large Sparse Datasets,” computer scientists Arvind Narayanan and Vitaly Shmatikov showed that very little knowledge was required to correlate one of the anonymous lists with an Internet Movie Database account: an overlap of even a half-dozen films between Netflix’s list and an IMDb account could suffice to make a highly likely positive match. Because many IMDb accounts use people’s real names and other identifying information, they provide a foothold for obtaining a person’s entire viewing history.

A Netflix user history may seem like a fairly harmless example of “reidentification.” Other data sets that are released “anonymously,” including consumer purchases, website visits, health information and basic demographic information, appear more menacing. When AOL Research released a large data set of Internet searches for 650,000 users of AOL’s search engine in 2006, The New York Times and others were immediately able to identify some of the users by finding personal information in the search queries. AOL admitted its error, but the data remains out there for anyone to view. Notoriously, there was User 927, whose searches included “beauty and the beast disney porn,” “intersexed genitals,” and “oh i like that baby. i put on my robe and wizards hat.”

In his 2010 paper, “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization,” law professor Paul Ohm wrote:

Reidentification combines datasets that were meant to be kept apart, and in doing so, gains power through accretion: Every successful reidentification, even one that reveals seemingly nonsensitive data like movie ratings, abets future reidentification. Accretive reidentification makes all of our secrets fundamentally easier to discover and reveal. Our enemies will find it easier to connect us to facts that they can use to blackmail, harass, defame, frame, or discriminate against us. Powerful reidentification will draw every one of us closer to what I call our personal “databases of ruin.”

Ohm’s predictions are dire, yet his underlying point is irrefutable: both the value and the privacy of anonymized data are far higher than we intuitively think, because the data gains value only at large scale. The bits and pieces of this data we contribute, free and without compensation, become parts of large, profit-making machines.

Reidentification is a key aspect of giving value to data. It frequently has its value in the absence of total reidentification—marketers do not need to know your name or your Social Security number to show you ads. But if the value of the data goes up for credit bureaus and others that can determine your true identity, there is a strong incentive for them to do exactly that.

Hence, many privacy policies today give a false sense of security. Parsing the language of these policies is not easy, and because the information’s value changes depending on how much other data it is collated with, such guarantees are at best naïve and at worst disingenuous.

As for subscription services, such as those offered by Apple, Google and Facebook, anonymity doesn’t really exist. These companies already know who you are. Ex–Google CEO Eric Schmidt described the Google+ social network as fundamentally an “identity service” providing a verifiable “strong identity” for its users—one that requires you to use your real name. Needless to say, actions performed while logged in to this identity are far less anonymous than those performed under a pseudonym or when not signed in.

As long as you are signed in with an account from these respective services, there is nothing to prevent all actions taken on their sites from being associated with your account. By default, Google records your entire search history if you are logged in with a Google account. The organization Europe Versus Facebook, founded by law student Max Schrems, has publicized the extent of Facebook’s data collection. With the help of EU laws, he obtained Facebook’s internal record of him, a thousand-page dossier containing more or less everything he had ever done on Facebook: invites, pokes, chats, logins, friendings, unfriendings and so on. The accrual of all possible data—unabashedly personal data—is the industry standard. The restrictions, where they exist, are only on how that data is used.

Fighting the Future   Given the choice, most consumers would prefer that their information not be collected and aggregated. And so advertisers and data aggregators have treated them like the proverbial boiling frog: enticing them into an indispensible social or technological network, then slowly eliminating their choices. Regulations and advocacy have been consistently losing ground against the advertising behemoth.

By default, browser and mobile software provide little protection against the collection of their data. Simple but powerful browser extensions such as Disconnect and Ghostery prevent a great deal of tracking via cookies on PCs, but they are used only by a small fraction of consumers. And even such extensions can’t prevent the many other forms of tracking, and mobile platforms do not permit their use. Privacy advocate Brian Kennish, the creator of Disconnect, stresses the lack of transparency in data collection and use: “We’re trading information we don’t even understand for Internet products. If we don’t even know what’s happening, it’s hard to assess the risk.”

Cases like the one the government brought against Google are irrelevant to the central privacy issues of the day. There is no legal or regulatory infrastructure set up to monitor the collection, aggregation and trading of consumer information. Certain forms of information, such as medical records, are cordoned off by privacy legislation such as HIPAA, but even these laws are no guarantee of anonymity, as it is easy to determine much about a person’s health and medical history by looking at his everyday purchases and activities. In great enough quantities, collection and aggregation of nonconfidential information can violate privacy just as much as the disclosure of confidential information does.

Most resistance to this kind of aggregation has been purely reactive and not particularly effective. When the resistance has had any effect, it has played on momentary consumer outrage. Consider the case of Facebook Beacon, launched in 2007: the concept was that companies partnering with Facebook, which included eBay, Yelp, The New York Times and Blockbuster, would allow it to put an invisible “web bug” on their sites that would enable Facebook to see everything its users did on the partner sites and associate that activity with their Facebook accounts, whether or not they were logged in. If I purchased shoes from Zappos, for example, Facebook would post that information to my wall automatically, saying, “David just bought shoes from Zappos!” Facebook users were “opted” in to Beacon without being asked and had to manually turn it off.

There was a public outcry: Facebook users did not want their online activity automatically advertised to their friends. MoveOn started a petition, and a class-action suit was filed against Facebook and several partners. Facebook quickly made Beacon optional for users, requiring an explicit opt-in, and subsequently allowed people to turn it off completely. Two years later, in 2009, it shut down Beacon altogether because, when given a choice, very few people wanted to opt in to such a program.

But Facebook didn’t abandon the goals of Beacon. Rather, it learned from its mistake, grasping that what frightened people most about Beacon was seeing their online behavior publicized without their consent. Through the use of “like” buttons, comment registration and third-party cookies, Facebook still monitors a large percentage of the online activity that Beacon was supposed to capture. It just doesn’t publicize its actions.

This kind of two-step, where data is collected but the consumer is not notified, has become the norm in Internet commerce. The two-step works in other ways. Facebook has drastically weakened its privacy policies several times, most notably in 2009, 2010 and 2012, each time attempting to make more user information less private by default. (A brief timeline is available from the Electronic Frontier Foundation, which has worked diligently to raise consumer awareness.) Whenever there was a strong public protest, Facebook retreated, but not to its original position, thereby cooling critics’ ire while still managing to raise the flame under the frog.

Facebook’s case is an unusually visible one. Most companies have not had their data collection practices scrutinized so closely, if at all. Natasha Singer’s Times article about Acxiom raised eyebrows in Congress and at the FTC, but no action has been forthcoming: “self-policing” seems to be the order of the day, which is to say there’s no order at all. Because consumers remain mostly in the dark about the activities of companies like Acxiom, there is far less pressure on them than there has been on Facebook—and even there, the pressure hardly seems to have made a difference. The Obama administration’s Consumer Privacy Bill of Rights, issued in February 2012, sets out vague guidelines for control and transparency that are wholly out of touch with reality: corporations have so far yielded nothing to it, and the government has not pressed the point.

Legislatively, there are very few existing guidelines, partly owing to the difficulty in quantifying exactly what should be illegal: companies have been collecting this sort of data for years, so how would one justify criminalizing the collection of more of it? In Steinberg v. CVS, decided last year, CVS successfully fought off a Pennsylvania lawsuit over giving “anonymized” data to pharmacy companies and data brokers, because no legal protections were in place beyond the requirement of scrubbing people’s names from the data. The concept of reidentification has not yet entered the legal domain—nor has the inevitability that the data will be combined with other data.

There are many legal issues to resolve, and the only impetus for change appears to be consumer education and outrage. But given the complexity and obscurity of data aggregation today, outrage occurs only when a company makes a public relations gaffe that’s big, simple and visible enough for the media to latch on to. Even then, few people end up leaving Facebook. All of your friends are there, being watched and anonymized as they “friend” and watch you, all of them doing, in the words of Joseph Turow, “free labor in the interest of corporate profits.”

Caleb Crain writes about his experience of being cyber-stalked in this same issue of The Nation.

David AuerbachDavid Auerbach, a software engineer, has written for the Times Literary Supplement, Bookforum, n+1 and Triple Canopy. He blogs at Waggish.


Latest from the nation