There are two major problems with the phone dragnet, as it currently exists.
First, the government has a database of all the phone-based relationships in the United States, one they currently (as far as we know) do not abuse, but one that is ripe for unbelievable abuse.
But there is current abuse going on. The dragnet takes completely innocent people who are three (now two) degrees of separation from someone subjected to a digital stop-and-frisk, a very low standard, and puts them (by dint of at least one communication with someone who communicated with someone who might be suspicious) into the NSA’s analytical maw. Permanently. Those people can have their multiple IDs connected, including any online searches NSA happened to injest, they can be subjected to data mining, by dint of those conversations, they apparently can even have the content of their communications accessed without a warrant, they might even be targeted to become informants using the data available to NSA.
This may well be the digital equivalent of J Edgar Hoover’s subversives list, a collection of people who will always be subject to heightened scrutiny, including unbelievably invasive digital analysis, because of a three degree association years in the past.
According to PCLOB’s estimate, as many as 120 million people may have been — may still be! — subjected for this treatment.
Discussions of whether the House Judiciary and Intelligence Committee bills “reforming” the dragnet really fix it have almost entirely ignored this second abuse, the innocent people who will be subjected to the “full range of NSA’s analytical tradecraft” merely because of a potentially completely innocent association.
There are things that should be done — whether in the current dragnet or the “reformed” one — to mitigate this abuse. Those data ought to age off, which they currently don’t (and won’t, under the new program, as currently described). That analysis ought to be subject to audits, which they’re not currently. The FISC ought to get some sense of what happens in this corporate store, which it’s not clear it currently has. Criminal defendants ought to have some visibility into whether their prosecutions stemmed from such analysis.
But there are also things — as Congress crafts a dragnet replacement — that can affect the sheer number of new people who will be thrown into the corporate store, into NSA’s analytical pool. And those things have a lot to do with how this new scheme deals with what is called “data integrity.”
As I have written repeatedly, the number of results NSA (or the telecoms, under the new system) will get under a particular query depends on how many noisy numbers — things like telemarketers, voice mail numbers, and pizza joints — remain in the collection. As Jonathan Mayer showed, even in his 300 person dataset that included just 2 people who had ever called each other, 17% were connected at the second hop through T-Mobile’s voice mail number.
In spite of the fact that just 2 of its participants had called each other, the fact that so many people had called T-Mobile’s voicemail number connected 17% of participants at two hops.
Already 17.5% of participants are linked. That makes intuitive sense—many Americans use T-Mobile for mobile phone service, and many call into voicemail. Now think through the magnitude of the privacy impact: T-Mobile has over 45 million subscribers in the United States. That’s potentially tens of millions of Americans connected by just two phone hops, solely because of how their carrier happens to configure voicemail.
And from this, the piece concludes that NSA could get access to a huge number of numbers with just one seed.
But our measurements are highly suggestive that many previous estimates of the NSA’s three-hop authority were conservative. Under current FISA Court orders, the NSA may be able to analyze the phone records of a sizable proportion of the United States population with just one seed number.
We know NSA currently does significant work to pull those noisy numbers via a “data integrity” process both before new data is used for contact chaining and as new numbers are identified as “high volume numbers.” While we don’t get to assess the efficacy of that process, it can make the difference between hundreds of millions of Americans getting thrown into the NSA’s analytical pool, or just tens of thousands. But as the contact-chaining process gets outsourced to the telecoms, the question becomes more pressing.
As I see it, there are three possible ways this function might be done going forward:
- The telecoms do an initial sort of high volume numbers, taking out voice mail box and telemarketer calls, then pass the data onto NSA, which does a secondary sort to pull out things like pizza joints (which NSA might want to keep in the data set, but suppress in contact chaining until they have evidence a pizza joint might be a key hub in a terrorist attack). This plays to existing telecom strengths (most likely do similar analysis on their own use of the data now), but doesn’t require they make what are analytical intelligence decisions. Even though this is likely the best solution, it still means many completely innocent Americans may be subject to NSA’s analysis because they ordered pizza.
- The telecom does all the data integrity analysis, identifying all the high volume numbers. This would result in the fewest number (but still intolerably too many) of innocent Americans being dumped into NSA’s pot. But it would also turn the telecoms into an arm of US intelligence (well, even more than they already are!), because they’d be in the position of making analytical judgments about what data is useful for NSA’s intelligence purposes. Which may be one of the reasons the telecoms seem to be demanding immunity, again.
- NSA does the data integrity analysis at the telecoms, as seems to be envisioned by the HPSCI bill. This might achieve the current status quo, borrowing on 8 years of experience to strike the right balance. But it would also present the intolerable condition of NSA employees or contractors accessing and analyzing the raw data of private communications providers at the providers’ locales.