[Photo: National Security Agency, Ft. Meade, MD via Wikimedia]

The NSA’s Purge Obfuscations

One thing that the 2011 702 documents Charlie Savage liberated make clear is that the government is (and was) obtaining more domestic communications — but purging them — than it wants to let on (and the numbers have surely gotten worse since 2011).

In a hearing on September 7, 2011, the first question that John Bates asked (starting at PDF 35) about the sampling the NSA had done is how many communications had been purged before the agency started counting its sample, a sample that included both PRISM and upstream collection. As Bates noted, it would be one thing if the NSA were purging half its collection and then counting than if it only had to purge a small amount.

During this exchange, the government was careful to limit their discussion of purged communications to upstream MCT related collection.

When the government responded (starting at PDF 117), it provided numbers for just what had gotten purged from upstream collection.

I’m not entirely sure their claim that none of this purged information was “upstream” collection — as opposed to MCT collection — is correct (as a post on the violations will explore). But they make it clear: the 18,446 purged communications were just Internet upstream. For every upstream  record purged because the target had roamed into the US, there might be correlated telephony collection that would get purged — some of the most commonly discussed purged communications. It might also include PRISM production that would have to get purged (if, for example, the target continued to use GMail while in the US). In addition, there might be targets discovered to be (perhaps by reading that PRISM production) Americans. So the 18,446 is just a portion of what got purged — but the government pointedly avoided telling Bates how much of the other kind there was.

Of the upstream Internet collection in 2011, .1% was getting purged.

The purge numbers for telephony and PRISM would not be the same as for upstream. The telephony numbers might be far far higher, given public reporting from the period. The NSA was working off some overcollection that was limited to upstream during this period, which would lead to more upstream communications being purged. But the rules on domestic collection of PRISM communications are different than they are for upstream.

In any case, the government’s careful dodge of providing Bates the full purge number suggests the telephony and PRISM purge numbers might be substantial, too. But we don’t get that number.

Marcy Wheeler is an independent journalist writing about national security and civil liberties. She writes as emptywheel at her eponymous blog, publishes at outlets including Vice, Motherboard, the Nation, the Atlantic, Al Jazeera, and appears frequently on television and radio. She is the author of Anatomy of Deceit, a primer on the CIA leak investigation, and liveblogged the Scooter Libby trial.

Marcy has a PhD from the University of Michigan, where she researched the “feuilleton,” a short conversational newspaper form that has proven important in times of heightened censorship. Before and after her time in academics, Marcy provided documentation consulting for corporations in the auto, tech, and energy industries. She lives with her spouse in Grand Rapids, MI.

2 replies
  1. greengiant says:

    Imagine the NSA has data captured,  NSA has data retained,  NSA has data stored,  NSA has data analysed,  NSA has data “collected” one may surmise only at that point at which it is identified with a person and was looked at by a human,  and that is changing every minute.   There are a number of reasons Clapper would not know the answers,  Clapper does not want to know the answers so that he can’t tell anyone the answers. The NSA is not going to reveal this cause they are saving all this for the really really big stuff. The US is taking pictures of every piece of mail sorted by blue box.  Your ISP, Google and facebook are tracking every link and facebook is tracking mouse movements.  Google is reading your email to figure out what to sell you. Malware is buying ads and jamming down gigabytes an hour trying to force faults and reading back gigabytes a day looking for uncleansed memory/disk store. CA and others have correlated your IPs and phone numbers with all the above. Not even going to talk about phone aps and tracking. The next thing after Ashley Madison will be far off actors buying stuff from the likes of CA just to blackmail people.   Anyway,  I would be pissed if the NSA does not capture every bit on the net and put it in Utah or bounce it off the moon or wherever to put it into store.  Mercer and Zuckerberg and Thiel are making money off big data already,  anyone trust them more?   As long as sharing NSA is restricted to the “collected” part,  the other agencies’ fishing expeditions are going to be limited.   As is I can guess DOD has access and is over everything outside the US.

    • SpaceLifeForm says:

      Thank you for mentioning ads.

      Ad servers have been the primary vector to load malware for two decades now. Via JavaScript (originally called ECMAScript).

      Then Livescript which should and must not be confused with the more modern Livescript.

      It the ‘old days’, the Javascript malware attacks were easy because it was insecure HTTP, so the ad servers could inject whatever they wanted. Now it is tougher.

      But over a decade ago, I observed Javascript doing https to unknown domains.

      Until any TLA proves their worth, none can be trusted.

Comments are closed.