[Photo: National Security Agency, Ft. Meade, MD via Wikimedia]

The NSA’s 5-Page Entirely Redacted Definition of Metadata

In my post on Rosemary Collyer’s shitty upstream 702 opinion, I noted that the only known (but entirely redacted) discussions of what constituted metadata were part of the 2004 and 2010 authorizations for the Internet dragnet.

The documents liberated by Charlie Savage (starting at PDF 184) reveal the topic was actually discussed during the resolution of the 2011 upstream fight. In response to a Bates question to “fully describe what constitutes ‘metadata'” that can be extracted from Internet transactions, the government defined the term in a footnote that is substantially redacted.

That discussion is followed by five entirely redacted pages describing the three (also entirely redacted) categories of metadata.

So I apologize to the government for suggesting they’ve never defined the difference between content and metadata in the context of upstream content collection (the discussion probably closely follows the Internet dragnet discussion, which Bates had had with the government roughly 18 months earlier; that discussion allowed some dialing, routing, addressing, or signaling information that counted as content but didn’t convey the message of the communication to be treated as metadata).

That said, what the fuck are you thinking?!?!?

I mean, first of all, Congress is about to reauthorize 702, possibly trying to codify the prohibition on about searches. But most of Congress won’t go through the trouble to read this five page definition, much less consult with technical experts to understand if the definition is meaningful and how any draft bill would interact with this language. So it’s unclear how closely tested this has been.

As noted, even by the 2010 discussion, it was clear Bates was creating a middle ground for stuff that was technically content but which served a DRAS function — probably something akin to Steve Bellovin et al’s definition of architectural content. Given the way NSA asked to and did nuke the existing PRTT data at precisely this time (though without letting the Inspector General review their destruction of intake data) it’s highly likely they were violating those limits, at least through the processing stage. But legally, using this definition of metadata would all of a sudden be kosher, because the metadata would have been collected under a content standard, so the distinction of it being metadata would matter primarily for the privacy considerations (not least because Americans’ metadata collected off this upstream collection could and can be disseminated with a much lower standard than the one in place in the Internet dragnet, and can be disseminated for non-terrorism purposes), not legal ones. In other words, by collecting its domestic metadata using a content collection statute, the legal distinction between metadata and content would no longer matter, after 7 years of mattering.

Except now it does.

If the NSA’s five page definition of metadata includes stuff that is legally content, then the promise to avoid “about” collection is probably bogus, because it’d incorporate these definitions of metadata and thereby permit using metadata that actually counts as content as a selector.

Which is probably also why the government is so keen to avoid a prohibition on about searches — because what they’re doing, even today, amounts legally to about collection.

I’ll have to put some thought to the privacy implications of this (I suspect this explains the utility of upstream collection for cybersecurity purposes).

But if I’m right, there’s no way this should be classified, at least not entirely classified, not if the government has claimed to have gotten out of the business of searching for selectors in content.

Marcy Wheeler is an independent journalist writing about national security and civil liberties. She writes as emptywheel at her eponymous blog, publishes at outlets including Vice, Motherboard, the Nation, the Atlantic, Al Jazeera, and appears frequently on television and radio. She is the author of Anatomy of Deceit, a primer on the CIA leak investigation, and liveblogged the Scooter Libby trial.

Marcy has a PhD from the University of Michigan, where she researched the “feuilleton,” a short conversational newspaper form that has proven important in times of heightened censorship. Before and after her time in academics, Marcy provided documentation consulting for corporations in the auto, tech, and energy industries. She lives with her spouse in Grand Rapids, MI.

3 replies
  1. SpsaceLifeForm says:

    Consider: NSA collects the metadata, but dumps it onto another network, say SIPRNET.

    Now FVEY can see it.

    Did the metadata become content at that point?

    I believe that is what government would argue. And how would FISC know they have been fooled?

  2. d says:

    I’d be curious about this. Metadata had a really obvious definition back when collection methods were designed around radio comms. I would have just assumed that the old meaning still held: data=body of communication, metadata=all the information associated with the delivery of the communication. But maybe things aren’t that simple any more.

  3. GKJames says:

    Is it a reasonable inference that the government’s redaction is intended to hide the fact that it is collecting (and using) information it shouldn’t be collecting? If so, why do FISC judges continue to play along in a game that appears to be an abdication by the judiciary of its separation-of-powers obligation? And with Congress either ok with this or simply AWOL on the issue, how does any of this translate into meaningful oversight of the national security apparatus?

Comments are closed.