NSA, Destroying the Evidence

In my obsessions with the poor oversight over the phone dragnet techs, I have pointed to this description several times.

As of 16 February 2012, NSA determined that approximately 3,032 files containing call detail records potentially collected pursuant to prior BR Orders were retained on a server and been collected more than five years ago in violation of the 5-year retention period established for BR collection. Specifically, these files were retained on a server used by technical personnel working with the Business Records metadata to maintain documentation of provider feed data formats and performed background analysis to document why certain contact chaining rules were created. In addition to the BR work, this server also contains information related to the STELLARWIND program and files which do not appear to be related to either of these programs. NSA bases its determination that these files may be in violation of BR 11-191 because of the type of information contained in the files (i.e., call detail records), the access to the server by technical personnel who worked with the BR metadata, and the listed “creation date” for the files. It is possible that these files contain STELLARWIND data, despite the creation date. The STELLARWIND data could have been copied to this server, and that process could have changed the creation date to a timeframe that appears to indicate that they may contain BR metadata.

The NSA just finds raw data mingling with data from the President’s illegal program. And that’s all the explanation we get for why!

Well, PCLOB provides more explanation for why we don’t know what happened with that data.

In one incident, NSA technical personnel discovered a technical server with nearly 3,000 files containing call detail records that were more than five years old, but that had not been destroyed in accordance with the applicable retention rules. These files were among those used in connection with a migration of call detail records to a new system. Because a single file may contain more than one call detail record, and because the files were promptly destroyed by agency technical personnel, the NSA could not provide an estimate regarding the volume of calling records that were retained beyond the five-year limit. The technical server in question was not available to intelligence analysts.

This is actually PCLOB being more solicitous in other parts of the report. After all, it’s not just that there was a 5 year data retention limit on this data, there was also a mandate that techs destroy data once they’re done fiddling with it. So this is a double violation.

And yet NSA’s response to finding raw data sitting around places is to destroy it, making it all the more difficult to understand what went on with it?

The Privacy Problems (?) of Outsourcing the Dragnet

Both Ed Felten

I am reminded of the scene in Austin Powers where Dr. Evil, in exchange for not destroying the world, demands the staggering sum of “… one MILLION dollars.” In the year 2014, billions of records is not a particularly large database, and searching through billions of records is not an onerous requirement. The metadata for a billion calls would fit on one of those souvenir thumb drives they give away at conferences; or if you want more secure, backed up storage, Amazon will rent you what you need for $3 a month. Searching through a billion records looking for a particular phone number seems to take a few minutes on my everyday laptop, but that is only because I didn’t bother to build a simple index, which would have made the search much faster. This is not rocket science.

And Tim Edgar have started thinking about how to solve the dragnet problem.

One helpful technique, private information retrieval, allows a client to query a server without the server learning what the query is.  This would allow the NSA to query large databases without revealing their subjects of interest to the database holder, and without collecting the entire database.  Recent advances should allow such private searches across multiple, very large databases, a key requirement for the program.  The use of these cryptographic techniques would make the need for a separate consortium that holds the data unnecessary.  I discussed this in more detail in my testimonybefore the Senate Select Committee on Intelligence last fall.  Seny Kamara of Microsoft Researchpoints out these techniques were first outlined over fifteen years ago, while the state of the art is outlined in “Useable, Secure, Private Search” from IEEE Security and Privacy.

But I want to consider something both point to that President Obama said in his speech which both Felten and Edgar consider.

Relying solely on the records of multiple providers, for example, could require companies to alter their procedures in ways that raise new privacy concerns.

I’m admittedly obsessed by this, but one processing step the NSA currently uses on dragnet data seems to pose particularly significant privacy concerns: the data integrity role, in which high volume numbers — pizza joints, voice mail access numbers, and telemarketers, for example — are “defeated” before anyone starts querying the database.

This training module from 2011 (and therefore before some apparent additions to the data integrity role, as I’ll lay out in a future post) describes three general technical roles, the first of which would be partly eliminated if the telecoms kept the data.

  • Ensuring production meets the terms of the order and destroying that which exceeds it (5)
  • Ensuring the contact-chaining process works as promised to FISC (much of this description is redacted) (7)
  • Ensuring that all BR and PR/TT queries are tagged as such, as well as several other redacted tasks (this tagging feature was added after the 2009 problems) (9)

The first and third are described as “rarely coming into contact with human intelligible” metadata (the first function would likely see more intelligible data on intake of completed queries from the telecoms). But — assuming a parallel structure across these three descriptions — the redacted description on page 8 suggests that the middle function — what elsewhere is called the data integrity function — has “direct and continual access and interaction” with human intelligible metadata.

And indeed, the 2009 End-to-End Review and later primary orders describe the data integrity analysts querying the database with non-RAS approved identifiers to determine whether they’re high volume identifiers that should be taken out of the dragnet.

Those analysts are not just accessing data in raw form. They’re making analytic judgments about it, as this description from the E-2-E report explains.

As part of their Court-authorized function of ensuring BR FISA metadata is properly formatted for analysis, Data Integrity Analysts seek to identify numbers in the BR FISA metadata that are not associated with specific users, e.g., “high volume identifiers.” [Entire sentence redacted] NSA determined during the end-to-end review that the Data Integrity Analysts’ practice of populating non-user specific numbers in NSA databases had not been described to the Court.

(TS//SI//NT) For example, NSA maintains a database, [redacted] which is widely used by analysts and designed to hold identifiers, to include the types of non-user specific numbers referenced above, that, based on an analytic judgment, should not be tasked to the SIGINT system. Read more