For the Purposes of Analytical Efficiency, Making Copies of the Dragnet
In 2008, NSA started (or started telling the FISA Court) it was copying the dragnet.
Starting with the January docket BR 08-01 (the date is illegible but it should be around January 4, 2008), the orders added a footnote saying,
5 The Court understands that for the purposes of analytical efficiency a copy of meta data obtained pursuant to the Court’s Orders in this matter will be stored in the same database with data obtained pursuant to other NSA authorities and data provided to NSA from other sources. Access to such records shall be strictly limited in accordance with the procedures set forth in paragraphs A – G.
The footnote would appear in four more orders that year: BR 08-04 4/3/08; BR 08-07 6/26/08; BR 08-08 9/19/08?. Then it disappeared in the December 11, 2008 docket, BR 08-13 12/11/08. It did not appear in any other orders, though starting with the October 29, 2010 docket BR 10-70, a different footnote noted that “NSA will maintain the BR metadata in recovery back-up systems.”
The change almost certainly relates to the federated query system, in which all the data from EO 12333 collection (and, given the reference to “data provided to NSA from other sources,” probably GCHQ collection) was and, at least until 2011, remained accessible from one interface.
The footnote almost certainly does reflect a change in the way NSA handled the data (that is, in this case NSA informed FISC in timely fashion), because by April of that year, 31 “newly trained” NSA analysts were caught querying domestic phone data using 2,373 identifiers without knowing they were doing so, which seems to indicate the “newly trained” analysts just kept querying metadata as they would have using EO 12333 collected data. Though NSA didn’t tell FISC about that until 6 months later. In the interim (in August 2008), NSA also told FISC about how it correlated numbers — which we know works across data sources, not exclusively within the domestic data collection.
In other words, NSA was slowly integrating the phone dragnet in with its larger metadata collection, and informing — perhaps even more slowly — FISC what that meant.
In spite of the disappearance of the footnote in the first orders dealing with the dragnet problems in 2009, the NSA did not segregate the data from the federated interface. That’s clear from a memorandum of understanding NSA issued sometime after March 18, 2009 indicating that access to one metadata repository had been shut down, but four were still accessible:
- SIGINT dating back to 1998
- [redacted — which could be STELLAR WIND data or could be foreign-supplied data]
- BRFISA dating back to May 2006
- PR/TT dating back to a redacted date that public records show to be July 2004
Given the previous inclusion of 3,000 US persons in with other queries, it’s possible the newly excluded collection consisted of GCHQ collected data that included significant US person data.
I raise all this to point out one of the inherent dangers with the dragnet. A program that was billed as a simple collection designed to serve FBI needs got integrated within 2 years of inception, creating a great deal of problems, without reconsideration of whether the stated purpose of the dragnet still matched what the by-then clearly different intent was. And this from a program that was supposed to be closely minimized.
Oh by the way, NSA told the FISC, we made an extra copy of the database of all phone-based relationships in the United States. Because it’s more efficient to have two databases.
The whole purpose of building a “Big Data” system, like the NSA did with Accumulo, is to be able to throw everything into it. Accumulo, like Hadoop on which it is based, is a schema-less database. There are a number of reasons for a schema-less database, but the primary one is that you don’t need to know, in advance, what data elements (which would be enumerated in a schema) you’re going to store. So, when you come up with a new set of data, you build an appropriate ingestion procedure and store the stuff. I would speculate that the federated query system was developed before the NSA had sufficient big data capabilities and is slowly being transitioned out of use.
It’s worth noting, too, how terrible some of this stuff is in terms of process engineering? Make sure you copy the identifier correctly from the spreadsheet? Seriously? The NSA seems to have no problem putting resources on the real sexy stuff – big data – but can’t seem to be bothered with user interface design. Remember this as well as the screenshots we’ve seen of the shitty-looking Windows interface next time you see a movie or a TV show with a hacker working with a fancy advanced interface.
@Saul Tannenbaum: The 2012 report from WaPo suggested Marina may be in transition, which would make sense bc if it really includes all Internet metadata then the entire concept of what they take has been changing.