Posts

The August 20, 2008 Correlations Opinion

Screen Shot 2014-04-08 at 10.18.39 AMOn August 18, 2008, the government described to the FISA Court how it used a particular tool to establish correlations between identifiers. (see page 12)

A description of how [name of correlations tool] is used to correlate [description of scope of metadata included] was included in the government’s 18 August 2008 filing to the FISA Court,

 

On August 20, 2008, the FISC issued a supplemental opinion approving the use of “a specific intelligence method in the conduct of queries (term “searches”) of telephony metadata or call detail records obtained pursuant to the FISC’s orders under the BR FISA program.” The government claims that it cannot release any part of that August 20, 2008 opinion, which given the timing (which closely tracks with the timing of other submissions and approvals before the FISC) and the reference to both telephony metadata and call detail records almost certainly approves the use of the dragnet — and probably not just the phone dragnet — to establish correlations between a target’s multiple communications identifiers.

As ODNI’s Jennifer Hudson described in a declaration in the EFF suit, the government maintains that it cannot release this opinion, in spite of (or likely because of) ample description of the correlations function elsewhere in declassified documents.

The opinion is only six pages in length and the specific intelligence method is discussed at great length in every paragraph of this opinion, including the title. Upon review of this opinion, I have determined that there is no meaningful, segregable, non-exempt information that can be released to the plaintiff as the entire opinion focuses on this intelligence method. Even if the name of the intelligence method was redacted, the method itself could be deduced, given other information that the DNI has declassified pursuant to the President’s transparency initiative and the sophistication of our Nation’s adversaries [Ed: did she just call me an “adversary”?!?] and foreign intelligence services.

[snip]

The intelligence method is used to conduct queries of the bulk metadata, and if NSA were no longer able to use this method because it had been compromised, NSA’s ability to analyze bulk metadata would itself be compromised. A lost or reduced ability to detect communications chains that link to identifiers associated with known and suspected terrorist operatives, which can lead to the identification of previously unknown persons of interest in support of anti-terrorism efforts both within the United States and abroad, would greatly impact the effectiveness of this program as there is no way to know in advance which numbers will be responsive to the authorized queries.

ACLU’s snazzy new searchable database shows that this correlations function was discussed in at least three of the officially released documents thus far: in the June 25, 2009 End-to-End Review, in a June 29, 2009 Notice to the House Intelligence Committee, and in the August 19, 2009 filing submitting the End-to-End Review to the FISC.

In addition to making it clear this practice was explained to the FISC just before the Supplemental Opinion in question, these documents also describe a bit about the practice.

They define what a correlated address is (and note, this passage, as well as other passages, do not limit correlations to telephone metadata — indeed, the use of “address” suggests correlations include Internet identifiers).

The analysis of SIGINT relies on many techniques to more fully understand the data. One technique commonly used is correlated selectors. A communications address, or selector, is considered correlated with other communications addresses when each additional address is shown to identify the same communicant as the original address.

They describe how the NSA establishes correlations via many means, but primarily through one particular database.

NSA obtained [redacted] correlations from a variety of sources to include Intelligence Community reporting, but the tool that the analysts authorized to query the BR FISA metadata primarily used to make correlations is called [redacted].

[redacted] — a database that holds correlations [redacted] between identifiers of interest, to include results from [redacted] was the primary means by which [redacted] correlated identifiers were used to query the BR FISA metadata.

They make clear that NSA treated all correlated identifiers as RAS approved so long as one identifier from that user was RAS approved.

In other words, if there: was a successful RAS determination made on any one of the selectors in the correlation, all were considered .AS-a. ,)roved for purposes of the query because they were all associated with the same [redacted] account

And they reveal that until February 6, 2009, this tool provided “automated correlation results to BR FISA-authorized analysts.” While the practice was shut down in February 2009, the filings make clear NSA intended to get the automated correlation functions working again, and Hudson’s declaration protecting an ongoing intelligence method (assuming the August 20, 2008 opinion does treat correlations) suggests they have subsequently done so.

When this language about correlations first got released, it seemed it extended only so far as the practice  — also used in AT&T’s Hemisphere program — of  matching call circles and patterns across phones to identify new “burner” phones adopted by the same user. That is, it seemed to be limited to a known law enforcement approach to deal with the ability to switch phones quickly.

But both discussions of the things included among dragnet identifiers — including calling card numbers, handset and SIM card IDs — as well as slides released in stories on NSA and GCHQ’s hacking operations (see above) make it clear NSA maps correlations very broadly, including multiple online platforms and cookies. Remember, too, that NSA analysts access contact chaining for both phone and Internet metadata from the same interface, suggesting they may be able to contact chain across content type. Indeed, NSA presentations describe how the advent of smart phones completely breaks down the distinction between phone and Internet metadata.

In addition to mapping contact chains and identifying traffic patterns NSA can hack, this correlations process almost certainly serves as the glue in the dossiers of people NSA creates of individual targets (this likely only happens via contact-chaining after query records are dumped into the corporate store).

Now it’s unclear how much of this Internet correlation the phone dragnet immediately taps into. And my assertion that the August 20, 2008 opinion approved the use of correlations is based solely on … temporal correlation. Yet it seems that ODNI’s unwillingness to release this opinion serves to hide a scope not revealed in the discussions of correlations already released.

Which is sort or ridiculous, because far more detail on correlations have been released elsewhere.

How NSA Hunts Metadata “Content” in Search of Your Digital Tracks

Screen shot 2013-12-30 at 10.12.55 AMDer Speigel has posted a set of slides associated with their story on how NSA’s TAO hacks targets.

The slides explain how analysts can find identifiers (IPs, email addresses, or cookies) they can most easily use to run a Quantum attack.

Because NSA is most successful hacking Yahoo, Facebook, and static IPs, it walks analysts through how to use Marina (or “QFDs,” which may be Quantum specific databases) to find identifiers for their target on those platforms. If they can’t find one of them, it also notes, analysts can call on GCHQ to hack Gmail. Once they find other identifiers, they can see how often the identifier has been “heard,” and how recently, to assess whether it is a still-valid identifier.

The slides are fascinating for what they say about NSA’s hacking (and GCHQ’s apparent ability to bypass Google’s encryption, perhaps by accessing their own fiber). But they’re equally interesting for what they reveal about how the NSA is using Internet metadata.

The slides direct analysts to enter a known identifier, to find all the other known identifiers for that user, which are:

determined by linking content (logins/email registrations/etc). It is worth verifying that these are indeed selectors associated to your target. [my emphasis]

This confirms something — about Internet metadata, if not yet phone metadata — that has long been hinted. In addition to using metadata to track relationships, they’re also using it to identify multiple identities across programs.

This makes plenty of sense, since terrorists and other targets are known to use multiple accounts to hide their identities. Indeed, doing more robust such matching is one of the recommendations William Webster made after his investigation of Nidal Hasan’s contacts with Anwar al-Awlaki, in part because Hasan contacted Awlaki via different email addresses.

But it does raise some issues. First, how accurate are such matches? The NSA slides implicitly acknowledge they might not be accurate, but it provides no clues how analysts are supposed to “verify[] that these are indeed selectors associated to your target.” In phone metadata documents, there are hints that the FISC imposed additional minimization procedures for matches made with US person identifiers, but it’s not clear what kind of protection that provides.

Also, remember NSA was experiencing increased violation numbers in early 2012 in significant part because of database errors, and Marina errors made up 21% of those. If this matching process is not accurate, that may be one source of error.

Also, note that NSA itself calls this “content,” not metadata. It may be they’ve associated such content via other means, not just metadata collection, but given NSA’s “overcollection” of metadata under the Internet dragnet, almost certainly collecting routing data that count as content, it does reflect the possibility they themselves admit this goes beyond metadata. Moreover, this raises real challenges to NSA claims that they don’t know the “identity” of the people they track in metadata.

Now, none of this indicates US collection (though it does show that NSA continues to collect truly massive amounts of Internet traffic from some location). But the slide above does show NSA monitoring whether this particular user was “seen” at US-[redacted] in the last 14 days. US-[redacted] is presumably a US-associated SIGAD (collection point). (They’re looking for a SIGAD from which they can successfully launch Quantum attacks, so seeing if their target’s traffic uses that point commonly.) While that SIGAD may be offshore, and therefore outside US legal jurisdiction, it does suggest this monitoring takes place within the American ambit.

At least within the Internet context, Marina functions not just as a collection of known relationships, but also as a collection of known data intercepts, covering at least a subset of traffic. They likely do similar things with international phone dragnet collection and probably the results of US phone dragnet in the “corporate store” (which stores query results).

In other words, this begins to show how much more the NSA is doing with metadata than they let on in their public claims.

Update: 1/1/14, I’m just now watching Jacob Appelbaum’s keynote at CCC in Berlin. He addresses the Marina features at 22:00 and following. He hits on some of the same issues I do here.

Federated Queries and EO 12333 FISC Workaround

Particularly given the evidence NSA started expanding its dragnet collection overseas as soon as the FISA Court discovered it had been breaking the law for years, I’ve been focusing closely on the relationship between the FISA Court-authorized dragnets (which NSA calls BR FISA — Business Records FISA — and PR/TT — Pen Register/Trap and Trace — after the authorities used to collect the data) and those authorized under Executive Order 12333.

This document — Module 4 of a training program storyboard that dates to late 2011 — provides some insight of how NSA trained its analysts to use international collections to be able to share data otherwise restricted by FISC.

The module lays out who has access to what data, then describes how analysts look up both the Reasonable Articulable Suspicion (RAS) determinations of identifiers they want to query on, as well as the BR and PR/TT credentials of those they might share query results with. It also describes how “EAR” prevents an analyst from querying BR or PR/TT data with any non-RAS approved identifier. So a chunk of the module shows how software checks should help to ensure the US-collected data is treated according to the controls imposed by FISC.

But the module also describes how a software interface (almost certainly MARINA, the metadata database) manages all the metadata collected from all over the world.

All of it, in one database.

So if you do what’s called a “federated” query with full BR and/or PR/TT credentials — meaning it searches on all collections the analyst has credentials for, with BR and PR/TT being the most restrictive — you may pull metadata collected via a range of different programs. Alternately, you can choose just to search some of the collections.

When launching analysts with [redacted] the appropriate BR or PR/TT credentials have the option to check a box if they wish to include BR or PR/TT metadata in their queries. If an analyst checks the “FISABR Mode” or “PENREGISTRY Mode” box when logging into [redacted] will perform a federated query. This means that in addition to either BR or PR/TT metadata, [redacted] will also query data collected under additional collection authorities, depending on the analyst’s credentials. Therefore, when performing a query of the BR or PR/TT metadata, analysts will potentially receive results from all of the above collection sources. Users of more recent versions of [redacted] do have the option, however, to “unfederate” the query, and pick and choose amongst the collection sources that they would like to query (10)

Back in 2009, when NSA was still working through disclosures of dragnet problems to FISC, analysts apparently had to guess where the data they were querying came from (which of course is an implicit admission that BR data had been improperly treated with weaker EO 12333 protections for years). But by 2011 they had worked it out so queries showed both what SIGAD (collection point) the metadata came from, as well as (using a classification mark) its highest classification.

It is possible to determine the collection source or sources of each result within the chain by examining the Producer Designator Digraph (PDDG)/SIGINT Activity Designator (SIGAD) and collection source(s) at the end of the line.

If at least one source of a result is BR or PR/TT metadata, the classification at the beginning of the line will contain the phrases FISABR or PR/TT, respectively. In addition, in the source information at the end of the line, the SIGAD [redacted] BR data can be recognized by SIGADs beginning with [redacted] For PR/TT, data collected after October 2010 is found [redacted] For a comprehensive listing of all the BR and PR/TT SIGADs as well as information on PR/TT data collected prior to November of 2009, contact your organization’s management or subject matter expert.

Since it is possible that one communication event will be collected under multiple collection authorities (and multiple collection sources), not all of the results will be unique to one collection authority (or collection source). Keep in mind that the classification at the beginning of each result only indicates the highest level classification of that result, and does not necessarily reflect whether a result was unique to one collection authority (or collection source). If a result was obtained under multiple authorities (or sources), you will see more [redacted] (15-16)

In other words, analysts will be able to see from their results where the results come from. If a query result includes data only from BR or PR/TT sources, then the analyst can’t share the result with anyone not cleared into those programs without jumping some hoops. But if a query result showed other means to come up with the same results from a BR or PR/TT search (that is, if EO 12333 data would return the same result), then the result would not be considered a BR- or PR/TT-unique result, meaning the result could be shared far more widely. (Note, this passage also provides more details about the timing of the Internet metadata shutdown, suggesting it may have lasted from November 2009 to October 2010.)

Sharing restrictions in the FISC Orders only apply to unique BR or PR/TT query results. If query results are derived from multiple sources and are not unique to BR and PR/TT alone, the rules governing the other collection authority would apply. (17)

After noting this, the training storyboard spends 5 pages describing the restrictions on dissemination or further data analysis of BR and PR/TT results, even summaries of those results.

Then it returns to the point that such restrictions only hold for BR- or PR/TT-unique results and encourages analysts to run queries under EO 12333 so as to be able to get a result that can be shared and further exploited.

 However, as we’ve discussed, not all BR or PR/TT results are unique. If a query result indicates it was derived from another collection source in addition to BR or PR/TT, the rules governing the other collection authority would apply to the handling an d sharing of that query result. For example, this result came from both BR and E.O. 12333 collection; therefore, because it is not unique to BR information, it would be ok to inform non- BR cleared individuals of the fact of this communication, as well as task, query, and report this information according to standard E.O. 12333 guidelines.

In summary, if a query result has multiple collection authorities, analysts should source and/or report the non-BR or PR/TT version of that query result according to the rules governing the other authority. But if it is unique to either the BR or PR/TT authority then it is a unique query result with all of the applicable BR and PR/TT restrictions placed on it. In both cases, however, analysts should not share the actual chain containing BR or PR/TT results with analysts who do not have the credentials to receive or view BR or PR/TT information. In such an instance, if it is necessary to share the chain, analysts should re-run the query in the non-BR or non-PR/TT areas of [redacted] and share that .cml. (22)

Let me be clear: none of this appears to be illegal (except insofar as it involves a recognition it is collecting US person data overseas, which may raise issues under a number of statutes). It’s just a kluge designed to use the US-based dragnet programs to pinpoint results, then use EO 12333 results to disseminate widely.

It does, obviously, raise big questions about whether the numbers reported to Congress on dragnet searches reflect the real number of searches and/or results, which will get more pressing if new information sharing laws get passed.

Mostly, though, it shows how NSA uses overseas collection to collect the same data on Americans without the restrictions on sharing it.

There are a lot of likely reasons to explain why the NSA stopped collecting Internet metadata in the US in 2011 (seemingly weeks after this version of the storyboard, though they would still be able to access the PR/TT metadata for 5 years Update 11/20/14: they destroyed the PRTT data in December 2011). But it is clear the overseas collection serves, in part, to get around FISC restrictions on dissemination and further analysis.

Updated: Added explanation for BR FISA and PR/TT abbreviations.

The Stalker Outside Your Window: The NSA and a Belated Horror Story

[photo: Gwen's River City Images via Flickr]

[photo: Gwen’s River City Images via Flickr]

It’s a shame Halloween has already come and gone. The reaction to Monday’s Washington Post The Switch blogpost reminds of a particularly scary horror story, in which a young woman alone in a home receives vicious, threatening calls.

There’s a sense of security vested in the idea that the caller is outside the house and the woman is tucked safely in the bosom of her home. Phew, she’s safe; nothing to see here, move along…

In reality the caller is camped directly outside the woman’s window, watching every move she makes even as she assures herself that everything is fine.

After a tepid reaction to the initial reporting last week, most media and their audience took very little notice of the Washington Post’s followup piece — what a pity, as it was the singular voice confirming the threat sits immediately outside the window.

Your window, as it were, if you have an account with either Yahoo or Google and use their products. The National Security Agency has access to users’ content inside the corporate fenceline for each of these social media firms, greasy nose pressed to glass while peering in the users’ windows.

There’s more to story, one might suspect, which has yet to be reported. The disclosure that the NSA’s slides reflected Remote Procedure Calls (RPCs) unique to Google and Yahoo internal systems is only part of the picture, though this should be quite frightening as it is.

Access to proprietary RPCs means — at a minimum — that the NSA has:

1) Access to content and commands moving in and out of Google’s and Yahoo’s servers, between their own servers — the closest thing to actually being inside these corporations’ servers.

2) With these RPCs, the NSA has the ability to construct remote login access to the servers without the businesses’ awareness. RPCs by their nature require remote access login permissions.

3) Construction through reverse engineering of proprietary RPCs could be performed without any other governmental bodies’ awareness, assuming the committees responsible for oversight did not explicitly authorize access to and use of RPCs during engineering of the MUSCULAR/SERENDIPITY/MARINA and other related tapping/monitoring/collection applications.

4) All users’ login requests are a form of RPC — every single account holder’s login may have been gathered. This includes government employees and elected officials as well as journalists who may have alternate accounts in either Gmail or Yahoo mail that they use as a backup in case their primary government/business account fails, or in the case of journalists, as a backchannel for handling news tips. Read more

The CNET “Bombshell” and the Four Surveillance Programs

CNET is getting a lot of attention for its report that NSA, “has acknowledged in a new classified briefing that it does not need court authorization to listen to domestic phone calls.”

In general, I’m just going to outsource my analysis of what the exchange means to Julian Sanchez (I hope he doesn’t charge me as much as Mike McConnell’s Booz Allen Hamilton for outsourced analysis).

What seems more likely is that Nadler is saying analysts sifting through metadata have the discretion to determine (on the basis of what they’re seeing in the metadata) that a particular phone number or e-mail account satisfies the conditions of one of the broad authorizations for electronic surveillance under §702 of the FISA Amendments Act.

[snip]

The analyst must believe that one end of the communication is outside the United States, and flag that account or phone line for collection. Note that even if the real target is the domestic phone number, an analyst working from the metadatabase wouldn’t have a name, just a number.  That means there’s no “particular, known US person,” which ensures that the §702 ban on “reverse targeting” is, pretty much by definition, not violated.

None of that would be too surprising in principle: That’s the whole point of §702!

That is, what Nadler may have learned that the same analysts who have access to the phone metadata may also have authority to issue directives to companies for phone content collection. If so, it would be entirely feasible for the same analyst to learn, via the metadata database, that a suspect phone number is in contact with the US and for her to submit a request for actual content to the providers, without having to first get a FISA order covering the US person callers directly. Since she was still “targeting” the original overseas phone number, she would be able to get the US person content without a specific order.

Screen shot 2013-06-16 at 11.50.59 AMI just want to point to a part of this exchange that everyone is ignoring (but that I pointed out while live tweeting this).

Mueller: I’m not certain it’s the same–I’m not certain it’s an answer to the same question.

Mueller didn’t deny the NSA can get access to US person phone content without a warrant. He just suggested that Nadler might be conflating two different programs or questions.

And that’s one of the things to remember about this discussion. Among many other methods of shielding parts of the programs, the government is thus far discussing primarily the two programs identified by the Guardian: the phone metadata collection (which the WaPo reports is called MAINWAY) and the Internet content access (PRISM).

Read more