How the NSA Connection Chains without Calls
For a very long time, I’ve been trying to figure out what the government means when it says it “connection chains” data call detail records under its Section 215 dragnet (and, possibly, once it passes, under USA F-ReDux).
The phone dragnet first started moving towards “connection chaining” in 2013, when Dianne Feinstein included the concept in her Fake FISA Fix.
Scope of permissible query return information:
For any query performed pursuant to paragraph (1)(D)(i), the query only may return information concerning communications—
(A) to or from the selector used to perform the query;
(B) to or from a selector in communication with the selector used to perform the query; or
(C) to or from any selector reasonably linked to the selector used to perform the query, in accordance with the court approved minimization procedures required under subsection (g). [my emphasis]
The February phone dragnet order that approved Obama’s modified approach also approved (though it may have approved earlier) chaining on “connections” in addition to “contacts” made.
The first “hop” from a seed returns results including all identifiers (and their associated metadata) with a contact and/or connection with the seed. The second “hop” returns results that include all identifiers (and their associated metadata) with a contact and/or connection with an identifier revealed by the first “hop.”
And all versions of USA Freedom Act, once the Intelligence Community got their whack at them, chained on “connections” as well as calls.
(iii) provide that the Government may require the prompt production of call detail records—
(I) using the specific selection term that satisfies the standard required under subsection (b)(2)(C)(ii) as the basis for production; and
(II) using call detail records with a direct connection to such specific selection term as the basis for production of a second set of call detail records;
The latest version of USA F-ReDux takes a different approach, with two hops, neither of which requires that Call Detail Records — defined as a set of 5 things that may but are not required to be included, just one of which involves calls made — reflect calls made. And the second hop invokes “session identifying information” that is divorced from the definition of CDRs that excludes (for example) location data.
(iii) provide that the Government may require the prompt production of a first set of call detail records using the specific selection term that satisfies the standard required under subsection (b)(2)(C)(ii);
(iv) provide that the Government may require the prompt production of a second set of call detail records using session-identifying information or a telephone calling card number identified by the specific selection term used to produce call detail records under clause (iii)
Absent more limiting language, I read this as permitting the government to require (immunized and compensated) providers to find CDRs using session identifier information that the government itself is not permitted to receive to find a set of “CDRs” of interest (again, without requiring that the CDRs have to reflect calls made, because that’s not a required aspect of the definition).
I’ve been having a hard time explaining what that might involve.
But today’s Intercept story shows what chaining NSA does that does not involve calls made.
As the slide, above (from this deck), makes clear, with data collected from Pakistan, they start with selectors of people who have not left Af-Pak, and then match phone use not involving calls made. It does this by training the computer on what is normal and what is unique to identifiers previously IDed as couriers. It proves its data works, of course, by showing that Ahmed Muwafak Zaidan is the top match, even though Zaidan isn’t a terrorist at all! But it shows that the government will use location data to “chain” on people connected primarily by location habits.
The other deck describes the Automated Bulk Cloud Analytics, SKYNET. The slide to the left describes tracking things, all but one of which involves “session identifying information” that doesn’t involve any actual calls made (though this scheme also has access to phrases made, which any domestic program could not).
- Travel patterns, including repeated visits to particular locations (obtained using location data)
- Patterns of call usage (incoming only, “excessive” SIM or handset swapping or power-downs probably indicating counter-surveillance)
- Co-travelers (obtained using location data — and we know AT&T does this under Hemisphere)
- Similar travel patterns (again, obtained using location data)
- Common contacts
Only common contacts involve calls made (though that could even come from address books, which we know NSA collects).
And the outcome of this process is a set of identifiers — some tasked, the others not yet tasked — all of which (as either IMSIs or Handsets) would qualify as CDRs under USA F-ReDux.
None of this proves this is what the government wants to do with the hop process under USA F-ReDux.
But it does show that the NSA has a whole approach to analysis that has nothing to do with contact chaining, chaining on calls made, but instead chains on connections. The key input to that process is location data, which the government can’t obtain as a CDR under USA F-ReDux, but which telecoms need to provide service and therefore would have available to conduct analysis (and again, AT&T does some of this analysis now under Hemisphere).
These slides don’t prove that’s what the government intends under USA F-ReDux. But it does show it’s the kind of thing the NSA does, regularly, with its metadata analysis.
I’m a little unclear on this. If the data point “location data” does not involve a “call made”, then is it simply “where your phone is at any given time”? That is, the essential piece of information that Verizon needs to know, in order to send you a call? And if so, can that be defeated by keeping your phone turned off except when making a call? (as John Rain does, and I’m sure other incognitoes?) Or is this not defeatable at all, if one wants to make use of cellular phone technology?
Yes. Your phone checks into its network (and also into potential WiFi networks) all the time. It’s basically saying “are you there?” “Yes I’m here if you need me.”
In both the telephony and especially the Internet side, sessions can include just that kind of checking in.
Check this image and the blogpost it came from — this is the “attach” process for what I think is a cellphone in LTE/4G network.
Depending on the device, the applications with which it is loaded, the network in which it is being used, there’s more “pings” going on between the device and network than depicted in this graphic.
If I understand this correctly, if this cellphone is powered on and using services besides voice, all this back-and-forth is going on between mobile device and the network.
What I don’t know is whether 1) this same back-and-forth may be happening with a device a user believes to be turned off; 2) if a similar array of activity can be created from the network side by way of a ping, as used to check packet loss.
Is this attachment process what is meant by “match phone use not involving calls made”?
EDIT: I don’t know yet whether this attachment process is similar for a 3G network. There might be some exchange in this process identifying the type of device and compatibility with 3G or LTE/4G.
EDIT-2: Clearly the documents we’ve seen leaked are overviews of functionality, not detailed technical documents outlining what is happening with the same specificity we see in the graphic above. IMO, they’re still hiding in semantics, fuzzing view by avoiding serious technical examination.
We continue to see virtually zero HUMINT employed; content leaked and/or revealed during investigation shows near complete reliance on fallible, manipulable SIGINT-only to ID people. In the case of Zaidan, avoiding any HUMINT validation of his work could easily play into a disinfo/discrediting op.
EDIT-3: Another “what I want to know”: Is there a common application on most-often suspected phones, written in a common language, most often used on targeted user equipment (UE), which may also earmark the users in a unique way? My nose itches thinking about this; did Zaidan use an app common to individuals in AQ/Taliban, but less common to general public? Hypothetically an app checking for updates during attach process may identify a specific UE — but it’s not a call.
this is good stuff, rayne. the blog master is an interesting character and seems, for some reason, to want to clue the world in on the intricacies and details of cellphone connection – and it is intricate.
with respect to:
“..We continue to see virtually zero HUMINT employed; content leaked and/or revealed during investigation shows near complete reliance on fallible, manipulable SIGINT-only to ID people. In the case of Zaidan, avoiding any HUMINT validation of his work could easily play into a disinfo/discrediting op…”
i have wondered about the same thing. but i am sceptical that there is no human interaction ex poste the machine id’ing a telephone usage trail/person. after all, it was (i assume) the purloined nsa documents that gave zaidan’s telephone data the name “zaidan”. this data then offers a wonderful killing opportunity behind closed state’s secret doors. just drone the guy and say it was a killing justified by telephone traffic (if admitting even that publicly), while appearing not to have known it was mr. x. – in other words, a “signature” strike that was not really just the usual sinature strike shot in the dark. sweet!!
adding:
though divining, and i do mean “divining”, a person’s INTENT through other than calling usage of his cellphone is the key idea here, the possession of a cellphone and that phone’s unbidden check-calling for contact still means nsa depends on electromagnetic info for its analysis. mere possession of a phone that is “live”, i.e., connected to a network, is enough to betray a person’s activities to the government – something for a thoughtful judge to think about.
I think you’re absolutely right about this.
I’ve assumed that, like cryptography, machine learning is something that the NSA is investing in heavily. In that context, what seems surprising to me is how unsophisticated SKYNET/Cloud Analytics seems compared to the state of the art. Look, for example, at what one intern for a one media company did in a summer – http://engineering.flipboard.com/2015/05/scaling-convnets/ – just to upscale images for display.
I think it’s also safe to assume that any data repository that the intelligence community has is being “machine learned” to predict whatever arbitrary behavior they’re interested in. Given how much of this has become commoditized (https://aws.amazon.com/machine-learning/), how much companies are investing in it, the NSA has to be doing more.
I’ve also assumed that, when it comes to signature drone strikes, what that really means is “our machine learning classify says this is where the drone should attack.”
That’s quite scary, if true. Machine learning, in general, works. But “is this person a terrorist who deserves to die?” is, to put it coldly, such an edge case that relying on machine learning seems tenuous, at best.