How NSA Hunts Metadata “Content” in Search of Your Digital Tracks

Screen shot 2013-12-30 at 10.12.55 AMDer Speigel has posted a set of slides associated with their story on how NSA’s TAO hacks targets.

The slides explain how analysts can find identifiers (IPs, email addresses, or cookies) they can most easily use to run a Quantum attack.

Because NSA is most successful hacking Yahoo, Facebook, and static IPs, it walks analysts through how to use Marina (or “QFDs,” which may be Quantum specific databases) to find identifiers for their target on those platforms. If they can’t find one of them, it also notes, analysts can call on GCHQ to hack Gmail. Once they find other identifiers, they can see how often the identifier has been “heard,” and how recently, to assess whether it is a still-valid identifier.

The slides are fascinating for what they say about NSA’s hacking (and GCHQ’s apparent ability to bypass Google’s encryption, perhaps by accessing their own fiber). But they’re equally interesting for what they reveal about how the NSA is using Internet metadata.

The slides direct analysts to enter a known identifier, to find all the other known identifiers for that user, which are:

determined by linking content (logins/email registrations/etc). It is worth verifying that these are indeed selectors associated to your target. [my emphasis]

This confirms something — about Internet metadata, if not yet phone metadata — that has long been hinted. In addition to using metadata to track relationships, they’re also using it to identify multiple identities across programs.

This makes plenty of sense, since terrorists and other targets are known to use multiple accounts to hide their identities. Indeed, doing more robust such matching is one of the recommendations William Webster made after his investigation of Nidal Hasan’s contacts with Anwar al-Awlaki, in part because Hasan contacted Awlaki via different email addresses.

But it does raise some issues. First, how accurate are such matches? The NSA slides implicitly acknowledge they might not be accurate, but it provides no clues how analysts are supposed to “verify[] that these are indeed selectors associated to your target.” In phone metadata documents, there are hints that the FISC imposed additional minimization procedures for matches made with US person identifiers, but it’s not clear what kind of protection that provides.

Also, remember NSA was experiencing increased violation numbers in early 2012 in significant part because of database errors, and Marina errors made up 21% of those. If this matching process is not accurate, that may be one source of error.

Also, note that NSA itself calls this “content,” not metadata. It may be they’ve associated such content via other means, not just metadata collection, but given NSA’s “overcollection” of metadata under the Internet dragnet, almost certainly collecting routing data that count as content, it does reflect the possibility they themselves admit this goes beyond metadata. Moreover, this raises real challenges to NSA claims that they don’t know the “identity” of the people they track in metadata.

Now, none of this indicates US collection (though it does show that NSA continues to collect truly massive amounts of Internet traffic from some location). But the slide above does show NSA monitoring whether this particular user was “seen” at US-[redacted] in the last 14 days. US-[redacted] is presumably a US-associated SIGAD (collection point). (They’re looking for a SIGAD from which they can successfully launch Quantum attacks, so seeing if their target’s traffic uses that point commonly.) While that SIGAD may be offshore, and therefore outside US legal jurisdiction, it does suggest this monitoring takes place within the American ambit.

At least within the Internet context, Marina functions not just as a collection of known relationships, but also as a collection of known data intercepts, covering at least a subset of traffic. They likely do similar things with international phone dragnet collection and probably the results of US phone dragnet in the “corporate store” (which stores query results).

In other words, this begins to show how much more the NSA is doing with metadata than they let on in their public claims.

Update: 1/1/14, I’m just now watching Jacob Appelbaum’s keynote at CCC in Berlin. He addresses the Marina features at 22:00 and following. He hits on some of the same issues I do here.

Marcy has been blogging full time since 2007. She’s known for her live-blogging of the Scooter Libby trial, her discovery of the number of times Khalid Sheikh Mohammed was waterboarded, and generally for her weedy analysis of document dumps.

Marcy Wheeler is an independent journalist writing about national security and civil liberties. She writes as emptywheel at her eponymous blog, publishes at outlets including the Guardian, Salon, and the Progressive, and appears frequently on television and radio. She is the author of Anatomy of Deceit, a primer on the CIA leak investigation, and liveblogged the Scooter Libby trial.

Marcy has a PhD from the University of Michigan, where she researched the “feuilleton,” a short conversational newspaper form that has proven important in times of heightened censorship. Before and after her time in academics, Marcy provided documentation consulting for corporations in the auto, tech, and energy industries. She lives with her spouse and dog in Grand Rapids, MI.

38 replies
  1. Saul Tannenbaum says:

    In my career, I spent a lot of time on “identity management”, that is, the technology of matching digital identities to actual people. This was in the context of universities where the problem was understood and well-bounded. Students, for example, are often university employees, and if you have separate student and employee tracking systems (like almost all universities) you need a way to understand that this is one person with multiple roles. Universities that did this well had teams of people who would meet regularly to resolve the hard identity matching cases. So, I think it’s fair to conclude that the NSA’s identity matching isn’t going to be very accurate. It wasn’t for us, and we didn’t have to deal with people trying to conceal their identities, except for enrolled celebrities and there we were complicit in the deception.

    Those Der Spiegel slides are a goldmine.

    [Previous stuff, um, redacted, because I was confusing identity management SSOs – single sign ons – with NSA SSO – their Special Source Operations. So much for my personal acronym identity managment accuracy.]

  2. Saul Tannenbaum says:

    @orionATL: “Quantum” is the NSA codename for the suite of systems it uses to implant malware on vulnerable, targeted systems.

    The folks who make up these codenames seem to have a good time at it. So far, my favorite is COMMONDEER.

  3. Frank33 says:

    NSA never meta-data that they did not love.

    I am sorry forgive me. But I am sure many have noted the obvious. Meta-data contains the so-called data or content. And also obvious, anyone with sufficient resources can hijack our information. Almost every device has a MAC code, that can be used for bad things. How do we protect privacy? I will have to think about that problem :)

  4. orionATL says:

    @Saul Tannenbaum:

    thanks.

    i too have enjoyed both the double-entendres and trying to guess what the signal thiefs were revealing about the intent of their various programs by the names they assigned to them.

    “marina”, for example – a place where you park your boat in between excursions and ??

  5. thatvisionthing says:

    Blarney was good. Egotistical Giraffe was good. I gaped at the program/agency seal that said “Nothing But Net”. And just now in the slide above, Mutant Broth? Check that box to search GCHQ databases.

  6. thatvisionthing says:

    Question, in the gchq slide linked above, what’s an R&T analyst and what’s yahoo_b and yahoo_l/y? Quantum_biscuit also?

    I know ew linked to it from another diary today, but this Alice in Wonderland passage feels so ping:

    ‘Where do you come from?’ said the Red Queen. ‘And where are you going? Look up, speak nicely, and don’t twiddle your fingers all the time.’

    Alice attended to all these directions, and explained, as well as she could, that she had lost her way.

    ‘I don’t know what you mean by YOUR way,’ said the Queen: ‘all the ways about here belong to ME—but why did you come out here at all?’ she added in a kinder tone. ‘Curtsey while you’re thinking what to say, it saves time.’

    – See more at: http://www.emptywheel.net/2013/05/15/putins-game/

  7. earlofhuntingdon says:

    Pity that the government can act so illegally when physically outside its own territorial limits. Since government is a creature of some foundational law, a constitution, Grundgesetz, whatever, it would seem a good idea that it be bound to its own laws wherever it acts. Otherwise, it grants itself a get out of jail free card, at least when it happens to be the biggest bull in the china shop. I fear the USG has become what Winston Churchill once said of John Foster Dulles, Eisenhower’s prudish, self-righteous, vengeful Secretary of State: the bull that carries its own china shop with it wherever it goes.

  8. Frank33 says:

    @bloodypitchfork:

    Thanks a lot!

    “Raise your hand if you are not surprised by continuous beaming of microwaves into your brain”

    I would raise my hand not because I am smart, but because I am paranoid.

  9. Snoopdido says:

    Emptywheel wrote:

    “Because NSA is most successful hacking Yahoo, Facebook, and static IPs, it walks analysts through how to use Marina (or “QFDs,” which may be Quantum specific databases) to find identifiers for their target on those platforms.”

    “QFDs” evidently are associated with MySQL and are called “Question-Focused Datasets” per this TS/SCI Raytheon job description: http://jobs.raytheon.com/jobs/senior-java-user-interface-developer-job-linthicum-maryland-1-3760291

  10. Snoopdido says:

    @Snoopdido: And what are “Question-Focused Datasets” you ask?

    Per this link (http://sciencewise.info/resource/Question-focused_dataset/Question-focused_dataset_by_Wikipedia):

    “A question-focused dataset (QFD) is a subset of data that is derived from one or more parent data sources and substantively transformed in order to answer a specific analytic question or small set of questions. Since by definition a QFD is designed with a specific question in mind, it should perform much better at answering the question that the parent repository.”

  11. bloodypitchfork says:

    @Snoopdido: I find these statements above and below your quote really odd…

    quote”This article is an orphan, as no other articles link to it.”

    (snip)

    This article does not cite any references or sources.”unquote

    Bored NSA analyst perhaps? :)

  12. lefty665 says:

    @thatvisionthing: “Mutant Broth? Check that box to search GCHQ databases.”

    A Macbeth allusion? It might reflect something of a ‘tude about having to ask the Brits. “Double double toil and trouble, fire burn and cauldron bubble”. Validation would be a sub inquiry “eye of newt”. “Owlet’s wing” would be a good one too.

  13. lefty665 says:

    Well thanks EW. Since your logins aren’t encrypted, they know that lefty665 here is linked to the same email as lefty667 at FDL, so we’re more than “just neighbors”. At least we’re across the street from you know who:)

  14. Frank33 says:

    @Tom in AZ:

    They are part of the new COINTELPRO. Spandan C catapults the propaganda of the most disgusting war pimp sychophants. Support the wars. Support the Homeland Fascist Police State. Support the Unconstitutional Torture, Indefinite Detention, and Drone Robot Murder. The people who comment are even worse.

    Spandan C got on the fighting side of me, when he catapulted falsehoods about an American hero, Dan Ellsberg. When they discuss National Security, it is full metal neocon garbage.

    If you do disagree there, your comment will be censored.

  15. bloodypitchfork says:

    @lefty665: quote” @Frank33:

    “Brains are the sexiest part of the body.” unquote

    Youbetcha, that’s why I get off on Physics porn. :)@john francis lee:
    quote:”The question is … what are we gonna do when the evil Supremes announce that the totalitarian state is constitutional? “unquote

    Indeed. I’ve been saying that one all along. Frankly, when it comes to National Security, I have no doubt they will not risk becoming the fall guys should another attack occur vis a vis 9/11. I believe that’s why Leon covered his ass with the word likely.

    In that regard, if you viewed the video I linked to above, even Applebaum said he WENT to Congress with his proof shown in the video and not ONE SINGLE Congress asshole would talk to him, because..they DON’T have a political “solution”. That tells me something. Actually, notwithstanding the proclivity of many American citizens to not give a shit, given what he has shown, and the fact that every little one horse town in the entire country has become militarized, not to mention all the other Orwellian bullshit happening today, I’d submit we’re well on our way to TOTALITARIANISM-R-US on steroids. Especially after seeing what the stinking NSA has now. That’s why I’m of the opinion, there is only ONE thing that will shake this nation out of it’s stupor. A full on insurrection inducing revelation from Snowden proving 9/11 was perpetrated by USG traitors. However, I also believe before this is over, another attack will happen which will provide the raison d’être for the coming Police State’s existence.

    I pray I’m wrong.

  16. ess emm says:

    @bloodypitchfork: Watched Appelbaum’s talk. It is absolutely unreal what the NSA has done apparently with the full cooperation of the computer industry. As China Matters points out,

    a significant element of the Snowden story is the collusion between Big Tech and the NSA, fueled by the awareness that both sides want the same thing: a thoroughly backdoored Internet open to individual data profiling and surveillance penetration (and tolerate the resultant security breaches as cost of doing business/collateral damage).

    We’re pwned.

  17. bloodypitchfork says:

    @ess emm: Yes, it is absolutely unreal. And that is the point. Unfortunately, Applebaum didn’t have time, nor was it the place to go into deeper analysis of each of the capabilities. This is why I go to sites like this…

    http://www.captainsjournal.com/2013/12/30/nsa-spying/

    which, as usual, I only found by my daily blog coverage of this site..

    http://sipseystreetirregulars.blogspot.com/

    Which when push comes to shove, is more relevant to the coming refusal of the SC to address the coming police state vs the 4th Amendment. That’s why we were given the 2nd.

  18. ess emm says:

    By the way, the TAO cant be doing interdiction of computers sent by mail on US soil, it has to be the FBI, right? If TAO does it themselves, then like ew said, we have NSA statuary over-reach into domestic affairs (as if “incidental” collection wasnt already a violation of the 4A).

  19. bloodypitchfork says:

    @ess emm: quote:”By the way, the TAO cant be doing interdiction of computers sent by mail on US soil, it has to be the FBI, right?”unquote

    If what I saw on DemocracyNow yesterday is real, then I believe you are wrong.
    http://www.democracynow.org/2013/12/30/glenn_greenwald_the_nsa_can_literally

    Furthermore, as Appelbaum did not have the time, nor was it the place to further analyze each of the program revelations, and for those techy types, I believe this may serve to reveal deeper technical information..

    http://www.captainsjournal.com/2013/12/30/nsa-spying/

    Moreover, in my estimation, as this latest assault on the collectivist Totalitarinism-R-Us wannabes of the USG sublimely illustrates the gravity of the situation, it deserves to be spread as far and wide as humanly possible. In that light, I posted the links on every non relative blog, forum and site that I visit on a daily basis, regardless of their “rules, conventions or sphere of interest”. After all, in my opinion, Appelbaums presentation succinctly illustrates, the war for a free internet has now began in earnest. Torpedos Los!

  20. bloodypitchfork says:

    hmmmm, after being totally consumed yesterday by the seriousness of what we are witnessing, I completely missed the fact that tonight is New Years Eve. Considering the NSA’s internet collection and analysis and the recently passed 2014 NDAA(especially section 107) there exists the distinct possibility we may underestimate the potential liability each of us share as members of a community that is dissident oriented. In that light, should I not make it back here before tonight..

    HAPPY FUCKING NEW YEAR!

    :)

    bartender..a round for the house..on me. I propose a toast..here’s to the fucking NSA analysts. May these wretched scumbag traitors experience a lifetime of personal misfortune, turmoil and bodily function failure. After all, that is what you have personally cast upon mankind. Oh, and btw…eat me.

  21. Frank33 says:

    @bloodypitchfork:
    Or, we could encourage NSA employees, including NSA babes, to follow Edward Snowden’s example. They can become true patriots by discreetly and covertly releasing the secrets of the National Security Agency’s Universal Dragnet against American citizens.

  22. Saul Tannenbaum says:

    To the quesiton of what can be done, I’m going with Bruce Schneier’s analysis: We need to increase the cost of bulk surveillance by means of encryption and decentralization of services. Make it more expensive and they’ll be forced to do less of it (http://cctvcambridge.org/SchneierSnowden).

    And if you’re in a position to do so, donating to the Freedom of the Press Foundation (https://pressfreedomfoundation.org/) which is supporting a series of encryption tools, can’t hurt. (I have no relationship with them except as a donor.)

  23. bloodypitchfork says:

    @Frank33: Indeed. Ok, in that case, ..

    Dear all you NSA dudes and dudette’s. As employees of the NSA, who’ve been bamboozeled by General Schmuck into thinking the so called “war on terror” is worth selling out your neighbors, friends, and your fellow citizens, for a hefty paycheck worth many times more than the average worker in this country, I feel behooved to enlighten you to a few facts.

    First off, you are working for the biggest terrorist on the face of the planet, who’s documented war crimes stretch clear back to the dawn of the Republic and continue to this day by virtue of the monstrous murder by Drone program that YOU support via NSA’s conspiracy with CIA/DOD war criminals. Under the Nuremberg principles, you have no defense should the time come massive changes occur in the supranational sovereign status of the United States government.

    Second, notwithstanding the financial deception of the century by virtue of the documented lies the Director of your employer has told Congress and the FISA court in order to gain the exponentially massive funding increases, you are also a co-conspirator in the largest civil rights fraud ever perpetrated on a people in history, of which whistle blowers such as Peter Drake and others have been prosecuted to the full force of the USG for simply trying to expose those crimes which you help keep secret from the American public and keep YOU employed.

    Third, now that one of your former associates has risked his life and future to expose the very programs you use daily to spy on the entire planet, you are also guilty Misprison of Felony, by virtue withholding secondary evidence of those very programs to which Edward Snowden handed journalists, and is now accused of the absurd charge of Espionage.

    Fourth. By ignoring your moral duty to help improve the human condition, you risk bringing the entire planet to the brink of Totalitarianism, while you enjoy financial benefits and secular freedom’s of stable employment, even though those very benefits come at the cost of burning the Magna Carta and US Constitution to ashes.

    And last but not least, the truth at very heart of the reason why the War on Terror is a monumental distortion of the threat claimed by the NSA, notwithstanding the lies of General Alexander and James Clapper.

    It is statistically real that:

    You are 35,079 times more likely to die from heart disease than from a terrorist

    You are 33,842 times more likely to die from cancer than from a terrorist

    obesity is 5,882 to 23,528 times more likely to kill you than a terrorist

    you are 5,882 times more likely to die from medical error than terrorism

    you’re 4,706 times more likely to drink yourself to death than die from terrorism

    you are 1,904 times more likely to die from a car accident than from a terrorist

    your meds are thousands of times more likely to kill you than Al Qaeda

    you’re 2,059 times more likely to kill yourself than die at the hand of a terrorist

    you’re 452 times more likely to die from risky sexual behavior than terrorism

    you’re 353 times more likely to fall to your death … than die in a terrorist attack

    you are 271 times more likely to die from a workplace accident than terrorism

    you are 187 times more likely to starve to death in America than be killed by terrorism

    you’re about 22 times more likely to die from a brain-eating zombie parasite than a terrorist

    you are more than 9 times more likely to be killed by a law enforcement officer than by a terrorist

    you are 37 times more likely being crushed to death by TV or furniture” than being killed by a terrorist

    Americans are 110 times more likely to die from contaminated food than terrorism

    you are more likely to be killed by a toddler than a terrorist

    you are four times more likely to be struck by lightning than killed by a terrorist.

    In closing, I submit either you are incapable of understanding the nature of the Death State, or you are simply a traitor. Make no mistake. The choice is yours to rectify your position. In the meantime, the entire planet is awakening to your criminality, and soon, you will have no place to hide. In that light, it would behoove you to think wisely.

    ok Frank33..I’ve done my part.

    bartender..set em up. And make them doubles.

  24. tom says:

    QFDs — we’ve seen them discussed in previously released NSA documents, notably the 24 page review of C-Traveller analytics. As correctly noted above, they’re specialized databases derived from the multi-trillion record megabases that are too slow to query effectively. Not just a sub-set of records but things calculated from them, like velocity of travel between two cell tower touches, that are only implicit in the original records.

  25. Tom in AZ says:

    @Frank33: Thanks, Frank. That sounds ab out right, and I did have a couple of comments go down the rabbit hole before just avoiding the place. I will look in about once a month when rolling down my bookmarks list, checking places occasionally. Unlike the daily visits here.

  26. GKJames says:

    We could, of course, consider turning the equation on its head. If there’s going to be all this collecting, let it be for a useful purpose: bring the NSA under the Library of Congress, organize the information in a useful way, and let the public have access for search and retrieval. Am sure that there are thousands of chocolate chip cookie recipes that’ve been slung around the Internet, but which people have had a hard time finding again. Then, periodically, we can take a gigantic server, slap a “Time Capsule” label on it, and shoot it into space to show whatever creature comes upon it what we’ve been up to in the land of the free home of the brave on little old earth.

Comments are closed.