The Google/Yahoo Problem: Fruit of the Poison MCT?
OK, this will be my last post (at least today) to attempt to understand why some Internet providers incurred so many costs associated with the response to the FISA Court’s October 3, 2011 decision that the government had improperly collected US person data as part of Multiple Communication Transactions.
For the moment, I’m going to bracket the question of whether Google and Yahoo are included in upstream providers (though I think it more likely for Google than Yahoo). Footnote 3 in the October 3 opinion seems to distinguish upstream collection from collection from Internet service providers. Though note the entirely redacted sentence in that footnote that may modify that easy distinction.
But let’s consider how the violative data might be used. We know from the conference call the I Cons had the other day (you can listen along here) that this is primarily about getting email inboxes.
An intelligence official who would not be identified publicly described the problem to reporters during a conference call on Wednesday.
“If you have a webmail email account, like Gmail or Hotmail, you know that if you open up your email program, you will get a screenshot of some number of emails that are sitting in your inbox, the official said.
“Those are all transmitted across the internet as one communication. For technological reasons, the NSA was not capable of breaking those down, and still is not capable, of breaking those down into their individual [email] components.”
If one of those emails contained a reference to a foreign person believed to be outside the US – in the subject line, the sender or the recipient, for instance – then the NSA would collect the entire screenshot “that’s popping up on your screen at the time,” the official continued.
Now, whether or not this collection comes from the telecoms or the Internet companies themselves, it effectively serves as an index of Internet communications deemed interesting based on the participants or because the email talks about an approved selector.
But it may be that this upstream collection serves primarily to identify which content the government wants to collect.
In his November 30, 2011 opinion, Bates emphasized (see page 10) the limits on what analysts could do with properly segregated upstream MCTs in the future.
An analyst seeking to use (e.g., in a FISA application, in an intelligence report, or in a Section 702 targeting decision) a discrete communication within an Internet transaction that contains multiple discrete communications must document each of the determinations. [my emphasis]
Then, the September 25, 2012 opinion describes how, using threats that he would declare the previous collection a crime under 1809(a)(2), which prohibits the “disclosure” of any information collected illegally, Judge John Bates got the government purge that previous collection and any reports generated from it.
The government informed the Court in October 2011 that although the amended NSA procedures do not by their terms apply to information acquired before October 31, NSA would apply portions of the procedures to the past upstream collection, including certain limitations on the use or disclosure of such information.
That effort, according to Bates, did not begin until “late in 2011.”
But here’s the thing: the government would have “disclosed” this information to email providers if it had used any of the violative MCTs to target emails in their custody — the Section 702 targeting decisions Bates was explicitly concerned about.
So presumably, once Bates made it clear he considered 1809 violations real problems in November 2011, the government would have had to modify any certifications authorizing collection on email addresses identified through the violative upstream collection (regardless of source).
I don’t yet understand why, in adjusting to a series of modified certifications, the providers would incur millions of dollars of costs. But I think expunging poison fruit targeting orders from the certifications would have taken some time and multiple changed certifications.
Update: Footnote 24 in the October 3, 2011 opinion provides more clarity on whether PRISM collection includes MCTs; it doesn’t.
In addition to its upstream collection, NSA acquires discrete Internet communications from Internet service providers such as [redacted] Aug. 16 Submission at 2; Aug. 30 Submission at 11; see also Sept. 7 2011 Hearing Tr. at 75-77. NSA refers to this non-upstream collection as its “PRISM collection.” Aug. 30 Submission at 11. The Court understands that NSA does not acquire Internet transactions” through its PRISM collection. See Aug Submission at 1.
I dunno about this idea. Unless I read it wrong, you seem to suggest that the cost could come from Google/Yahoo themselves having to follow branching chains of causality from given poisoned roots and find the fruit to delete, where the fruit potentially also includes new selectors that are only there because of what was learned from inappropriate application of the bad ones.
But I would be very surprised if the corporate role was that active. Surely the selector list gets modified and updated all the time, including removing some of them. It would be very odd if the system consisted of anything other than the government providing periodic full lists of the selectors, which the companies just plug in. I don’t see how such high costs could come from the need to change selector lists.
I guess you could extend the argument to say that Google and Yahoo may have had to incur costs to actually comb through data that was being held. I can think of a scenario where this could be the case, although it involves a lot of assumptions.
Say that the order the companies are under (and what pissed off Lavabit) is not, as I speculated yesterday, an order to hand over the SSL cert private keys. Rather, it’s an order to retain all of the session keys used for each connection to a client and store them next to the encrypted content of the communication whenever the communication matches a selector on the list.
In this model, much of the (especially domestic) traffic matching selectors is not automatically diverted to NSA databases but rather sits in provider-controlled storage after the provider plucks it from its own wire and stores the session keys somewhere. Then, when the NSA wants to search it or decide what it can legally pull into its own databases, it uses some kind of automated system that it has set up to retrieve the keys to the data and scan the plaintext.
If a lot of that data suddenly became poisoned fruit, it’s possible that the NSA was not considered legally allowed to scrub it itself, since that would require accessing the corporate database, reading it, and sorting the communications in a pretty invasive way. (This scenario is assuming that the NSA reply to Wyden about not being able to know how many illegal records it controls actually has some basis in reality).
In that case, the providers themselves would probably have had to make their own system to read the stored wire data and then pay a whole bunch of people some serious overtime to go through and scrub it until the database was sanitized enough for the NSA to be allowed back into.
First the nit and then the bone…
Email sender lines in the inbox are readily spoofed both by scammers and NSA contractors trying to phish your computer out from under you. Anyone using email would have spoofed both sender and subject for the NSAs benefit. But the NSA is going to save the screenshot anyways.
The intelligence community keeps talking about archaic round robin data collection and analysis problems while the really intrusive and productive ones which we know nothings can guess at are not discussed.
1. Slurp up photo images of all US mail.
2. Slurp up all internet searches and web sites used.
3. Slurp up all internet email contents.
4. Slurp up pen registers on all phone calls in the US, because they are business records and pen registers and their court rulings are so 1940s.
5. Slurp up all phone txt, phone content, Internet chat, VOIP, forum posts, comments and mails.
6. Slurp up all internet purchases, slurp up all card purchases.
Before I came to the conclusion that the internet companies actually incurred costs commensurate with their reimbursement, I’d have to see some evidence. They might just have seen a revenue opportunity. Or, maybe their
hatred of these programs is real and they saw an opportunity to make it hard on the government.
And, just for the record, the notion that separating webmail MCTs into distinct messages is hard is pretty bogus. I mean, that’s what the webmail software running in your browser does. The publicly available tools to do screen scraping (take a web page, extract the underlying data in machine readable form) are already pretty sophisticated. Compared to the other
technical challenges, this one is pretty trivial, at the give-it-to-a-summer-intern-because-it’s-boring-busywork level of difficulty.
Marcy Wheeler’s Favorite Techdirt Posts Of The Week
@Saul Tannenbaum: There’s a big redaction in Bates’ opinion that makes me believe he agrees with you–that NSA should be able to technologically separate out the MCTs.
Pat Leahy is going to have a hearing on this opinion next month. I hope he thinks to invite some tech people to challenge they why of this.
Think “value billing” when you consider the “costs” charged by telecoms providers for surreptitiously and perhaps illegally providing oceans of data on their customers and users to the USG.
You’re lost in the forest fire, counting the burning leaves.
Really bad banner ad now, your PC performance is bad, fix it now …
First the trolls, then the phishing adverts on the blog….
James Bamford also has this piece out today:
here’s an easier cite to use:
Ahoy Wheelhouse Lugnuts, at 5:00 pm EST today I will be hosting Firedoglake Book Salon at FDL, along with author Professor Thomas J. Healy, on his new, and extremely fantastic book:
The Great Dissent: How Oliver Wendell Holmes Changed His Mind—and Changed the History of Free Speech in America
Come visit and participate, it will be a fun two hour chat. Healy is VERY good and engaging, and the subject matter is the First Amendment, free speech and press freedom.
One point regarding the NSA’s collection of MCTs that seems to have escaped notice (including Judge Bates) is that according to the AT&T whistleblower James Klein (https://www.eff.org/files/filenode/att/SER_klein_decl.pdf), the NSA upstream collection on major fiber hubs was occurring in 2003. Klein mentions touring AT&T’s Folsom Street facility in January 2003 and seeing the NSA collection room nearing completion then (page 3 of that PDF).
Surely it can’t be that the MCTs collected by the NSA and reported to the FISC in 2011 began ONLY after the FISC authorized collection began in May 2006. The NSA must have been collecting MCTs from the very beginning of their upstream collection.
And surely as NSA analysts were examining internet traffic captured by their triggers at the very inception of the NSA upstream collection, the NSA analysts would have been examining content from within MCTs and not just SCTs.
So if the NSA upstream collection was capturing internet traffic at least as early as mid 2003 according to Rick Klein, NSA analysts must have also known about their MCT captures by 2003 as well.
Which brings me to my final point/question:
Why did the NSA wait 8 years from 2003 until 2011 to inform anyone?
Surely the NSA knew by 2006 when the FISC finally gave legal cover to the NSA’s upstream collection that MCT captures had been going on for years at that point.
General Keith Alexander became NSA Director in 2005 and there can be little doubt that he had knowledge of the NSA’s MCT collection prior to its admission to the FISC in May 2011.
Why did the US government wait until 2011 to acknowledge the MCT captures? Was it wholly or partly due to the changeover from the Bush DOJ to the Obama DOJ? Or was it due to something else?
One other thought. If the analysts PRISM interface sorts all the data, including the MCTs, into discrete message chunks, is this even more absurd than it sounds? Are the MCTs getting broken into SCTs whenever a search query makes its rounds through the system, while the NSA goes to Congress and says that it would be illegal to look at that data in situ as MCT form to determine, e.g., the number of US Persons in the system?
if you, like me, have been trying to get a handle of the problematic nature of an “MCT”, this example provided by EFF may help:
“… but what, exactly, is an MCT?
Responding to a question from New York Times journalist Charlie Savage, ODNI gave the following example of one type of MCT:
‘… there’s a certain kind of communication that is referred to in the opinion as a “multi-communication transaction,” where there are several communications bundled together. I can give you one example of that, but I really don’t want to talk in great detail because it can get into operational sensitivities.
One example of this is if you have a webmail email account, like Gmail or Hotmail or something like that, you know that when you go and you open up your email program, you will get a screenshot of some number of emails that are sitting in your inbox. In the case of my server, what I get is the date of the email, the sender, the subject line, and the size of the email message. But I may get 15 of them at one time.
Those are all transmitted across the Internet as one communication, even though there are 15 separate emails mentioned in them. And for technological reasons, NSA was not capable of breaking those down into their — and still is not capable — of breaking those down into their individual components.
So if you had a situation where one of those emails may have referenced your targeted email in the subject line, for example, you’d nonetheless collect the whole inbox list together. It’s like a screenshot. You don’t get the whole email, you’d get what’s ever popping up on your screen at the time, that comes as one communication.
On occasion, some of those might prove to be wholly domestic. For example, if you are targeting a foreign person, and that foreign person is in communication with a U.S. person, you can get all of that U.S. person’s screenshot. So there may be other communications that are between U.S. persons, which are wholly domestic communications, which we’re not allowed to collect under section 702. So, that’s the brief executive summary of the problem, which NSA discovered and brought to the court’s attention.
I’ll go on to say that the court — a brief summary of the court’s ruling, they found that because it was technologically impossible to prevent this from happening, the collection of this communication was not problematic. But what was problematic was the fact that the court felt that NSA’s procedures for identifying and purging wholly domestic communications needed to be beefed up. And that’s what was done…’
This is likely not the only type of MCT involved in the NSA spying program …”
EFF is leading the fight against the NSA’s illegal mass surveillance program. Learn more about what the program is, how it works and what you can do.