Stewart Baker’s IM-y Numbers

Screen shot 2014-07-08 at 9.11.30 AMStewart Baker accuses Bart Gellman and colleagues of inventing a phony statistic when they note that 89% of the communications collected under Section 702 were non-targets. He does some math to prove why they’re wrong in their interpretation of the scope of this.

The story is built around the implied claim that 90% of NSA intercept data is about innocent people.  I think the statistic is a phony.  Especially in an article that later holds up US law enforcement practice as a superior model.

What’s wrong with the statistic?  Well, let’s take an example from law enforcement.  Suppose I become the target of a government investigation.  The government gets a warrant and seizes a year’s worth of my email.  Looking at my email patterns, that’s about 35,000 messages.  About twenty percent – say 7500 –are one-off messages that I can handle with a short reply (or by ignoring the message).  Either way, I’ll never hear from that person again.  And maybe a quarter are from about 500 people I hear from at least once a week.  The remainder are a mix — people I trade emails with for a while and then stop, or infrequent correspondents that can show up any time.  Conservatively, let’s say that about 25 people are responsible for the portion of my annual correspondence that falls into that category.  In sum, the total number of correspondents in my stored email is 7500+500+25 = 8000 or so.  So the criminal investigators who seized and stored my messages from me, their investigative target, and over 8000 people who aren’t targets.

Or, as the Washington Post might put it “7999 out of 8000 account holders found in a large cache of communications seized by law enforcement were not the intended surveillance target but were caught in a net the investigators had cast for somebody else.”

I agree that the numbers would be impressive — if they actually were what Baker claims they are.

But they aren’t.

First, remember that these are minimized communications. And while the NSA is keeping data that has no foreign intelligence value, it is almost certainly not keeping spam (we know this because other NSA documents talk about defeating spam). So eliminate that 20% — or likely higher — or so right off.

Furthermore, the 9/10 ratio does not reflect all the communications WaPo examined. It doesn’t include the minimized US person ones. Almost half of the communications NSA identified as US person communications — that’s somewhat clear from the graphics, but Gellman stated that on Twitter.

So the actual number is closer to 95% of communications not being targets, not 89%.

But Baker also doesn’t consider what he’s dealing with. For the most part it’s not email, it’s IMs.

Screen shot 2014-07-08 at 9.18.42 AM


76% of this sample is IMs. Just 14% are emails.

So while Baker’s email example is nifty, it’s largely off point. Because he’d need to look at his IM patterns (or those of a 25 year old, who is more likely to resemble a target), not his email patterns.

It would still be a low number, if you’re considering pre-processed communications. It makes more sense when you realize that’s not what you’re considering.

6 replies
  1. orionATL says:

    ben wittes, james dempsey, and now stuart baker, defenders of our government’s unnecessary and ineffective national security programs for spying on american citizens.

    how is it that the closer these possibly well-intentioned worthies get to great power and its applications in our society (dragnets, for example),

    the blinder they get?

    and the sloppier and more misleading their reasoning gets?

    • orionATL says:

      once, just once, i want to see one of the nsa’s fine-suited, very-serious-person defenders say:

      “you know, this is a close call, this usg program/action may be constitutional, but it’s just as likely that it’s not. it should be cancelled.”

      wouldn’t that be a refreshing change?

      but we are not going to hear that.

      right now, it’s all good ol’ boys goin’ all-out in support of nsa’s hard-court press to WIN.

  2. der says:

    Idiot math. Apples and Oranges. “I have about 27 oranges in my refrigerator, I know this because I opened the door and saw them, my 8 year old son or was it my 3 year old daughter counted them so the number’s about right. Now I’m likely not going to eat all 27 because about 3 or maybe 6 could be rotten, that’s about 11 or about 22%, some of them look smaller than the rest maybe they’re tangerines so let’s just say spam oranges, another let’s say 15%, about, and just because, because my lazy ass is too bored to really look at each one. Let’s also rule out the apples that are painted to look like oranges, so there you go, the Washington Post hates America and wants us all to die in our beds. Also too, support the troops and if this hasn’t confused you dear reader then Al Gore is fat!”

    Ruled by fools.

  3. What Constitution? says:

    Lies, damned lies — and statistics. Last bastion of a scoundrel, Mr. Baker.

    Now, about that unanimous Supreme Court reminder that the Founders didn’t fight a revolution to obtain “protocols”. A general warrant is a general warrant — this governing assumption that the way to keep Murika safe is to “collect it all” then go hunting is not permitted by the United States Constitution.

  4. Rayne says:

    There’s still something squirrelly with numbers. Who exchanges just one IM? We do so in batches, clusters, sometimes one-on-one, sometimes in exchanges that morph in scale. Were they counting individual messages or conversations?


    Might just be me, but the IM distinction versus text messages can be very fuzzy, too, depending on the applications and telco. Think WhatsApp’s use in text service-poor networks.

Comments are closed.