The Inefficacy of Big Brother: Associations and the Terror Factory

The WSJ has a fascinating story, responding to (but not linking) this post, trying to address the question of whether the NSA programs we’ve learned about are efficient.

But some statisticians and security experts have raised another objection: As a terror-fighting tool, it is highly inefficient and has some serious downsides.

Their reasoning: Any automated approach to spotting something rare necessarily produces false positives. That means for every correctly identified target, many more alarms that go off will prove to be incorrect. So if there are vastly more innocent people than would-be terrorists whose communications are monitored, even an extremely accurate test would ensnare many non-terrorists.

[snip]

Even if the NSA’s algorithm “is terribly clever and has a very high sensitivity and specificity, it cannot avoid having an immense false-positive rate,” said Peter F. Thall, a biostatistician at the University of Texas’ M.D. Anderson Cancer Center. In his arena, false positives mean patients may get tests or treatment they don’t need. For the NSA, false positives could mean innocent people are monitored, detained, find themselves on no-fly lists or are otherwise inconvenienced, and that the agency spends resources inefficiently.

Others, though, noted a key difference between terrorism and, say, a needle in a haystack: Terrorists tend to talk to each other in a way that needles don’t. So by analyzing a network of communications, the NSA could be ferreting out clues from more than just the messages’ particulars.

This question is, obviously, one of the reasons I posted on the 3 apparent false positives presented as implicitly terrorist associates of Najibullah Zazi in 2009. Because — assuming I’m right that they were false positives — it provides a glimpse into precisely how the government understood a lot of these terms in 2009 (I assume, though could be wrong, that their approach continues to be fine-tuned). As a reminder, here’s what we know about these 3 people:

Evidence that “individuals associated with Zazi purchased unusual quantities of hydrogen and acetone products in July, August, and September 2009 from three different beauty supply stores in and around Aurora;” these purchases include:

Person one: a one-gallon container of a product containing 20% hydrogen peroxide and an 8-oz bottle of acetone

Person two: an acetone product

Person three: 32-oz bottles of Ion Sensitive Scalp Developer three different times

For a variety of reasons, I believe the 3 false positives consist of one person (probably person two) with a genuine relationship with Zazi who purchase relatively little acetone, and 2 people with false relationships with Zazi who bought an unusual amount of beauty supplies.

That says the FBI made two mistakes, IMO. Assuming any purchase of a common product, acetone, was criminal on behalf of someone with a real tie to Zazi.

And assuming the relationships between the other two — the ones buying more beauty supplies — were meaningful. This could be, and I suspect it is, an assumption that anyone who belongs to the same mosque (and unlike the radical one he attended in NY, Zazi was reportedly not close to people at his mosque in CO).

Also note. This program (unlike ones I believe to exist at the National Counterterrorism Center) may not be algorithms per se at all. Rather, it could just be associations: If tie to Zazi and if beauty supply purchaser = “positive.” In other words, for better and worse the FBI may not be asking the computers to “think” for it at all.

Nevertheless, the assumptions — that membership in the same mosque  (or, for that matter, a single communication with a suspected terrorist) necessarily equates to a meaningful relationship — probably doom the approach in any case.

Which brings me to my other point. The WSJ suggests the costs of false positives include wasted investigative resources and unfair persecution for false positives.

But it doesn’t consider the other possible uses of what may or may not be considered false positives.

First, there’s the possibility an FBI investigation into a true false positive — someone totally innocent of terrorism — may discover some other criminal exposure, which the FBI could and has been known to use to turn the false positive into an informant.

Then there’s the likelihood, especially if a potentially false positive is a young Muslim male, that the FBI will keep that person under heavy surveillance and recruitment for years and ultimately turn him into a terrorism statistic. The FBI started surveilling Mohamed Osman Mohamud 3 years, starting before he turned 18, before they got him to attempt to bomb a public event. His parents even alerted the authorities to his increasing radicalism, but instead of intervening to reverse it, the FBI exacerbated it with several informants.

Would Mohamud have ever turned to terrorism without all that help from the FBI? Would he have developed the competence and acquired the resources to do harm? We can’t actually know, and I’m actually not aware that anyone has asked this question.

What we also can’t know is whether, had the FBI dedicated its efforts to something else, it could have prevented a crime developing without FBI’s help.

That is, there are a whole slew of questions that have to be asked as we assess this program. Which is why we need real transparency.

Marcy has been blogging full time since 2007. She’s known for her live-blogging of the Scooter Libby trial, her discovery of the number of times Khalid Sheikh Mohammed was waterboarded, and generally for her weedy analysis of document dumps.

Marcy Wheeler is an independent journalist writing about national security and civil liberties. She writes as emptywheel at her eponymous blog, publishes at outlets including the Guardian, Salon, and the Progressive, and appears frequently on television and radio. She is the author of Anatomy of Deceit, a primer on the CIA leak investigation, and liveblogged the Scooter Libby trial.

Marcy has a PhD from the University of Michigan, where she researched the “feuilleton,” a short conversational newspaper form that has proven important in times of heightened censorship. Before and after her time in academics, Marcy provided documentation consulting for corporations in the auto, tech, and energy industries. She lives with her spouse and dog in Grand Rapids, MI.

23 replies
  1. Chris Kapilla says:

    Looks like a sentence fragment typo (horrors!), but I would like to hear the rest of your thought…’this could be an assumption that anyone who belongs to the same mosque…’

  2. SpanishInquisition says:

    “For the NSA, false positives could mean innocent people are monitored, detained, find themselves on no-fly lists or are otherwise inconvenienced, and that the agency spends resources inefficiently.”

    Also with this method it can kill innocent people with ‘signature strikes’ where it is alleged that people are terrorists merely because they supposedly act like terrorist and must be killed even though we don’t know who they are.

  3. guest says:

    Another way to game it: use its existence as a tool to isolate someone, by making them think that anyone around them will get monitored.

  4. lefty665 says:

    Nice wordsmithing, inefficiency vs inefficacy indeed.

    It is clear NSA has had many false positives, and that they have continued to evolve better methods to resolve them. In the early months/years after 9/11 NSA reportedly overwhelmed the FBI with thousands of leads. There were news reports of FBI agents pissed that they were getting the umpteenth lead to go investigate a carry out pizza parlor or something equally silly.

    NSA’s selectivity/specificity seems to have improved dramatically. That is nice. It undoubtedly reduces the number of grief responses they get from the consumers of their alerts, and it subjects fewer innocent citizen to causeless persecution. They are getting less inefficient, and likely faster than their surveillance is expanding. But, that does not make pointing their eyes, ears and info vacuums inside the US any less unconstitutional.

    This seems to be mostly a story about what the bozos at FBI do with leads once they get them. Their actions have routinely been profoundly wrong for much much longer than NSA’s. The FBI actions you describe transcend inefficacy, and you did not even get to the FBI shooting someone they were interrogating in the back of the head.

  5. orionATL says:

    It’s not just the feds:

    “… Instead, the government allowed the companies to release only broad numbers with no breakdowns. Over the last six months of 2012, Facebook said, it had received as many as 10,000 requests from local, state and federal agencies, which impacted as many as 19,000 accounts. Facebook has 1.1 billion accounts worldwide. Microsoft said that it received between 6,000 and 7,000 similar requests, affecting as many as 32,000 accounts…”

    http://m.washingtonpost.com/regional

    “Local, state, and federal”?

    What the hell is going on?

    Are these judicially sanctioned requests?

    Does existing federal law or judicial decisions mandate or approve that all levels of government can demand info from facebook?

  6. orionATL says:

    There really can be no excuse, other than trying to keep its spying programs hidden from public view, for gagging a company from publicly describing its obligations to share its customers’ data/info with a large number of different government agencies.

  7. emptywheel says:

    @Chris Kapilla: Ah, thanks. The site crashed just as I saved this and thought the whole thing had been saved. I’m sure whatever I said the first time was smarter!

  8. P J Evans says:

    @orionATL:
    FB apparently went into a little more detail – they said it could be local authorities looking for missing kids (where they’re looking at people the kid knee online), on up to the feds looking for criminals.

  9. Dredd says:

    Since the NSA is a military outfit, efficiency is measured by military ideology, not corporate civilian techniques for pleasing stockholders.

    Their dogma of overwhelming an opponent includes no limits on amount of materiel to get a job done, so I doubt that they consider getting all data in the context of efficiency –as a business person would.

    Not very much of what they do is efficient in the civilian context, because they do not labor in a civilian context.

  10. Duncan Hare says:

    Evidence of what?

    a one-gallon container of a product containing 20% hydrogen peroxide and an 8-oz bottle of acetone

    an acetone product

    Person three: 32-oz bottles of Ion Sensitive Scalp Developer three different times

    1. These are not “large quantities.” A 40 gallon drum is a large quantity.
    2. These are beauty supplies, Acetone for removing nail polish, hydrogen peroxide for hair bleaching, and scalp developer for washing hair.

    5 Minutes talking to one’s wife would clear these objects from suspicious activity. Only a male dominated institution would find these products suspicious.

  11. Rayne says:

    With all the concentrated attention on technology employed by NSA, we haven’t looked at human factors in the collection and analysis process.

    At least one human must decide the scope of collection; a human must decide the criteria for which collected content will be searched. And a human must determine the validity of the results.

    Virtually all of our networked communications are ultimately exposed to the evaluation of humans, no matter how much technology is used.

    Who are these people? Their loyalty isn’t to us if they are contractors and subcontractors; it’s to their employers. This explains why so many natsec folks employed in the industry by non-governmental/private entities defend PRISM and other NSA programs. It’s not because they are loyal to us, but questioning their current or potential corporate employers induces massive shockwaves in their employment tectonics.

    They don’t get rewarded by their employers for reducing false-positives to protect our civil rights, because firms aren’t compensated for that attribute. They’re paid to hunt for connections—the more, the better.

    The entire system is rigged against humans at the other end of its efforts.

  12. Nigel says:

    How many of those ‘foiled terror plots’ which are claimed as justification for shredding the 4th involve Mohamed Osman Mohamud style entrapments ?

  13. john francis lee says:

    The cyber-terrorism-industrial complex lives on false positives. The FBI creates its own via entrapment. The CIA does it just for fun. The NSA’s goal is more business. Terrorism is just the ruse. Just like the upcoming war in Syria. Attack someone somewhere for something.

    Turning and turning in the widening gyre
    The falcon cannot hear the falconer;
    Things fall apart; the centre cannot hold;
    Mere anarchy is loosed upon the world,
    The blood-dimmed tide is loosed, and everywhere
    The ceremony of innocence is drowned;

    … innocence. Yeah. Right. See Big ‘Bama

  14. Rayne says:

    @Duncan Hare: Yes, exactly.

    One gallon of peroxide might mean very little to women of Middle Eastern, Hispanic, or Asian origin. If one’s hair is very coarse and very dark, it can take many attempts to lift one’s hair color to a shade of light brown let alone blonde. If a user also has very long hair, they need more lightening product.

    A single box of permanent hair color contains approx. 3 ounces of peroxide; it’s not enough to treat a full head of shoulder-length dark blonde European hair. 6 ounces at a minimum might work. But darker, coarser hair might need twice that—and it has to be redone every 3-6 weeks if one regularly treats dark root regrowth.

    One gallon of peroxide might provide enough bleaching product for one year for a dark-haired woman.

    Might be much less if some of this peroxide is used for bleaching body hair, too.

    Surprises me they only tripped on one person buying this quantity, frankly.

  15. C says:

    Remember HBGeary. The e-mails that LulzSec ripped from them discussed this in detail. They believed that they could build a case of conjunctives basically I see username x on site y from 10am to 2pm then at 2:05pm username b loggs into site c. If this happens more than once x = b. On the face of it is does sound seductively useful until you realize that there are more than a billion people in the world and at a certain point these seductive rules are swamped by the law of large numbers.

  16. Orestes Ippeau says:

    “there’s the possibility an FBI investigation into a true false positive — someone totally innocent of terrorism — may discover some other criminal exposure, which the FBI could and has been known to use to turn the false positive into an informant”

    That POTENTIAL is present in any particular instance; but there are SO MANY ‘instances’ that “possibility” turns into overwhelming probability.

    That PROBABILITY means there should be (I’d argue the 4th amendment requires must be) some consequential systemic procedure in place — one that specifically addresses the question of how incidental the serendipitous discovery was to the intended scope of the search, and

    a) requires the court inquire into the mechanism by which the serendipity occurred,
    b) includes a statutory assumption that the mechanism is insufficiently minimized, which the government has the onus to overcome — or else the mechanism will be excised from the terms of the order,
    c) if the government succeeds in b), nonetheless requires the court to impose additional conditions aimed at ensuring the order is adequately minimized, or
    d) prevents the government from continuing to pursue any discovered (suspected) offence within the context of order that incidentally uncovered it, but instead, should they elect to pursue the matter, to seek the FISA court’s permission to pass on any and all incidentally-discovered facts to the local, state or other federal agency with appropriate authority to pursue the matter — the sole exception being where the government is able to persuade the court that further investigation of the incidentally-discovered matter is inextricably bound up with the national security concern that gave rise to the order in the first place.

  17. Garrett says:

    Prism, itself, looks very much like a user-guided search tool, rather being than a pure algorithm hunter.

    At a certain level, though, it’s all algorithms.

    Network graphs can be seen as funky multidimensional geometry. Distance measures, for example, are involved.

  18. lefty665 says:

    @Rayne: “Their loyalty isn’t to us if they are contractors and subcontractors; it’s to their employers. This explains why so many natsec folks employed in the industry by non-governmental/private entities defend PRISM and other NSA programs.”

    Yep, Hayden turned it over to the contractors, and Gen. Keith has squared it. It’s big money to go with big data. See Bamford, “The Shadow Factory”

    Couple of side notes:

    Beef Hollow Rd is 1.5 million square feet, and from a recent report, 150-200 workers. It does not look like it is going to house many users on site.

    What if the bad guys quiet down when they’ve got a project in the works? Think those search algorithms go the other way too? Who has dropped common communications patterns? One more reason to need it all continuously, in real time.

    Might call that Total Information Awareness. If there’s a hell Poindexter deserves to be pushing a yottabyte up a mountain every day for eternity.

Comments are closed.