The Corporate Store: Where NSA Goes to Shop Your Content and Your Lifestyle

I’m increasingly convinced that for seven months, we’ve been distracted by a shiny object, the phone dragnet, the database recording all or almost all of the phone-based relationships in the US over the last five years. We were never wrong to discuss the dangers of the dragnet. It is the equivalent of a nuclear bomb, just waiting to go off. But I’m quite certain the NatSec establishment decided in the days after Edward Snowden’s leaks to intensify focus on the actual construction of the dragnet — the collection of phone records and the limits on access to the initial database (what they call the collection store) of them — to distract us away from the true family jewels.

A shiny object.

All that time, I increasingly believe, we should have been talking about the corporate store, the database where queries from the collection store are kept for an undisclosed (and possibly indefinite) period of time. Once records get put in that database, I’ve noted repeatedly, they are subject to “the full range of [NSA’s] analytic tradecraft.”

We don’t know precisely when that tradecraft gets applied or to how many of the phone identifiers collected in any given query. But we know that tradecraft includes matching individuals’ various communication identifiers (which can include phone number, handset identifier, email address, IP address, cookies from various websites) — a process the NSA suggests may not be all that accurate, but whatever! Once NSA links all those identities, NSA can pull together both network maps and additional lifestyle information.

The agency was authorized to conduct “large-scale graph analysis on very large sets of communications metadata without having to check foreignness” of every e-mail address, phone number or other identifier, the document said.

[snip]

The agency can augment the communications data with material from public, commercial and other sources, including bank codes, insurance information, Facebook profiles, passenger manifests, voter registration rolls and GPS location information, as well as property records and unspecified tax data, according to the documents. They do not indicate any restrictions on the use of such “enrichment” data, and several former senior Obama administration officials said the agency drew on it for both Americans and foreigners.

That analysis might even include tracking a person’s online sex habits, if the government deems you a “radicalizer” for opposing unchecked US power, even if you’re a US person.

Such profiles are not the only thing included in NSA’s “full range of analytic tradecraft.”

We also know — because James Clapper told us this very early on in this process — the metadata helps the NSA pick and locate which content to read. The head of NSA’s Signals Intelligence Division, Theresa Shea, said this more plainly in court filings last year.

Section 215 bulk telephony metadata complements other counterterrorist-related collection sources by serving as a significant enabler for NSA intelligence analysis. It assists the NSA in applying limited linguistic resources available to the counterterrorism mission against links that have the highest probability of connection to terrorist targets. Put another way, while Section 215 does not contain content, analysis of the Section 215 metadata can help the NSA prioritize for content analysis communications of non-U.S. persons which it acquires under other authorities. Such persons are of heightened interest if they are in a communication network with persons located in the U.S. Thus, Section 215 metadata can provide the means for steering and applying content analysis so that the U.S. Government gains the best possible understanding of terrorist target actions and intentions. [my emphasis]

The NSA prioritizes reading the content that involves US persons. And the NSA finds it, and decides what to read, using the queries that get dumped into the corporate store (presumably, they do some analytical tradecraft to narrow down which particular conversations involving US persons they want to read).

And there are several different kinds of content this might involve: content (phone or Internet) of a specific targeted individual — perhaps the identifier NSA conducted the RAS query with in the first place — already sitting on some NSA server, Internet and in some cases phone content the NSA can go get from providers after having decided it might be interesting, or content the NSA collects in bulk from upstream collections that was never targeted at a particular user.

The NSA is not only permitted to access all of this to see what Americans are saying, but in all but the domestically collected upstream content, it can go access the content by searching on the US person identifier, not the foreign interlocutor, without establishing even Reasonable Articulable Suspicion that it pertains to terrorism (though the analyst does have to claim it serves foreign intelligence purpose). That’s important because lots of this content-collection is not tied to a specific terrorist suspect (it can be tied to a geographical area, for example), so the NSA can hypothetically get to US person content without ever having reason to believe it has any tie to terrorism.

In other words, all the things NSA’s defenders have been insisting the dragnet doesn’t do — it doesn’t provide content, it doesn’t allow unaudited searches, NSA doesn’t know identities, NSA doesn’t data mine it, NSA doesn’t develop dossiers on it, even James Clapper’s claim that NSA doesn’t voyeuristically troll through people’s porn habits — every single one is potentially true for the results of queries run three hops off an identifier with just Reasonable Articulable Suspicion of some tie to terrorism (or Iran). Everything the defenders say the phone dragnet is not, the corporate store is.

All the phone contacts of all the phone contacts of all the phone contacts of someone subjected to the equivalent of a digital stop-and-frisk are potentially subject to all the things NSA’s defenders assure us the dragnet is not subject to.

Don’t get me wrong: I’m not saying some of this analysis isn’t appropriate with actual terrorist suspects.

But that’s not what the corporate store is. It is — PCLOB estimates — up to 120 million phone users (the actual number of people would be smaller because of burner phones, and a significant number would be foreign numbers), the overwhelming majority of which are completely innocent of anything but being up to 3 degrees away from a guy who got digitally stop-and-frisked.

Yet those potentially millions of Americans get no effective protection once they’re in the corporate store. As the PCLOB elaborates,

Once contained in the corporate store, analysts may further examine these records without the need for any new reasonable articulable suspicion determination.

[snip]

Furthermore, under the rules approved by the FISA court, NSA personnel may then search any phone number, including the phone number of a U.S. person, against the corporate store — as long as the agency has a valid foreign intelligence purpose in doing so — without regard to whether there is “reasonable articulable suspicion” about that number. 589 Unlike with respect to the initial RAS query, the FISA court’s orders specifically exempt the NSA from maintaining an audit trail when analysts access records in the corporate store. 590

There are just a few protections. The analysts accessing the corporate store need to have undergone training and must claim a foreign intelligence (but not exclusively counterterrorism) purpose. And normally, if NSA wants to circulate the US person data outside of the NSA, a high level official must certify that,

the information identifying the U.S. person is in fact related to counterterrorism information and that it is necessary to understand the counterterrorism information or assess its importance.

Again, that doesn’t require the US person have any tie to counterterrorism, just that it be “related to” counterterrorism, which FISC has already deemed even the larger collection store to be by default. (The Executive Branch can also search the corporate store for exculpatory or inculpatory information, which, given that no defendant has succeeded in getting a search for the former, probably means it is only used for the latter — and note, this is not, apparently, limited to counterterrorism purposes, and as of right now the Executive is also permitted to do back door searches of content for criminal evidence unrelated to terrorism, though Obama has vaguely promised to change that while stopping short of a warrant.)

And no one, aside from PCLOB’s estimate of up to 120 million (which may or may not have been reviewed when PCLOB let the IC review some of their process descriptions), is talking about how many Americans are in the corporate store. Geoffrey Stone has said NSA only “touched” 6,000 people in 2012, though that may mean only 6,000 of a much larger number of people who got placed in the corporate store were subjected to further NSA processing. We can assume the numbers were far higher until 2009, when there were over 17,000 on a RAS list. Furthermore, I’m very curious to see whether such numbers spike for 2013, given claims that NSA used the dragnet for “peace of mind” after the Boston Marathon attack, launched by young men who interacted via mobile phone with a huge number of totally innocent US person contacts. Will half of Cambridge, MA be subject to the full range of NSA’s tradecraft because we used the dragnet to get peace of mind after the Boston Marathon attack?

Moreover, as discussed last month, the NSA can alter the intake into the corporate store via choices made by data integrity analysts — the other part of the process largely exempted from oversight, and with a few inclusions could cause the bulk of American call records to end up in the corporate store.

Obama said the dragnet “does not involve the NSA examining the phone records of ordinary Americans.” But in doing so, he was implying that the millions of Americans whose records may have made it into the corporate store are not ordinary, and therefore not entitled to the kind of due process enshrined in the Constitution.

image_print
56 replies
  1. OregonPrivacy says:

    This is what Snowden was referring to when he said “I, sitting at my desk,” said Snowden, could “wiretap anyone, from you or your accountant, to a federal judge or even the president, if I had a personal email”

    Snowden was referring to the “company store”

  2. joanneleon says:

    I completely agree with you on the telephony metadata data base being a shiny object, or limited hangout, or whatever is the best name for it. And what makes it even more obvious is DiFi and Rogers, and Keith Alexander, and how carefully they ringfence every conversation to 215 and sometimes 702. And this eagerness for transparency by Obama and Clapper is the other tell. The only thing anyone in the IC wants to talk about are the programs under FISA and Congressional oversight. That’s almost the only thing the media talks about, except when a new, big story comes out, then they’re immediately back to talking only about the metadata program. We saw this kind of thing with Iran Contra too. Make a big fuss about the tip of the iceberg so nobody finds out what’s beneath it.

    What’s happening to all of that upstream collection data? What’s happening to all that data collected from the Google data center lines? And so on.

    And as you’ve noted many times, what happens to the results of the three hop searches which could end up being huge data sets?

    What I’m very confused about though, is that the corporate store can only be useful if it is getting refreshed all the time, right? If it’s full of results of very selected queries that are not done on a regular basis, you’d have a very fragmented and incomplete data base. It would be a sort of hit or miss kind of thing, which doesn’t sound like a very thorough way to check out a suspicious person’s activities. Unless you were getting that person’s communications on a regular basis and pouring it into the corporate store.

    So the big question for me is, what’s feeding the corporate store and is it being kept up to date all the time? And is that being done by collecting all the communications (and three hops out) of all the people on the RAS watch list? I find it very confusing understanding how the corporate store is fed.

    I downloaded the PCLOB report but have not read it yet. Maybe that will clear things up.

    P.S. Small typo: “Reasonable Articulable Suspicion of some time to terrorism” (or Iran). –

  3. joanneleon says:

    Well this is as good a time as any to dig into the PCLOB report. I’ve poked around it around it with searches but really need to read whole thing. Wish I could print it out and read hard copy.

    p. 165 is helpful.

    What is really needed, and a few have mentioned this, is a team with serious tech and analytical skills who are given special clearance and authority to go in and actually poke around in these data bases as part of an investigation, and report on it.

  4. joanneleon says:

    One more thing and I apologize if the answer is obvious and I’ve missed it along the way:

    Why do they call this data base the “corporate” store?

  5. orionATL says:

    whoever the nsa collects data from

    whatever it collects

    however it collects it

    if and how and where it stores that personal data (whether machine metadata or human language communication data)

    however long it stores that data,

    because the nsa collects vast amounts of data to which it continuously adds in very short time intervals,

    any information the nsa collects and stores is useful only so long as it has been or can be connected to another person or social group.

    if the nsa’s central identifier for each record were to be assigned a random encrypted identifier by some outside institution,

    the nsa would be years in untangling this snarl on its on.

    under this circumstance, the nsa/fbi/cia might be required to apply for a “license” in order to review any individual’s records.

  6. OregonPrivacy says:

    @joanneleon: I think from the NSA itself. See this articlehttps://www.aclu.org/blog/national-security/raiding-corporate-store-nsas-unfettered-access-vast-pool-americans-phone-data

  7. thatvisionthing says:

    @joanneleon: p165 Sotomayor et al sound like #AskSnowden Edward Snowden.

    @ferenstein what’s the worst and most realistic harm from bulk collection of data? Why do you think it outweighs national security? #AskSnowden

    The worst and happening-right-now harm of bulk collection — which again, is a euphemism for mass surveillance — is two-fold.

    The first is the chilling effect, which is well-understood. Study after study has show that human behavior changes when we know we’re being watched. Under observation, we act less free, which means we effectively *are* less free.

    The second, less understood but far more sinister effect of these classified programs, is that they effectively create “permanent records” of our daily activities, even in the absence of any wrongdoing on our part. This enables a capability called “retroactive investigation,” where once you come to the government’s attention, they’ve got a very complete record of your daily activity going back, under current law, often as far as five years. You might not remember where you went to dinner on June 12th 2009, but the government does.

    The power these records represent can’t be overstated. In fact, researchers have referred to this sort of data gathering as resulting in “databases of ruin,” where harmful and embarrassing details exist about even the most innocent individuals. The fact that these records are gathered without the government having any reasonable suspicion or probable cause justifying the seizure of data is so divorced from the domain of reason as to be incapable of ever being made lawful at all, and this view was endorsed as recently as today by the federal government’s Privacy and Civil Liberties Oversight board.

    Fundamentally, a society in which the pervasive monitoring of the sum of civil activity becomes routine is turning from the traditions of liberty toward what is an inherently illiberal infrastructure of preemptive investigation, a sort of quantified state where the least of actions are measured for propriety. I don’t seek to pass judgment in favor or against such a state in the short time I have here, only to declare that it is not the one we inherited, and should we as a society embrace it, it should be the result of public decision rather than closed conference.

    Love that guy.

  8. joanneleon says:

    @emptywheel: Yes, it’s the “corporate” part that I was curious about.

    And if they are pouring other kinds of records in there, “business records”, along with the metadata and other collection of communications, I just don’t see how Keith Alexander could deny that those are dossiers. The corporate store is the very definition of a giant file cabinet full of dossiers.

    At DEF CON 2012, Alexander was the keynote speaker; during the question and answers session, in response to the question “Does the NSA really keep a file on everyone, and if so, how can I see mine?” Alexander replied “Our job is foreign intelligence” and that “Those who would want to weave the story that we have millions or hundreds of millions of dossiers on people, is absolutely false…From my perspective, this is absolute nonsense.”

    http://en.wikipedia.org/wiki/Keith_B._Alexander

    From the PCLOB report:

    If the NSA queries around 300 seed numbers a year, as it did in 2012, then based on the estimates provided earlier about the number of records produced in response to a single query, the corporate store would contain records involving over 120 million telephone numbers.

    Why is PCLOB having to guess at this number? Shouldn’t they have access to that number? Are they talking about just the number of records produced in 2012 or did they extrapolate for the other years as well?

    I feel like I’m at a point where I’ve read a huge amount of information about these NSA programs for months and years and I still don’t have a good sense of anything. Maybe that’s an exaggeration but today it feels that way.

  9. thatvisionthing says:

    @thatvisionthing: But I think Snowden may be wrong about something: “…they’ve got a very complete record of your daily activity going back, under current law, often as far as five years”

    See PCLOB report:

    The rules of the FISA court for the 215 program impose no limits on how long data can be held in the corporate store, in contrast to the five-year retention limit on collection store data.

    page 165-166 = 169-170 of PDF numbering (I was referring to PDF numbering @11, which would be 161 page number in document)

    Though I actually don’t know how to make sense of the retention rules; Risen and Poitras added 10 years to 5:

    http://www.nytimes.com/2013/09/29/us/nsa-examines-social-networks-of-us-citizens.html?pagewanted=all

    If the N.S.A. does not immediately use the phone and e-mail logging data of an American, it can be stored for later use, at least under certain circumstances, according to several documents.

    One 2011 memo, for example, said that after a court ruling narrowed the scope of the agency’s collection, the data in question was “being buffered for possible ingest” later. A year earlier, an internal briefing paper from the N.S.A. Office of Legal Counsel showed that the agency was allowed to collect and retain raw traffic, which includes both metadata and content, about “U.S. persons” for up to five years online and for an additional 10 years offline for “historical searches.”

    I’d love clarification but I expect I’ll see a shiny object instead.

  10. OregonPrivacy says:

    Copied directly from the PCLOB Report:

    “The FISA court’s orders expressly state that the NSA may apply “the full range” of signals intelligence analytic tradecraft to the calling records that are responsive to a query, which includes every record in the corporate store.”

    I interpret that to mean that once the number goes into the "cooperate store" , those telephone numbers are sent to every kind of database that NSA has and matched with text messages, social media profiles, Internet searches, tax records, ect, ect, ect.

    I sponsored a somali refugee family in 2006, and since then have supported and helped them, and their huge extended families in many ways over the years. Lots of phone calls, emails, I've even used my web skills to help them buy discounted airline tickets. Considering the portland bomb suspect case was a somali man, I'm sure that one of the numbers I've called over the past 7+ years had to be within some hops of him or somebody who talked to him. If nothing else, the number of the Church Charity that ran the resettlement program in Portland. So that means I'm in the "corporate store" and the NSA can apply “the full range” of signals intelligence analytic tradecraft" to me and my life.

    If I'm a "second hop" contact, then everybody in my life that I've been in contact with is also in the "corporate store" and also subject to the same "…the full range” of signals intelligence analytic tradecraft" scrutiny.

    Quote from Page 30 of pdf copy

  11. joanneleon says:

    @thatvisionthing: Snowden speaks without any qualifiers there and with a very declarative voice.

    they effectively create “permanent records” of our daily activities, even in the absence of any wrongdoing on our part. This enables a capability called “retroactive investigation,” where once you come to the government’s attention, they’ve got a very complete record of your daily activity going back, under current law, often as far as five years. You might not remember where you went to dinner on June 12th 2009, but the government does.

    He doesn’t say “some of us” or “three hops of us”. He speaks as if this data is being stored about all of us. And telephone metadata doesn’t necessarily indicate where you went to dinner on June 12th, 2009. Though his mention of a five year limit does suggest he’s talking about metadata. But geolocation data or credit card data does indicate where you went to dinner I realize he just use this as an example but it’s an interesting example. I think he carefully considers his answers. He also says it’s a record of our “daily activity”. Phone metadata doesn’t capture our daily activity. Geolocation and financial transactions and internet activity, added to it, do pretty well capture our daily activity. Is that what he’s talking about?

    And overall, is he being overly broad or engaging in hyperbole? Does he assume we know he means only a subset of us who are within three hops of the 300 “seed” identifiers (with the record of daily activity comment)? If he did mean that, it seems to me that he would have said so.

    A “very complete record of your daily activity” is a very strong statement. It sure as hell doesn’t sound like just phone metadata that he’s talking about there. And he doesn’t qualify it by saying “some of you” or a “subset of Americans”. He is talking about all of us because he distinguishes those who “come to the government’s attention” as a subset. I just don’t think he would have worded it so definitively unless he meant what he was saying, literally.

    And unless I’ve missed something big, no news organization has written definitively about surveillance of this scale yet. A lot of us assume it is being done. There have been big news stories about the various capabilities the NSA has, and the slides, etc. But has there been definitive reporting on anything other than phone metadata collected and stored on all Americans? Do text messages (recent report) count as telephony metadata? Is email metadata stored on everyone? Govt has specifically denied that geolocation data is being collected on all of us. Is credit/debit card data bundled together into the metadata data base?

  12. thatvisionthing says:

    EW: Don’t get me wrong: I’m not saying some of this analysis isn’t appropriate with actual terrorist suspects.

    I don’t think there’s a bigger shiny object than the word “terrorist.”

  13. joanneleon says:

    @OregonPrivacy: So basically, your assumption is that anyone whose phone number or other identifier has ever gotten into the corporate store, since its inception, is subject to collection and storage of all of their communications, collected on a regular basis or continuously, taken from the giant “collections data base” and placed into the corporate store? And once your number/name/email/whatever is in the corporate store, you never get out of it and you’re effectively under constant surveillance for the rest of your life?

    That would make Snowden’s statements more logical (in my comment #15 above).

    However, he mentions a five-year limit and I don’t think there is a five-year limit on the corporate store. He gives the distinct impression that everyone’s full daily activities are being collected.

    What I also don’t know is what is in the giant “collections data base” where they presumably throw everything they collect from everywhere they can, which is probably a group of data bases across all the Five Eyes countries. I also don’t know how long they keep the various things in the “collections data base” that the corporate store pulls from.

  14. emptywheel says:

    @joanneleon: This is the number NSA has been refusing to give Wyden for years. They’re not going to put a number on it bc if it’s in this neighborhood or even larger then there will be outrage.

  15. emptywheel says:

    @thatvisionthing: There are a bunch of different authorities and only the FISA ones have deadlines, I think. I’m not sure where the 10 year age off comes from, though, but it’s also not clear whether that’s upstream collection or something else.

  16. OregonPrivacy says:

    @joanneleon I’m not knowledgable about this stuff like EW is.

    I am a tech person, and understand the web. I know a bit about how things work, Internet wise. How servers work and how data is transmitted. I know what is possible based on technology. I don’t have any more insight into what they are actually doing than anybody else. But if it’s possible to be done based on technology, then I imagine they are doing it.

    I think it would be simple for the NSA to be collecting and storing vast amounts of raw digital records. In addition to “records on individuals” there is additionally enormous amounts of raw digital data stored that can be inspected and analyzed as needed.

    Some of the information could be indexed and updated automatically depending on how connected the person is (i.e. on the computer frequently, uses a smart phone, twitter, social media), maybe they have a netflix or hulu account or just a “smart” tv. Maybe they pay for almost everything with credit/debit cards. Do they use an electronic pass for travel (car or other). Maybe have a recent model car with tracking electronics. Or use a gps service or app. How about a kindle or e-reader? Not only what books but what pages you have read – and what time you are reading them. Same for RSS readers and services like Instapaper/Pocket.

    Some of the major retailers are installing systems to collect IP addresses of all phones within range inside/close to store – even if you are not using their wifi. And now bluetooth systems to monitor and interact with phones are being created. And of course there are covert surveillance systems that can do the same thing that could be installed anywhere.

    They may not have it all neatly organized into files per person. But if it was important enough to recover, I think they could access various digital records and retrieve just about anything. Financial would be easy. Also travel (for an enlightening read), take a look at Edward Hasbrouck articles about the process for airline tickets and how there is a travel record kept for each person. That was a surprise to me. Not only for aircraft flights but also in some circumstances records of bus and train travel (based on people requesting their records under FOIA).

    Everything is digitized. Server logs of sites, google searches, ISP, Internet traffic, camera databases, shopping loyalty card records, ect.

    As a business person, I’ve purchased mailing lists. You can purchase detailed demographic information down to a street block selection. It’s incredible how much is out there. I’ve seen pages of old yearbooks, church and school directories (which often listed addresses and telephone numbers) digitized and searchable.

    Profiles on “dating sites”, of course Linked-In, Google and Facebook (Real Names everybody).

    I don’t know how long they do keep the data. 5, 10, 15 years or ?

    I know they can.

    Real Story – My family is pretty small, and not connected. My mother (deceased) had a step-sister whom she last saw in 1950, and had no idea where she lived/died or what happened to her after that. I had posted some family photos on flickr (not using my real name) and mentioned this stepsisters name on one photo. My sister (first name only on flickr- but it was unusual and also listed city she lived in- but no current Facebook or other social media account) was somehow tracked down and contacted by this woman’s 80 year old son. Because her first name was unusual and he had an idea of area, I imagine he just did a phone record lookup search. If you have an unusual name, just put in the name and city and you will usually get a hit – the reports also show ages, so you can narrow it down that way. Or if you know a persons relative, search for them in the phone lookups and you will often get a list of “associated people” to them.

    My sister was shocked and upset because she purposefully tried to avoid doing anything on the Internet under her real name, she only used email. Refused to have a Facebook or other account. She “thought” she was protecting her privacy that way.

  17. spongebrain says:

    The use of the qualifier, “ordinary,” in the context of not being subject to undue scrutiny promotes, in effect, a kinder, gentler form of enslavement. Those who would escape the eyes and ears of Morgoth’s spies, followed by outright thralldom in the belly of Angband itself, should lead very dull and boring lives.

    Moreover, hardly a senator, or representative, or judge, or general could be considered an “ordinary American,” so apparently all of them are fair game.

  18. emptywheel says:

    @OregonPrivacy: Yes. That’s precisely the problem.

    And Somalia is one of the countries I suspect may have mass collection (meaning the content is there too), in part bc they really don’t have translators who speak Somali and therefore can’t discriminate at selection. That’s just a guess, but it is my suspicion.

  19. OregonPrivacy says:

    @emptywheel: All the somali’s I know were in Kenyan Refugee camps prior to USA resettlement. They call frequently to Kenya to talk with relatives in the camps. They live in mud huts in the camps, and most do not read or write, but they use phones to stay in touch. Sometimes it’s a communal phone that one richer person owns and sells time on.

  20. joanneleon says:

    @OregonPrivacy: Agree with your assessment that it would make most sense to keep all the records in raw (or massaged) form and query them as needed to construct a dossier or whatever. But I can think of a few reasons to construct an actual corporate store data base and perhaps other similar data bases like it:
    1. Efficiency of queries
    2. A way to limit what various organizations are allowed to see
    3. Technical limitations on storage of raw communications, as in, might only be able to store a year’s worth of communications on everyone whereas you could store more on a subset of people in corp store

    It’s easy to see how both the raw communications data bases and the corporate store would be growing at rapid rates, hence the need for the Utah center

  21. emptywheel says:

    @joanneleon: There’s a five year limit on the dragnet–the collection store. So if, say, OreganP’s calls were caught up in a query in 2012, then all her (?) call records from that point back to 2007 would be included in the query. If that identifier was just queried that once, then that’s all that would get in the corporate store, but if it kept being queried than all her calls would continue to get moved over.

  22. LeMoyne says:

    PCLOB estimates – up to 120 million phone users (the actual number of people would be smaller because of…

    The PCLOB guesstimates over 120 million people, not “up to”.

    And the PCLOB estimate is flawed in several ways:
    1) The use of constant values for new seeds per hop is a poor assumption in two ways: A) local duplication [im in yours, you’re in mine, and so are our mutual friends] and B) ultimate saturation [cant get any new ones when all are gotten already]:
    — 1A) The sub-networks of contacts that each of us have contain considerable overlap in the first and second hops and are fully exploited in those hops. Second and third hops may enter and exploit other sub-networks (per hop production remains high), yet all later hop production is subject to reduction by duplication where part of this hop’s result is nullified because it is already in the previous result (per hop production is reduced).
    — 2A) As the database begins to fill – as the number of previous entries goes past 1/100, 1/10 … 1/2 of the total of all possible phones people, the number of new entries per hop must eventually collapse towards zero. Once the database is near full, nearly every potential new contact is already in there. For example, if the corporate store contained half of all people, taking one hop off of the corporate store with a per hop production of 1 per person would capture all remaining people, then and there, in that first hop.

    2) The 120 million comes from just the 2012 production out of 300 seeds. There were over 130 times as many seeds in 2008+2009 (44,000 seeds). Estimation by the same method from the 2008+2009+2012 seeds produces values over 10 billion. So the method almost certainly creates values that are too high, but using < 1% of the seeds makes the estimate way too low at the same time.

    By pretending there is no local duplication,
    By seeing large results and not taking global saturation into account, and
    Especially, by not noticing that their estimate is both too large and too small at the same time,
    The PCLOB has produced a low-end estimate for the size of the ‘corporate store’ that is unreliable.

    P.S. — I started a reply on the 120 million post — http://www.emptywheel.net/2014/01/23/pclob-estimates-120-million-phone-numbers-in-corporate-store/ — but it grew into a post of its own. Developing a better counter estimate whilst trying to use assumptions that model reality, varying the assumptions made &/or at least being explicit about the assumptions – that has taken some time.

  23. thatvisionthing says:

    @emptywheel: Risen and Poitras actually name a source, kinda — do you know the document?

    A year earlier, an internal briefing paper from the N.S.A. Office of Legal Counsel showed that the agency was allowed to collect and retain raw traffic, which includes both metadata and content, about “U.S. persons” for up to five years online and for an additional 10 years offline for “historical searches.”

    Thanks

  24. OregonPrivacy says:

    @emptywheel: Empty Wheel ? Yes (keeping my biz tweets separate from my other interests.)

    By “queried”, do you mean show up on the Hops. For example, in 2012, my phone number show up on a 2nd hop from a somali acquaintance. So 5 years worth of calls go into “corporate store” – meaning phone numbers of everybody I called or who called me during past 5 years goes into the “corporate store” also?

    Then, I don’t talk to anybody for the next 3 years (till 2015). At that point, I again get called by a somali acquaintance who’s 1 hop away from a RAS identifier and so all the phone numbers I called (and phone numbers of people who called me from 2010-2015) go into the corporate store.

    Am I understanding this correctly?

    Meanwhile my phone number and all phone numbers I call or am called by go into the corporate store, and NSA will be running analytics on to collect data, build patterns, or create social networks, etc.

    And “Corporate Store” queries are not audited and the information/databases generated from them are kept indefinitely?

  25. LeMoyne says:

    @emptywheel: Consider the corporate store as the analyst playground. It’s a playground because they need have no more concern for RAS, minimization, etc. If they are letting the corporate store go stale as you describe, then they are not ‘using all the tools’. Furthermore, if they don’t run background processes to keep the interaction info in the corporate store completely current and up-to-today’s-date then they are searching for terrorists in the past . Really they would be remiss – and I’m agin ’em, but I would fully expect they keep the ‘corporate store’ as contemporaneous as possible.

    Consider it this way: Nominally, the NSA doesn’t search in the bulk collection, they search in the corporate store. If that store isn’t current they have 1) missed a chance to get more and 2) they aren’t searching in the present, but in the past. The priority order is accurate per Binney, Drake, etc: 1) get more and 2) do a good job. Whatever the NSA is besides ‘hay hoarders’, they aren’t plonkers. It is not an assumption, but a strong inference that they keep the interaction data in the corporate store current moment-to-moment.

  26. OregonPrivacy says:

    @LeMoyne: LaMoyne, Just trying to get a better understanding here. This is what I understand is happening. Let me know if it’s not correct.

    Once something is in the “Corporate Store” it doesn’t go stale because it’s continually updated with information collected from “other sources”. The only thing that might not be up to date is when a new number (and it’s associated connections) begins calling the identified number that’s in the “corporate store”. But the NSA gets call records daily, so at most there 24 hours behind. And in the mean time, they can be using alerts on “other sources” to show them any interaction with high priority numbers.

  27. LeMoyne says:

    @OregonPrivacy: IANEW (i am not emptywheel – lol) – neither am i a spook. I have studied computers off and on for decades. Done some graduate school in CS. So this is *my take*
    My take is you have the first part right. I would say your direct relationships are in there because you talked to a Somali who at some point sent money home through a hawala network which interacted with a Somali who meets RAS (all Somalis?). Because the hawala network is handling money for people who meet RAS, I bet they too meet RAS. One hop to US resident Somali, two hops to you and three to your friends. Can say with some confidence that any Somali hawala network meets RAS standard [Moalin], so, by the suspicious reasoning of the dragnet mentality, if you made a payment into such a network you are either in hop one or potentially under RAS yourself. Can probably get your whole city in there with three hops from you.. so… I’d ask how the water is but we are all in the same soup (I believe). Seems pretty cold to me.

    As to the second part and your question: I would be totally flabbergasted if they don’t keep the corporate store as current as possible. It may be too much traffic to keep it absolutely current, but if they are querying in the relationship tree that brought you in, they will bring at least that area up-to-date to avoid searching in the past. They would need timing of last contact, accurate rate+pattern of contact and type of all interactions to get a current picture of ‘reality’. They need it current to make inferences about the past AND current character and pattern of any relationship.

    My take: The corporate store contains the relationship analysis from multiple selectors: phone, email, snail mail, money transfers, group membership, observation, etcetera. A giant, detailed, multi-faceted relationship diagram of the world. The relationship info must be kept current to be useful for the intended purpose of figuring out what is going on now. The analysts then call up content of the exchanges (words, $$ value and direction, etc.) based on what they see in the relationships laid out in the corporate store. The content is not in the corporate store: they search in the index of relationships that is the corporate store.

    Just think: When the corporate store contains almost every interaction they track, anything they don’t have in there looks to be hiding from them. Once it is nearly full, they can claim Automatic RAS on anyone not in there. I think they got it full of everyone’s activity years ago (at least the activity of all interesting nonUS people). That’s an explanation for why the seed numbers have plummeted: they can’t find many seeds that aren’t already in there. They may only be re-seeding with a selector like group membership (e.g. made payment to any hawala under RAS) in order to tighten the network of RAS by reducing the hops required to get from solid RAS to everyone else in the world.

  28. Frank33 says:

    It appears there are a few ways to avoid Universal NSA Dragnet. You can become a Bankster who launders drug cartel money. Or you can become a Hedge Fund Manager who invests NSA secret funds in busineses such Fossil Fuel and For-Profit prisons. Corrupt military contractors will be protected from the NSA if they share the profits with the Secret Government Overlords.

    And the entrepreneurs of the NSA certainly get a piece of the action from all these profitable and secret arrangements.

  29. LeMoyne says:

    @LeMoyne: edited the above a couple times for clarity. Rework of conclusion:

    When the corporate store contains most instances (>50%) of the interactions that they track, I think you end up with an even greater percentage of the people in it. At some point, any action, person or group that they don’t have in the corporate store appears to be hiding from them. Trying to avoid the dragnet is almost certainly an RAS from the dragnet’s POV… so… once it begins to approach full: ZZZOOOOP!! it’s as full as they can make it.

    I think they got it full of most everyone’s activity years ago (at least the activity of all ‘interesting’ nonUS people). That’s an explanation for why the seed numbers have plummeted: they can’t find many seeds that aren’t already in there.

    However, they are likely re-seeding the corporate store with a compound selector. This would be some metagroup membership like made payment to any hawala [where all hawala are under RAS]. This re-seeding will shorten the distance from RAS to swathes of people i.e. it will reduce the hops necessary to reach US along one or more selectors.

    Perhaps for an even shorter path they tell themselves: all EFTs provide basis for RAS. Then they reseed w/selector = ever made an EFT without regard to time at all. Whatever they use, and remember they can use it all: Any node on any selector axis that has left over hops can then be used to rebuild the index from the bulk storage and expand the relationship diagram for that selector out to max hops.

    Of course, they are not keeping dossiers, but we have some hints about how they are constructing personas inferred by human and computer analysis. At a persona constructed from multi-selector analysis that has an extra hop for some selector, their maintenance daemons could pursue all selectors with that same extra hop. MOAR!!!

    Once the corporate store is full the only need for hopping is new surveillance capability or other discoveries of new networks. Essentially every new instance of any selector (new phone, new acct…) will interact with a pre-existing selector already in the full store, and thereby go in with the ongoing concurrency update. The only specific change proposed by the administration is to reduce max hops from three to two. Two is the minimum that could be called investigation. The proposed change from 3 to 2 could be a much more dignified equivalent to Charlie Sheen’s version of winning.

  30. greengiant says:

    Corporate store, a google of “etymology “corporate store” NSA” reveals
    The Information Society: Evolving Landscapes 1990 contains the phrase “the contribute to it, hence creating a collective, and corporate, store of knowledge”

    Store probably refers to a “data store” or date storage as opposed to the NSA store of hacks at compromising computers, routers, servers, modems, etc, and as opposed to say Walmart. I don’t know the orginal
    usage, but store was an assembly language term for copying to memory.

    Corporate could be a form of corporal or physically existing in some fashion as opposed to a adjective denoting some form of ownership or source. Which leads one to think about why such an adjective is needed. What incorporate stores of data does the NSA have. Thus corporate could mean, readily on line on a disk for example as opposed to a raw or temporary or longer term or archaic data store.

  31. C says:

    @joanneleon:

    I just don’t see how Keith Alexander could deny that those are dossiers. The corporate store is the very definition of a giant file cabinet full of dossiers.

    He may be defining a “dossier” as a specific report compiled on a person, i.e. pulling their needle out of the haystack. This is of course an entirely disingenuous fiction but it would be consistent with his other claims. In his mind and the mind of his lapdogs the huge piles of data don’t exist or matter until they look at them they just exist to give them, not you, “peace of mind.”

  32. guest says:

    corporate, as in “embodied” or “embodiment”, something that is created in physical form or as a legal entity under statute, and continues because it is maintained as such.

  33. guest says:

    I keep reading these 5 year limits and 10 years limits, but are there really any meaningful limits regarding how long this info is kept? In Snowden’s quote, I took it to mean just a for example timeframe, or maybe the timeframe from the time they were first able to store this much data. Just like the Staypuft Marshmallow man from Ghostbusters, I think that imaginary “permanent record” paranoia we grew up with in the 60’s and 70’s and learned to joke about in the 80’s has come to life to destroy us.
    Anyway, I don’t think Snowden would call it a permanent record if it only had a 5 or 10 year shelf-life.

  34. C says:

    @guest: While I won’t speak for Snowden I think that the answer to the how long question is variable. Storing yottobytes of data is expensive and storing everything forever would mean storing an exponentially growing store of data most of which would (a) never be accessed again; and (b) be so old as to be useless for anything but blackmail when it finally is accessed.

    So I suspect that the term “permanent record” means something they want to keep indefinitely while the 5 to 10 year limits are just practical guidelines for when most of the stuff (i.e. anything that hasn’t been accessed by an investigator or is flagged for later) is supposed to be tossed.

    But as we have learned, the NSA doesn’t even obey laws let alone guidelines so there is no reason to believe that anything is being tossed and the size and scope of the Bluffdale center makes it clear that the hoarding complex knows no limits. So ultimately the word “permanent” may be a practical goal.

  35. dnaDatabasing says:

    In response to some twittered speculation, I looked into whether NSA would have your DNA profile in the ‘corporate store’. I think it is much more likely that this would be available only as a federated query of CODIS/NDIS.

    The US uses 13 loci in DNA profiling, all tandem repeats of 4-5 nucleotides of variable length which I’ve listed below. For example, the first occurs as 24 possible alleles of AGAT repeats. You can find the rest of them listed at http://www.cstl.nist.gov/strbase/str_fact.htm

    CSF1PO, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, FGA, THO1, TPOX, vWA

    They would also have a read on amelogenin for XY sex determination, plus the hypervariable mitochondrial control region and multiple chrY loci for your tribe/ethnicity/ancestry.

    It would be feasible to store this as an n-tuple and do pattern matching of an unknown by simple vector subtraction. Alternately they could concatenate and take a few microseconds to do a Blast search (which would be insensitive to incomplete records and minor sequencing error. However they use common commercial software instead. Thus this does not fit in at all with NSA databasing methods for other selectors.

    The UK also maintains a database with many of these same loci. In the case of Gareth Williams — the GHCQ cryptographer who died under bizarre circumstances — his DNA was supposedly noon the pull-tag of the locked North Face bag nor on the edge of the tub. It was further claimed the SIMs and indeed the entire apartment was wiped clean by the perps — which is utterly impossible with DNA. We learn from this the UK had no interest whatsoever in identifying the responsible party, probably because they knew full well the trail would lead to MI5.

    Pretty decent exposition by the FBI of how the DNA seq is determined, handled, and searched:

    http://www.fbi.gov/about-us/lab/biometric-analysis/codis/codis-and-ndis-fact-sheet

  36. thatvisionthing says:

    What are all those scanners and photo guns and what have you in the camera/scope gauntlet you have to drive through at border patrol check stations, even if you’re not leaving or entering the country? I think the ACLU calls it the Constitution-free zone or something. I never voted for it and now I’m wondering what the hell are they getting and doing wth it? I asked the officer once if I was being x-rayed and he said he couldn’t tell me. Yes, I live in DiFi state, and no, I didn’t vote for her either.

  37. thatvisionthing says:

    @thatvisionthing: Because now I’m thinking of Jacob Appelbaum at 30C3 going through doodads in the NSA spy catalog:

    http://www.nakedcapitalism.com/2014/01/jacob-appelbaum-30c3-protect-infect-militarization-internet-transcript.html

    Okay, who here is not surprised?

    I’m going to blow your fucking mind.

    [laughter]

    Okay. We all know about TEMPEST, right?

    Where the NSA pulls data out of your computer, irradiate stuff and then grab it, right? Everybody who raised their hand and said they’re not surprised, you already knew about TEMPEST, right? Right? Okay. Well, what if I told you that the NSA had a specialized technology for beaming energy into you and to the computer systems around you, would you believe that that was real or would that be paranoid speculation of a crazy person?

    [laughter]

    Anybody? You cynical guys holding up your hand saying that you’re not surprised by anything, raise your hand if you would be unsurprised by that.

    [laughter]

    Good. And it’s not the same number. It’s significantly lower. It’s one person. Great.

    Here’s what they do with those types of things. That exists, by the way. When I told Julian Assange about this, he said, “Hmm. I bet the people who were around Hugo Chavez are going to wonder what caused his cancer.” And I said, “You know, I hadn’t considered that. But you know, I haven’t found any data about human safety about these tools. Has the NSA performed tests where they actually show that radiating people with 1 kilowatt of RF energy at short range is safe?”

    [laughter]

    My God!

    No, you guys think I’m joking, right? Well, yeah, here it is.

    This is a continuous wave generator, a continuous wave radar unit. You can detect its use because its use is between 1 and 2 GHz and its bandwidth is up to 45 MHz, user adjustible, 2 watts using an internal amplifier, external amplifier makes it possible to go up to 1 kilowatt.

    Just going to let you take that in for a moment. [clears throat] Who’s crazy now?

    [laughter]

    Now, I’m being told I only have one minute, so I’m going to have to go a little bit quicker. I’m sorry.

    Here’s why they do it. This is an implant called RAGEMASTER.

    It’s part of the ANGRYNEIGHBOR family of tools,

    [laughter]

    where they have a small device that they put in line with the cable in your monitor and then they use this radar system to bounce a signal – this is not unlike the Great Seal bug that [Leon] Theremin designed for the KGB. So it’s good to know we’ve finally caught up with the KGB, but now with computers. They send the microwave transmission, the continuous wave, it reflects off of this chip and then they use this device to see your monitor.

    Yep. So there’s the full life cycle. First they radiate you, then you die from cancer, then you… win?

  38. thatvisionthing says:

    @thatvisionthing: And speaking of…

    I have another question. Like, NSA, FBI, CIA… GCHQ, Five Eyes… how many peas, how many walnut shells, how many secret authorizations and goals?

    I’ve been wondering, ever since I left an NSA comment in a drone thread: http://www.emptywheel.net/2014/01/09/drones-and-double-agents-hassan-ghul/#comment-664067 — Why are person-hunting and “hand held finishing tools” like TAWDRYYARD and WATERWITCH etc in the NSA spy catalog? They were the gizmos Jacob Appelbaum went on to talk about next and he seemed to think they could be tools of assassination. (Am I misunderstanding?)

    Like, the FBI had COINTELPRO… but LBJ started a CIA domestic spy program Operation CHAOS targeted at antiwar movement. You can look at and fix the FBI all you like, but that wouldn’t stop the CIA. Unless maybe they spring from the same executive well, but who can fix that, now that checks don’t check and balances don’t balance? This all seems to be in the Constitution-free zone.

    Marcy has wondered in the past if there’s some sort of OLC memo authorizing killing Fred Hampton, and I keep wondering who gave the orders for Kent State… and now I get to wonder wtf NSA and your beacon targeting devices and “hand held finishing tools”? Like, what? NSA? What? What?

    I know I’m supposed to be feeling all nationally secure and happy and proud, but I’m kinda missing it.

  39. Fixated says:

    Those TAO catalog items called Find/Fix/Finish are tactical devices to find a cell-phone signal of a selector, get a geolocational fix on it (last mile), and then finish (kill) the person holding it, either by drone or team assault.

    I recall the main use was in Iraq, promoted by Alexander himself. It seems that by the time Afghanistan, the opposite side would have figured out to take out the batteries when not in use or (iPhone) put their cell phones in faraday cages when not in use.

    However that catalog also describes 3-4 passive radar retro-reflector implants in USB sticks and cables and hand-held devices that illuminate them at a distance for geolocation. So there’s no way to turn these things off. It goes without saying that Afghanistan doesn’t manufacturer any of its electronic devices — they’re all imported and subject to tampering and substitution.

  40. Pajarito says:

    @thatvisionthing: And what about vehicle license plate scanners? In wide use nationally: http://www.usatoday.com/story/money/cars/2013/07/17/license-plate-scanners-aclu-privacy/2524939/
    Data on where you were, when. Likely NSA just scoops it up in intercepted signals, say when the police car downloads to a server using wireless. Where I am, the city recently changed policy to store those data “only 2 years.” You can bet that is part of the Corporate Store.

  41. greengiant says:

    @dnaDatabasing: There is obviously a larger DNA program being executed by the government than what the FBI and local law enforcement are going on about. There was already a paper done on using more STRs, short tandem repeats, to predict the surname from an unknown y-DNA.
    This would be useful for unidentified terrorists such as the Boston bombers as well as other persons of interest.
    I’m not sure about what they would do for women.

  42. bmaz says:

    @Fixated: Hi there “Fixated”. My name is bmaz, and I see you relentlessly hitting our blog through a revolving set of sock puppet user names.

    STOP THIS ACTIVITY IMMEDIATELY. Pick one name and stick to it, or be banned. It is that simple.

    And, by the way, the ONLY reason I approved this comment was to have a forum to warn you.

  43. thatvisionthing says:

    @Fixated: NSA? “Promoted by Alexander himself?”

    I hadn’t thought of NSA as military, as assassinations being part of its functions, so much a part that it’s just another item in a 5-year-old catalog.

    Does he talk about this at defcon or black hat or anywhere? This is news to me. Do Wyden and Udall know and oversee?

    I’m sorry for the identity problem and appreciate your post.

  44. thatvisionthing says:

    @bmaz: bmaz, if you’re still monitoring this thread. Fixated’s comment made me think about a Jim Garrison quote that bloodypitchfork posted recently regarding Kennedy’s assassination and what this nation was becoming or had become. I read it late but greatly appreciated it.

    When I was looking for the Garrison comment, two possibilities came up. The correct link is in Marcy’s post on Obama’s annoted speech. But the other one was your diary on the 50th anniversary of JFK’s assassination, which I had missed at the time. (The post and comments are great, too, btw.) Here’s my question to you:

    – Garrison on national security: http://www.emptywheel.net/2014/01/17/first-impressions-obamas-speech/#comment-665261
    – You on Garrison: http://www.emptywheel.net/2013/11/22/50-years-that-day-jfk-and-today/#comment-656091

    What is your opinion of the Garrison quote, which seems remarkably similar to me to your conclusion in your JFK post?

    Also, re Fixated, I’m glad you let F’s comment post. I have sympathy with your efforts to maintain a trustworthy site, and I have sympathy with any commenter who may not trust premises. It’s all there.

    Thanks.

Comments are closed.