Seeing Through the Blizzard to Utah: How Much Space Does Metadata Need

In the blizzard of half-truths, dissembling, and prevarications about the nature of the National Security Agency’s surveillance programs, it’s easy to lose sight of the obvious. In this case, the obvious is about one million square feet in size.

First, a few other large scale objects for comparison:



Here’s Google’s data center in The Dalles, Oregon; note the size of cars in proportion to the size of the buildings on this campus. You’ll find cars are the best tool for estimating approximate physical scale of this and the following examples.



This is Apple’s data center in Maiden, North Carolina. Again, compare the automobiles against the building in the photo for scale.



Microsoft has a data center in Dublin, Ireland. It’s a little harder to estimate physical size in this photo. A key difference is the height of the facility, as if development was limited in footprint. 



And now Facebook’s Prineville, Oregon data center, which may be somewhat larger than the Apple site.

Unfortunately a good photograph of the NSA’s Bluffdale, Utah data center is unavailable for reposting here without hassle. You can take a gander at this link (original story at this link).

Compare the size of the NSA’s facility with the preceding data centers; keep in mind that the NSA has at least a half dozen other large facilities with sizeable storage capacity, including Fort Meade MD; Oak Ridge TN; Buckley AFB CO; Fort Gordon, Augusta GA; Lackland AFB, San Antonio TX; and Oahu HI, where Edward Snowden likely worked.

There are more facilities, too, referred to only as “satellites” in this article by James Bamford at Wired, as well as domestic and overseas “listening posts.” All of these facilities have serviced existing surveillance programs to date. Data storage has also decreased enormously in size and cost over the last ten years. (The average MP3 player and digital cameras shrank correspondingly; think of servers as doing a similar downsizing in mass over the same period.) This size reduction likely allowed increased volume of material to be handled at the same sites.

Of course the size of each of the data centers noted here is shaped by the layout of the facilities infrastructure. But each data center has similar basic requirements: adequate floor space for server racks, environmental conditioning services, auxiliary power sources. These scale along with the size and contents of the data center.

Nor are these data centers shown here the only data centers for each business. But as an internet user you’ve got a pretty good guess as to the amount of data handled by each of these social media businesses.

Once you’ve compared these photos, ask yourself just how big a facility hosting limited metadata really needs to be.

And remember the use of the word “collection” on the NSA slides the public has seen to date.

17 replies
  1. C says:

    The use of the term “collection” I think is quite deliberate. Note, for example the publicly cited uses of it (Zazi and Mumbai). In both cases the purported involvement appears to have been after the fact. That is, after the events happened they were able to go back and find the related people. And in the 300 uses cited they talk about the numbers as indexing meaning that they use it for retrospective search and evaluation.

    This this isn’t about advance warning, yet. It is about having a copy of the live stream later so that they can query it at will for anything that they are interested in. In other words it really is about supporting future surveillance. As such this data center needs to be infinitely large.

  2. Rayne says:

    @C: The scale of UT facility is massive, dwarfing these private sector sites. The location is also worth noting.

    I’ll have another post later in the evening about “satellite” sites. UT site might only be a minor player, which makes the scale even more incredible.

  3. lefty665 says:

    Small correction, that’s Agency, not Administration in the first sentence.

    “Once you’ve compared these photos, ask yourself just how big a facility hosting limited metadata really needs to be.”

    Did I miss something? Was there ever a suggestion that Beef Hollow Rd was intended only for metadata?

    Is it not more likely the master repository for everything (think “Hitchhiker’s G2TG)? That would allow their developing mining and predictive analysis tools to use it all.

  4. Dead Last says:

    Another big location for data centers is near Quincy, WA. Microsoft is located there.

    One requirement for data centers is available electricity. As the aluminum manufactures have left the Columbia River area in the northwest for places like Iceland (geothermal), computer companies have moved in due to the low cost electricity. I assume that the Bluffdale site is desirable because Los Angeles is soon to give up its rights to electricity from the Utah Power Project — a large plant burning Utah’s low-sulfer coal. Utah is also tech savy, having produced Novell and WordPerfect. And the workforce is hyper-patriotic, drawing on ex-missionaries who have spent two years abroad honing their language skills.

  5. Saul Tannenbaum says:

    I think it’s a conceptual mistake to think about this as a big repository. It’s certainly that, but it’s like more.

    The right analogy here is Amazon. The way a modern tech startup works isn’t to start acquiring servers and building out infrastructure, it goes out and rents virtual servers/storage from Amazon. They’re building an internal-to-NSA Amazon, to make it easy to spin up whatever they’re thinking up today.

    You’ve got to believe that there are the spook managmeent consultants inside the NSA talking about being more entrepeneurial, decreasing the friction required for someone to test out a new idea, and to have the capacity to test beyond state of the art algorithms.

    Or, to put it another way, the NSA has to be to cloud computing and big data what it once was to encryption: so far ahead it was hard for the lay person to even understand what they’re doing.

    Or, to put it another way: big-data analytic social science. In the 60s, pretty much everybody who was doing quantitative political science was CIA funded. The equivalent, these days, are taking monstrous data sets from, say, Facebook, and predicting whether you’ll buy a product or vote in an election. Or, take large scale criminal justice datasets and predict whether you’re going to be a danger if you’re let out on probation. That’s all happening in the open, in the big-data community. What’s the NSA equivalent of that?

  6. omphaloscepsis says:

    “NSA does provide some measure of the computing power at its new data farm in Utah. It requires 65 megawatts of power, enough for 65,000 homes. It also has its own power substation. In fact, Davis of the NSA says, the availability and relatively low cost of power put Utah at the top of the list for the center.

    That much power generates so much heat that the computers will fry without 1.5 million gallons of cooling water a day.”

    “The estimated power of those computing resources in Utah is so massive it requires use of a little-known unit of storage space: the zettabyte. as the amount of data that would fill 250 billion DVDs.

    The NSA’s Utah Data Center will be able to handle and process five zettabytes of data, according to William Binney, a former NSA technical director . Binney’s calculation is an estimate. An NSA spokeswoman says the actual data capacity of the center is classified.”

    Caption on one of the photos here says the Utah floor space is 1.5 million square feet, which is over 34 acres:

  7. Rayne says:

    @Bill Michtom: The article from which the photo came refers to “Dalles” not “The Dalles.”

    @lefty665: I’ll tweak for Agency, thanks.

    And that next to last last line was snark. My point was that the UT facility was far more than required for metadata.

    @Dead Last: I think people miss how remote this facility is, in comparison to other commercial data centers. Power source is a major consideration as is physical stability (i.e. weather and earthquake resistance). But remoteness…well. Make of it what you will.

    @Saul Tannenbaum: This is not a development facility as it is low density in terms of personnel. It’s purely cloud services OR a massive redundancy (or both).

  8. Saul Tannenbaum says:

    @Rayne: I know it’s not a development facility. Developers don’t need to be there in order to use it, just like they don’t need to be in an Amazon data center to use Amazon services.

    Let me be clear: I think what you’re doing in terms of physical analysis of the facility is immensely valuable. But, in terms of a conceptual analysis of what the NSA is up to, we need to look beyond simple data storage (though that’s certainly a part of it) to what the cutting edge of data analysis is and then gaze out a bit.

  9. Rayne says:

    @Saul Tannenbaum: The intention of this post is to provide what is lost in the snowstorm of words. While NSA, rest of gov’t, and subject businesses obscure attempts at clarity, the fact of this monolithic structure which cannot be obscured remains; it serves a purpose beyond whatever existing structures have provided over the last decade-plus.

    We’re not done here by a long shot with regard to looking both forward and back. More on that later.

    Different topic: which of you folks was looking at the Boundless Informant heat map?

    After pondering the map more carefully and in light of the data center research I did, I think Brazil was warm on the map in comparison to the coolness of the rest of South America for two key reasons:

    — Location of largest private sector commercial data centers on the continent (likely Sao Paulo);
    — End points of submarine telecom cables at multiple points in Brazil.

    In contrast the submarine cables providing service to northernmost tier of South American countries terminate in Florida—no need to monitor in SA, can be done from FL.

  10. lefty665 says:

    @Rayne: Duh! my bad, caught me in literal mode. That’s a hazard after parsing so many official weasel words. Thanks for another good post, keep ’em coming.

    Wonder what the fiber route is from Meade to Utah and the other data centers? Did they dig their own trenches or do they cohabit with the rest of us? Could that cable in my front yard be part of it?

  11. greengiant says:

    Are those diesel generators for back up power? Fuel cell generators have become more popular for cell towers and are even being used to power big box stores but they have not captured the whole market yet.

  12. TomVet says:

    @lefty665: Interesting question about the routes. Several years back when fiber was first gaining momentum there was a large spurt of fiber-cable-laying activity along the railroads. They go from major hubs in nice straight routes, have ample rights of way and easements alongside, and are generally less disruptive than highway installations. Added bonus is that they are less visible hence less folks wondering “what’s going on over there?”

  13. lefty665 says:

    @TomVet: It was mostly just for fun. Maybe they’re using China Telecom, they’ve got a straight shot from LA to D.C.:)

    Expect you’ve got it right, and the answer is mostly right alongside everything else with someone like AT&T maintaining it.

  14. Morris Minor says:

    @TomVet: This was early-mid ’80s. Southern Pacific Railroad started a subdivision to run fiber through their rights-of-way. They called it Southern Pacific Railroad Isomething Nsomething Tsomething. Or… SPRINT.

  15. TomVet says:

    @Morris Minor: Learn something new every day. Thanks to Acronym Finder that becomes Southern Pacific Railroad Intercontinental Network of Telecommunications. I saw the activity along the old B+O/Chessie/CSX/whateverthehecktheyarenow RR at about that time frame. Thanks.

Comments are closed.