Knowledge Deficit: The Battle for Open Data

Data is more important than ever to businesses, citizens and policymakers, but the government's about to shut down a major data source.

In 2008, American fishermen caught 7.25 trillion pounds of fish. Of that haul, 493 million pounds consisted of flounder, which I can recommend prepared in a cream sauce.

Fish statistics are not data you need everyday, but if you have a business interest in the fish world, are concerned about over-fishing, or simply want a peek at the machinations of commerce that surround us every day, access to that data is important. But now the government is set to stop publishing a massive amount of publicly collected data, and time is running out for you and the businesses and researchers who rely on the information to do anything about it.

Those numbers about fishing came from the National Marine Fisheries service, but I found them in the Statistical Abstract, a publication of the Census Bureau that collects and indexes all the various data sets the government produces. Bureaucracy, after all, is largely an exercise in information management, and the government collects more information, on more topics, than you can imagine—seriously, take a look.

But budget cuts are putting your access to that information in peril. The Obama administration and Congressional Republicans are moving to eliminate the Census Bureau’s Statistical Compendia Branch, which has collected data from around the government and organized it for public consumption since 1878.

Their justification is the need to save $2.9 million, equivalent to a federal budget rounding error, and the suggestion that some (though not all) of that information can be found through other sources. Perhaps the most astonishing argument for the decision: Sales of the hard-copy version of the abstract have been declining, a trend that may have something to do with the fact that you can download the whole book for free at the Bureau’s website.

The decision has prompted public outcry from the right (Washington Post economics columnist Robert Samuelson), the left (New York Times economics columnist Paul Krugman) and perhaps most actively, America’s librarians, particularly research librarians, who are lobbying against the cut.

“They don’t understand the folly of these small cuts that aren’t going to really accomplish anything as far as the budget goes, but will have a real-life negative impact in just those sectors the administration and the Congress say they care about: Education and small business,” Emily Sheketoff, the executive director of the American Library Association’s Washington office, says.

Small businesses might not be able to afford professional research, but can use the statistical abstract to find data to aid their work—the number of people traveling to a national park each year and when could be useful if trying to start a restaurant, for example.

It’s also possible for businesses to use government data as a key part of their product. If you’ve used an app on your phone to track a public bus or train and time your arrival accordingly, you’ve benefited from publicly accessible government data and a programmer who knows how to leverage it.

But tools like that need data prepared to certain standards that allow software to manipulate and analyze it. In 2009, the Obama administration was bullish on offering data that way, hiring the government’s first-ever chief information officer and creating a suite of websites, including, to present the information.

Developers made apps to track earthquakes, the quality of hospital and nursing care, the reliability of different airports and flights, even when to plant your garden depending on the frost patterns for your region. And that’s just the outward-facing government data—the Obama administration also released a slew of data on government activities as a transparency measure, especially on spending connected to the stimulus.

But the budget squeeze came down on these initiatives, too, and the e-government budget will be cut dramatically next year after it briefly appeared the entire project would be shut down. Those cutbacks and the elimination of the Statistical Abstract are a worrying trend—just as it's getting easier for government to share more and more useful information with citizens, they're moving in the opposite direction.

That condition is fundamentally worrying in a democratic state—public information isn’t public unless you can actually access it, and collecting and publicizing data is one of the most basic functions of government.

Equally important, one key economic trend right now is using new technologies to leverage data. The fastest growing companies in the world are putting data at the center of their business models: Groupon uses real-time data about unused capacity at businesses to offer lower prices and increase demand; Google and Facebook use data about you and your passions to target advertising; the tech buzzword of the moment is “cloud” storage, having access to your data wherever you are.

In a world where that kind of business model raises hopes about new industries and concerns about privacy, taking the public sector out of the game would represent a major missed opportunity to provide huge amounts of new data while helping set appropriate standards for how it is collected and used.

Even if you don’t care to delve into public spending data, or you find fishery statistics a waste of your time, you’re paying for this information, and you should care about whether it’s available for the entrepreneurs and researchers in this country who need those numbers to make things better. As the song goes, you don’t know what you’ve got till it's gone, and when the statistical abstract is gone, you won’t know, period.

What Can You Do? The House of Representatives has already passed a budget that eliminates the abstract, and the Obama administration had proposed the cut in its annual budget. Reversing the decision before the final spending decisions are made by October 1, the start of the next fiscal year, will require action in the Senate.

All is not lost, as supporters of free information are doing their best to publicize the issue and pressure policymakers to reverse the decision.

The relevant senators are Barbara Mikulski and Kay Bailey Hutchison, the top Democrat and Republican, respectively, on the subcommittee that will decide on the funding for the Census Bureau. You can bother them on twitter at @SenatorBarb and @kaybaileyhutch—ask them to explain why information shouldn’t be public, and what they intend to do about it. You can also tweet at @ciodotgov, the official twitter of IT managers in the federal government, or @uscensusbureau.

For more traditional political outreach, the American Library Association has set up a website to make it easier to tell your senator you care about this issue. And the Sunlight Foundation, a leader on transparency issues, has a broader campaign around information access, Save the Data.

Photo courtesy U.S. Census Bureau