On Saturday coders around the country joined a hackathon organized by DataRefuge and the Environmental Data and Governance Initiative, racing to archive NASA’s climate science data before the country wakes up one day and it has miraculously disappeared. This is a legitimate fear, as the Trump administration has begun its mission of nipping and tucking certain information from public access.
The 200 coders met in Doe Library on the UC Berkeley Campus, but there were similar hackathon communities with the same intended mission in over twenty cities. The groups developed an efficient system in which hackers were split into two roles: the taggers and the baggers. The taggers were responsible for finding and marking the specific sites and data sets that needed to be archived, while the baggers were in charge of writing the code to download all the data into the Internet Archive, a digital library. “The process involves developing web-crawler scripts to trawl the internet, finding federal data and patching it together into coherent data sets,” writes Wired. This task is more difficult than one might imagine because there is essentially no consistency in the way government data has been presented on public sites in the last thirty years.
Nevertheless, when the hackathon ended the coders had successfully downloaded 8,404 NASA and DOE webpages onto the Internet Archive— essentially all of NASA's climate data. They also developed “backdoors” to download 25 gigabytes from 101 public datasets, and were expecting even more to come in as scripts on some of the larger datasets finished running, reports Wired.
But that’s not all the hackers accomplished. Figuring that this disappearing information will continue to be an ongoing crisis, the programmers are developing software that will help track the changes in websites, so that we will be aware of what we are losing and when. Engineers call this version control. For instance, the Global Data Center's reports and one of NASA's atmospheric carbon dioxide (CO2) data sets has already been removed from the web.
"Climate change data is just the tip of the iceberg," Eric Kansa, an anthropologist who manages archaeological data archiving for the nonprofit group Open Context, told Wired. "There are a huge number of other data sets being threatened [that are rich] with cultural, historical, sociological information."