Open data - Vox Publica https://voxpublica.no/tag/open-data/ Magasin om demokrati og ytringsfrihet Wed, 24 Sep 2014 07:18:16 +0000 nb-NO hourly 1 Finding the keys to government data: seminar report https://voxpublica.no/2010/01/finding-the-keys-to-government-data-seminar-report/ https://voxpublica.no/2010/01/finding-the-keys-to-government-data-seminar-report/#comments Mon, 25 Jan 2010 09:45:38 +0000 https://voxpublica.no/?p=2667 The seminar with the optimistic headline “Give us our data” was organised by the Infomedia department at the University of Bergen. The department has initiated and funded a fact-finding project on Norwegian government data this autumn, hoping that the project report and the seminar can help move the topic higher up on the political and business agendas.

A whole catalogue of interesting facts and opinions about government data — that’s what I’m taking away from the seminar (of course, as I helped organise it this is a completely subjective and biased view!).

Open data as a topic is unusual in that it brings together people with very different roles and backgrounds, from computer scientists via public sector specialists to journalists, business entrepreneurs and innovative civil servants. The presentations and debates at the seminar always zoomed in on the same questions, but from different angles: Why should more government data be made public? What obstacles are in the way and how can they be passed? What can we do with the data?

These are my notes from the seminar presentations, supplemented with slides from speakers. See also other reports and remarks (in Norwegian): Bente Kalsnes’ post on Origobloggen has sparked a lively debate, and a blog comment from Anders Waage Nilsen summarizes the day very efficiently.

Denmark: Demand-driven approach

Cathrine Lippert from Denmark’s National IT and Telecom Agency reported on the agency’s initiatives to improve access to government data. They include a project competition for innovative services (winners to be announced at a conference on February 4), an innovation programme directed towards the private sector, and a data source catalogue on the social platform digitaliser.dk. Planned is also an open data desk which can provide assistance, define guidelines and highlight good practices.

Download presentation (pdf).

The agency tries to advance its agenda by appealing to and bringing together interested groups in both the private and public sector. Lippert said the agency believes it can accomplish more by this demand-driven approach mobilising the grassroots. A top-down approach is hard, as open data does not have the same political weight as currently in Britain and the US.

Britain: Data and innovative journalism at The Guardian

The British newspaper is at the vanguard of using data in journalism. Simon Rogers, editor of the Datablog, explained how The Guardian works toward the “mutualisation of data”. Data is shared with users by publishing the data material behind stories on the Google Document platform — simple and user-friendly. A Flickr group has been set up to collect users’ own visualizations of data.

Increasingly, the role of journalists will be to guide the public through the vast forest of data; to be curators of information, Rogers said.

The Guardian’s “crowdsourcing” of researching the files of parliament members’ expenses is already famous. More than 23.000 users took part in reviewing the files. The editors learned from the experiment that when you ask users for help, you need to define manageable tasks and you should give the users something back for their efforts. When a new batch of data was released, the editors gave more specific tasks and the job was done in one and a half days.

Last week, The Guardian launched its own gateway to public data portals. In the future, they also want to give people visualization tools, Rogers explained.

Hidden data and how to find them

Web developer Harald Groven at the Norwegian Centre for ICT in Education focused his presentation on how vast amounts of highly interesting public sector data are kept under lock and key. In the analogue era publishing medium or low level aggregates of data was practically impossible — there wasn’t enough paper. This is no longer relevant, but the same practices remain, Groven said. Legal constraints are part of the reason why Statistics Norway and other institutions do not release more fine-grained data.

A Norwegian government data portal should concentrate on making available anonymized low level aggregated statistics, data sources that are largely unknown today, Groven recommended. He illustrated the proposition with examples from his own work developing services aimed at giving young people a better basis for making decisions about what to study. A type of data needed for one of the services, salary levels in different occupations, was difficult to get access to at a sufficiently detailed level.

A news journalist’s perspective: TV 2

Journalists often experience that public sector agencies want to control the presentation of data, Gaute Tjemsland of Norwegian TV 2’s news website said in his presentation. When TV 2 wanted the data from national school tests, the ministry responded by sending pdf documents, before finally caving in and releasing the spreadsheets that they had had all along. The reason was explicitly that they didn’t want the media to produce school rankings — i.e. present the data in their own way.

For journalists, the ideal situation is to get structured data, as detailed as possible, and as fast as possible, Tjemsland commented. He had encountered three main obstacles. Public sector agencies want to retain control over information; they are afraid of losing revenue; or in many cases they are not aware that their data can be valuable to others. The last obstacle is probably the most important, Tjemsland said.

The TV 2 editor proposed benchmarking the openness of government institutions. By defining variables to measure transparency, more pressure can be applied to have government data released. The media need to do their part by demanding information and should take a leading role in the debate about open data.

The need for a data.gov.no

In my own presentation, I emphasized four main findings from the project at the Infomedia department — based on a survey among state agencies, an evaluation of state agency websites and interviews with civil servants at the local and regional level.

We found that there is a scarcity of information about what data sources that actually exist. Very few agencies provide substantial information about their own datasets. Second, a central datastore, a data.gov, doesn’t exist; therefore we created a simple “store” of our own using a Google spreadsheet. With the help of a small community around 130 data sources have been registered there so far. Third, our survey and interviews convinced us of the great potential that exists in making more data available. Among other results, six out of ten agencies said they plan to make more data available during the next year. Finally, I highlighted how knowledge of open data issues vary widely across sectors and agencies. This probably reflects the low profile that the topic still has politically and in the public sphere.

In our project report we make ten proposals for making more data available in Norway. In the presentation I emphasized four of them: Create datastores at the state, regional and local levels; define principles and guidelines; give special attention to privacy issues; and define and fund pilot projects to kick-start the process.

]]>
https://voxpublica.no/2010/01/finding-the-keys-to-government-data-seminar-report/feed/ 1
Open government data in Norway: project report summary https://voxpublica.no/2010/01/open-government-data-in-norway-project-report-summary/ https://voxpublica.no/2010/01/open-government-data-in-norway-project-report-summary/#comments Mon, 18 Jan 2010 10:03:49 +0000 https://voxpublica.no/?p=2598 A project group at The University of Bergen’s Department of Information Science and Media Studies has during the past few months surveyed Norwegian state agencies and interviewed civil servants in different state and local government agencies about their views and policies regarding the release of data sources for re-use. The findings have been published in Norwegian in the project report “Fakta først” (pdf, 14 MB). Here you can read the project report summary in English:

The public sector collects and generates vast amounts of data. In recent years the interest in re-using public, non-personal data has been increasing among citizens, groups and companies outside the public sector. The media, civil society groups, businesses and private citizens can use public data as “raw material” to create new services, new insight and economic value. Efficient re-use of public data requires that public sector agencies inform about their data sources and make data available in relevant formats.

Practice varies strongly between Norwegian public sector agencies in different subject areas and across administrative levels (state/regional/local), this fact finding project from August to December 2009 has revealed. Some agencies offer detailed information about their data sources and have made data available for download. However, a major part of the agencies assessed offer insufficient or no information about data sources on the homepage of their websites. Here a fundamental requirement for the re-use of data is missing. The impression of varying interest and unused potential is amplified by the results of a survey among state agencies:

  • Two thirds of respondents say their agency possesses data with potential for re-use that is not utilized today.
  • The survey on the other hand suggests that the subject of open data is on the agenda in many agencies; more than six out of ten say they plan to make more data available for re-use during the coming year.

The survey shows that increased costs and the concern that external groups will misunderstand the data and misinform the public are cited as the two greatest obstacles against more data being made available. In addition, interviews with public sector agency employees suggest that the topic of making data available is new to some agencies.

A comparison with initiatives and debates about open public data in a selection of other countries (Britain, Denmark, Netherlands, USA) show that the attention the topic receives is greatest when it is placed on the agenda at the highest political level. The report recommends a number of concrete measures that it is assumed would quickly increase the selection of data sets made available for re-use. A website that collects public data sources, inspired by the US government’s data.gov, would be an obviously efficient initiative, especially when accompanied by a set of clear principles and rules and an “instruction manual” that describes how to make data available in a secure and user-friendly way. The report also points out the need for a parallel, ongoing debate about criteria for the constructive re-use of data. The media should, in cooperation with the public, play a leading role by producing examples of best practices in re-using open data.

]]>
https://voxpublica.no/2010/01/open-government-data-in-norway-project-report-summary/feed/ 6
From Civic Data to Civic Insight https://voxpublica.no/2009/10/from-civic-data-to-civic-insight/ https://voxpublica.no/2009/10/from-civic-data-to-civic-insight/#comments Thu, 08 Oct 2009 20:42:06 +0000 https://voxpublica.no/?p=1933 Earlier this year the water began to recede on government data in the United States with President Obama’s announcement of an unprecedented push toward further transparency in the federal government. But with the rush of new data comes the challenge of making sense of it all — something admittedly still in its formative stages.

By June of 2009 the nation’s Chief Information Officer, Vivek Kundra, had overseen the launch of data.gov with the goal of increasing public access to machine readable datasets produced by the federal government. Other government sites such as usaspending.gov and recovery.gov have since been launched to provide even more focused data on how the U.S. spends its taxpayers’ dollars.

A promise of increased participation

The promise of data.gov and of many of these other civic data collections is in allowing citizens to participate in the scrutiny of their government and of society at large, opening vast stores of data for examination by anyone with the interest and patience to do so. Open source civic data analysis, if you will.

The array of data available on data.gov is still sparse in some areas but has steadily grown to include things ranging from residential energy consumption, to patent applications, to national water quality data among others.

And the data transparency movement isn’t just federal anymore: State and local municipalities such as California and New York City are following suit with pledges to make more civic data available.

The benefits of the open data movement are also starting to be recognized throughout Europe. The U.K. has called on Sir Tim Berners-Lee, the inventor of the world wide web, to lead a similar government data transparency effort there, which should soon result in a data.gov analogue. And indications of movements are beginning to stir in Germany (link in German).

Beyond Data Scraping

In the U.S. various government data resources have been available in some form or another online for years now. Programmers could scrape these online data sources by writing custom parsers to scan webpages and create their own databases. And many journalist-programmers working in today’s modern newsrooms still do. But it’s messy, it doesn’t scale or extend well, it’s brittle, and ultimately the data that results may not interoperate well with other data.

Having government buy-in to the publication of organized and structured data lowers the barriers substantially for developers and others to get involved with analyzing that data. It also means that structured formats, such as those that conform to semantic Web standards can interoperate more easily and be utilized to build ever more complex applications on top of the data.

Data.gov ≠ Insight.gov

So now that the U.S. government is publishing all kinds of data online, society will be better, right? Well – maybe. Let’s not forget that data has a long way to go before it becomes the information and knowledge that can ultimately impact back on policy.

Some non-governmental organizations are pushing data to become information by incentivizing contests with big prizes. For instance, the Apps for America 2 contest, coordinated by Sunlight Labs, awarded a total of $25,000 to the top application submissions which made data.gov data more transparent and accessible for citizens.

These efforts at coordinating developers and stimulating application development around government data are vital, no doubt. The applications which result typically involve polished interfaces and visuals which make it much easier for people to search, browse, and mashup the data.

Take for example the Apps for America 2 winner, DataMasher, which lets users create national heat maps by crossing two datasets (either adding, subtracting, dividing, or multiplying values). These operations, however, can’t show correlation, and at best they can only show outliers. As one anonymous commenter put it:

I don’t get it. It shows violent crime times poverty. So these are either poor, or violent, or both? I don’t think multiplying the two factors is very enlightening.

What we end up with is that many of the possible combinations of datasets lead to downright pointless maps which add little if any information to a discourse about those datasets.

Data.gov and indeed many of the applications built around it somehow fall short of the mark in terms of helping people share and build on the insights of others – to produce information. It’s not simply that we need interfaces to data, we also need ways to collaboratively make sense of that data.

The Minnesota Employment Explorer was an early foray into helping people collaboratively make sense of government data. It not only visualizes employment information but also allows people to ask questions and build on the insights of others looking at the visuals in order to make sense of the data. In the long run it’s these kinds of sensemaking tools that will really unlock to potential of the datasets published by the government.

What’s Next?

With a long tradition of making sense of the complex, there’s a unique opportunity for the institution of journalism to play a leadership role here. Journalists can leverage their experience and expertise with storytelling to provide structured and comprehensive explorations of datasets as well as context for the interpretation of data via these applications. Moreover, journalists can focus the efforts and attention of interested citizens to channel the sensemaking process.

I’ll suggest four explicit ways forward here:

(1) that data-based applications be built with an understanding of trying to promote information and insight rather than simply be database widgets,
(2) that journalists should be leaders (but still collaborators with the public) in this sensemaking enterprise,
(3) that these applications incorporate the ability to aggregate insights around whatever visual interface is being presented, and
(4) that data.gov or other governmental data portals should collect and show trackback links to all applications pulling from its various datasets.

And finally, after we all figure out how to make sense of all this great new data, lies the question of whether government is even “listening” to these applications.  Is the federal government prepared to accept or adopt the insight of its constituents’ data analysis into policy?

]]>
https://voxpublica.no/2009/10/from-civic-data-to-civic-insight/feed/ 5