By Philippe Duchesne on Feb 03, 2015
Now that large amounts of open data are becoming available, along with efficient visualization tools for their respective types, one of the next challenges is to make sense of these data in the scope of particular domains and use cases. Be it enriching a breaking news video with relevant graphs, contextualizing a budget report with related public policy excerpts, or bringing city statistics to life with localized pictures, it’s all about finding the right datasets that bring sense to each others. A fair part of making that sense lies in the ability to discover the right data, deconstruct it and tie the fragments together in mosaics that carry more information than the sum of these elements.
On the path to data valorization, the first step is discoverability of data. While cataloguing tools and open formats are now becoming mainstream (cf. CKAN and its numerous public deployments), usage of open metadata standards is still lagging behind. Sometimes because of proprietary metadata structures that prevent cross-domain discoverability, more often because datasets lack proper metadata altogether. If the former is being solved by the emerging use of standardized vocabularies (DCAT, INSPIRE, Schema.org to name a few), the latter is mostly a matter of raising awareness, in all data publishing bodies, that metadata is just as important as data.
The next step in data reuse is the ability to transform data to match the tools and frameworks where data is to be used. Having data in a open format is good, but there often exists multiple potential open formats for the same dataset, and each context of use comes with a set of tools that may support only some of them. CSV’s may need to be turned into KML, or XML into JSON. This is where on-the-fly data transformation tools such as The Data Tank come into play, and ease up data processing by removing format friction.
Lastly, real added value can be created by going below the surface of the datasets, i.e. by no longer consuming datasets as unsplittable entities, but rather chunking them, taking the relevant parts for the subject at stake, and stitching the fragments into meaningful data mosaics. Some standards exist or are emerging to tackle that problem, like URI Fragments, Open Annotations, and the whole Linked Data toolbox, but a complete stack for the authoring and publication of such mosaics is still to be produced. Once achieved, such an environment would allow anyone to easily deconstruct datasets, build contextualized data mashups and exchange them as documents on their own, while relying directly on the original, remote data sources.
Curious to find out more? Come to the Open Data Tools and Standards session at 13.30 in the Auditorium Félicien Rops, where we will discuss this further.