The main problem with these formats is that they are designed to be human-readable, and as such, are less easy for computers to parse and read. Documents in .pdf and .docx format will often have a great deal of formatting applied. Margins,
logos, letterheads, headers and footers may be great on paper, but are detrimental in the context of data analysis.
The bottom line is: whatever data is in the catalogue, it needs to be consumable by a computer. Any format that includes formatting (think of a PDF with a letterhead on top) is going to be less useful. While PDF data is ok for someone just trying to learn one or two facts, it’s very inconvenient for developers trying to mash datasets together.
How do we create a solid Open Data Catalogue?
Start small. Release a small number of datasets. Connect with the development community in order to choose which ones, because the developers are the ones who are going to connect your data with the public at large, by building databases, apps, and websites that take advantage of the data you’re releasing.
Use analytics to measure which datasets are popular, but don’t make the mistake of thinking that low-usage datasets are low-usefulness. You can’t be assured that every person who accesses the data will be getting it directly from you. A developer might download your map of neighbourhood boundary data, then write an API that allows others to access it. Your data catalogue might be the original source of the data, but it’s not necessarily going to be the only one, or even the most popular one. Accept the fact that once the data is out there, it’s going to be used in ways you might not expect.
To get a better picture of which data is useful, continue engagement with the community. Read their blogs, retweet their tweets, and host hackathons. Feel free to copy liberally from other Open Data Catalogues. If other cities have got a good mix of datasets, consider that these might be good candidates for the next phase of your data release.
Be Flexible, Be Bold.
Governments and developers are heading into uncharted territory with Open Data. Nobody knows exactly what the possibilities are, nor where the pitfalls lie. So communication is going to be essential.
To learn more, feel free to attend the Open Discussion About Open Data on May 23.
If you can’t attend, be sure to follow the conversation on Twitter, using the hashtag: #skdev
Francis Chary Mobile Software Engineer
Francis has been working in software development for almost 10 years. In three cities and two continents, he has worked for a multinational engineering company, a small local software shop, and everything in between. His guiding professional principle can be summed up with the words: "It's never the user's fault."