This guide is for IT managers, GIS coordinators, asset managers and database administrators.
The technical process for publishing open data is:
- Establish an open data site
- Identify datasets to publish
- For each dataset:
- Identify a standard to follow, if possible
- Export the data, conforming to the standard
- Create a dataset on data.gov.au, providing metadata and choosing an appropriate licence
- Upload the file as a resource.
- Schedule regular updates
1. Establish an open data site
For simplicity, we recommend that Victorian councils publish open data to data.gov.au, the Australian Government’s open data repository.
All data-publishing Victorian councils other than City of Melbourne use this portal. It is free and provides storage for the data itself. It includes a number of useful services, such as automatically converting between geospatial data formats, and providing web previews of geospatial data.
The Victorian Government’s open data portal, data.vic.gov.au does not currently (late 2015) provide storage, and thus requires organisations to provide their own data access infrastructure.
Follow the instructions in the data.gov.au toolkit to gain an account and publish a dataset.
South Australian councils generally publish to data.sa.gov.au, which is also powered by CKAN.
Two alternatives from commercial providers include:
- Socrata, currently used by City of Melbourne and the ACT.
- ArcGIS Open Data, currently used by City of Launceston and City of Hobart. You can combine this with a data.gov.au presence, by enabling automatic harvesting (see data.gov.au toolkit).
2. Identify datasets to publish
Start with datasets that are known to be of good quality, are free from privacy or confidentiality issues, are likely to be immediately useful, are updated infrequently, and have a defined OpenCouncilData standard.
Good starter datasets that don’t change frequently are: drain pipes, waste collection zones, dog walking zones, and customer service centre locations.
→ Consider establishing a dataset register to identify and prioritise candidate datasets.
3.1 Identify an open data standard
It is strongly recommended that you follow existing standards as much as possible. This greatly improves the value of the data.
The Open Council Data standards (standards.opencouncildata.org) are lightweight, open standards developed specifically for public data sharing by Australian local governments. They focus on transforming existing data by appropriate naming of attributes, and continue to evolve in response to the needs of councils using them.
For budget data, consider the evolving Fiscal Data Package.
In some cases, “heavyweight” standards also exist, such as Aspec‘s stormwater drainage specification, or ATDIS for planning development applications. Publishing in two different standards would maximise the usefulness of your data to different audiences.
Tabular (spreadsheet) data should be published as CSV (comma-separated values) in “clean sheet” format. Specifically: one header row, commas between values, double quotes surrounding values containing commas or double quotes (which must be doubled), and no metadata of any kind embedded in the file. For instance:
1,"13, ""Quoted"" Street"
Geospatial data should be published as:
- CSV, for point datasets, following the csv-geo-au standard.
- GeoJSON, for all other spatial datasets, and optionally for point datasets.
- Possibly ESRI Shapefiles in addition.
- Images and video should probably be hosted in a dedicated site and linked to from your data portal.
- Documents such as reports or council minutes can be provided as PDF files, but links to online HTML versions are better.
3.2 Preparing datasets
Preparing a dataset means taking steps to make it as useful as possible:
- Cleaning data:
- Replacing values such as “Unknown” or “No name” with blanks or nulls, especially for numeric or date fields.
- Standardising attribute values.
- Transforming: Exporting to a chosen standard. Tools such as OGR2OGR or FME may help automate this process.
It typically takes an experienced GIS specialist around 1-2 hours to set up each repeatable dataset transformation, although of course there are exceptions.
3.3 Create a dataset
A single dataset in data.gov.au can contain several “resources” (files or remote URLs). See the data.gov.au toolkit for specific instructions.
A data.gov.au record supports a rich set of metadata, giving the potential user of the data context such as: who published this? how often is it updated? what is in it? who is the ultimate source? how do I get in contact with them? what is the licence?
See the data.gov.au toolkit on metadata to see the information you will need to provide.
For datasets that follow an Open Council Data standard, add tags recommended by the relevant standard, for instance “opencouncildata” and “ocd-dogzones-0.1”
The recommended licence for almost all open datasets is Creative Commons Attribution 4.0 International, abbreviated CC-BY. This licence allows anyone to have confidence that they have a legal right to use the data for any purpose.
It allows the user to:
- Share: copy and redistribute the material in any medium or format
- Adapt: remix, transform, and build upon the material for any purpose, even commercially.
It requires the user to:
- Attribute: give appropriate credit, provide a link to the licence, and indicate if changes were made.
The CC-BY licence is the standard recommended by the Australian Government Open Access and Licensing Framework, and adopted as the default by the Australian Government and all state governments.
It may be tempting to consider a more restrictive licence, such as requiring permission to be sought before use, or not allowing commercial use. This is strongly discouraged. Restrictive licences significantly inhibit a wide range of creative, exploratory, or otherwise innovative uses. In most cases it is better to use social encouragement rather than a legal term: “Please sign up to our email list for updates about this data” rather than “You must register your email address in order to access the data.”
If the idea of someone making money from your data seems threatening, consider that they may well be using the revenue to fund a service or product that is desirable but low priority for your council.
One rare exception where a non-commercial licence may be justified is if it is critical to impose additional obligations on commercial users.
For instance, the City of Greater Geelong relies on architecture firms to submit 3D models of new buildings in order to maintain their CBD 3D model. They consider it unviable to release the data under an open licence (and allow firms to neglect this obligation), so may release it under a non-commercial licence as a compromise.
3.4 Upload the file(s)
4. Schedule regular updates
Many datasets will need to be regularly updated in order to remain useful. (Exceptions include those which refer to a specific time period, such as the budget for a given year.) In order to minimise the resources consumed in this process, you will probably want to automate this.
→ See Automatic publishing tools for several tools to enable you to make this a completely hands-off process.
Tell people within your organisation: web communications, engineering, planning, people who interact with researchers, community engagement, and so on.
→ Reach out to open data networks in Australia.