10 Principles for Managing Metadata
We have identified some simple principles for better metadata management. These apply almost irrespective of the type of metadata or its purpose.
Metadata management relies on making metadata explicit and recording it. If it's not “written down”, it can't be used to manage data.
1. Develop and communicate your metadata framework
It is useful to have a clear model of how metadata supports your data. Communicating this model within your organisation (and externally where appropriate) promotes understanding and consistency.
Don't make people guess about your metadata. Tell them what you use so it can inform both human interpretation and computer-based processes. For example, datasets must have structural metadata associated with them to enable use. It is important that this metadata is in a structured, consistent format which is able to be understood by anyone who may need it.
2. Support the lifecycle
Metadata has a life-cycle, from creation to active use and eventual retirement, that needs to be kept in mind. Develop methods and processes for dealing with retired or superseded metadata as well as metadata in current use.
This allows easier differentiation of current metadata, making inadvertent use of outdated metadata less likely. For example, conceptual metadata for a data item would be superseded if a new classification were introduced.
3. Make metadata active
Some metadata is passive, it's just documentation for data. Active metadata is used to control automated processes. Make metadata an integral part of the business process.
Making metadata active whenever possible will deliver more consistent processes as well as ensuring that the metadata itself is current and consistent. For example, using logic tables to define relationships between data items allows those relationships to be interpreted by people as well as machines.
4. Re-use metadata
Re-use existing metadata if it is fit for purpose. Don't create new metadata items until you've checked that an appropriate metadata item doesn't already exist.
Many copies of similar metadata creates confusion and complexity. Re-use, however, promotes efficiency and consistency. For example, by not having to respecify conceptual metadata for a data item appearing in more than one collection.
5. Preserve history
When you look at data you need to know about the metadata that went with that data at the time it was created, not just the latest version. Systems and process that deal with metadata should preserve the history of versions of your metadata.
This allows the comparison of data over time even though methods may have changed. A simple example is to maintain structural metadata for historical datasets to enable comparison over time.
6. Register metadata
Developing processes to register metadata lends authority to metadata and promotes use of standard metadata. You need to be able to determine who created a given metadata element and what authority they had, when it was created, its current status (whether its in use, retired etc) and who is currently responsible for it.
This gives authority to the metadata, promotes re-use, and saves confusion. Registration of metadata is a fundamental enabler for application of many of the other principles.
7. Use standards
Standards are agreed ways of doing things. Data-related standards are of two main types:
- structure and content of data, such as vocabularies and classifications like the Australia and New Zealand Standard Classification of Occupations (ANZSCO), and
- structure and content of metadata, such as standards for geographic information, metadata registries, web resources and statistical data exchange.
These standards are generally agreed across jurisdictions. You may also have preferred use or localstandards, which allow consistent description of data within an jurisdiction. Which standards you use will depend on the context and type of metadata. Use agreed authoritative standards where possible, but if these do not exist or are unsuitable, then stick to local standards. Using authoritative standards encourages real interoperability and comparability between jurisdictions and agencies.
8. Capture metadata once
Capture or develop metadata as close as possible to the process which creates it. This enables re-use of metadata throughout the life of the data rather than needing to develop slightly different metadata for reporting or output later.
Re-purposing metadata avoids inconsistencies in metadata as well as inefficiencies in its creation. We can use, for example, conceptual metadata such as data item labels for both collection and dissemination purposes.
9. Derive metadata
Metadata can often be generated as a byproduct of automated systems. Creating metadata automatically from systems where possible minimises the human effort needed.
This promotes completeness (metadata is always created when a process occurs) and accuracy (limits human errors). For example, extracting explicit structural metadata for datasets from the process which creates them, rather than creating it independently.
10. Metadata needs to be fit for purpose
Data should be supported by metadata that is accessible and usable in context of the information needs of your clients. To be useful, the metadata must be readily available in form that is meaningful to those using it. In many cases, data users get the benefits of good metadata, while data producers bear the costs creating and maintaining it. Make sure the cost of production is justified for the producers, and that the metadata created is useful for users.
Data is not always used immediately. If metadata is not captured when data is produced, the data may be of little value to those trying to use it later when no one remembers what it is or how it is structured. The cost, however, of developing and maintaining perfect metadata for all data held within an agency is likely to outweigh the benefits for both users and producers. Targeting metadata resources in these cases is imperative to ensure a cost benefit balance.