Metadata tool creates sustainable business processes
The Questionnaire Development Tool (QDT) was designed to integrate metadata across household surveys. The ABS Household Survey Program includes surveys such as the National Health Survey, Survey of Education and Training, Household Expenditure Survey and many more. The program produces data on health, education, personal and household income, household expenditure, time use and disability. As such, the program is an important tool in measuring Australia’s social well being, providing information to inform the debate on these issues.
Prior to QDT development, the metadata processes for these surveys were cumbersome and sometimes non-existent. When responsibility for household surveys was decentralised, the haphazard way that metadata had been constructed was highlighted. A combination of manual and automated systems had been used, creating inconsistencies - in part because of a perception at the operational level that there was little tangible value in creating metadata.
In 2005, the production of a metadata driven system for the development of household surveys became a focus for the Integrated System for Household Surveys (ISHS) project. A proof of concept working model of the QDT was released in mid-2008. This demonstrated the benefits of systematised metadata and a more complete metadata framework for household surveys, saving the organisation an estimated $200,000 per year. The technology of the framework and design is highly extensible making uses in collections other than household surveys a possibility.
New business program highlights inconsistencies in practice
A restructure of the ABS Household Survey Program in 2004 realigned the responsibility for developing and processing household surveys from specialist sections to the relevant survey management areas. This change provided the impetus to examine the household survey process and streamline it to enable non-specialists to undertake the work.
The Integrated System for Household Surveys (ISHS) was a transformational project instigated to undertake an end-to-end analysis of the household survey cycle in order to identify where improvements could be made.
Integrated approach identifiesincreased benefits
The specialist sections that undertook survey development and processing prior to decentralisation were used to the fragmented metadata frameworks in place. The decentralisation highlighted the difficulties in a fragmented approach to metadata practice and the need for an integrated process to be developed. Issues identified included:
• gaps in both the metadata model and its practical application,
• manual processes for transferring data and metadata between platforms (for example, inconsistencies resulted from metadata driving the collection of data being developed independently to that driving the processing), and
• a lack of obvious benefit from producing early and well-defined metadata.
These findings led to the creation of the Questionnaire Development Tool (QDT).
Fragmented practice unsustainable in new environment
Initially the approach to applying metadata across survey points was inconsistent.
Specifications for the programming of data capture instruments were developed manually. These related to the question text/sequencing as well as edits, and existed in a purely text format. They were not usable by technical systems.
There was system support for detailed metadata for data transformation processes, but the system design limited its actual use. For example, maximum and minimum data item values could be defined, but there was no demonstrable benefit in populating them because they weren’t used for anything.
The existing framework went some way towards developing active and useful metadata. It was designed for use by technical systems and across other metadata frameworks, but was complex and highly fragmented.
Gaps existed in data item metadata, both in the framework and its practical application. For example, items being output were not linked to the questions asked to collect the information. In addition, edits applied at the time of collection were specified to the programmers as text, and were subject to individual interpretation and implementation.
Improved accuracy creates greater value
The introduction of the QDT has resulted in some changes to business practice. Greater processing consistency now exists across survey management areas and data quality has improved, particularly as the human error has been minimised through the introduction of automated processing code generation and metadata-based validation tools.
There is increased understanding of metadata and its value amongst internal staff and external clients. This has led to improved communication with users and researchers about how data is compiled.
The re-use of metadata elements is increasing as repetition across platforms is minimised.
Metadata able to be recycled
The QDT provides a new and more complete framework for the metadata surrounding survey, reducing the need for redevelopment of metadata. QDT is designed to ensure that the metadata it stores is easily used by other systems.
The proof of concept work has established that metadata-driven specifications can be used to automatically generate data collection instruments.
• The link between questions and data items is now explicit and well-defined in human-readable form.
• Question modules and their associated data items can be identified as ‘Standard’ and ‘Preferred Standard’, thereby encouraging their re-use.
• Edits can now be defined in logic tables enabling them to be implemented automatically by technical systems.
• Registration status is applied to each metadata element reflecting the entire metadata lifecycle.
Activating metadata adds recognised value which, in turn, encourages early and accurate definition. By developing value-adding functionality, the stronger use of the metadata elements is encouraged. For example, a facility was developed to automatically compare data to the metadata describing it, encouraging the use of fields such as minimum and maximum values, as well as full definition of classification metadata.
Benefits extend beyond cost savings
The overall cost of developing the QDT was approximately $2 million. It has been estimated that this will create significant savings through the automation of manual processes and increasing the re-use of metadata elements.
The creation of the QDT as an active metadata framework for household surveys has clearly demonstrated benefits, not only in potential cost savings, but in increasing the ability of the ABS to add value to survey data. The QDT itself can be extended to apply to other data sources, such as business surveys and administrative collections, thus better positioning the ABS to respond to emerging demands and to take advantage of future technical developments.
What lessons were learnt?
The QDT development applied most of the 10 metadata management principles identified in this paper. In particular though, the benefit derived from registering metadata includes;
• re-use of metadata elements,
• identification and use of agreed standards,
• identification and use of current (as opposed to superseded) metadata, and
• knowing that the metadata is authoritative.
Make metadata active
The processing benefits from the QDT mainly derive from the active use of logic tables to ensure consistency of application of rules (for edits, sequencing and derivations). Active metadata is also used to drive the dataset creation process. This results in the automatic production of authoritative structural metadata (for example data item lists) for all datasets. Please read the case study for further information
Want to find out more?
For more information about the ABS, visit www.abs.gov.au
For more information about the NSS, visit www.nss.gov.au