Concept Paper: Australian Longitudinal Learning Database (ALLD)
High quality education and training helps people to develop knowledge and skills that may be used to enhance their own living standards and those of the broader community. For an individual, educational attainment is widely seen as a key factor in obtaining a rewarding career. For Australia, having a skilled workforce is vital to supporting ongoing economic development and improvements in living conditions.
The Australian Government's Productivity Agenda recognises the need to support investment in skills and human capital, including measures to enhance teacher quality, improve the quality of early childhood education, and achieve ambitious targets for higher educational attainment rates. Good quality data are required to progress towards these goals and to provide the evidence base for ongoing policy development.
This paper introduces the concept of the Australian Longitudinal Learning Database (ALLD) as a core, enduring database of education and socio-demographic statistical information. An ALLD would link data on the pathways and outcomes of Australian students from early childhood education to schooling, post-school education and potentially labour force outcomes. The ALLD would be constructed from existing data sources, and, subject to community support, could include data drawn from the Australian Bureau of Statistics (ABS) Census of Population and Housing into a central, statistical and research base.
Information derived from the ALLD would allow governments and researchers to develop a better understanding of the drivers and underlying factors affecting student progress and outcomes. It would facilitate improved measurement of participation in early childhood education, school performance and social inclusion, and inform national agreement reporting through the Council of Australian Governments (COAG) and other monitoring processes.
IMPROVING EDUCATION DATA
Currently, data collected in the early childhood, education and training sectors is fragmented and sector specific. While there is a good deal of information available on participation in education and training, there is only limited information on the educational pathways and outcomes for students. This is largely due to the data being collected from a variety of sources.
There are some initiatives underway to bring data together, such as the My School website administered by the Australian Curriculum, Assessment and Reporting Authority (ACARA), which combines enrolment data and information about socio-economic status with National Assessment Program - Literacy and Numeracy (NAPLAN) results. However, there are currently no accessible databases that integrate data sources across the different education sectors for statistical and research purposes.
The ABS has developed a model of how existing information could be structured into a student-centred longitudinal database. The model would bring together information on early childhood education, schooling, Vocational Education and Training (VET) and higher education. Other information could also be incorporated, such as childhood development information from the Australian Early Development Index (AEDI), or the results of literacy and numeracy testing and academic results. All of this information could be stored in an enduring, linked statistical and research database.
Linking to population data sources, particularly to the ABS Census of Population and Housing, would combine a comprehensive and coherent picture of education and training from administrative sources with the contextual factors that influence learning. The Census would provide nationally consistent information covering the characteristics of students (including their socio-economic status, Indigenous status and disability information), together with information about their family and community that may influence learning. Importantly, the Census could provide the potential to explore the association between an individual's education and training experience, and his or her employment outcomes. Feasibility studies to assess the quality of data integration with the Census are planned following the 2011 Census.
There is also potential for integration with datasets in other domains, such as health and community services, for the production of multidimensional statistical outputs. However, this is currently outside the immediate scope of the project.
AUSTRALIAN LONGITUDINAL LEARNING DATABASE
The model being proposed by the ABS is known as the Australian Longitudinal Learning Database, or ALLD, and is represented by the diagram below.
The major arc in the diagram represents how enrolment information from the different sectors would be linked to provide student pathways from early childhood education and school to VET and higher education. Enrolment information could be linked to statistical collections such as the Census (block below the arc) providing a foundation of core socio-demographic characteristics and eventual labour force outcomes. The first thin band above the enrolment arc shows a variety of supplementary and education performance information (such as AEDI, NAPLAN and Year 12 results) which could be integrated into the database. The other band above the arc represents the multiplicity of pathways among education, the labour force and other activities.
Data in the ALLD would be potentially available for dissemination at national, jurisdiction and at small area levels.
Constructing the ALLD
In its role of leading the National Statistical Service, the ABS would take a leadership role in facilitating and developing the ALLD in collaboration with data custodians from the Commonwealth and state and territory governments, and across different sectors of education and training.
The ALLD project is consistent with the legislated function of the ABS to maximise the use, for statistical purposes, of information available to official bodies. It is also consistent with the High level principles for data integration involving Commonwealth data for statistical and research purposes endorsed by Commonwealth Portfolio Secretaries in February 2010.
The ALLD would be constructed using probabilistic linkage techniques. These techniques make use of variables such as age, sex, geographic location and other socio-demographic characteristics to match records from one dataset to those in another. The linking variables are chosen to provide a high probability that the matched records belong to the same person but there is some chance that they do not. Nevertheless, since matched records in the integrated dataset share key characteristics in common, it is assumed that they contribute to a database of enriched information for statistical and research purposes.
Statistical linkage keys may also be available to inform the linkage process. A statistical linkage key is a derived variable used to link data for statistical and research purposes that is generated from elements of an individual’s personal demographic data and attached to de-identified data relating to the services received by that individual (National Community Services Information Management Group 2004:12). Where statistical linkage keys or other mechanisms for direct matching are available (e.g. for a subset of records), benchmarking studies can be undertaken to provide information on the quality of probabilistic linking.
Improved administrative systems to follow students throughout their education/training, such as a unique student identifier, could assist in linking records over time and between sectors. Nevertheless, since the ALLD would be primarily based on probabilistic linkage, its construction could commence immediately without the implementation of a unique student identifier for each Australian student. There are currently, however, local identifiers (for example school student IDs at the jurisdiction level) that might be used in the data linkage process.
Options for data integration are being explored by a wide range of agencies, and different models will emerge over time. Strict ABS confidentiality methods would apply to integrated datasets held within the ABS, including the ALLD. Secure data analysis arrangements would be put in place so that data would be held in a safe and secure environment, legislative requirements under which the ABS operates would be met, and no information likely to enable the identification of an individual would be released.
A Multifaceted Approach
In order to create the ALLD, a number of components, or elements, could be progressively achieved. Some elements will build the ALLD as an enduring longitudinal dataset. Others may be more dynamic in which different datasets are integrated for particular statistical or research purposes. There is a degree of overlap among the elements described below, and they are not necessarily sequential.
1: Linking early childhood education and schools data
The ABS is already engaging with state and territory governments and the Commonwealth to develop unit record collections for Early Childhood Education and Care (ECEC) and the National Schools Statistics Collection (NSSC). We are in the process of researching and testing options for data linkage from the preschool to school collection and within the school collection over potentially 13 years of school education. As the ALLD would be student-centred (rather than institution-centred), students could be followed throughout their preschool and school education, even if they move from one state to another, or between the government and non-government school systems.
Benefits of this stage of the ALLD include the direct, rather than indirect (or apparent), measurement of student transitions, both from early childhood education to school and throughout the student's school career. The ALLD would provide direct measures of retention, greater flexibility of output from unit record data, and the use of uniform standards to provide nationally comparable statistics.
2: Linking to Census and surveys
While not based on exact matching, the benefit of linkage between enrolment and Census data would be the provision of a consistent base of demographic information about those participating in education, such as their socio-economic status, Indigenous status and disability information. Since the Census covers the whole population, an additional benefit of linkage (and non-linkage) will be identification of the characteristics of those who are not engaged in education. Thus, enrolment data, in conjunction with the Census, will form a valuable research tool, particularly for analysing the effect of early education on a child's future education outcomes.
3: Linking to education performance measures
Integrating ECEC, schools and Census data with education performance measures means greater potential to examine how students with different backgrounds perform and develop knowledge and skills over time.
4: Linking to education and training beyond school
5: Linking to post-education outcomes
Post-education measures from the Census, with a consistent set of labour force outcome measures including occupation, industry, hours worked and income, would complement the more specialised information available from graduate destination surveys. A longitudinal learning database incorporating Census data would also provide information on socio-demographic characteristics and enable comparisons between groups.
6: Other possibilities
CONFIDENTIALITY, PRIVACY, SECURITY AND ACCESS
The ALLD would comprise high quality, confidentialised and integrated data. It will not comprise a complete set of exactly matched records from different sources, rather a coherent statistical and research base constructed through probabilistic linkage that draws together data from the different sources. Maintaining confidentiality and the security of data will be paramount to the success of the ALLD.
The use and release of data from the ALLD project will be governed by the provisions of the Census and Statistics Act 1905 and the Privacy Act 1988. In addition, governance, storage and analysis of the ALLD data will be informed by national guidelines such as the High level principles for statistical data integration across Australian Government as endorsed by Portfolio Secretaries.
The ABS is currently improving the flexibility of secure analysis facilities for unit record data. Analysis of the ALLD data for approved statistical and research projects would be through the proposed Remote Execution Environment for Microdata (REEM). While the current focus of the REEM development is the analysis of household survey data, it is envisaged that later stages of REEM development will explore the analysis of linked datasets.
The key components of the REEM are the Survey Table Builder (similar to the Census Table Builder) and an Analysis Service (for statistical analysis). These services will enable researchers to analyse detailed microdata in a way that ensures no information likely to enable the identification of an individual is viewed or released. Confidentiality routines will ensure that statistical outputs are confidentialised in line with ABS legislative requirements and can be released as public use statistical outputs (that is, they can be published and shared with others without restrictions). The REEM will use internationally recognised standards for the exchange of data and metadata including the use of the Data Documentation Initiative (DDI), Statistical Data and Metadata Exchange (SDMX) and machine-to-machine interfaces (APIs).
Data integration using data sources from across Australia promises to result in significant and cost effective improvements in official statistics for statistical and research purposes, and evidence-based policy. The ALLD would use data already collected by education authorities for administrative purposes, thereby avoiding expensive and potentially burdensome new collections of individuals. The ALLD promises to more effectively use current data for statistical and research purposes.
Since it is based on probabilistic linkage, construction of the ALLD could commence immediately without the implementation of a unique student identifier for each Australian student. Current investigations into linkage of school enrolment records over time and the proposed 2011 Census data linkage studies (Census Data Enhancement Project: An Update October 2010, ABS cat. no. 2062.0) will provide evidence about the quality of matching achieved without a statistical linkage key.
The legislative framework of the ABS provides both the motivation for undertaking projects to maximise the use of existing data for statistical purposes and the safeguards on its confidentiality and security. As the ALLD project develops, the ABS would continue to engage with governments and the community to ensure that there is broad acceptance of this project, that data are held within a safe and secure environment, and that there are suitable processes for researchers to analyse the data.
The ALLD project is sponsored by the Strategic Cross-sectoral Data Committee for early childhood, education and training which reports to the ministerial councils within the education and training sectors.