• NSS Home
  • Contact Us
  • Sitemap
 

National Statistical Service

Advanced Search

  • Home
  • Statistical References
  • NSS Networks
  • Resources
  • Themes
  • Seminars
  • Statistical Clearing House
  • NSS Events
  • Other Links

Skip to next image
Skip to next image


Concept Paper: Australian Longitudinal Learning Database (ALLD)

INTRODUCTION

High quality education and training helps people to develop knowledge and skills that may be used to enhance their own living standards and those of the broader community. For an individual, educational attainment is widely seen as a key factor in obtaining a rewarding career. For Australia, having a skilled workforce is vital to supporting ongoing economic development and improvements in living conditions.

The Australian Government's Productivity Agenda recognises the need to support investment in skills and human capital, including measures to enhance teacher quality, improve the quality of early childhood education, and achieve ambitious targets for higher educational attainment rates. Good quality data are required to progress towards these goals and to provide the evidence base for ongoing policy development.

This paper introduces the concept of the Australian Longitudinal Learning Database (ALLD) as a core, enduring database of education and socio-demographic statistical information. An ALLD would link data on the pathways and outcomes of Australian students from early childhood education to schooling, post-school education and potentially labour force outcomes. The ALLD would be constructed from existing data sources, and, subject to community support, could include data drawn from the Australian Bureau of Statistics (ABS) Census of Population and Housing into a central, statistical and research base.

Information derived from the ALLD would allow governments and researchers to develop a better understanding of the drivers and underlying factors affecting student progress and outcomes. It would facilitate improved measurement of participation in early childhood education, school performance and social inclusion, and inform national agreement reporting through the Council of Australian Governments (COAG) and other monitoring processes.

IMPROVING EDUCATION DATA

Currently, data collected in the early childhood, education and training sectors is fragmented and sector specific. While there is a good deal of information available on participation in education and training, there is only limited information on the educational pathways and outcomes for students. This is largely due to the data being collected from a variety of sources.

There are some initiatives underway to bring data together, such as the My School website administered by the Australian Curriculum, Assessment and Reporting Authority (ACARA), which combines enrolment data and information about socio-economic status with National Assessment Program - Literacy and Numeracy (NAPLAN) results. However, there are currently no accessible databases that integrate data sources across the different education sectors for statistical and research purposes.

The ABS has developed a model of how existing information could be structured into a student-centred longitudinal database. The model would bring together information on early childhood education, schooling, Vocational Education and Training (VET) and higher education. Other information could also be incorporated, such as childhood development information from the Australian Early Development Index (AEDI), or the results of literacy and numeracy testing and academic results. All of this information could be stored in an enduring, linked statistical and research database.

Linking to population data sources, particularly to the ABS Census of Population and Housing, would combine a comprehensive and coherent picture of education and training from administrative sources with the contextual factors that influence learning. The Census would provide nationally consistent information covering the characteristics of students (including their socio-economic status, Indigenous status and disability information), together with information about their family and community that may influence learning. Importantly, the Census could provide the potential to explore the association between an individual's education and training experience, and his or her employment outcomes. Feasibility studies to assess the quality of data integration with the Census are planned following the 2011 Census.

There is also potential for integration with datasets in other domains, such as health and community services, for the production of multidimensional statistical outputs. However, this is currently outside the immediate scope of the project.

AUSTRALIAN LONGITUDINAL LEARNING DATABASE

The model being proposed by the ABS is known as the Australian Longitudinal Learning Database, or ALLD, and is represented by the diagram below.

The major arc in the diagram represents how enrolment information from the different sectors would be linked to provide student pathways from early childhood education and school to VET and higher education. Enrolment information could be linked to statistical collections such as the Census (block below the arc) providing a foundation of core socio-demographic characteristics and eventual labour force outcomes. The first thin band above the enrolment arc shows a variety of supplementary and education performance information (such as AEDI, NAPLAN and Year 12 results) which could be integrated into the database. The other band above the arc represents the multiplicity of pathways among education, the labour force and other activities.

Data in the ALLD would be potentially available for dissemination at national, jurisdiction and at small area levels.

ALLD Model

Key Benefits

  • Effect of early education on school performance and beyond
    Through linking preschool and school data it will be possible to explore how environmental factors, and the experiences of early education, affect later school performance.
  • Characteristics of children not in early childhood education
    The characteristics of non-participants in the education system are not available from administrative records. Integrating education data with the Census would make it possible to compare the characteristics of people within the education system with those outside the system.
  • School retention measures
    Using the ALLD, it would be possible to obtain direct and accurate school retention measures which allow for students who move between school sectors or interstate.
  • Aboriginal and Torres Strait Islander students
    The ALLD would provide more accurate information on the educational participation, attainment and pathways of Indigenous students, with potential to reduce the reporting burden.
  • Low socio-economic status students
    The ALLD would provide a means of significantly improving the reporting of national education indicators by socio-economic status and following students from low socio-economic status backgrounds throughout their education.
  • Social inclusion
    The ALLD would provide insight into what particular characteristics are likely to increase barriers to education, and therefore, the issues that need to be tackled to overcome them. In addition to students from low socio-economic status and Indigenous backgrounds, other population groups of interest that could be identified using the Census include students with disabilities, students from culturally and linguistically diverse backgrounds and students in remote areas.
  • Education outcomes and productivity
    Through linkage to successive cycles of the Census, the ALLD may also provide a comprehensive picture of labour force, income, occupation and housing outcomes. This set of measures would complement information available from graduate destination surveys.

Constructing the ALLD

In its role of leading the National Statistical Service, the ABS would take a leadership role in facilitating and developing the ALLD in collaboration with data custodians from the Commonwealth and state and territory governments, and across different sectors of education and training.

The ALLD project is consistent with the legislated function of the ABS to maximise the use, for statistical purposes, of information available to official bodies. It is also consistent with the High level principles for data integration involving Commonwealth data for statistical and research purposes endorsed by Commonwealth Portfolio Secretaries in February 2010.

The ALLD would be constructed using probabilistic linkage techniques. These techniques make use of variables such as age, sex, geographic location and other socio-demographic characteristics to match records from one dataset to those in another. The linking variables are chosen to provide a high probability that the matched records belong to the same person but there is some chance that they do not. Nevertheless, since matched records in the integrated dataset share key characteristics in common, it is assumed that they contribute to a database of enriched information for statistical and research purposes.

Statistical linkage keys may also be available to inform the linkage process. A statistical linkage key is a derived variable used to link data for statistical and research purposes that is generated from elements of an individual’s personal demographic data and attached to de-identified data relating to the services received by that individual (National Community Services Information Management Group 2004:12). Where statistical linkage keys or other mechanisms for direct matching are available (e.g. for a subset of records), benchmarking studies can be undertaken to provide information on the quality of probabilistic linking.

Improved administrative systems to follow students throughout their education/training, such as a unique student identifier, could assist in linking records over time and between sectors. Nevertheless, since the ALLD would be primarily based on probabilistic linkage, its construction could commence immediately without the implementation of a unique student identifier for each Australian student. There are currently, however, local identifiers (for example school student IDs at the jurisdiction level) that might be used in the data linkage process.

Options for data integration are being explored by a wide range of agencies, and different models will emerge over time. Strict ABS confidentiality methods would apply to integrated datasets held within the ABS, including the ALLD. Secure data analysis arrangements would be put in place so that data would be held in a safe and secure environment, legislative requirements under which the ABS operates would be met, and no information likely to enable the identification of an individual would be released.

A Multifaceted Approach

In order to create the ALLD, a number of components, or elements, could be progressively achieved. Some elements will build the ALLD as an enduring longitudinal dataset. Others may be more dynamic in which different datasets are integrated for particular statistical or research purposes. There is a degree of overlap among the elements described below, and they are not necessarily sequential.

1: Linking early childhood education and schools data
The first element in the construction of the ALLD is to develop unit record collections of early childhood and school enrolment data, and establish links between them. Some aspects of this stage are well advanced already, and, with continued support from Commonwealth agencies, jurisdictions and the non-government sector, comprehensive unit record collections could be established over the next two to three years.

The ABS is already engaging with state and territory governments and the Commonwealth to develop unit record collections for Early Childhood Education and Care (ECEC) and the National Schools Statistics Collection (NSSC). We are in the process of researching and testing options for data linkage from the preschool to school collection and within the school collection over potentially 13 years of school education. As the ALLD would be student-centred (rather than institution-centred), students could be followed throughout their preschool and school education, even if they move from one state to another, or between the government and non-government school systems.

Benefits of this stage of the ALLD include the direct, rather than indirect (or apparent), measurement of student transitions, both from early childhood education to school and throughout the student's school career. The ALLD would provide direct measures of retention, greater flexibility of output from unit record data, and the use of uniform standards to provide nationally comparable statistics.

2: Linking to Census and surveys
Linking enrolment data to the ABS Census and potentially survey data is a second element in the construction of the ALLD. The ABS will use probabilistic linkage methods, based on information such as geographic location, age and sex, to match records between the Census/survey collections and education datasets. Trial studies in association with the 2011 Census have been proposed (see Census Data Enhancement Project: An Update October 2010, ABS cat. no. 2062.0). These will test the viability of different linkage methodologies and provide evidence about the expected quality of linked datasets. We propose to test this element of the project during 2012 in conjunction with processing of the 2011 Census.

While not based on exact matching, the benefit of linkage between enrolment and Census data would be the provision of a consistent base of demographic information about those participating in education, such as their socio-economic status, Indigenous status and disability information. Since the Census covers the whole population, an additional benefit of linkage (and non-linkage) will be identification of the characteristics of those who are not engaged in education. Thus, enrolment data, in conjunction with the Census, will form a valuable research tool, particularly for analysing the effect of early education on a child's future education outcomes.

3: Linking to education performance measures
A third element in the construction of the ALLD would be to integrate the results of educational assessments, such as the AEDI or NAPLAN tests. This element of the project would be undertaken in collaboration with the agencies responsible for these data and could be implemented once the viability of integration has been assessed and data protocols have been established.

Integrating ECEC, schools and Census data with education performance measures means greater potential to examine how students with different backgrounds perform and develop knowledge and skills over time.

4: Linking to education and training beyond school
A fourth element in the construction of the ALLD is the integration of VET and Higher Education data. Linking to these data sources would provide users with a more detailed picture of the educational pathway that students can take throughout their lifetime and provides insight into the association between school and post-school participation and education performance. However, it is recognised that such pathways may not be entirely continuous, as some people may enrol in VET or Higher Education as mature-aged students rather than progress immediately from school. The feasibility of linkage, including the resultant quality of integrated data, would need to be fully explored. There are a number of options for incorporating these data into the ALLD, and the ABS will continue to explore these with the relevant data custodians. It may be feasible to incorporate data from the school and VET/Higher Education sectors by the time of the 2016 Census or pilot test with the 2011 Census.

5: Linking to post-education outcomes
A fifth element of the ALLD project is linking to post-education outcomes, which may be examined using linkage to successive cycles of the Census, or to national surveys. This stage of the ALLD is noted separately for completeness, but is an aspect of the second element (linking to the Census and surveys), discussed above, in relation to early childhood and schooling. It could be incorporated as soon as effective linkage to the Census has been established for earlier stages and VET/Higher Education datasets are part of the ALLD system. The timing of this element could coincide with linkage to VET and higher education datasets described, in element four.

Post-education measures from the Census, with a consistent set of labour force outcome measures including occupation, industry, hours worked and income, would complement the more specialised information available from graduate destination surveys. A longitudinal learning database incorporating Census data would also provide information on socio-demographic characteristics and enable comparisons between groups.

6: Other possibilities
Other possibilities for the ALLD may include the integration of datasets from other domains such as health and community services. As the ALLD develops, the ABS will respond to user requirements for statistical outputs that relate education to other areas of social concern, and continue to consider public views on appropriate design and scope.

CONFIDENTIALITY, PRIVACY, SECURITY AND ACCESS

The ALLD would comprise high quality, confidentialised and integrated data. It will not comprise a complete set of exactly matched records from different sources, rather a coherent statistical and research base constructed through probabilistic linkage that draws together data from the different sources. Maintaining confidentiality and the security of data will be paramount to the success of the ALLD.

The use and release of data from the ALLD project will be governed by the provisions of the Census and Statistics Act 1905 and the Privacy Act 1988. In addition, governance, storage and analysis of the ALLD data will be informed by national guidelines such as the High level principles for statistical data integration across Australian Government as endorsed by Portfolio Secretaries.

The ABS is currently improving the flexibility of secure analysis facilities for unit record data. Analysis of the ALLD data for approved statistical and research projects would be through the proposed Remote Execution Environment for Microdata (REEM). While the current focus of the REEM development is the analysis of household survey data, it is envisaged that later stages of REEM development will explore the analysis of linked datasets.

The key components of the REEM are the Survey Table Builder (similar to the Census Table Builder) and an Analysis Service (for statistical analysis). These services will enable researchers to analyse detailed microdata in a way that ensures no information likely to enable the identification of an individual is viewed or released. Confidentiality routines will ensure that statistical outputs are confidentialised in line with ABS legislative requirements and can be released as public use statistical outputs (that is, they can be published and shared with others without restrictions). The REEM will use internationally recognised standards for the exchange of data and metadata including the use of the Data Documentation Initiative (DDI), Statistical Data and Metadata Exchange (SDMX) and machine-to-machine interfaces (APIs).

SUMMARY

Data integration using data sources from across Australia promises to result in significant and cost effective improvements in official statistics for statistical and research purposes, and evidence-based policy. The ALLD would use data already collected by education authorities for administrative purposes, thereby avoiding expensive and potentially burdensome new collections of individuals. The ALLD promises to more effectively use current data for statistical and research purposes.

Since it is based on probabilistic linkage, construction of the ALLD could commence immediately without the implementation of a unique student identifier for each Australian student. Current investigations into linkage of school enrolment records over time and the proposed 2011 Census data linkage studies (Census Data Enhancement Project: An Update October 2010, ABS cat. no. 2062.0) will provide evidence about the quality of matching achieved without a statistical linkage key.

The legislative framework of the ABS provides both the motivation for undertaking projects to maximise the use of existing data for statistical purposes and the safeguards on its confidentiality and security. As the ALLD project develops, the ABS would continue to engage with governments and the community to ensure that there is broad acceptance of this project, that data are held within a safe and secure environment, and that there are suitable processes for researchers to analyse the data.

The ALLD project is sponsored by the Strategic Cross-sectoral Data Committee for early childhood, education and training which reports to the ministerial councils within the education and training sectors.

We are interested in receiving feedback on this proposal. Please email your comments on this paper to: SCDCSecretariat@abs.gov.au or mail to:

Director
National Centre for Education and Training Statistics
Australian Bureau of Statistics, Locked Bag 10
BELCONNEN A.C.T. 2616

NSS QuickFind


  • This website is managed and maintained by the Australian Bureau of Statistics.
  • Privacy
  • Disclaimer
  • © Copyright
  • Contact Us
  • Sitemap
  • Creative Commons License