|
GLOSSARY Includes any dataset containing information collected by, or on behalf of, the Australian government or any dataset containing information collected by another Australian jurisdiction and provided to the Australian government for the common good. Confidentialise The protection of data confidentiality through the removal or altering of information, collapsing detail within a dataset, or aggregation and related techniques. For more information about confidentiality and how to confidentialise data see the Confidentiality Information Series Confidentiality The legal and ethical obligation to the provider of information to maintain and protect the privacy and secrecy of that information. Also see Confidentialise. For more information about confidentiality and how to confidentialise data see the Confidentiality Information Series Content data Refers to service or clinical information contained in a health record. It does not include demographic information. Cross Portfolio Data Integration Oversight Board This Board was established in 2011 to oversee the development of a cross government environment that is safe and effective for data integration involving Commonwealth data for statistical and research purposes. The Board is chaired by the Australian Statistician and membership includes the Secretaries of the Department of Families, Housing, Community Services and Indigenous Affairs; the Department of Health and Ageing; and the Department of Human Services. For more information see Cross Portfolio Data Integration Oversight Board Terms of Reference. Data custodians Agencies responsible for managing the use, disclosure and protection of source data used in a statistical data integration project. Data custodians collect and hold information on behalf of a data provider. The role of data custodians may also extend to producing source data, in addition to their role as a holder of datasets. Data integration See Statistical Data Integration. Data linking An element in the process of data integration. Data linking creates links between data from different sources based on common features present in those sources. Also known as 'data linkage'. Data provider An individual, household, business or other organisation which supplies data either for statistical or administrative purposes. Data user A person involved in accessing and investigating integrated datasets for statistical and research purposes. Data users include academics working in research institutions and employees undertaking research in Commonwealth and State/Territory agencies. De-identified data De-identified data is data that has had any identifiers (i.e. information that directly establishes the identity of an individual or organisation, such as name, address, Australian Business Number) removed. Deterministic (exact) linking Linking records belonging to the same unit by using a unique identifier such as Australian Business Number. End users People who examine research findings rather than produce outputs. Examples include employees undertaking research in public and private sector organisations, representatives from media outlets and consumer advocacy groups, and members of the wider community. Ethics approval A judgement made by an approved Human Research Ethics Committee that a human research proposal meets the requirements of the National Statement on Ethical Conduct in Human Research and is ethically acceptable before the commencement of such research. Ethics committee Shortened form of Human Research Ethics Committee (HRECs). HRECs protect the welfare and rights of participants involved in research. HRECs review proposals for research that involves humans, monitor the conduct of research and deal with complaints that arise from research. In the context of data integration involving Commonwealth data, some data custodians require that an ethics committee must approve a data integration project prior to the release of data. Ethics approval does not however guarantee that approval for data release will be given. More information on HRECs, including a list of registered HRECs, is available from the National Health and Medical Research Council. Exact linking See Deterministic linking.Governance arrangements The way that decisions are made, how they are communicated, how they are monitored and the extent to which sanctions are imposed for non-compliance. Identifiable data Identifiable data enables a person to establish the identity of a person or organisation to which some data relate. The identity of a person or organisation could be established directly if the dataset contains identifiers such as name and address, or indirectly if there is a combination of information in the dataset from which their identity can be deduced. Identifier Information that directly establishes the identity of an individual or organisation. Examples of identifiers are: name, address, driver's licence number, Medicare number and Australian Business Number. Also known as direct identifier. Institutional arrangements The organisation of activities associated with data integration, along with the characteristics and roles of institutions involved in such activities. Integrated dataset A dataset created by bringing together two or more datasets, generally at the unit level (i.e. for an individual person or business) or micro level (e.g. information for a small geographic area), for statistical and research purposes. Integrating Authority An Integrating Authority (IA) is the single agency ultimately accountable for the sound conduct of the statistical data integration project, leading it through its approval and implementation. For more information, see the paper on ‘Rights, roles and responsibilities of Integrating Authorities’.Microdata Microdata are unit record data, i.e. data for an individual person or organisation. Personal information Information or an opinion (including information or an opinion forming part of a database), whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonable be ascertained, from the information or opinion. Source: Privacy Act 1988 Privacy An individual’s right to have their personal information managed so that it is kept confidential except where informed consent has been given, or a legal authority exists, in accordance with the requirements of the Privacy Act 1988. Privacy Act The Privacy Act 1988 regulates the collection, storage, use and disclosure of personal information by Commonwealth and ACT government agencies and certain private sector organisations. Section 14 of the Privacy Act sets out 11 Information Privacy Principles that govern the conduct of Commonwealth agencies in their collection, management and use of data containing personal information. The Information Privacy Principles do not permit agencies to use or disclose, in identifiable form, records of personal information for research and statistical purposes, unless specifically authorised or required by another law, or the individual has consented to the use or disclosure. The states and territories have their own regulations governing privacy of personal information. Privacy Impact Assessment An assessment tool that describes the personal information flows in a project, and analyses the possible privacy impacts that those flows, and the project as a whole, may have on the privacy of individuals. The aim of a Privacy Impact Assessment is to identify and recommend options for managing, minimising or eradicating privacy impacts. For more information on Privacy Impact Assessments see Privacy Impact Assessment Guide, August 2006, Office of the Australian Information Commissioner, www.privacy.gov.au. Probabilistic linking Data linking based on the relative likelihood that two records belong to the same unit given a set of similarities/differences between the values of the linking variables (e.g. name, date of birth, sex) on the two records. Providers See Data provider.Re-identifiable data Data from which identifiers have been removed and replaced by a code, but it remains possible to re-identify a specific individual by, for example, using the code or linking different datasets. Research purposes Activities to investigate or explain phenomena, which result in statistical outputs or conclusions drawn in relation to population groups and not in relation to specific individuals, households, businesses or organisations. Separation principle The separation principle is one mechanism to protect the identities of individuals and organisations in datasets. The separation principle means that no-one can see the identifying or demographic information, used to identify which records relate to the same person or organisation (e.g. name, address, date of birth), in conjunction with the content data (e.g. clinical information, benefit information, company profits). Instead, staff can see only the information they need to do the linking or analysis. So, rather than someone being able to see that John Smith has a rare medical condition, or the profits earned by Company X, the person doing the linking sees only the information needed to do the linking (e.g. John Smith’s name and address) and the analyst just sees a record, with no identifying information, showing that a person has a rare medical condition together with any other variables needed for analysis (e.g. broad age group, sex).
Using data for statistical and research purposes means using it to describe characteristics of groups within the population, and relationships that might exist between variables such as social and economic conditions, behaviours and outcomes. It precludes use of a dataset for administrative or client management purposes (e.g. it cannot be used for detecting fraud nor for ensuring compliance), where there is an impact on specified individuals. Statistical disclosure control Involves managing the risks of an individual or organisation being identified, either directly or indirectly through released data. This risk is managed by confidentialising the data to minimise the risk of identification. Statistical disclosure control techniques Techniques for confidentialising a dataset to minimise the risk that the identity of a particular individual or organisation may be disclosed. Two broad statistical disclosure control techniques are data reduction methods which aim to control or limit the amount of detail available without compromising the usefulness of the information available for research, and data modification methods (perturbation) which involve changing the data slightly to reduce the risk of disclosure. Statistical outputs The result of any collection, storage, analysis and transformation of data where the individual statistical unit is of no interest in itself, and the results are presented in a form that does not reveal information about identifiable individuals. Statistical purposes Purposes which support the collection, storage, compilation, analysis and transformation of data for the production of statistical outputs, and the dissemination of those outputs and information describing them. Statistical purposes include the collection or use of information to provide for the drawing of a sample of statistical units for data collection.Return to Statistical Data Integration home |