Information (including personal information) collected by agencies for the administration of programs, policies or services and with the potential to be used for statistical purposes. Administrative data is one type of microdata.
Aggregate data are produced by grouping information into categories and aggregating values within these categories. For example, a count of the number of people of a particular age (obtained from the question ‘In what year were you born?’).
Aggregate data is typically presented in tables. Aggregate data is also referred to as tabular data or macrodata.
This term is most commonly used to refer to data from which direct identifiers have been removed (de-identified data), but it is sometimes also used to refer to confidentialised data. To avoid confusion the more specific terms de-identified data and confidentialised data are used in the Confidentiality Information Series.
Cell concentration rule
See Cell dominance rule.
A rule commonly applied to cells in a table to assess whether a cell may enable identification. The cell dominance rule (also called the cell concentration rule) is used to identify cells where a small number of data providers contribute a large percentage to the cell. If a cell fails this rule further investigation or action is needed to ensure that identification is unlikely.
For more information see Confidentiality Information Sheet 4: ‘How to confidentialise data: the basic principles’.
Data that will allow identification of an individual or organisation, either directly or indirectly.
To remove or alter information, or collapse detail within a dataset, to ensure that no person or organisation is likely to be identified in the data (directly or indirectly).
For more information see Confidentiality Information Sheet 1: 'Confidentiality: what is it and why is it important?’ and Confidentiality Information Sheet 4: ‘How to confidentialise data: the basic principles'.
Confidentialised Unit Record Files (CURFs)
Confidentialised Unit Record Files (CURFs) are files containing microdata that have been de-identified and modified to protect individuals or organisations from either direct or indirect identification.
‘An obligation to the provider of information to maintain the secrecy of that information.’
Source: UN Economic Commission for Europe, 2009
For more information see Confidentiality Information Sheet 1: ‘Confidentiality: what is it and why is it important?’
Rules that are applied to each cell in a table to identify table cells that pose a risk of identification (disclosure). Two common rules are the frequency rule and the cell dominance rule.
The organisation or agency which is responsible for the collection, use and disclosure of information in a dataset. Data custodians have an obligation to keep the confidential information they are entrusted with secret.
On-site data laboratories provide access to detailed microdata at a secure site controlled by the data custodian.
An individual, household, business or other organisation which supplies data either for statistical or administrative purposes.
A technique used to confidentialise data. Data reduction methods aim to control or limit the amount of detail available to avoid identification of a particular individual or business. Data reduction methods include combining categories of information or suppressing information for unsafe cells.
For more information about techniques to confidentialise data see: Confidentiality Information Sheet 4: ‘How to confidentialise data: the basic principles'.
Involves slightly altering small cells in a table to ensure results from analysis based on the data are not significantly affected, but the original values cannot be known with certainty. Data rounding may be random or controlled.
For more information about data rounding techniques see: Confidentiality Information Sheet 4: ‘How to confidentialise data: the basic principles'.
Data that have had any identifiers removed. May also be referred to as unidentified data.
Occurs when a direct identifier is included with the data that can be used to establish the identity of an individual or organisation.
Disclosure occurs when a person or an organisation recognises or learns something that they did not already know about another person or organisation through released data.
Managing the risks of an individual or organisation being identified either directly or indirectly.
A rule commonly applied to cells in a table to assess whether a cell may enable identification. The frequency rule (also called the threshold rule) sets a threshold value for the minimum number of individuals or businesses in any cell. Common threshold values are 3, 5 and 10. If a cell fails this rule further investigation or action is needed to ensure that identification is unlikely.
Able to be identified either directly or indirectly.
See Direct identification, Identified data and Indirect identification.
Data that include an identifier.
An identifier (direct identifier) is information that directly establishes the identity of an individual or organisation. Examples of identifiers are: name, address, driver's licence number, Medicare number and Australian Business Number.
Occurs when the identity of an individual or organisation is disclosed, not through the use of direct identifiers, but through a combination of unique characteristics.
See Aggregate data.
Unit record data where each record represents observations for an individual or organisation. Unit record data may contain individual responses to questions on a survey questionnaire or administrative forms. For example, answers given to the question ‘In what year were you born?’.
An unusual value that is correctly reported but is not typical of the rest of the population.
'Information or an opinion (including information or an opinion forming part of a database), whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonably be ascertained, from the information or opinion.’ (Privacy Act 1988)
Personal information is information that identifies, or could identify a person. There are some obvious examples of personal information, such as name or address. Personal information can also include medical records, bank account details, photos, videos, and even information about what a person likes, their opinions and where they work - basically, any information through which person is reasonably identifiable.
Information does not have to include a name to be personal information. For example, in some cases, date of birth and post code may be enough to identify someone.
A technique used to confidentialise data. Perturbation is a data modification method that involves changing the data slightly to reduce the risk of disclosure while retaining as much content and structure as possible.
Perturbation techniques include data rounding or data swapping.
For further information about techniques to confidentialise data see: Confidentiality Information Sheet 4: ‘How to confidentialise data: the basic principles'.
The individual’s right to have their personal information managed so that it is kept confidential except where informed consent has been given to release the information, or a legal authority exists, in accordance with the requirements of the Privacy Act 1988.
An individual, household, business or other organisation which provides data either to statistical collections or administrative collections. May also be referred to as a respondent.
Occurs when a user tries to find a particular person or organisation with a set of characteristics known to the user.
Remote access facility
Remote access facilities are used by agencies around the world to enable approved researchers to submit data queries through a secure internet-based interface from their desktop. The request is run against a confidentialised unit record file, which is securely stored within the data custodian's computing environment.
The presence of a rare characteristic in the data can pose an identification risk, depending on how remarkable (or extraordinary or noticeable) the characteristic is. This might include people in unusual jobs, very large families or young people with very high educational qualifications.
In the context of confidentiality, risk management involves identification and management of the risk of disclosure in accordance with the impact and likelihood of a disclosure occurring and within the constraints provided by legislation and policies.
Further information about the process of managing microdata disclosure risks is in Confidentiality Information Sheet 5: 'Confidentiality: managing the risks of disclosure in microdata’.
Safe storage and access to held data, including physical security of buildings and IT security.
An identification made without any deliberate attempt.
Purposes which support the collection, storage, compilation, analysis and transformation of data for the production of statistical output, and the dissemination of those outputs and the information describing them.
This means that information cannot be used for administrative, regulatory, law enforcement or other purpose that affects the rights, privileges or benefits of particular individuals or organisations.
Data suppression involves not releasing information that is considered unsafe because it fails confidentiality rules being applied.
For further information about techniques to suppress data see: Confidentiality Information Sheet 4: ‘How to confidentialise data: the basic principles'.
See Aggregate data.
See Frequency rule.
See De-identified data.
Used to characterise the situation where an individual can be distinguished from all other members in a population or sample in terms of information available on microdata records. The existence of uniqueness is determined by the size of the population or sample, the degree to which it is segmented by geographic information and the number and detail of characteristics provided for each unit in the dataset.
Unit record data
For more information about confidentiality, or to provide feedback on this series, please email: firstname.lastname@example.org