This chapter outlines the key issues to be considered when designing a survey or an administrative collection. It covers topics relating to survey design, data collection, sample methodology, questionnaire development and testing, and respondent load.
3.1 SURVEY DESIGN
The design of the statistical collection, whether it is a survey or an extraction from an administrative system, should consider all aspects from collection of the data through to dissemination and end use of the statistics. These aspects are highly interrelated and the design should ensure that all stages link together effectively to satisfy the objectives of the statistical collection.
3.1.1 Data Requirements
Detailed data requirements should be specified early in the collection process to ensure sufficient time to properly test new items and resolve potential data quality issues.
It is useful to think consider outputs rather than the questions to be asked in the collection to encourage consideration about how the data will be used. Data requirements should be expressed in terms of concepts being measured and output to be produced. Specifications should be as precise as possible to meet specific data requirements of users. Clear definition of data items will assist in achieving a well designed questionnaire.
Data specifications are an essential part of the process because they will largely determine whether existing data sources (e.g. administrative data or an existing collection) can be used or if a new collection is required. Further, data specifications will largely drive the choices of collection methodology, sample selection, data processing and analysis.
Design should also be based on requirement definitions. For example, a sample design developed to support estimation at state level may not allow meaningful analysis at lower geographical levels.
3.1.2 Units of Analysis
Units are entities about which data is collected. They are the objects of observation. These can be physical units (e.g. persons, families, households, business units) or events (e.g. births, school enrolments) or transactions (e.g. sales). Unit of analysis is an important characteristic of statistical output and should be defined during survey design.
3.1.3 Survey Population
In statistical collections the population refers to a set of units about which data is required. Statistical populations have two attributes:
· Scope or target population refers to the boundaries of the units of data collection or analysis. The target population should be defined in terms of content (e.g. all persons), units (e.g. households), extent (e.g. Australia) and timeframe (e.g. quarterly). In identifying the target population for a collection where more than one analysis unit is used, the lowest level of unit should be used to identify the target population (e.g. persons in Australia on 1/1/2009, retail establishments in NSW with turnover greater than $50,000 in 2008-09).
· The coverage of the survey is the subset of the target population of interest being surveyed. With limited resources, certain "difficult" parts of the population may not be covered by the survey sample, e.g. homeless people, people with poor English language proficiency. In producing estimates the sampled units should also represent some of the characteristics of these unsampled (due to coverage) subpopulations.
3.1.4 Data Items
These should be identified and defined early in the phase to ensure that the collection meets the statistical needs of users. Output data items are the building blocks for data analysis, results, outputs and outcomes in a statistical collection. Data items form the basis for the survey form design and question wording. Data items need to be carefully converted into questions and any new questions should be thoroughly tested before included in the survey. The data items, along with metadata should be linked with the data management strategy.
Output items which involve combining other data items are referred to as derived items. For example, the collection unit of a survey of the labour force would be persons, and a data item collected would be hours worked. The labour force standards and classification framework classifies the population aged 15 and over into three categories: employed, unemployed and not in the labour force. The labour force is a derived by the addition of employed and unemployed categories. See Chapter 9 – Data Management for further information on handling of data from acquisition through processing, output and storage.
3.1.5 Standards and Classifications
Standards and classifications refer to a standard set of concepts and rules which define the way data is collected, grouped, analysed and reported. The use of standards and classifications help to ensure consistency in the use of concepts and definitions in statistical collections. Standards and classifications allow comparison of statistics over time and between collections and assists in integrating data from different organisations. The ABS has developed a wide range of standards and classifications covering a number of topics such as industry, education, occupation, countries, languages etc. See Chapter 10 – Statistical Infrastructure for additional information on ABS’ standards and classification frameworks.
3.2 DATA COLLECTION METHODS
The success of the survey will depend to a large extent on the suitability of the collection method chosen. A wide range of data collection methods are available. The balance between the objectives and resourcing may constrain the choice of methods for a survey.
Some of the factors to consider in choosing the collection method are:
· Complexity of topic and nature of questions: issues include the need for respondents to access records (e.g. financial statements), the need to provide detailed explanations to respondents and the inclusion of questions on sensitive topics.
· Response rates: the choice of collection method can significantly influence the response rate. For example, personal interviews, although relatively expensive, usually achieve a better response rate than mail surveys, telephone surveys or internet surveys.
· Respondent preference: a collection method which fits in with the respondents’ lifestyle may provide better response. For example, some respondents may have a strong preference for completing survey forms over the internet.
· Sample frame and target population: certain collection methods will not be suitable for certain target populations. For example, a mail or phone survey would not be suitable for a survey of homeless people.
Some of the most common collection methods are described below. A mix of different methods may be required to achieve the best results. For example, a face to face interview could be supplemented with a self enumeration form to enable respondents to provide sensitive details they may not be comfortable telling an interviewer.
3.2.1 Face to Face Interviews
Face to face interviews are used mainly for household surveys. Personal interviews involve a trained interviewer visiting a respondent, asking questions and recording responses.
· enables the interviewer to explain the purpose of the collection
· easier to gain the trust of respondents which may increase response rate and data quality
· enables interviewers to use aids such as prompt cards if required to help respondents
· allows longer interviews than telephone interviews
· suits some categories of respondents particularly (e.g. recent immigrants from non-English background who would find it difficult to complete a self-enumeration survey form).
· expensive method, training and travel costs are substantial
· data can be subject to interviewer bias (caused by interviewer's appearance, attitude etc)
· respondents may be reluctant to disclose sensitive or private information to interviewers
· interviewers with appropriate language and communication skills may not be available
3.2.2 Telephone Interviews
Telephone surveys are used for both household and business surveys. Telephone data collection methods are also widely used for follow-up and post-enumeration work in conjunction with other collection methods.
· lower costs, for example, it is possible to cover all Australia from one call centre
· enables interviewer to build trust and to provide opportunity for further explanations, if required
· provides data more quickly
· lower bias from interviewer appearance, attitude etc
· may be limited by time and number and complexity of questions that can be asked
· respondents have some control over the interview process and can terminate the interview
· not all respondents may have access to phone or contactable over phone
· sometimes difficult to convince respondents on the authenticity and authority of the collecting organisation and interviewers and confidentiality of information
· decreasing number of fixed land lines
3.2.3 Self-Enumeration Methods
Copies of questionnaires are mailed out to respondents with a reply-paid envelope for the respondent to mail back the completed form. Follow up procedures are also conducted by mail.
· cheaper than personal interviews
· respondent is able to complete questionnaire in their own time
· respondent can take time to check records (e.g. financial documents)
· detailed written instructions and explanations can be included
· allows access to 'difficult-to-contact' respondents or respondents who need to consult others (e.g. accountants acting on behalf of small business clients )
· usually gets lower level of response because respondents have full control on whether to respond
· long time lag between when questionnaire is mailed out and the time it is returned
· respondents may misinterpret instructions or questions
· not appropriate for respondents with limited ability to read or write English unless survey forms can be provided in preferred language
Drop off Mailback and Drop off Pick-up
The questionnaire or survey form is dropped off at respondents’ address by an interviewer with instructions on how to complete the form. The form is either mailed back by the respondent or collected in person on or after a specified date.
· generally provide higher response rates than postal surveys
· usually less expensive than personal interviews.
· using interviewers may cost more than postal surveys
· respondents may be away when interviewer visits.
Whichever collection method is chosen, a range of data collection tools are available to meet the collection needs. Paper based or electronic or a combination of the two is the most popular. An electronic based survey can involve either computer assisted interviewing or electronic forms.
3.2.4 Computer Assisted Interviewing (CAI)
In this approach, an interviewer uses a portable computer to collect responses rather than on a paper form. This can be used for both face to face (Computer Assisted Personal Interviewing or CAPI) and telephone interviewing (Computer Assisted Telephone Interviewing or CATI).
· sequencing of questions is controlled by the computer allowing complex sequencing
· allows some editing and querying of responses to be carried out at the time of interview, improving data quality
· data entry and some coding and processing can occur simultaneously of the interview which reduces collection cost and increases the collection efficiency
· 'call scheduling' for telephone interviewing allows calls to be rescheduled if the phone is engaged or respondents are not available
· allows performance of interviewers to be monitored
· higher set-up and maintenance cost of computer equipment and training of interviewers.
See the paper Review of CATI Procedures in Overseas Statistical Agencies for some information on overseas experiences with CATI systems.
3.2.5 Electronic Form
An electronic form (e-form) is a questionnaire or survey form sent to the respondent's computer via e-mail or accessed via the Internet.
· data entered in e-form can be directly captured and edited in the host computer, saving manual data entry and processing costs and improving data quality
· use of electronic returns produces faster response than other self-enumeration methods
· automatic sequencing can be built into the form so that only questions relevant to the respondent are visible
· once created the forms can be easily modified for future use with relatively little effort or expense
· the form can be sent to anyone with internet access, even in geographic locations where other methods (e.g. mail forms, face-to-face interviews) would be difficult or impossible
· the costs involved in developing the forms, maintaining the systems and ensuring the security of data can be high
· completing e-forms may require respondents to have compatible software and help desk staff may be necessary to support users
· relies on high levels of internet access in the target population
· respondents may be reluctant to use e-forms due to concerns about privacy and security issues, compromising response rates
3.2.6 Internet Panel
Also known as Online panel, an internet panel is a pre-recruited group of individuals who have agreed to participate in an online survey. The use of internet panels shares some of the advantages of the use of e-form surveys in improving the efficiency of data capture and reduced costs. The use of internet panels can also facilitate longitudinal survey opportunities using the same set of respondents to study changes across time.
3.3 SAMPLE METHODOLOGY
Sample design refers to what a sample consists of and how it is to be obtained. It is concerned with defining the population and frame, sample size, and sampling techniques.
The combination of sample design and estimation methodology should attain the best possible precision under the given budget, or the lowest possible cost for a fixed precision. The choice of sample design should take into account the availability of auxiliary information, as this can be used in the selection and the estimation process to obtain more accurate estimates. The sampling method for a survey can range from simple random sampling to a complex multistage design.
See Basic Survey Design Manual (Chapter 7) in www.nss.gov.au for more detailed information on sample design and methodology issues.
Administrative collections in general collect data from all individuals in a target population. Where information is to be extracted from a very large administrative dataset, it is often effective to use a sample of the dataset. Using a sample from an administrative dataset can help to reduce cost and time required for 'cleansing', processing and analysing large datasets. Reduction in the volume of data may often lead to only marginal reductions in accuracy, while substantially reducing the cost and time taken to produce results. In these cases good sample design principles must be followed in extracting the data. For longitudinal databases, methods that ensure a continued representative sample over time should be used.
3.4 QUESTIONNAIRE OR SURVEY FORM DEVELOPMENT
The main functions of a questionnaire or survey form are to collect accurate and relevant information from respondents. To achieve this, a questionnaire should:
· clearly state what information is to be collected
· include only an appropriate number of questions for respondents to complete in a reasonable time
· contain questions which appear in a logical sequence
· use a language that is easily understood by the respondents
· avoid bias and ambiguity in question wording or instructions
· appear user friendly, clutter-free and well laid out form design
· provide adequate space for responses
· allow easy processing either manually or by automated means
See Basic Survey Design Manual(Chapter 8) in www.nss.gov.au for more detailed information on questionnaire development.
See the paper Towards Best Practice for Design of Electronic Data Capture Instruments in www.nss.gov.au for standards and guidelines for developing electronic statistical data capture instruments.
3.5 SURVEY FORM TESTING
Survey form testing allows problems to be identified and corrected prior to conducting the full survey. It also provides an opportunity to trial a number of aspects such as form design, question wording, provider burden, respondents’ reaction, ideal timing for the actual survey etc. In some cases, data collected in the tests may be useful preliminary indicators of the actual survey results. The test can be also used to estimate likely response rates as well as sample error and population variability. However, since the preliminary results are based on a smaller sample and thus produce higher standard errors, caution needs to be exercised in interpreting or using results from tests.
Survey testing provides guidance on a number of aspects of the survey development including
· variability of the target population
· expected response rates
· most suitable data collection method
· appropriate form design, question and instruction wording
· suitability of tick box style questions
· effectiveness of interviewer training
· approximate cost, time and resource requirements
· organisational requirement for the survey or administrative collection
3.5.1 Types of Testing
The following types of testing are the most commonly used. All or some can be used depending on the resource availability, difficulty of the subject matter and the precision of responses required. Pilot tests and dress rehearsals are quantitative tests while the other types of tests are qualitative.
See ABS Statistical Clearing House’s Basic Survey Design Manual for more detailed information on survey testing procedures.
Focus groups involve informal discussions of issues or topics with small groups of people drawn from the survey population. They can be used early in the development of a survey to test the language, concepts and understanding of terminologies used in the questionnaire or survey form. Focus group sessions should be conducted before a questionnaire is drafted, although focus groups may also be used to test different form designs.
Pretesting is the process of informally testing questionnaire or form design with potential respondents.
The unstructured questionnaire is tested with a group of people who can provide feedback on issues such as their understanding of concepts, ability to answer questions, thought process of respondents, range of responses, flaws in question wording etc. Responses from pretesting should be used to improve form design and question wording.
Pilot testing is formally testing a questionnaire or survey form with a small sample of respondents. Open ended or semi-closed questions can be used to gather a range of likely responses which are used to develop a more highly structured questionnaire with closed questions. Pilot testing is used to identify any problems with the form or instructions to interviewers. It also allows the comparison of two or more versions of a questionnaire.
A dress rehearsal is a final trial run of the survey where the chosen sampling methodology is used to select a small sample from the target population. Dress rehearsals are used to detect any problems that may arise in the nearly final survey form design, question wording and data collection and processing systems. They may also provide an opportunity to obtain an estimate of survey costs and likely variation in responses in the population.
Post Enumeration Survey (PES)
PES is a survey of a sample of respondents and non-respondents after a test or survey has been conducted, with the aim of evaluating the quality of the responses. It is usually undertaken through structured interviews and utilises probing questions about how the respondent completed the form and their understanding of the concepts used in the survey. Since 1966, the Population Census in Australia has been followed by a PES to measure the extent of undercount or overcount.
The objectives of a PES are to:
· evaluate the accuracy of data collected in a survey
· to confirm respondents’ understanding of concepts and definitions
· obtain information on source data used by respondents to answer survey questions
· evaluate the effectiveness of any changes to form design and question wording
· test the relevance of the data items collected
· gauge respondent load due to changes in form design and questions wording
3.6 ADMINISTRATIVE DATA
The collection of statistical data by an organisation’s administrative systems is generally governed by their program or policy requirements. The administrative data is collected from their clients as part of their program administration. Such information is generally stored in an organisation’s electronic database.
Fields, coding and edits can be applied to administrative collections to allow extraction of data for statistical purposes. Organisations collecting administrative data may be requested to collect other data items for statistical purposes. However, any increased reporting burden imposed on respondents due to extra data items should also be considered.
3.6.1 Extraction of Statistical Data from Administrative Systems
There are several ways through which statistical data can be extracted from an administrative system. These include:
Taking a snapshot
A snapshot is a picture of an administrative database at a point in time. In this method statistical users extract relevant data from an administrative database and load the data into a separate database for their use. The extracted data may be processed further through editing, aggregation and derivation before being used for analytical purposes.
The NSW Police, for example, extracts quarterly criminal activity data from its criminal records database and sends it to the NSW Bureau of Crime Statistics and Research. The Bureau loads the data into their own database for processing and analysis. The Bureau publishes crime statistics annually on behalf of the NSW Police.
Generating a parallel record
In this method administrative agencies generate a separate statistical record for each administrative transaction.
Using a computer network, these records can be automatically added to the statistical and administrative system databases. This method allows statistical data users to define and create specific information they require. This reduces the reprocessing required to transform the information into statistical data after extraction from the administrative system.
In this approach, statistical users extract statistical data from administrative systems when the need arises. Although this may be practical in the short term or for one-off purposes, it may prove to be an inefficient long term approach.
3.6.2 Compiling Data from Several Administrative Systems
The collection of administrative data may also involve compiling data from a number of different administrative sources. For example, data relating to people involved in criminal activities may be collected by different government agencies (e.g. police, legal aid, a variety of courts/tribunals, prisons). Each of these agencies will have administrative processes in place to collect relevant data for their specific purpose. Data from these diverse administrative sources can be combined to meet the requirements of statistical collections.
An agreement between agencies to standardise their administrative collection systems can facilitate sharing of information (e.g. the use of consistent identifiers such as Australian Business Number). However, agencies should check relevant confidentiality and privacy legislation before embarking on any data sharing exercise.
3.6.3 Policy Changes and Administrative Systems
In an administrative collection, policy and program considerations have a higher priority than statistical considerations in determining concepts, definitions, coverage, frequency, timeliness and other attributes.
As public policy changes over time and new legislation is introduced, program requirements and procedures will also change. These changes can impact on the usefulness of the administrative data for statistical purposes including the ability to compare data over time.
These impacts can, however, be minimised to some extent by documenting changes to concepts, definitions, coverage etc. Statistical users of administrative data can also partially fund collection activities of administrative agencies if this will help to continue collection of data affected by policy changes.
3.7 RESPONDENT LOAD
Respondent load is a measure of the effort, in terms of time and cost, for respondents to participate in a survey. Pilot tests, dress rehearsals or other forms of testing can give an estimate of the time taken by respondents to complete the survey form.
Respondent load should always be considered when planning a statistical collection and there should be policies and practices in place to manage relationships with respondents. The aim should always be to keep reporting load to the minimum and to maintain the high quality of collections.
The following are some suggested ways to reduce respondent load:
- consult with users to ensure data collected relevant
- avoid duplication with other collections
- use sound collection practices
- use existing data collections including administrative collections where possible
- design more user friendly questionnaire and survey forms (e.g. use of tick box)
- select a collection method which suits respondents (e.g. electronic data collection if respondents hold data in that form)
- educate respondents on what’s expected from them in advance
- rotate respondents at regular intervals
- accept critical data or best estimates etc when respondents have difficulties
- allow additional time where possible to complete forms
- minimise follow-up contacts unless absolutely necessary
- provide assistance to respondents with special needs (e.g. disabled, new immigrants)
3.8 STATISTICAL CLEARING HOUSE (SCH)
Recognising the excessive burden placed on businesses by an increasing number of statistical surveys the Commonwealth government has been working on strategies to reduce respondent burden to a reasonable level.
The establishment of a SCH is one such strategy whereby statistical surveys run by commonwealth government agencies which require participation of 50 or more businesses are subject to a central clearance process.
The purpose of the SCH is to ensure that all such surveys are necessary and well designed to minimise respondent load and maximise benefit to all stakeholders.
The SCH website www.sch.abs.gov.au provides an overview of statistical clearance activities undertaken by the Statistical Clearing House.