Dataset publication and risk assessment

The Information Privacy Act 2009 (Qld) (IP Act) regulates the publication1 and overseas transfer2 of personal information.  Once information is appropriately de-identified it is no longer personal information, but the risk of re-identification should be considered as part of the decision to publish a dataset.

What is personal information?

Personal information is defined in the IP Act3 as ‘information or an opinion, including information or an opinion forming part of a database, whether true or not, and whether recorded in a material form or not, about an individual whose identity is apparent, or can reasonably be ascertained, from the information or opinion’.

It includes information which directly identifies an individual and information that can be compared or cross-referenced with other information to identify an individual.

The risk of re-identification

Generally, individuals whose personal information are included in an appropriately de-identified data-set will have little chance of being re-identified.  However, in today’s online environment large amounts of personal information are publically available—including personal information published by the individuals themselves—meaning the possibility of re-identification cannot be absolutely removed.

It is unlikely that a dataset containing information about a single individual would be published; generally datasets would contain information about numerous individuals4.  As a matter of practicality, there would be difficulty in re-identifying all the individuals in a de-identified dataset because the additional information needed will vary considerably in terms of its general availability, ease of access and physical location.  The resources required to re-identify all the individuals contained in a dataset would likely make the effort of doing so not worthwhile.

Assessing the risk of personal information being disclosed

The IP Act does not require agencies to consider the risks associated with publishing data sets; it does however, make agencies responsible for privacy breaches.  In order to avoid or mitigate the dangers of a privacy breach, agencies should consider conducting a risk assessment as part of their decision-making process for publishing datasets.

An assessment of privacy risks in publishing data would consider:

  • the likelihood of re-identification of individuals; and
  • the consequences for the individuals of re-identification.

Likelihood of re-identification

In order for the information to be personal information, the identity of the individual must be at least ‘reasonably ascertainable’. Factors to consider when deciding if identity is reasonably ascertainable include:

  • the amount of alternative information about the individuals contained in the dataset that is publicly available - for example census data and public registers;
  • the ease of access of the alternative information – digital vs paper records;
  • the level of detail provided – for example, date of birth is more specific than month of birth or year of birth;
  • the number of steps and the associated amount of time, resources and effort required to identify an individual;
  • how up-to-date the information is – more current information can be more identifying; and
  • intimate knowledge – the extent to which only people with personal knowledge of individuals such as family or close friends would be able to identify an individual.

The likelihood of re-identification can be reduced through applying de-identification techniques.5 However, the simple removal or substitution of clearly identifying information may not be sufficient to reduce to a reasonable degree the likelihood of re-identification.

The availability of other information and improvements in cross-referencing and correlation techniques are factors which can change over time.  For this reason, agencies should:

  • reassess the risk of published datasets being re-identified on a regular basis
  • as required, re-evaluate the form or amount of the published data.

In extreme cases, the dataset may have to be withdrawn from publication.

The consequences of re-identification

How serious the consequences of re-identification are depends on how significant the data is to the individual.  For example, re-identification of individuals participating in a clinical trial for treatment of HIV could have serious consequences for the privacy of the individuals involved. The consequences of re-identification of individuals who made submissions to their local Council on a proposal for a playground redevelopment will be less serious.

Sensitive information as a starting point

The IP Act requires special protections to be given to sensitive information6. Where datasets contain sensitive or other personally significant information, agencies may wish to take that into consideration as part of their privacy risk assessment.

Sensitive information is information concerning an individual’s:

  • racial or ethnic origin
  • political opinions
  • membership of a political association
  • religious beliefs or affiliations
  • philosophical beliefs
  • membership of a professional or trade association
  • membership of a trade union
  • sexual preferences or practices
  • criminal record
  • health information.

Privacy risk assessment matrix - Appendix 1

The Privacy risk assessment matrix in Appendix 1 (PDF, 52.68 KB) will assist agencies in conducting an assessment of the risk in publishing de-identified datasets.

1 See Information Privacy Principle 11 and National Privacy Principle 2.
2 See section 33 of the IP Act.
3 Section 12 of the IP Act.
4 The dataset released by the Department of Transport and Main Roads as part of the Open Data scheme - Driver licence details for all vehicle and motorcycle drivers licensed in Queensland has over 6 million entries - http://data.qld.gov.au/dataset/driver-licences
5 For information concerning de-identification techniques, refer to OIC Guideline: Publishing datasets and de-identification techniques
6 These requirements apply only to health agencies; see National Privacy Principle 9.

Current as at: February 18, 2013