Skip to main content

Table 3 Indirect identifiers detected in publications and changes to minimize risk of reidentification of study participants

From: Data privacy protection in scientific publications: process implementation at a pharmaceutical company

Category of indirect identifier (incidence)

Cause/trigger

Implemented change

Personal participant characteristics (19 publications)

Age, gender, race, age at diagnosis and BMI listed in demographic information

Anonymized demographic listing by modifying specific age information

 

Report of early gene therapy data on only two study participants

Removed individual data for two cases from an ongoing study

 

Age, gender, race, ethnicity, weight, and BMI cross-tabulated by treatment groups resulting in multiple n = 1 and n = 3 cell frequencies*

Split demographic table to show identifier only in an overall frequency and list only disease characteristics across treatment groups

 

Age, gender, race, and country of origin in one single participant listing

Removed race and country of origin and left only age and gender

 

Age in description of individual participants

Removed exact age from individual participant description

 

Cell sizes for gender, age and race were small owing to a detailed age group cross‑tabulation

Removed race from the demographics table to reduce indirect identifiers to sex, age, and US origin

 

Age, sex, and race included in the comparison and descriptions of different treatment conditions and outcome findings

Removed race to avoid exposure of specific ethnicity

Nationality of two participants in country with low enrolment

Removed nationality in text for the description of protocol deviations

 

Age, gender, race, ethnicity, weight, and BMI cross-tabulated in a multiple-treatment group causing very small cell sizes

Presented demographic identifiers only for the overall group and made cross‑tabulations for the remaining disease parameters (table split)

 

Age, gender, race, and BMI cross-tabulated across multiple treatment groups causing n = 1 frequency

Presented demographic identifiers for the overall group and made cross‑tabulation from the remaining disease parameters (table split)

 

Detailed information on family relationships and ethnic background of participants and original participant identifiers (such as age, gender) used

Removed details regarding family relationships and ethnic background (showed only indicator ‘yes’ for existing family history), participant identifiers replaced with generic identifiers (e.g., participant numbers)

 

Individual participant descriptions: age and sex in study with small group sizes

Removed age from the individual participant descriptions and left only gender information

 

Information related to time, location and investigator contained in computed tomography scans and liver biopsy images

Removed all indicators related to time, location and investigators from scans and images

 

Age, gender, race, and ethnicity listed in demographics table causing multiple small cell sizes in the table

Removed gender as identifier from the table

 

Age, gender, and country list for each participant

Age was recoded to age groups

 

Age, gender, and date of death listed in one table with therapy received and cause of death

Removed exact date of death

 

Exact dates (treatment, SAEs, and diagnosis) in several listings

Removed all dates, but kept study day information

 

Age and gender listed with each arterial occlusive event per participant

Removed gender information from the listing of arterial occlusive events

 

Three indirect identifiers used to describe a single participant

Removed age or gender

Location/geographical characteristics (seven publications)

Site numbers (containing part of the participant IDs) reported

Removed site numbers from manuscript as they are usually part of the participant identification number

 

Investigator names, sites, country, and frequency of participants enrolled

Removed number of enrolled participants per site and left only list of countries

 

Investigator names (four publications)

Investigator names removed

 

Site number as part of the participant number

Removed site numbers by transforming to consecutive numbers

  1. BMI Body mass index, SAE Serious adverse event
  2. *The smaller the sub-group size (cell size) the higher the risk of re-identification (exposure risk)