Table 3 Indirect identifiers detected in publications and changes to minimize risk of reidentification of study participants

From: Data privacy protection in scientific publications: process implementation at a pharmaceutical company

Category of indirect identifier (incidence) Cause/trigger Implemented change
Personal participant characteristics (19 publications) Age, gender, race, age at diagnosis and BMI listed in demographic information Anonymized demographic listing by modifying specific age information
  Report of early gene therapy data on only two study participants Removed individual data for two cases from an ongoing study
  Age, gender, race, ethnicity, weight, and BMI cross-tabulated by treatment groups resulting in multiple n = 1 and n = 3 cell frequencies* Split demographic table to show identifier only in an overall frequency and list only disease characteristics across treatment groups
  Age, gender, race, and country of origin in one single participant listing Removed race and country of origin and left only age and gender
  Age in description of individual participants Removed exact age from individual participant description
  Cell sizes for gender, age and race were small owing to a detailed age group cross‑tabulation Removed race from the demographics table to reduce indirect identifiers to sex, age, and US origin
  Age, sex, and race included in the comparison and descriptions of different treatment conditions and outcome findings Removed race to avoid exposure of specific ethnicity
Nationality of two participants in country with low enrolment Removed nationality in text for the description of protocol deviations
  Age, gender, race, ethnicity, weight, and BMI cross-tabulated in a multiple-treatment group causing very small cell sizes Presented demographic identifiers only for the overall group and made cross‑tabulations for the remaining disease parameters (table split)
  Age, gender, race, and BMI cross-tabulated across multiple treatment groups causing n = 1 frequency Presented demographic identifiers for the overall group and made cross‑tabulation from the remaining disease parameters (table split)
  Detailed information on family relationships and ethnic background of participants and original participant identifiers (such as age, gender) used Removed details regarding family relationships and ethnic background (showed only indicator ‘yes’ for existing family history), participant identifiers replaced with generic identifiers (e.g., participant numbers)
  Individual participant descriptions: age and sex in study with small group sizes Removed age from the individual participant descriptions and left only gender information
  Information related to time, location and investigator contained in computed tomography scans and liver biopsy images Removed all indicators related to time, location and investigators from scans and images
  Age, gender, race, and ethnicity listed in demographics table causing multiple small cell sizes in the table Removed gender as identifier from the table
  Age, gender, and country list for each participant Age was recoded to age groups
  Age, gender, and date of death listed in one table with therapy received and cause of death Removed exact date of death
  Exact dates (treatment, SAEs, and diagnosis) in several listings Removed all dates, but kept study day information
  Age and gender listed with each arterial occlusive event per participant Removed gender information from the listing of arterial occlusive events
  Three indirect identifiers used to describe a single participant Removed age or gender
Location/geographical characteristics (seven publications) Site numbers (containing part of the participant IDs) reported Removed site numbers from manuscript as they are usually part of the participant identification number
  Investigator names, sites, country, and frequency of participants enrolled Removed number of enrolled participants per site and left only list of countries
  Investigator names (four publications) Investigator names removed
  Site number as part of the participant number Removed site numbers by transforming to consecutive numbers
  1. BMI Body mass index, SAE Serious adverse event
  2. *The smaller the sub-group size (cell size) the higher the risk of re-identification (exposure risk)