Questionnaire development
The development and reporting of the questionnaire followed published recommendations [22]. As described elsewhere, this included a literature search, content and construct validation by non-author experts using a clinical sensibility tool and tables of specifications (n = 4 experts), face and content validation by pilot testing using a clinical sensibility tool and informal semi-structured interviews (to ensure clarity, realism, validity, and ease of completion) (n = 14 individuals) [21]. Not all of the authors, or the experts and public validators were in favor of AR. The study was approved by the health research ethics board of the University of Alberta (Pro00039590), and return of the survey was considered consent to participate.
Questionnaire administration
We surveyed the public (4 groups), adults with biomedical science training (2 groups), and animal researchers (2 groups), the details of which are as follows:
1. Public: we chose 4 groups that represented a range of situations that were conveniently accessible for survey distribution: i) A convenience sample of adults approached while in various line-ups at the Heritage Festival in Edmonton, Alberta, Canada in August 2013 (n = 195). These adults were asked if they would fill out a paper survey about AR in return for $5 food ticket as an incentive. We did not track the number of people asked to participate, but our impression was that most adults approached completed the survey. ii) A random sample of Canadian adults accessed using the marketing firm Survey Sampling International (SSI) in November 2014 (n = 586). The sample “was selected to be reflective of the Canadian population over age 18 years with at least a high school education, by age, gender, race/ethnicity, income, and geographic region.” Of those invited to participate, 1 terminated, 85 submitted partial responses, and 501 submitted complete responses. iii) A sample of US adults using Amazon Mechanical Turk (AMT) in February 2015 (n = 439). For this sample, we limited potential responders to those living in the US, with a Human Intelligence Task approval rate >90 %, and we paid $1.60 on completion. The survey was on the AMT site for <4.5 h, and the average time spent per survey was 31 min. Crowdsourcing on AMT has found results to be psychometrically valid, with high test-retest reliability, attentiveness, and truthfulness, even on complex cognitive and decision-making tasks [23–27]. We included two attention checks with excellent results (one-third of the way in: “please tell us whether you agree with this equation: 2 + 2 = 4”; 407/18 (97 %) agree; and toward the end of the survey: “to show you have read the instructions, please check ‘none of these’ as your answer to this question”; 413/415 (99.5 %) ‘none of these’). iv) A convenience sample of adult visitors on the pediatric wards of the Stollery Children’s Hospital, Edmonton, Alberta, Canada in May 2015 (n = 107). We did not track the number of people asked to participate, but our impression was that most of those approached completed the paper survey.
2. Adults with biomedical science training: i) The second year University of Alberta Medical School in September 2013, and ii) The next second year class in November 2014. Non-respondents were sent three reminders at about 2–3 week intervals.
3. Animal researchers: i) The corresponding authors of AR papers published during the 6 months from October 2012 to March 2013 in the high impact journals Nature, Science, and Critical Care Medicine (i.e., representing leaders in their field of AR); and ii) All academic faculty pediatricians at The Hospital for Sick Children in Toronto, Ontario, Canada, as listed by email on the University of Toronto website. We assumed these pediatricians would have many ties to research activities, including knowledge of, if not participation in, AR. Non-responders were sent three reminders at about 2–3 week intervals.
The surveys were done using the web-based tool REDCap, which allows anonymous survey responses to be collected or entered, and later downloaded into statistical software for analysis [28]. The paper surveys done at the local festival and the children’s hospital were entered by hand, whereas all other groups entered data directly into the REDCap surveys after invitation by email or using the SSI and AMT platforms.
Questionnaire content
The background section stated “In this survey, ‘animals’ means: mammals, such as mice, rats, dogs, and cats. It has been estimated that over 100 million animals are used in the world for research each year. There are many good reasons to justify animal research, which is the topic of this survey. Nevertheless, some people argue that these animals are harmed in experimentation, because their welfare is worsened. In this survey, ‘harmful’ means such things as: pain, suffering (disease/injury, boredom, fear, confinement), and early death. We value your opinion on the very important issue of the ethical dimension of animal research.” We chose mammals to represent a group of sentient animals that are thus capable of being harmed.
We presented demographic questions, 3 questions about support for AR, and 12 arguments with their counterarguments to consider. The survey stated: “a) First, we give you an argument to justify harmful animal research, and we ask if you agree with that argument; b) Then we give some responses to the argument, and we ask if you think each response would make it harder for someone to justify harmful animal research using the initial argument (i.e., would make the initial argument less convincing). All the arguments and responses in this survey are those commonly made in the literature on animal research.” When each argument was presented, it was followed with the question “Is this a good enough reason to justify using animals in medical research?” When the responses to an argument were presented, they were followed with the question: “Do any of the following responses make it harder for someone to justify animal research using Argument x (i.e., make Argument x much less convincing)?”
Statistics
We used REDCap as our web-based survey management tool, allowing anonymous survey responses to be collected, and later downloaded into an SPSS database for analysis [28]. Responses are described using proportions (percentages). Pre-specified subgroup analyses included exploratory comparisons on all responses for: the 4 public groups; the two medical school classes; and the two animal researcher groups. If results were not statistically and clinically significantly different within each of these groups, we planned to compare the groups to each-other. These comparisons were done using the Chi-square statistic, with P < 0.05 after Bonferroni correction for multiple comparisons considered statistically significant. Prior to statistical analysis, we defined a clinically significant difference between groups as one where the comparison is statistically significant, in addition to having a clear majority (at least 60 %) on different sides of the yes/no response option. This was done because we are interested in whether the groups have opinions that could result in different practical consequences both for the animals, and for the researchers involved. For example, if many respondents have very different opinions about the moral permissibility of AR, this might lead to very different levels of support of AR, and thus have different implications for the actual practice of AR.