Regarding the use of public social media data for population health monitoring, our analysis revealed a range of opinions, from enthusiasm, through acceptance to opposition. Users accepted a sense of personal responsibility for what they posted, and viewed use of the data generated as a price for participation on Twitter. In this section we examine the responses from participants to our semi-structured interview.
Twitter use – different ways that participants use Twitter
Patterns of use: Four broad patterns of use emerged from the data. Some users reported engaging in professional promotion, either for their own independent business ventures, or as the social media representative for a larger business. Several users reported using Twitter for social engagement, generally interacting with peers, or other Twitter users based on common interests, sharing thoughts, or participation in particular events. A third type of Twitter use was venting. While fewer people reported venting as a separate category, respondents reported using Twitter to interact with businesses as empowered consumers by raising public awareness of poor quality goods/services. Finally, respondents reported using Twitter to follow content generators, staying up-to-date with news, events, content, and promotions. Professional promotion, social engagement, and venting are classified as active/content generating uses, while following is classified as passive/content receiving.
Privacy expectations – do participants have different understanding of their level of privacy on Twitter?
“You are the product”: Many users disavow the expectation of privacy. Twitter is a public forum, they report, and as such there is no assumption of privacy. According to one participant, the fact that Twitter is free is important,
I don’t pay to use Twitter. I sort of signed up with the expectations that it’s a free site and you just kind of throw things out publicly, [so] I don’t really have an expectation that anything that I post is going to remain private [Control group, 29, M].
Another respondent in focus group three echoed a similar sentiment with a more negative tone. In response to another participant’s comment that Twitter needs to turn a profit somehow, Phillip says,
Exactly, like that’s what their product is. Their product is you. Because it’s free, you are the product [Depression group, 29, M].
Despite this commonly held understanding, our focus group data revealed that some privacy is expected. In fact, while some users state outright that they assume no privacy, the expectation of privacy may still remain intact given users’ (1) failure to understand data permanence, (2) failure to understand data reach, and (3) failure to understand the big data computational tools that can be used to analyze posts (discussed below).
Perception that data is ephemeral: One common misconception about Twitter data was that the data was ephemeral. The Twitter users interviewed were under the impression that accounts could be manicured, or that information generated before a certain date could not be retrieved (i.e. the Twitter user interface, and the computational and data infrastructure that supports that interface, were conflated by the users). In response to whether there was any potentially “incriminating” information on her Twitter account, one participant said,
I would say definitely. <chuckles> Maybe it’s because I’m young, so I started into social media when I was younger, like really young. So every once in a while, I’ll go through [and delete] [Control group, 21, F].
Deleting posts suggests a possible misconception regarding what data remains after deletion.Footnote 2 Another participant also reflects similar ideas regarding the permanence of Twitter data saying,
I would say most of the time I’m not afraid to rock the boat. But I mean, Twitter won’t let you scroll back that far, so I’m not super concerned [Depression group, 20, M].
A further participant was not under the impression that Twitter data could not be accessed, but felt as though the amount of data and text he generated made posts more difficult to find. In reference to a sub-tweet – i.e. a critical tweet that refers to an individual without explicitly naming them – made in response to a relationship breakup, he says,
I had to scroll through probably 200 to 300 tweets until I could find that sub-tweet. And I think especially in the last year as I’ve been getting more followers, I’ve been more aware of what I’ve been tweeting [Depression group, 22, M].
These statements suggest that, despite users’ understanding that Twitter is public, they may not be aware of the extent to which Twitter data is permanent, and available to anyone via the (free) Twitter Application Programming Interface or via data reselling servicesFootnote 3.
Data Reach: Another area that pointed to some misconception with regards to privacy was Twitter users’ conceptualizations of data reach. In response to another users’ privacy concerns, one participant retorted,
Are you naïve enough to think that your public tweet is going to be seen by like a million people? I mean sure, it’s public. Anyone could go and find it, or search for it, or whatever. I mean, but it’s not like Beyoncé tweeting is the same as me tweeting [Depression group, 54, M].
Nevertheless, many users demonstrated a lack of understanding with regards to the potential reach of their own data. Users in both focus group two and focus group five justified their lack of care with Twitter data by saying that they only had a handful of followers. However, one participant describes the problem with tweeting to a select group of followers,
You don’t really think about the far-reaching amount of people that can actually use what you say [Depression group, 29, M].
And several users discuss humorous tweets they made that were favorited by friends, and thus reached individuals they may not have chosen to share jokes with otherwise.
The Choice to Personalize Privacy Settings: Many users felt as though methods were available to them to limit their online presence. For some this included setting your account to “private”, for others it was deleting accounts and disappearing from social media altogether. Failure to personalize your online presence and online settings constituted an implied consent to having your data collected and analyzed. According to one participant,
You’re voluntarily using Twitter. So it goes back to that whole: the Internet’s public domain. If you want to have your data combed through, then please continue to post things on the Internet [Control group, 21, F].
The notion that interacting online, and in a public forum, implied giving consent to have data amassed and analyzed was echoed by focus groups in each of the five interviews. However, some participants’ views were more tempered. For example, some participants felt as though it was the choice of website that implied agreement to be used in datasets (i.e. Twitter is presumed to be a public platform by default, in contrast to Facebook which has explicit privacy controls). According to one respondent,
It all comes down to the fact that we know that we’re using Twitter and it’s public. I think I might honestly feel differently about that if it were Facebook, because I do feel like there is some degree of privacy in Facebook [Control group, 21, F].
For this participant, the auspices under which information is shared and the knowledge that data is public, permits users to exercise control, and to manage and edit self-disclosure.
Personal Responsibility: For many, Twitter use came down to a question of personal responsibility. For these respondents, Twitter presence, and online presence in general is a matter of personal choice. Two participants in separate focus groups referred to social networking, and the resultant data as the “Wild West”, existing outside of formal laws and regulation. As a result, many participants felt as though users had a personal responsibility to ensure their own comfort with the data that was generated. According to one participant,
I think our generation is gravitating towards [the idea that] privacy is not to be expected anymore. You have to create it yourself. You have to enable it yourself, because it just doesn’t exist anymore [Control group, 27, M].
Even the most privacy-conscious users acknowledged that lingering evidence of their online activity was a matter of personal choice. According another participant,
I just acknowledged to myself a long time ago that whatever I put on the Internet - whatever I put into my search engine, anything that I click on – is not private. [Depression group, 21, F].
Implied in these statements was not the notion that no oversight or regulation was necessary; only that in an environment devoid of such regulation, users needed to be careful with the evidence they left behind online.
Population health monitoring (particularly depression) – participants’ views on using Twitter to monitor disease at the population level
Population level data: Respondents expressed optimism regarding the use of Twitter data for public health at the aggregate level. While some users expressed concerns regarding privacy, others felt as though service to the greater social good was more important than individual privacy concerns. When asked to discuss the issue of Twitter use for aggregate public health monitoring, one participant states,
I kind of think it’s cool when it’s stuff that’s like the flu, because then that’s how they know to get the vaccines to a place [Depression group, 24, F].
When in the service of public health, other respondents were also willing to put aside privacy concerns. One participant articulates a particularly open viewpoint that was echoed by other members of that focus group interview:
I can’t be in a position to know all the possible things that someone could come up with, all the beneficial things, all the harmful things. I think [it represents one-percent of the issues], the whole array of things that are possible shouldn’t be stopped because we’re so overly worried about [privacy] [Depression group, 54, M].
While this attitude is somewhat more strongly worded than the attitudes of other participants, users generally took a utilitarian stance towards open access, provided that studies were in service to the greater good:
It’s like fluoride in the water to me. They put fluoride in our water. We don’t really have a choice if we want to drink water, we’re going to get fluoride. But the benefits outweigh the risk [Control group, 26, F].
Privacy concerns for these participants were rendered less significant by the potential of Twitter to provide current, accurate information in service of the greater public good.
When asked about the use of Twitter data in public health monitoring, most members echoed the sentiments of the two participants who replied, “I have no problem with that.” Yet, even at the aggregate level, two users from separate focus groups characterized the use of Twitter data to monitor depression as “creepy”. One participant, who is otherwise in support of the use of publicly available data social media data for population health applications, conveys a sense of unease,
You’re screaming into the void, and someone is listening. It’s a little bit creepy, but it’s taking the words from your own mouth [Control group, 21, F].
When probing questions were used to unpack the concept of “creepy”, user responses indicated a difficulty in distinguishing between aggregated and disaggregated data, citing concerns about privacy, or how being identified as having a high likelihood of depression might impact an individual. According to one participant,
The fact that if it was an algorithm, and they were looking like, ‘Hey, we think you’re feeling low right now.’ I feel like it might make me feel even more low [Depression group, 24, F].
Other users commented on the potential for words to be taken out of context, compromising confidentiality, or the stigma faced by individuals suffering from mental health issues. However, these concerns generally resulted from the ability to target particular individuals, rather than aggregate level mental health monitoring.
Yet even for the most enthusiastic supporters of public health monitoring, permissions were not without qualification. While several participants were comfortable providing complete access to their Twitter data, many stipulated that permission could only be implied where it pertained to aggregated, anonymized data:
I think I would be more comfortable being identified just in the group. So having somebody not be able to be like, ‘Oh, this specific Twitter name has the flu virus.’ Instead, ‘Just this many people have it.’ And there’s not like specific data that could be identified out of that group [Depression group, 24, F].
Another participant expresses a similar viewpoint in response to a question regarding mental health monitoring, in particular,
I’m OK as long as we can, you know, figure out ways to keep the data anonymous and completely, highly aggregated [Depression group, 42, M].
This general aggregated monitoring of public health outcomes using Twitter, including aggregate population-level rates for depression, met with qualified support from participants. The concerns of participants who expressed a continued reluctance to support the use of even aggregated data could be categorized under two themes: accuracy and unintended consequences. These issues will be discussed in more detail, below.
Accuracy: While many users reported that their own experiences with depression could be observed from their past social networking behavior, a major theme that emerged from the focus group findings was that Twitter data may not be an accurate proxy for underlying mood – and may produce aggregate depression rates that are unreliable. Users were principally concerned that the ways in which depression was likely to manifest may not be captured by simple keyword matching algorithms. Users were also concerned that the ways in which they used Twitter, and the content they generated, would not produce reliable data.
Each of the three focus groups with individuals who had been diagnosed with depression was asked, “Do you feel like your depression, or your experiences with depression would be evident from your online interactions?” One participant responds that his social networking behavior would be indicative of his mental state. He tells the story of a bout of depression during his senior year of high school (i.e. around 18 years of age) saying,
During my senior year, I would just tweet just because I wanted my friends to see it and to know that I didn’t feel good, or that I was upset or mad at someone. I think it would be very obvious, actually [Depression group, 20, M].
According to another participant, this may be true of people in general. He suggests that looking at students’ social networking data during finals might provide some insights into the lived experiences of students,
If you look at a student’s Facebook or Twitter, especially during finals time, you see how stressed people are. You see people aren’t sleeping. They aren’t eating. All they’re doing is studying, and their moods are getting worse and worse on social media [Depression group, 31, M].
Despite this feeling, some participants remained skeptical. “You can’t even get targeted advertising right,” quipped one participant, “what makes you think public health accuracy is going to be any better?”
Nevertheless, by looking at other ways in which users manifest their depression, public health monitoring could be improved. Consistent with the known relationship between depression and social isolation [41], several participants were concerned that automatic monitoring may miss cues such as decreased activity:
It’s just the opposite for me. If I’m feeling down or anything, I just kind of retreat back. There’d be a huge gap there [Depression group, 29, M].
Also commonly cited as an accuracy concern was the issue of falsehood – which was likely to take many forms. Users’ concerns related less to lying on social media. Rather, they discussed issues related to multiple accounts, false positivity, and what they felt was the appropriate content for social media. According to one participant,
I’ve never once posted anything negative. So if you took that data, it would not be accurate, because of course I have had bad days or sad days [Control group, 40, F].
Diagnostic versus aggregate health monitoring – differences between population level monitoring and individual diagnosis
Concerns: The potential for disaggregation of data to identify individuals who may be suffering from depression was met with mixed response. Users were concerned that the tools used to predict aggregate rates of depression at the population level could also be used to pinpoint individuals suffering from depression. This could lead to identification and further stigmatization. According to one participant,
Once you’ve got the taint of depression – mental illness at all in our society, it’s an uphill battle. Even now, people in my family are like, ‘Oh, you sound cranky. Have you taken your meds? [Depression group, 33, M].
Nevertheless, several respondents felt as though pinpointing individuals could help them access much-needed mental health services by paying attention to cues that friends may ignore. The following encounter took place during focus group two:
[Control group, 21, F]: People say things on the Internet they would never say in real life.
[Control group, 21, F]: That’s very true.
[Control group, 29, M]: I was just going to say, this probably makes me a bad person, but whenever I get the vague like “My life is terrible” Facebook posts, I just unfollow that person.
[Control group, 21, F]: Seriously, they just want the attention.
[Control group, 21, F]: I just wish there was an eye-roll button.
Respondents are suspicious of potential indicators of depression that appear on social media, so may simply ignore them, or unfollow the person. Given that computational methods do not ignore or unfollow, they may be particularly useful in identifying and responding to indicators of danger.
On a related topic, users expressed support for the use of social media based automated mental health technologies to augment treatment in the context of traditional mental health care (e.g. a psychiatrist, with explicit patient consent, using automated tools to monitor a patient’s mood between appointments). The idea emerged from focus group two, and was presented in the three subsequent focus groups where it met with largely positive response. When the idea was presented to members of focus group three, one participant replied,
I’m all for that. I know when I’ve gone to therapists or my doctor or whatever, I’m not the best at reporting how I’ve been doing when I’m actually at my appointment. Especially when I go see them for the first time. That would be fantastic to have something else to either support what I think, just because I’m not reliable about accurately assessing how I’m doing [Depression group, 29, M].
Similarly, focus group members appreciated being able to accurately assess the duration of moods. One participant suggested that responding to his therapist’s questions may become easier with the help of social media history,
[Depression group, 29, M]: I think that sounds great! Especially, I think one common question is like, how long have you felt this way? I don’t know. I don’t know.
[Depression group, 20, M]: Right, exactly. Forever.
[Depression group, 29, M]: But if you could look at Twitter and just immediately a graph that shows mood swings over time. Absolutely!
While users emphasized that individual consent would be required, many felt that automated social media tracking could allow a wider window of observation for the mental health practitioner, and could provide some objective evidence of mood swings, and duration which would be invaluable for predicting, diagnosing, and treating depression.
Participant views on regulating Twitter mining – participant views safeguarding privacy
Safeguarding privacy: Respondents differed in their views on the extent to which Twitter monitoring should be regulated. While some participants felt government oversight would help to ensure the ethical use of public data, others suggested that governmental oversight could lead to Orwellian monitoring. Nevertheless, even those who expressed concern with respect to governmental monitoring could not agree on the appropriate role of government. For some, it was government access to public health data that laid the foundation for abusive governmental monitoring. According to one participant,
For me it’s like, researchers – free access. I don’t care if they have all of it. Advertisers, they should have to pay for the access. And the government should have absolutely no access [Control group, 26, F].
While some participants felt as though government oversight was necessary to protect the rights of users, others felt as though oversight was unnecessary, or should come from the social networking sites, themselves. However, consistent themes emerged from the focus groups regarding ethical access and use of social networking data. First, users felt as though the collection, access, and use of social networking data should be transparent. Respondents did not feel as though simple blanket language in the “terms and conditions” constituted transparency. Such language was confusing and buried in what one participant terms, “a wall of text that no one ever reads”. Knowledge that using Twitter (or other social media sites), constitutes consent to have your data collected, analyzed, and commoditized should be made plain when creating a Twitter account.