Exploring Controversy on Twitter

General

*New* Our demo has been accepted and will be presented at CSCW 2016. Please have a look at this paper for more details.
We provide a system to explore controversy on Twitter. In previous work [1], we found that one can accurately quantify the controversy surrounding a topic of discussion using the structure of the interactions related to the topic. In this demo, we employ this approach in the wild, aiming to explore various topics of discussion on Twitter and detect the ones that are controversial.

To be more specific, the system processes the daily trending hashtags discussed on the platform - and treats each such hashtag as defining a single topic. It then assigns a controversy score to each topic. The controversy score is computed in three steps:

First, we build the retweet graph for the topic. Each vertex in that graph corresponds to one user who posted a tweet with the hashtag of the topic in a given day. Moreover, an edge between two vertices signifies that there was a retweet with that hashtag between the corresponding users.
Subsequently, we apply (METIS) on the retweet graph to partition its nodes into two clusters.
Finally, given the retweet graph and the two clusters computed in the previous two steps, we calculate the value of a measure that captures how separated the two partitions are - and this is the controversy score. Intuitively, the more separated the two clusters, the higher the value of the score - and the more controversial the topic.

The image corresponding to each hashtag demonstrate the corresponding retweet graph and its two clusters. The clusters are (arbitrarily) colored red and blue. The graph is rendered using a force-directed (layout algorithm).

For further details, we refer the interested reader to the research paper [1].

Functionality

The demo has three main tabs (i) Examples, (ii) Trending hashtags, and (iii) Tabular View, each providing a different functionality and ways for exploring controversy on Twitter.

Examples

This tab provides examples of hand-picked controversial and non-controversial hashtags, that were also used in our original research paper [1].

An important observation we can immediately make from this tab is the difference in the retweet graphs of controversial and non-controversial hashtags. There is a clear separation between the two clusters of nodes (blue and red) for controversial hashtags, where for non-controversial hashtags they appear to be mixed. This clear separation is prevalent across a wide spectrum of controversial events [1] indicating the lack of 'conversation' between the two opposing sides, adding evidence to the existence of echo chambers.

One can click on the hashtag to be directed to twitter search for the specific date during which the hashtag was observed. Moreover clicking on "Example Tweets" provides representative tweets from each of the two clusters (blue and red). The example tweets are generated randomly each time the "Example Tweets" link is clicked, so one can click on that link multiple times to see many example tweets which can help summarize the debate.

E.g. Clicking on the example tweets for "#netanyahuspeech" shows the two sides of the debate, with one side opposing Netanyahu (example) and the other side supporting him (example).

Each hashtag is also associated with a controversy score, that indicates the degree to which the related topic is controversial. A controversy score > 0.3 is generally indicative of a controversial topic. A score > 0.5 indicates a highly controversial hashtag.

Trending hashtags

This tab provides a way to explore controversy on Twitter 'in the wild'. To do this we first collected the trending hashtags in the US (from http://trends24.in/united-states/) for almost 3 months (25 June 2015 to 19 Sept 2015). We obtained all tweets mentioning these trending hashtags, and constructed the retweet graphs (there is an edge from @user1 to @user2 if @user1 retweeted @user2).

A user can either explore the hashtags day by day using the "Previous/Next day" links or browse specific days using the calendar. Twitter trending hashtags are not a great way to explore controversy, as most trending hashtags are not news related. By manual inspection, we can clearly see that the hashtags scored high by our score are clearly controversial and hence our demo helps in filtering out the real controversial hashtags from a lot of noise.

A few examples:

#whosiburningblackchurches (Score: 0.332): A controversial hashtag about the burning of predominantly black churches. (About the hashtag)
#communityshield (Score: 0.314): Discussion between the fans of two sides of a soccer game. (About the hashtag)
#nationalfriedchickenday (Score: 0.393): A debate between meat lovers and vegetarians about the ethics of eating meat.

We show example tweets only for hashtags which have a controversy score of at least 0.3. For those hashtags with a controversy score less than 0.3 (mostly non-controversial), we show other hashtags similar to this hashtag (hashtags which co-occur more often, generated using the score from [2]), to give the user a better sense of the topic the hashtag is related to.

Tabular View

This view of the hashtags tab helps us get a global picture about which hashtags are controversial. For each hashtag that we have processed, we apply the random walk controversy measure proposed in [1]. One can sort the hashtags by controversy score and explore the top controversial hashtags.

False positives

#independenceday (Score: 0.54): The topic of discussion related to this hashtag does not seem to be controversial, but was identified as controversial because of the two 'sides' (a group which posts patriotic messages, like 'respect to the troops', etc and a general group wishing others 'a happy #independenceday'), which don't oppose each other - at least not in the context of this topic.

This is a known drawback of our approach to measuring controversy, also addressed in [1]. Nevertheless, the benefit provided by this system is that we are able to filter a lot of non-controversial hashtags with confidence, only leaving a handful of hashtags for manual inspection.

Glitches

Data for some days is missing (e.g. 2015-07-08, 2015-07-12) because of troubles in our data collection pipeline. For some hashtags, the `Similar hashtags' functionality doesn't work because of a bug in our data processing pipeline.

Credits

For any questions contact Kiran Garimella (kiran.garimella@aalto.fi). Based on research by Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis and Michael Mathioudakis (Aalto University).

References

[1] Kiran Garimella, et al. Quantifying Controversy on Social Media, WSDM 2016, arXiv pre-print
[2] Wei Feng, et al. STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream, ICDE 2015