From Knoesis wiki
Jump to: navigation, search

Examining the Predictive Power of Different User Groups in Predicting 2012 U.S. Republican Presidential Primaries

Lu Chen, Wenbo Wang, and Amit P. Sheth


Existing studies <ref>Metaxas, P.T., Mustafaraj, E. and Gayo-Avello, D.: How (Not) to predict elections. In: Proceedings of the IEEE 3rd International Confernece on Social Computing, pp. 165--171 (2011)</ref><ref>O’Connor, B. and Balasubramanyan, R. and Routledge, B.R. and Smith, N.A.: From tweets to polls: Linking text sentiment to public opinion time series. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media, pp. 122--129 (2011)</ref><ref>Gayo-Avello, D.: I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper. Arxiv preprint arXiv:1204.6441. (2012)</ref><ref>Sang, E.T.K. and Bos, J.: Predicting the 2011 Dutch Senate Election Results with Twitter. In: Proceedings of SASN 2012, the EACL 2012 Workshop on Semantic Analysis in Social Networks, pp. 53--60 (2012)</ref><ref>Tumasjan, A. and Sprenger, T.O. and Sandner, P.G. and Welpe, I.M.: Predicting elections with twitter: What 140 characters reveal about political sentiment. In: Proceedings of the 4th International AAAI Conference on Weblogs and Social Media, pp. 178--185 (2010)</ref><ref>Bermingham, A. and Smeaton, A.F.: On using Twitter to monitor political sentiment and predict election results. In: Proceedings of the Sentiment Analysis where AI meets Psychology Workshop at IJCNLP. (2011)</ref><ref>Gayo-Avello, D.: Don't turn social media into another 'Literary Digest' poll. Communications of the ACM, 54(10) pp. 121--128 (2011)</ref> using social data to predict election results have focused on obtaining the measures/indicators (e.g., mention counts or sentiment of a party or candidate) from social data to perform the prediction. They treat all the users equally, and ignore the fact that social media users engage in the elections in different ways and with different levels of involvement. A recent study <ref>Mustafaraj, E. and Finn, S. and Whitlock, C. and Metaxas, P.T.: Vocal minority versus silent majority: Discovering the opionions of the long tail. In: Proceedings of the IEEE 3rd International Confernece on Social Computing, pp. 103--110 (2011)</ref> has shown that social media users from different groups (e.g., "silent majority" vs. "vocal minority") have significant differences in the generated content and tweeting behavior. However, the effect of these differences on predicting election results has not been exploited yet. For example, in our study, 56.07% of Twitter users who participate in the discussion of 2012 U.S. Republican Primaries post only one tweet. The identification of the voting intent of these users could be more challenging than that of the users who post more tweets. Will such differences lead to different prediction performance? Furthermore, the users participating in the discussion may have different political preference. Is it the case that the prediction based on the right-leaning users will be more accurate than that based on the left-leaning users, since it is the Republican Primaries? Exploring these questions can expand our understanding of social media based prediction, and shed light on using user sampling to further improve the prediction performance.

Here, we study different groups of social media users who engage in the discussions of elections, and compare the predictive power among these user groups. Specifically, we chose the 2012 U.S. Republican Presidential Primaries on Super Tuesday [1] among four candidates: Newt Gingrich, Ron Paul, Mitt Romney and Rick Santorum. We collected 6,008,062 tweets from 933,343 users talking about these four candidates in an eight week period before the elections. All the users are characterized across four dimensions: engagement degree, tweet mode, content type, and political preference. We first investigated the user categorization on each dimension, and then compared different groups of users with the task of predicting the results of Super Tuesday races in 10 states. Instead of using tweet volume or the overall sentiment of tweet corpus as the predictor, we estimated the "vote" of each user by analyzing his/her tweets, and predicted the results based on "vote-counting". The results were evaluated in two ways: (1) the accuracy of predicting winners, and (2) the error rate between the predicted votes and the actual votes for each candidate.

User Categorization

Using Twitter Streaming API, we collected tweets that contain the words "gingrich", "romney", "ron paul", or "santorum" from January 10th to March 5th (Super Tuesday was March 6th). Totally, the dataset comprises 6,008,062 tweets from 933,343 users. In this section, we discuss user categorization on four dimensions, and study the participation behaviors of different user groups.

Categorizing Users by Engagement Degree

We use the number of tweets posted by a user to measure his/her engagement degree. The less tweets a user posts, the more challenging the user's voting intent can be predicted. An extreme example is to predict the voting intent of a user who posted only one tweet. Thus, we want to examine the predictive power of different user groups with various engagement degrees. Specifically, we divided users into the following five groups: the users who post only one tweet (very low), 2-10 tweets (low), 11-50 tweets (medium), 51-300 tweets (high), and more than 300 tweets (very high). Table I shows the distribution of users and tweets over five engagement categories. We found that more than half of the users in the dataset belong to the very low group, which contributes only 8.71% of the tweet volume, while the very highly engaged group contributes 23.73% of the tweet volume with only 0.23% of all the users. It raises the question of whether the tweet volume is a proper predictor, given that a small group of users can produce a large amount of tweets.

To further study the behaviors of the users on different engagement levels, we examined the usage of hashtags and URLs in different user groups (see Table II. We found that the users who are more engaged in the discussion use more hashtags and URLs in their tweets. Since hashtags and URLs are frequently used in Twitter as ways of promotion, e.g, hashtags can be used to create trending topics, the usage of hashtags and URLs reflects the users' intent to attract people's attention on the topic they discuss. The more engaged users show stronger such intent and are more involved in the election event. Specifically, only 22.95% of all tweets created by very lowly engaged users contain hashtags, this proportion increases to 39.4% in the very high engagement group. In addition, the average number of hashtags per tweet (among the tweets that contain hashtags) is 1.43 in the very low engagement group, and this number is 2.68 for the very highly engaged users. The users who are more engaged also use more URLs, and generate less tweets that are only text (not containing any hashtag or URL). We will see whether and how such differences among user engagement groups will lead to varied results in predicting the elections later.

ElectionPrediction Table1&2.PNG

Categorizing Users by Tweet Mode

There are two main ways of producing a tweet, i.e., creating the tweet by the user himself/herself (original tweet) or forwarding another user's tweet (retweet). Original tweets are considered to reflect the users' attitude, however, the reason for retweeting can be varied, e.g., to inform or entertain the users' followers, to be friendly to the one who created the tweet, etc., thus retweets do not necessarily reflect the users' thoughts. It may lead to different prediction performance between the users who post more original tweets and the users who have more retweets, since the voting intent of the latter is more difficult to recognize.

According to users' preference on generating their tweets, i.e., tweet mode, we classified the users as original tweet-dominant, original tweet-prone, balanced, retweet-prone and retweet-dominant. A user is classified as original tweet-dominant if less than 20% of all his/her tweets are retweets. Each user from retweet-dominant group has more than 80% of all his/her tweets that are retweets. In Table III, we illustrate the categorization, the user distribution over the five categories, and the tweet mode of users in different engagement groups. It is interesting to find that the original tweet-dominant group accounts for the biggest proportion of users in every user engagement group, and this proportion declines with the increasing degree of user engagement (e.g., 55.32% of very lowly engaged users are original tweet-dominant, while only 31.89% of very highly engaged users are original tweet-dominant). It is also worth noting that a significant number of users (34.71% of all the users) belong to the retweet-dominant group, whose voting intent might be difficult to detect.

ElectionPrediction Table3.PNG

Categorizing Users by Content Type

Based on content, tweets can be classified into two classes -- opinion and information (i.e., subjective and objective). Studying the difference between the users who post more information and the users who are keen to express their opinions could provide us with another perspective in understanding the effect of using these two types of content in electoral prediction.

We first identified whether a tweet represents positive or negative opinion about an election candidate using the method proposed in a recent study. <ref>Chen, L. and Wang, W. and Nagarajan, M. and Wang, S. and Sheth, A.P.: Extracting Diverse Sentiment Expressions with Target-dependent Polarity from Twitter. In: Proceedings of the 6th International AAAI Conference on Weblogs and Social Media. (2012)</ref> The tweets that are positive or negative about any candidate are considered opinion tweets, and the tweets that are neutral about all the candidates are considered information tweets. We also used a five-point scale to classify the users based on whether they post more opinion or information with their tweets: opinion-dominant, opinion-prone, balanced, information-prone and information-dominant. Table IV shows the user distribution among all the users, and the users in different engagement groups categorized by content type.

The users from the very low engagement group have only one tweet, so they either belong to opinion-dominant (39%) or information dominant (61%). With users' engagement degree increasing from low to very high, the proportions of opinion-dominant, opinion-prone and information-dominant users dramatically decrease from 11.09% to 0.05%, 11.75% to 0.42%, and 27.40% to 0.66%, respectively. In contrast, the proportions of balanced and information-prone users grow. In high and very high engagement groups, the balanced and information-prone users together accounted for more than 95% of all users. It shows the tendency that more engaged users post a mixture of two types of content, with similar proportion of opinion and information, or larger proportion of information}.

ElectionPrediction Table4.PNG

Identifying Users' Political Preference

Since we focused on the Republican Presidential Primaries, it should be interesting to compare two groups of users with different political preferences -- left-leaning and right-leaning. To perform such a comparison, we needed to assess users' political preference. Some efforts <ref>Conover, M.D. and Ratkiewicz, J. and Francisco, M. and Goncalves, B. and Flammini, A. and Menczer, F.: Political polarization on twitter. In: Proceedings of the 5th International AAAI Conference on Weblogs and Social Media, pp. 89--96 (2011)</ref><ref>Golbeck, J. and Hansen, D.: Computing political preference among twitter followers. In: Proceedings of the annual conference on Human factors in computing systems, pp. 1105--1108 (2011)</ref><ref>Conover, M.D. and Gon{\c{c}}alves, B. and Ratkiewicz, J. and Flammini, A. and Menczer, F.: Predicting the political alignment of twitter users. In: Proceedings of the IEEE 3rd International Confernece on Social Computing, pp. 192--199 (2011)</ref> have been made to address the problem of predicting the political preference/orientation of Twitter users in recent years. In our study, we use a simple but effective method to identify the left-leaning and right-leaning users.

We collected a set of Twitter users with known political preference from Twellow[2]. Specifically, we acquired 10,324 users who are labeled as Republican, conservative, Libertarian or Tea Party as right-leaning users, and 9,545 users who are labeled as Democrat, liberal or progressive as left-leaning users. We then obtained the follower ids for each of the 1000 left-leaning users (denoted as L) and 1000 right-leaning users (denoted as R) who have the most followers using Twitter Search API. Among the remaining users that are not contained in L or R, there are 1,169 left-leaning users and 2,172 right-leaning users included in our dataset of 2012 Republican Primaries. We denote the set of these 3,341 users as T.

The intuitive idea is that a user tends to follow others who share the same political preference as his/hers. The more right-leaning users one follows, the more likely that he/she belongs to the right-leaning group. Among all the users that a user is following, let N_l be the number of left-leaning users from L and N_r be the number of right-leaning users from R. We estimated the probability that a user is left-leaning as N_l/(N_l+N_r), and the probability that a user is right-leaning as N_r/(N_l+N_r). A user is labeled as left-leaning (right-leaning) if the probability that he/she is left-leaning (right-leaning) is more than a threshold. Empirically, we set the threshold as 0.6 in our study. We tested this method on the labeled dataset T and the result shows that this method correctly identified the political preferences of 3,088 users out of all 3,341 users (with an accuracy of 0.9243).

Totally, this method identified the political preferences of 83,934 users. Other users may not follow any of the users in L or R, or follow similar numbers of left-leaning and right-leaning users, thus their political preferences could not be identified. Table V shows the comparison of left-leaning and right-leaning users in our dataset. We found that right-leaning users were more involved in this election event in several ways. Specifically, the number of right-leaning users was two times more than that of left-leaning users, and the right-leaning users generated 2.65 times the number of tweets as the left-leaning users. Compared with the left-leaning users, the right-leaning users tended to create more original tweets and used more hashtags and URLs in their tweets. This result is quite reasonable since it was the Republican election, with which the right-leaning users are supposed to be more concerned than the left-leaning users.

ElectionPrediction Table5.PNG

Electoral Prediction with Different User Groups

We first recognized the users from each state. There are two types of location information from Twitter -- the geographic location of a tweet, and the user location in the profile. We utilized the background knowledge from LinkedGeoData[3] to identify the states from user location information. If the user's state could not be inferred from his/her location information, we utilized the geographic locations of his/her tweets. A user was recognized as from a state if his/her tweets were from that state. Table VI illustrates the distribution of users and tweets among the 10 Super Tuesday states. We also compared the number of users and tweets in each state to its population. The Pearson's r for the correlation between the number of users/tweets and the population is 0.9459/0.9667 (p<.0001).

ElectionPrediction Table6.PNG

Estimating a User's Vote

If a user will vote, who will he/she vote for? To answer this question, we need to find for which candidate the user shows the most support. How can we measure a user's support for one candidate? We think there are two indicators that can be extracted from a user's tweets of one candidate -- mention and sentiment. Intuitively, people show their support for celebrities through frequently talking about them and expressing positive sentiments about them. Previously, we have analyzed each user's tweets, identified which candidate is mentioned, and whether a positive or negative opinion is expressed toward a candidate in a tweet. For each user, let N be the number of all his/her tweets, N_m(c)$ be the number of tweets in which he/she mentioned a candidate c, N_pos(c)$ be the number of positive tweets about $c$ from the user, N_neg(c)$ be the number of negative tweets about $c$ from the user. We define the user's support score for c as:

ElectionPrediction Formula.PNG

where β (0 < β < 1) is a smoothing parameter, and γ (0 < γ < 1) is used to discount the score when the user does not express any opinion toward the candidate (N_pos(c)=N_neg(c)=0). We used β=γ=0.5 in our study. According to this definition, the more positive tweets (less negative tweets) are posted about the candidate, and the more the candidate is mentioned, the higher the user's support score for the candidate is. After calculating a user's support score for every candidate, we selected the candidate who received the highest score as the one that the user will vote for.

Prediction Results

To predict the election results in a state, we used only the collection of users who are identified from that state. Then we further divided each user collection of one state over four dimensions -- engagement degree, tweet mode, content type, and political preference. In order to get enough users in one group, we used a more coarse-grained classification instead of the five-point scales described in the section of User Categorization. To be specific, we classified users as three different groups according to their engagement degree: very low, low, and high*. The very low and low engagement groups are the same as what we have defined previously. The high* engagement group comprises the users who post more than 10 tweets (i.e., the aggregation of the medium, high and very high groups defined previously). Based on the tweet mode, the users were divided into two groups: original tweet-prone* and retweet-prone*, depending on whether they post more original tweets or more retweets. Similarly, the users were classified as opinion-prone* or information-prone* according to whether they post more opinions or more information. The right-leaning users and left-leaning users were also identified from the user collection of each state. In all, for each state, there were nine user groups over four different dimensions.

We also considered users in different time windows. Our dataset contains the users and their tweets discussing the election in 8 weeks prior to the election day. We wanted to see whether it will make any difference to use the data in different time windows. Here we examined four time windows -- 7 days, 14 days, 28 days or 56 days prior to the election day. For example, the 7 days window is from February 28th to March 5th. In a specific time window, we assessed a user's vote using only the set of tweets he/she creates during this time. Note that a user's vote might be varied in different time windows, since we used different sets of tweets for the assessment.

With each group of users in a specific state and a specific time window, we counted the users' votes for each candidate, and the one who received the most votes was predicted as the winner of the election in that state. The performance of a prediction was evaluated in two ways: (1) whether the predicted winner is the actual winner, and (2) comparing the predicted percentage of votes for each candidate with his actual percentage of votes, and getting the mean absolute error (MAE) of the four candidates.

Table VII shows the accuracy of winner prediction by different user groups in different time windows. The accuracy was calculated as N_state_true/N_state, in which N_state_true was the number of states where the winner was correctly predicted, and N_state(=10) was the number of all Super Tuesday states. Figure I illustrates the average MAE of the predictions in 10 states by different user groups in different time windows. From Table VII and Figure 1, we do see that different user groups on each dimension show varied prediction performance.

ElectionPrediction Table7.PNG
ElectionPrediction Figure1.PNG

As shown in Table VII, the high* engagement group correctly predicted the winners of 5 states in 7 day, 8 in 14, 5 in 28 and 6 in 56 day time windows, respectively, which is slightly better than the average performance of very low and low engagement groups. In addition, the average prediction error of high* engagement group is smaller than that of very low and low engagement groups in three out of the four time windows (see Figure 1(a)). Comparing two user groups over the tweet mode dimension, original tweet-prone* group beat the retweet-prone* group on both the winner and the vote predictions, by achieving better accuracy on winner prediction and smaller prediction error in almost all the time windows (see Figure 1(b)). The two user groups categorized by content type also show differences in predicting the elections, but the difference is not as clear as that of user groups on other dimensions. On winner prediction, the opinion-prone* group achieved better accuracy in 14 day and 28 day time windows, and information-prone* group achieved better accuracy in 7 day and 56 day time windows. Although the prediction error of opinion-prone* group was smaller than that of information-prone* group in three time windows, the gap was quite small (see Figure 1(c)).

It is interesting to find that, among all the user groups, the right-leaning group achieved the best prediction results. As we can see in Table VII and Figure 1(d), right-leaning group correctly predicted the winners of 5, 7, 7 and 8 states (out of 10 states) in 7 day, 14 day, 28 day and 56 day time windows, respectively. Furthermore, it also showed the smallest prediction error (<0.1) in three out of four time windows among all the user groups. In contrast, the prediction by the left-leaning group was the least accurate. In the worst case, it correctly predicted the winners in only 2 states (in the 14 day time window), and its prediction error was over 0.15.

To further verify our observation, we looked at the average prediction error of four time windows for each state, and applied paired t-test to find whether the difference of the average prediction errors in 10 states between a pair of user groups was statistically significant. The test showed that the difference between right-leaning and left-leaning user groups is statistically highly significant (p<.001). The difference between low and high* engagement user groups was also found statistically significant (p<.01). However, the difference between original tweet-prone* and retweet-prone*, or between opinion-prone* and information-prone* was not significant.


There are at least two factors that could affect the accuracy of electoral prediction with a user group. Firstly, whether the prediction of users' votes is accurate. Secondly, whether the users' opinion is representative of the actual voters' opinion. We interpret the varied prediction results with different user groups based on these two factors.

In our study, the high* engagement user group achieved better prediction results than very low and low engagement groups. It may be due to two reasons. Firstly, high engagement users posted more tweets. Since our prediction of a user's vote is based on the analysis of his/her tweets, it should be more reliable to make the prediction using more tweets. Secondly, according to our analysis, more engaged users showed stronger intent and were more involved in the election event. It might suggest that users in the high* engagement group were more likely to vote, compared with the users in the very low and low engagement groups.

However, the low engagement group did not show better performance compared with the very low engagement group. One possible explanation might be that the users from these two groups are not that different. A more fine-grained classification of users with different engagement degrees might provide more insight. Since the prediction is state-based, we could not get enough users in each group (especially the groups of highly engaged users) if we divided users into more groups. It is worth noting that more than 90% of all the users in our dataset belonged to very low and low engagement groups. Accurately predicting the votes of these users is one of the biggest challenges in electoral prediction.

The results also show that the prediction based on users who post more original tweets is slightly more accurate than that based on users who retweet more, although the difference is not significant. It may be due to the difficulty of identifying users' voting intent from retweets. In most of the current prediction studies, original tweets and retweets are treated equally with the same method. Further studies are needed to compare these two types of tweets in prediction, and a different method might be needed for identifying users' intent from retweets. In addition, a more fine-grained classification of users according to their tweet mode could provide more insight.

No significant difference is found between the opinion-prone* and the information-prone* user groups in prediction. It suggests that the likely voters cannot be identified based on whether users post more opinions or more information. It also reveals that the prediction of users' votes based on more opinion tweets is not necessarily more accurate than the prediction using more information tweets.

The right-leaning user group provides the most accurate prediction result, which is significantly better than that of the left-leaning user group. In the best case (56 day time window), the {right-leaning user group correctly predict the winners in 8 out of 10 states (Alaska, Georgia, Idaho, Massachusetts, Ohio, Oklahoma, Vermont, and Virginia), with an average prediction error of 0.1. It is worth noting that this result is significantly better than the prediction result of the same elections based on Twitter analysis reported in the news article [4], in which the winners in only 5 out of 10 states are correctly predicted (Georgia, Idaho, Massachusetts, Ohio, Virginia). Since the elections being predicted were Republican primaries, the attitude of right-leaning users could be more representative of the voters' attitude. To some extent, it demonstrates the importance of identifying likely voters in electoral prediction.


<references />