Forecasting Twitter Movements

key insights for social media data collection

Neighbor Log Raw Count

Social Media Trends Demo

To get started with this demo, click on any of the buttons above to view trends related to a #MeToo movement hashtag. See below an explanation for each data visualization provided, from left to right.

  • forecasts : shows the log of 35 most recent weekly counts for a given hashtag, and projects log counts 15 weeks into the future using ARIMA. ARIMA p, d, and q parameters were chosen by gridsearching and optimizing validation loss over a subset of hashtag counts.
  • counts : shows the log of the previous 80 observed weekly counts for a given hashtag ("true"), along with an ARIMA-projected count ("pred"). Here, the ARIMA model was trained up to the (t-1)th timestep, and used to predict the (t)th timestep.
  • closest neighbors : shows a tsne (t-distributed Stochastic Neighbor Embedding) plot of the 20 closest neighbors to the hashtag in a month of twitter data. Distance between words were determined using a GloVe embedding representation of the social media corpus. Size of a word corresponds to the log of its raw count (bigger bubbles mean more popular words).
  • table : shows the closest neighbors to a given hashtag, sorted in order of increasing distance.

  • This demo is in progress and full code & documentation will be released within the next 3 months for public use. Here are some relevant links:
  • ARIMA models
  • Why log counts rather than counts for time series forecasting?
  • Stanford GloVe Embeddings
  • More coming soon!



    Contributor: Maya Srikanth