To get started with this demo, click on any of the buttons above to view trends related to a #MeToo movement hashtag. See below an explanation for each data visualization provided, from left to right.
forecasts : shows the log of 35 most recent weekly counts for a given hashtag, and projects log counts 15 weeks into the future using ARIMA. ARIMA p, d, and q parameters were chosen by gridsearching and optimizing validation loss over a subset of hashtag counts.
counts : shows the log of the previous 80 observed weekly counts for a given hashtag ("true"), along with an ARIMA-projected count ("pred"). Here, the ARIMA model was trained up to the (t-1)th timestep, and used to predict the (t)th timestep.
closest neighbors : shows a tsne (t-distributed Stochastic Neighbor Embedding) plot of the 20 closest neighbors to the hashtag in a month of twitter data. Distance between words were determined using a GloVe embedding representation of the social media corpus. Size of a word corresponds to the log of its raw count (bigger bubbles mean more popular words).
table : shows the closest neighbors to a given hashtag, sorted in order of increasing distance.
This demo is in progress and full code & documentation will be released within the next 3 months for public use. Here are some relevant links:
ARIMA models
Why log counts rather than counts for time series forecasting?
Stanford GloVe Embeddings
More coming soon!