Analyzing patterns of public attention in Ukraine with data from Google. Patterns, charts, statistics and some irritating results.
The armed conflict and institutional reforms are core issues of Ukrainian politics which dominate the public discourse over the country’s future. The public attention sways back and forth between both issues, this pattern has been clearly observable over the past months. Let’s consider this as a thesis and let’s try to test it systematically against empirical data.
An interesting method of measuring trends in public attention is to analyze people’s search queries in Google. If people google ‘Manchester United’ this week three times more than last week, it’s an indication that the public attention of Manchester United has increased. The method doesn’t allow to conclude about personal interests of individuals, but it is appropriate to make conclusions about trends in society. The low precision of the operationalization is compensated by a mighty quantity of data points. And it’s almost in real time.
[Consider that social sciences is no exact science. Methods (especially this one) are not capable to capture real events and conclusions drawn from it are wrong. Therefore it’s not about searching for the truth, it’s rather about the formulation of recurring patterns that are less wrong than others.]
Google Trends provides data on such trends, a ‘search volume index’ quantifies the ‘search interest’ of people who use the search-engine. It allows comparisons between the popularity of often used key words and changes over time.
The time series below visualize the popularity of search terms related to reforms over the past two years. Data is limited to users of Ukraine, on a weekly base, including Crimea and separatist controlled territories, terms are in Russian and Ukrainian language.
corruption (Ukrainian: Корупція, Russian: Коррупция), reforms (Ukrainian: Реформи, Russian: Реформы)
Commonly used keywords related to the armed conflict are ‘war’ and ‘ATO’. Trends over time are visualized below. [Consider that data on search interests are relative to each other, Google doesn’t provides absolute numbers of the quantity of searches. This allows the comparisons of trends but not comparisons of its absolute relevance. A direct comparisons of terms related to the conflict and terms related to reforms show a clear dominance of the terms related to the conflict. The visualization of such a comparison would show spikes of the graph which refers to the conflict and a flat line for terms related to reforms. That’s why they are shown in two separate time series.]
ATO (Ukrainian and Russian: ато, Anit-Terrorist Operation), war (Ukrainian: війна, Russian: война)
In order to visualize how the public attention sways back and forth between conflict and reforms data are aggregated. (Addition of search volume indices for keywords conflict, war, reforms, corruption in Ukrainian and Russian language.)
The time series enables a visual test of the thesis whether public attention is shifting between conflict and reforms. A first glance seems to support the assumption, peaks in conflict are contrasted by lows of reforms and the other way round. Visualizations like this can be misleading, statistical tests give more clarity.
Testing for correlations allows to draw conclusions about similarities between variables – in very simple terms. The correlation (data of 2014 till now) between the both variables (conflict, reform) is negative as expected (r= -0.363). The more conflict, the less reforms and the other way round. In a next step we test the significance of the dependency, this tells whether the dependency is random. A single factor ANOVA indicates a high significance, the probability that the dependency is random is way below 1% (p= 1.180 e-07).
So far results confirm the thesis about public attention between conflict and reforms. To be precise the tests allow to falsify the assumption that the thesis is wrong.
Let’s go one step further. Maybe one of the variables is able to predict the other one. Regression analyzes are useful in this respect. Imagine data points of both variables are plotted against each other, the wider the distance between them, the lower the dependency, the closer data points, the better the model. In this analysis the sum of all squared distances between data points is calculated.
I did three models. The first model is a simple (one sided) regression analysis between both variables. The other models test for time dependencies in order to check whether searches related to reforms can be predicted by searches related to conflict one week ago (model II) – and the other way round (model III). It’s of interest which of the models fits best. The higher R², the better the model. Results are shown in the table below. [This is very old-fashioned, time series analyzes are more appropriate for this kind of data, I agree. In case you are a statistic nerd, you are most welcome to conduct such an analysis and to send me the results for publishing on this blog. You find the spreadsheet for replication at the end of the post.]
Model | I | II | III |
r/c | c(t-1), r | r(t-1), c | |
Multiple R | 0.419 | 0.359 | 0.413 |
R^2 | 0.175 | 0.129 | 0.171 |
Adjusted R^2 | 0.166 | 0.119 | 0.161 |
r = search interest related to reforms, c = search interest related to conflict
Model I fits best, which rejects the assumption on time dependencies. In other words, this week’s searches related to conflict predict this week’s searches on reforms best. It’s a bit surprising that model III fits better than model II, indicating that searches related to reforms of last week predict this week’s searches on conflict better than the other way round.
Since you have been reading until here, you might be able to do your own interpretation of these results. Keep in mind that correlations are no proof for causalities and that we are dealing only with data from Google searches. From a methodology point of view we can only reject the assumption that a thesis is wrong, this is no proof whether it is right. Nevertheless, further research in this field seems quite interesting.
Various commentators repeatedly expressed concerns that political elites use the war as an excuse or even justification for the lack of reforms. It’s speculation that elites push the war topic as soon as public attention for reforms is rising to levels which would threaten the elite’s privileges. Considering the results, there is no base to reject this assumption either. Further (and more professional) research needed – as usual.
(wf)