Google provides a significant percentage of the incoming traffic which many news sites get, but the algorithms that determine which sites get that traffic are proprietary and not widely understood. Academics at Northwestern University decided to attempt to quantify how those algorithms work by looking closely at search results for news related queries for one specific month, November 2017. The researchers identified 200 search terms related to hard news events and then ran those terms once for each minute of each day. What they found after collecting all of this data is that Google’s top story search results are concentrated on a small handful of major outlets. From the Columbia Journalism Review:
In total, we collected 6,302 unique links to news articles shown in the Top Stories box. For each of those links we count an article impression each time one of those links appears.
The data shows that just 20 news sources account for more than half of article impressions. The top 20 percent of sources (136 of 678) accounted for 86 percent of article impressions. And the top three accounted for 23 percent: CNN, The New York Times, and The Washington Post. These statistics underscore the degree of concentration of attention to a relatively narrow slice of news sources.
Here’s the results of the study in graph form. These are the top 20 sources that receive 50% of the impressions. But you’ll notice that CNN is getting 5 to 10 times as many impressions in search results as most of these sites, including ABC, CBS, and NBC.
The researchers also wanted to see how the results skewed politically. To do that they used the results of another study which “identifies the ideological alignment of the top 500 most-shared news sites on Facebook” based on the self-reported political views of the people sharing the news. Here’s what they found:
Our data shows that 62.4 percent of article impressions were from sources rated by that research as left-leaning, whereas 11.3 percent were from sources rated as right-leaning. 26.3 percent of impressions were from news sources that didn’t have ratings. But even if that last set of unknown impressions happened to be right-leaning, the trend would still be clear: A higher proportion of left-leaning sources appear in Top Stories.
If you exclude the 26% that isn’t rated, the ratio of news that appears in Google search results leans left by more than 5:1. But CJR points out that this result may be skewed by the fact that there are more left-leaning sites than right-leaning ones overall. However, even if you account for this, it appears Google’s algorithm is still tilting the results to the left:
In GDELT there were 2.2 times as many articles from left-leaning sources as right-leaning sources. But in Google Top Stories that ratio was 3.2, indicating that the curation algorithm was slightly magnifying the left-leaning skew in comparison to the GDELT baseline.
Looking over all of this, I wonder how much of this skew is the result of CNN, the Washington Post, and the NY Times receiving such an outsized percentage of the impressions. I’d be interested to see how the left-right balance of the results would change if those two sources were cut back to about 3% of results each (comparable with Fox). Still, looking at the top 20 sites there aren’t many there that don’t lean pretty obviously to the left.
Why does this matter? Because Google presents itself as a kind of internet utility which doesn’t take sides. But the results of this audit suggests it’s restricting news search results to a narrow band of left-leaning sources. This isn’t happening by accident. They have tweaked the code to make it work out this way. This probably had a far more serious impact on the 2016 election than anything the Russians did or attempted to do.