Shine a light on tech’s hidden impacts. Triple your donation.
Skip navigation

Show Your WorkCitizen Browser

How We Investigated Facebook’s Most Popular Content

A scowling "smiley face" surrounded by blue and red circles of numbers.
Gabriel Hongsdusit

Have you read this article yet? You may want to start here.

↩︎ link

Introduction

In early November, Facebook released its second Widely Viewed Content Report

The report uses two metrics to quantify the diffusion of content in the Facebook news feed: content viewers and content views—as evidenced by phrases like “Posts with the most News Feed content viewers in the US during Q3 2021 represented only a small fraction of all News Feed content views in the US.”

However, the top 20 rankings provided for domains, links, pages, and posts use only the viewers metric, counting every person who has seen a piece of content once and only once. While this metric is useful for understanding the reach of a piece of content, it does not reflect the difference between a domain that a certain user sees in the news feed just once in a quarter and a domain that is seen in the news feed every day.  

The digital advertising world—which is built around the art of persuasive messaging—does distinguish between these concepts, which are described as reach and impressions. While “reach” is the number of unique users who have seen an advertising message, “impressions” are the total number of times the ad has been seen, counting multiple views by the same individuals. In combination, these give a third metric, “frequency”: the number of impressions per user, i.e. how many times a person sees a given message.

General consensus is that an ad campaign of limited budget must seek an optimal value for frequency: A message can be shown either to a large number of people a few times, or to a smaller number of people many times, with a different message impact depending on which value is prioritized.

The rankings released by Facebook are calculated with a metric comparable to reach but not impressions. However, we can estimate impressions and frequency by using data from our Citizen Browser panel to measure how often content was displayed to our 1,315 panelists over the same third-quarter period (Q3 2021, July 1 to Sept. 30) as the Facebook report. We do this by first counting impressions (the total number of times that links from a domain were shown to any of our panelists). Using our impressions metric, we can show that during this period, many sensationalist news and opinion sites appeared with greater frequency in panelists’ news feeds than mainstream news content.

↩︎ link

Methodology

In the Widely Viewed Content Report, Facebook presented its findings with a series of tables showing the top 20 domains, links, pages, and posts, along with the number of content viewers.

Although our sample is much smaller than Facebook’s 220 million U.S. users, we found that our panelist data revealed a similar set of top 20 rankings using the same methodology as Facebook: counting the number of unique users who had links to a given domain appear on their news feeds.

To create this ranking, we queried the Citizen Browser database for a list of URL base domains that appeared in any of our panelists’ Facebook timeline posts, ordered by the number of unique user accounts that were shown at least one post with a link from a given domain. This means that regardless of the number of different articles from, for example, nytimes.com that were shown to a user account over the Q3 time period, the domain is counted once per user.

To count impressions, we queried the database for the total number of timeline posts that contained a given URL domain. For each data capture from Citizen Browser—which simulates a Facebook user opening the Facebook website—the top 40 or so posts in a user’s news feed are saved to our database. To calculate the rankings, we counted the number of these snapshots that contained a link or links to a given domain.

This means that if the same post remained in a user’s news feed on multiple days, it was counted multiple times. Similarly, if two different users saw the same post in their news feeds, the post was counted twice. If multiple links from the same domain appeared in the same post, we counted the post just once. 

We believe that our method of counting impressions approximates Facebook’s own methodology for assessing view counts from content in posts. However, it is difficult to be certain, given the brevity of the companion methodology published by Facebook. (Facebook’s report acknowledges impressions—which Facebook refers to as “views”—as a quantifiable metric but does not provide rankings based on it.)

Our reporting highlights the change in rankings under these two differently calculated metrics. A representation of the degree of change is shown below in a slope chart. For this analysis, we found the set of domains that are in the top 20 when ranked by unique users and calculated their rank by impressions, and vice versa. This yielded a combined set of 28 domains that are in the top 20 under one or both of the two metrics. We then plotted the ranking change as a line joining the first rank to the second, with one line for each domain.

↩︎ link

Top 20 domains, ranked by unique users (Citizen Browser)

Rank Domain Unique users
1youtube.com1,186
2instagram.com1,142
3amazon.com815
4twitter.com784
5npr.org694
6media1.tenor.co679
7gofundme.com668
8nbcnews.com661
9cnn.com658
10nytimes.com628
11washingtonpost.com616
12unicef.org570
13usatoday.com561
14abcnews.go.com527
15playeralumniresources.com509
16m.tiktok.com508
17bbc.com499
18ranker.com490
19open.spotify.com490
20dailymail.co.uk481
Source: The Markup Citizen Browser project, July 1–Sept. 30, 2021. See our data here.
↩︎ link

Top 20 domains ranked by impressions, and the ranking change from unique users (Citizen Browser)

Rank Domain Impressions Change
1youtube.com56,5540
2instagram.com30,2010
3npr.org21,9232
4nytimes.com17,865–14
5washingtonpost.com12,0106
6amazon.com11,665–3
7cnn.com9,1212
8nbcnews.com7,6820
9dailywire.com7,674135
10ranker.com7,1308
11huffpost.com6,13911
12buzzfeed.com6,04639
13westernjournal.com5,205196
14msnbc.com5,11052
15dailymail.co.uk4,8025
16foxnews.com4,79634
17news.yahoo.com4,73731
18twitter.com4,251–14
19thehill.com4,08616
20abcnews.go.com3,856–6
Source: The Markup Citizen Browser project, July 1–Sept. 30, 2021. See our data here.
↩︎ link

Correlation

The Facebook report also provides us with an opportunity to determine how representative our panel is by comparing the list of top domains provided in Facebook’s report with our panel’s top domains calculated in the same way.

To do this, we generated a list of top domains by number of unique users who have seen a link to that domain within the same period as the Facebook Q3 report. All of the 20 domains listed in Facebook’s report appear within the top 189 domains of our Citizen Browser dataset. However, when we remove the anomalous domain of “paypal.com,” we see the remaining 19 domains listed in Facebook’s report within our top 39 domains. (We believe paypal.com is anomalous because it was shared via private posts, which are not captured by Citizen Browser.)

Figure 2 shows the distribution of the number of panelists who were shown the domains from our list in their Facebook news feeds.

We can then use the Kendall Tau-c rank correlation coefficient to get a sense of how comparable our list is with the Facebook report’s list. Kendall Tau-c is used because of its specificity in correlating the similarity between ordered lists of categorical values. (Categorical variables are difficult here because they are not orderable. That is to say, we can say two domains are the same or not, but we can’t say how similar they are.) 

Kendall Tau-c specifically looks at the relative ordering of two lists of data. However, because the Facebook report only provides a list of 20 domains, while our list contains 189 domains, we must calculate the correlation coefficient by first taking the intersection of the two lists (i.e., deleting domains in our list that don’t appear in the Facebook list, while maintaining their relative ordering). The resulting data can be seen in Figure 5. While this may have a biasing effect in our correlation coefficient, we attempt to resolve this by explicitly calculating our p-value through simulation. In addition, we computed the rank-biased overlap coefficient for the two lists and achieved similar results of rbo = 0.698 using the extrapolated RBO calculation.

We find that the domain orderings have a Kendall Tau-c ranking correlation of τ=0.411 with a p-value of p = 0.0111. A correlation value of τ = 0.411 represents a very strong correlation between the two rankings of domains (a correlation value of 1 means a perfect match, and 0 means no correlation whatsoever). Anything above 0.3 for Kendall Tau-c is considered a strong correlation.

In addition, the p-value gives us high confidence that this value is not the result of random data. This p-value was calculated explicitly by randomly shuffling our list of domains and re-running our analysis (see Figure 3). In this test we found that 98.89 percent of randomly generated lists had a worse correlation coefficient than our list did, giving us good statistical confidence that our results are representative.

It is important to note that this correlation only shows that our rankings are ordered similarly to the Facebook rankings. Because of the requirement to find the intersection of the two lists, and the fact that Kendall Tau-c doesn’t take absolute ranking into account, the correlation is unaffected by how many “intermediate” domains there are between two matches to the Facebook list. However, the observation that we only need a list of 189 domains to have coverage over the Facebook list gives us confidence that this result is significant.

To further address the caveats to the Kendall Tau-c analysis, we computed the correlation between the unique user counts for our domains and the domains from the Facebook report. This should tell us whether the number of users being shown the domains in the Facebook list and our own list scale linearly with each other.

To do this we again find the intersection of our domain list with the Facebook report domain list so we are comparing like quantities, but this time we use the Spearman correlation coefficient to show similarity between the number of unique users seeing that domain. The data in this calculation can be seen in Figures 4 and 5. We are able to use the Spearman coefficient because we are now operating with numerical variables and not categorical variables. This method looks specifically at whether the numbers trend in a similar way. We found a correlation of ρ = 0.568 and a p-value of p = 0.009, which again is a strong correlation. This shows that even though Facebook and The Markup are calculating rankings from very different audience sizes, the numbers scale in the same way and can be compared to each other.

From this analysis we can conclude that our data is a good match to the released Facebook data and that our sample is sufficient for statistical relevance. This means we can be confident in our conclusions based on our sample size.

↩︎ link

Unique users and rankings recorded by Facebook and Citizen Browser for domains in Facebook’s Q3 report

Domain Unique Citizen Browser Users Ranking Markup Ranking Facebook Unique Facebook Users
youtube.com1,18611183,800,000
gofundme.com66962132,000,000
amazon.com81523131,700,000
media1.tenor.co67954124,700,000
twitter.com78335118,800,000
unicef.org570116111,100,000
linktr.ee388397102,900,000
tiktok.com44530899,400,000
abcnews.go.com52713998,200,000
cdc.gov392371097,800,000
playeralumniresources.com510141197,200,000
open.spotify.com490181291,900,000
dailymail.co.uk482191389,600,000
nypost.com415331486,100,000
ranker.com490171585,600,000
nbcnews.com66071683,600,000
eventbrite.com447261783,400,000
cnn.com65881882,800,000
paypal.com1301891981,300,000
etsy.com463252080,900,000

Figure 5

Sources: The Markup Citizen Browser project/Facebook Widely Viewed Content Report, July 1–Sept. 30, 2021. See our data here.

↩︎ link

Twitter Bot

The Markup’s Twitter bot, “Trending on Facebook,” posts two lists of top performing content per day:

  1. Sites shown most frequently to panelists
  2. Sites with greatest increase when ranked by views

Our methodology for the two lists is as follows.

↩︎ link

Top Domains by Frequency

Frequency is the number of impressions per user, i.e., how many times a person sees a given message.

To find the list of top domains by frequency, we first calculate impressions per domain with the same methodology described above, but only for a 24-hour period from midnight on the day before the tweet is posted until midnight of the same day. For each domain in the list that has been seen by at least 10 unique users, we then divide the total number of impressions by the number of unique users who were shown it, giving the mean number of impressions per user (“frequency”). Domains are then ranked from the highest to lowest frequency value, and the top 5 are posted by the bot.

↩︎ link

Top Domains by Views Rank Change

Impressions are the total number of times a piece of content has been seen, counting multiple views by the same individuals. This metric calculates the change in rankings of domains, as ranked by impressions.

To find the top domains by rank change on a given day, we first calculate the number of impressions for each domain that were seen by any of our panelists in the last 24 hours and produce a list of the top 250 domains ranked in order of the number of impressions, highest first.

We then perform the same calculation for the 24 hours before that, i.e. the top 250 domains seen in the previous day. We compare the two lists, and for each domain that appears in the top 250 domains on both days, we calculate the rank change by subtracting the current day’s rank from the previous day’s rank. This means results will fall in a value range from +249 to –249. The bot then posts the top 5 domains with the greatest positive rank change.

↩︎ link

Limitations

An impression as captured by The Markup’s Citizen Browser software is not directly equivalent to a human Facebook user reading or watching a piece of content on the site. 

The Citizen Browser app is generally able to perform these captures between two and three times a day, with an average of 2.6 captures per user per day during Q3. According to a recent Pew Research poll, seven in 10 Facebook users say they visit the site daily, and just under half (49 percent) visit the site multiple times per day. Therefore, though the Citizen Browser capture rate is likely to be higher than the mean number of logins per day for a U.S. Facebook user, it is well within the realm of normal use.

Each capture from Citizen Browser records roughly 40 pieces of content placed at the top of a user’s news feed. It is unclear how many pieces of news feed content a human viewer would see in a standard Facebook browsing session. The average U.S. Facebook user spent around 33 minutes per day on the platform as of January 2021.

Insofar as news links are concerned, having a news article appear in a user’s Facebook feed also provides no guarantee that the reader will read and assimilate its message. However, previous studies have observed that high numbers of social media users—perhaps upward of half—share and forward links without ever reading them, meaning an analysis based purely on how frequently links appear may be a good proxy for the impact of news absorbed through social media. (Facebook implicitly acknowledged this problem when it introduced a prompt for users sharing unopened links in May 2021.)

Our calculation of correlation between our sample and Facebook’s results could be improved if we had a better understanding of what Facebook itself did in the calculation of its Q3 and prior Q2  transparency reports. The report methodology does not go into detail as to what Facebook considers a link and how it counts links overall. We assumed that all links on Facebook posts were considered and that the counts were unique per user. In addition, Citizen Browser only considers public posts, which may be the root cause of the differences between our list and Facebook’s. However, in its transparency report, Facebook does not provide us with enough insight to understand how these features affect its results.

↩︎ link

Facebook Response

Facebook did not take issue with our analysis and declined to answer specific questions.

In a short emailed statement, Meta spokesperson Ariana Anthony said, “The focus of the Widely Viewed Content Report is to show the content that is seen by the most people on Facebook, not the content that is posted most frequently. That said, we will continue to refine and improve these reports as we engage with academics, civil society groups, and researchers to identify the parts of these reports they find most valuable, which metrics need more context, and how we can best support greater understanding of content distribution on Facebook moving forward.”

↩︎ link

Conclusion

Though Facebook’s Widely Viewed Content Report series was presented as a transparency measure, we found that Facebook’s method for assessing top performing content on the platform ignores information about the total number of impressions that domains receive over time, which becomes particularly pertinent for domains that publish new content regularly. Moreover, when impressions are factored into the calculations, sensationalist news and opinion sites outperform much mainstream news content on the platform—with sites that do little original reporting benefiting from some of the most pronounced ranking increases.

Our investigation attempted to replicate Facebook’s methodology as closely as possible when calculating rankings by the same metric as the Widely Viewed Content Report, but this was made more difficult by the brevity of the documentation provided by the platform on how its findings were calculated. Given the fact that changes in methodology can produce significantly different results, better documentation of future reports would aid independent verification and preempt potential claims that the platform was being misleading under the guise of transparency.

↩︎ link

Acknowledgements

We thank Michael Lavine of the University of Massachusetts Amherst for reviewing this methodology.

We don't only investigate technology. We instigate change.

Your donations power our award-winning reporting and our tools. Together we can do more. Give now.

Donate Now