How We Investigated Facebook’s Most Popular Content

Introduction

In early November, Facebook released its second Widely Viewed Content Report.

The report uses two metrics to quantify the diffusion of content in the Facebook news feed: content viewers and content views—as evidenced by phrases like “Posts with the most News Feed content viewers in the US during Q3 2021 represented only a small fraction of all News Feed content views in the US.”

However, the top 20 rankings provided for domains, links, pages, and posts use only the viewers metric, counting every person who has seen a piece of content once and only once. While this metric is useful for understanding the reach of a piece of content, it does not reflect the difference between a domain that a certain user sees in the news feed just once in a quarter and a domain that is seen in the news feed every day.

The digital advertising world—which is built around the art of persuasive messaging—does distinguish between these concepts, which are described as reach and impressions. While “reach” is the number of unique users who have seen an advertising message, “impressions” are the total number of times the ad has been seen, counting multiple views by the same individuals. In combination, these give a third metric, “frequency”: the number of impressions per user, i.e. how many times a person sees a given message.

General consensus is that an ad campaign of limited budget must seek an optimal value for frequency: A message can be shown either to a large number of people a few times, or to a smaller number of people many times, with a different message impact depending on which value is prioritized.

The rankings released by Facebook are calculated with a metric comparable to reach but not impressions. However, we can estimate impressions and frequency by using data from our Citizen Browser panel to measure how often content was displayed to our 1,315 panelists over the same third-quarter period (Q3 2021, July 1 to Sept. 30) as the Facebook report. We do this by first counting impressions (the total number of times that links from a domain were shown to any of our panelists). Using our impressions metric, we can show that during this period, many sensationalist news and opinion sites appeared with greater frequency in panelists’ news feeds than mainstream news content.

↩︎ link

Methodology

In the Widely Viewed Content Report, Facebook presented its findings with a series of tables showing the top 20 domains, links, pages, and posts, along with the number of content viewers.

Although our sample is much smaller than Facebook’s 220 million U.S. users, we found that our panelist data revealed a similar set of top 20 rankings using the same methodology as Facebook: counting the number of unique users who had links to a given domain appear on their news feeds.

To create this ranking, we queried the Citizen Browser database for a list of URL base domains that appeared in any of our panelists’ Facebook timeline posts, ordered by the number of unique user accounts that were shown at least one post with a link from a given domain. This means that regardless of the number of different articles from, for example, nytimes.com that were shown to a user account over the Q3 time period, the domain is counted once per user.

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

Give Now

To count impressions, we queried the database for the total number of timeline posts that contained a given URL domain. For each data capture from Citizen Browser—which simulates a Facebook user opening the Facebook website—the top 40 or so posts in a user’s news feed are saved to our database. To calculate the rankings, we counted the number of these snapshots that contained a link or links to a given domain.

This means that if the same post remained in a user’s news feed on multiple days, it was counted multiple times. Similarly, if two different users saw the same post in their news feeds, the post was counted twice. If multiple links from the same domain appeared in the same post, we counted the post just once.

We believe that our method of counting impressions approximates Facebook’s own methodology for assessing view counts from content in posts. However, it is difficult to be certain, given the brevity of the companion methodology published by Facebook. (Facebook’s report acknowledges impressions—which Facebook refers to as “views”—as a quantifiable metric but does not provide rankings based on it.)

Our reporting highlights the change in rankings under these two differently calculated metrics. A representation of the degree of change is shown below in a slope chart. For this analysis, we found the set of domains that are in the top 20 when ranked by unique users and calculated their rank by impressions, and vice versa. This yielded a combined set of 28 domains that are in the top 20 under one or both of the two metrics. We then plotted the ranking change as a line joining the first rank to the second, with one line for each domain.

↩︎ link

Ranking Change by Measurement Metric

Rank

YouTube

Twitter

Tenor

CNN

#20

UNICEF

BBC

The Daily Mail

HuffPost

The Hill

#40

Yahoo! News

Fox News

Buzzfeed

#60

MSNBC

#80

#100

#120

#140

The Daily Wire

#160

#180

#200

Western Journal

Unique users

Impressions

Rank

YouTube

Twitter

Tenor

CNN

#20

UNICEF

BBC

The Daily Mail

HuffPost

The Hill

#40

Yahoo! News

Fox News

Buzzfeed

#60

MSNBC

#80

#100

#120

#140

The Daily Wire

#160

#180

#200

Western Journal

Unique users

Impressions

Rank

YouTube

Twitter

Tenor

CNN

#20

UNICEF

BBC

The Daily Mail

HuffPost

The Hill

#40

Yahoo! News

Fox News

Buzzfeed

#60

MSNBC

#80

#100

#120

#140

The Daily Wire

#160

#180

#200

Western Journal

Unique users

Impressions

Figure 1

Slope chart showing the change in domain ranking for a subset of domains in our sample, depending on which metric (unique users or impressions) is used. Source: The Markup Citizen Browser project, July 1–Sept. 30, 2021. See our data here.

↩︎ link

Top 20 domains, ranked by unique users (Citizen Browser)

Rank	Domain	Unique users
1	youtube.com	1,186
2	instagram.com	1,142
3	amazon.com	815
4	twitter.com	784
5	npr.org	694
6	media1.tenor.co	679
7	gofundme.com	668
8	nbcnews.com	661
9	cnn.com	658
10	nytimes.com	628
11	washingtonpost.com	616
12	unicef.org	570
13	usatoday.com	561
14	abcnews.go.com	527
15	playeralumniresources.com	509
16	m.tiktok.com	508
17	bbc.com	499
18	ranker.com	490
19	open.spotify.com	490
20	dailymail.co.uk	481

Source: The Markup Citizen Browser project, July 1–Sept. 30, 2021. See our data here.

↩︎ link

Top 20 domains ranked by impressions, and the ranking change from unique users (Citizen Browser)

Rank	Domain	Impressions	Change
1	youtube.com	56,554	0
2	instagram.com	30,201	0
3	npr.org	21,923	2
4	nytimes.com	17,865	–14
5	washingtonpost.com	12,010	6
6	amazon.com	11,665	–3
7	cnn.com	9,121	2
8	nbcnews.com	7,682	0
9	dailywire.com	7,674	135
10	ranker.com	7,130	8
11	huffpost.com	6,139	11
12	buzzfeed.com	6,046	39
13	westernjournal.com	5,205	196
14	msnbc.com	5,110	52
15	dailymail.co.uk	4,802	5
16	foxnews.com	4,796	34
17	news.yahoo.com	4,737	31
18	twitter.com	4,251	–14
19	thehill.com	4,086	16
20	abcnews.go.com	3,856	–6

Source: The Markup Citizen Browser project, July 1–Sept. 30, 2021. See our data here.

↩︎ link

Correlation

The Facebook report also provides us with an opportunity to determine how representative our panel is by comparing the list of top domains provided in Facebook’s report with our panel’s top domains calculated in the same way.

To do this, we generated a list of top domains by number of unique users who have seen a link to that domain within the same period as the Facebook Q3 report. All of the 20 domains listed in Facebook’s report appear within the top 189 domains of our Citizen Browser dataset. However, when we remove the anomalous domain of “paypal.com,” we see the remaining 19 domains listed in Facebook’s report within our top 39 domains. (We believe paypal.com is anomalous because it was shared via private posts, which are not captured by Citizen Browser.)

Figure 2 shows the distribution of the number of panelists who were shown the domains from our list in their Facebook news feeds.

↩︎ link

Unique User Counts for Top Domains Across Citizen Browser and Facebook (U.S.)

Facebook unique user counts for top 20 domains

Citizen Browser unique user counts for top 100 domains

200

1,200

175

1,000

150

800

Unique users (in millions)

125

Unique users

100

600

400

200

Domain rank

Facebook unique user counts for top 20 domains

Citizen Browser unique user counts for top 100 domains

200

1,200

175

1,000

150

800

Unique users (in millions)

125

Unique users

100

600

400

200

Domain rank

Citizen Browser unique user counts for top 87 domains

1,200

1,000

800

Unique users

600

400

200

Domain rank

Facebook unique user counts for top 20 domains

200

175

150

Unique users (in millions)

125

100

Domain rank

Citizen Browser unique user counts for top 87 domains

1,200

1,000

800

Unique users

600

400

200

Domain rank

Facebook unique user counts for top 20 domains

200

175

Unique users (in millions)

150

125

100

Domain rank

Citizen Browser unique user counts for top 87 domains

1,200

Unique users

800

400

Domain rank

Facebook unique user counts for top 20 domains

200

Unique users (in millions)

150

100

Domain rank

Figure 2

Sources: The Markup Citizen Browser project/Facebook Widely Viewed Content Report, July 1–Sept. 30, 2021. See our data here.

We can then use the Kendall Tau-c rank correlation coefficient to get a sense of how comparable our list is with the Facebook report’s list. Kendall Tau-c is used because of its specificity in correlating the similarity between ordered lists of categorical values. (Categorical variables are difficult here because they are not orderable. That is to say, we can say two domains are the same or not, but we can’t say how similar they are.)

Kendall Tau-c specifically looks at the relative ordering of two lists of data. However, because the Facebook report only provides a list of 20 domains, while our list contains 189 domains, we must calculate the correlation coefficient by first taking the intersection of the two lists (i.e., deleting domains in our list that don’t appear in the Facebook list, while maintaining their relative ordering). The resulting data can be seen in Figure 5. While this may have a biasing effect in our correlation coefficient, we attempt to resolve this by explicitly calculating our p-value through simulation. In addition, we computed the rank-biased overlap coefficient for the two lists and achieved similar results of rbo = 0.698 using the extrapolated RBO calculation.

We find that the domain orderings have a Kendall Tau-c ranking correlation of τ=0.411 with a p-value of p = 0.0111. A correlation value of τ = 0.411 represents a very strong correlation between the two rankings of domains (a correlation value of 1 means a perfect match, and 0 means no correlation whatsoever). Anything above 0.3 for Kendall Tau-c is considered a strong correlation.

In addition, the p-value gives us high confidence that this value is not the result of random data. This p-value was calculated explicitly by randomly shuffling our list of domains and re-running our analysis (see Figure 3). In this test we found that 98.89 percent of randomly generated lists had a worse correlation coefficient than our list did, giving us good statistical confidence that our results are representative.

It is important to note that this correlation only shows that our rankings are ordered similarly to the Facebook rankings. Because of the requirement to find the intersection of the two lists, and the fact that Kendall Tau-c doesn’t take absolute ranking into account, the correlation is unaffected by how many “intermediate” domains there are between two matches to the Facebook list. However, the observation that we only need a list of 189 domains to have coverage over the Facebook list gives us confidence that this result is significant.

↩︎ link

Correlation value of top Citizen Browser domains and top Facebook domains compared against randomly ordered lists

0.030

Correlation

value of 0.411

0.025

0.020

Probability

0.015

0.010

0.005

–0.6

–0.4

–0.2

0.2

0.4

0.6

0.8

Kendall Tau-c

0.030

Correlation

value of 0.411

0.025

0.020

Probability

0.015

0.010

0.005

–0.6

–0.2

0.2

0.4

0.8

–0.4

0.6

Kendall Tau-c

0.030

Correlation

value of 0.411

0.025

0.020

Probability

0.015

0.010

0.005

–0.6

–0.4

–0.2

0.2

0.4

0.6

0.8

Kendall Tau-c

Figure 3

Sources: The Markup Citizen Browser project and Facebook Widely Viewed Content Report, July 1–Sept. 30, 2021/The Markup simulated data. See our data here.

To further address the caveats to the Kendall Tau-c analysis, we computed the correlation between the unique user counts for our domains and the domains from the Facebook report. This should tell us whether the number of users being shown the domains in the Facebook list and our own list scale linearly with each other.

To do this we again find the intersection of our domain list with the Facebook report domain list so we are comparing like quantities, but this time we use the Spearman correlation coefficient to show similarity between the number of unique users seeing that domain. The data in this calculation can be seen in Figures 4 and 5. We are able to use the Spearman coefficient because we are now operating with numerical variables and not categorical variables. This method looks specifically at whether the numbers trend in a similar way. We found a correlation of ρ = 0.568 and a p-value of p = 0.009, which again is a strong correlation. This shows that even though Facebook and The Markup are calculating rankings from very different audience sizes, the numbers scale in the same way and can be compared to each other.

↩︎ link

Unique user count of top 20 domains as reported by Facebook against same domains’ user count in Citizen Browser data

200

180

Facebook unique users (millions)

160

140

120

100

200

400

600

800

1,000

1,200

1,400

Citizen Browser unique users

200

180

160

Facebook unique users (millions)

140

120

100

200

400

600

800

1,000

1,200

1,400

Citizen Browser unique users

200

Facebook unique users (millions)

160

120

400

800

1,200

Citizen Browser unique users

Figure 4

Source: The Markup Citizen Browser project/Facebook Widely Viewed Content Report, July 1–Sept. 30, 2021. See our data here.

From this analysis we can conclude that our data is a good match to the released Facebook data and that our sample is sufficient for statistical relevance. This means we can be confident in our conclusions based on our sample size.

↩︎ link

Unique users and rankings recorded by Facebook and Citizen Browser for domains in Facebook’s Q3 report

Domain	Unique Citizen Browser Users	Ranking Markup	Ranking Facebook	Unique Facebook Users
youtube.com	1,186	1	1	183,800,000
gofundme.com	669	6	2	132,000,000
amazon.com	815	2	3	131,700,000
media1.tenor.co	679	5	4	124,700,000
twitter.com	783	3	5	118,800,000
unicef.org	570	11	6	111,100,000
linktr.ee	388	39	7	102,900,000
tiktok.com	445	30	8	99,400,000
abcnews.go.com	527	13	9	98,200,000
cdc.gov	392	37	10	97,800,000
playeralumniresources.com	510	14	11	97,200,000
open.spotify.com	490	18	12	91,900,000
dailymail.co.uk	482	19	13	89,600,000
nypost.com	415	33	14	86,100,000
ranker.com	490	17	15	85,600,000
nbcnews.com	660	7	16	83,600,000
eventbrite.com	447	26	17	83,400,000
cnn.com	658	8	18	82,800,000
paypal.com	130	189	19	81,300,000
etsy.com	463	25	20	80,900,000

Figure 5

Sources: The Markup Citizen Browser project/Facebook Widely Viewed Content Report, July 1–Sept. 30, 2021. See our data here.

↩︎ link

Twitter Bot

The Markup’s Twitter bot, “Trending on Facebook,” posts two lists of top performing content per day:

Sites shown most frequently to panelists
Sites with greatest increase when ranked by views

↩︎ link

Our methodology for the two lists is as follows.

↩︎ link

Top Domains by Frequency

Frequency is the number of impressions per user, i.e., how many times a person sees a given message.

To find the list of top domains by frequency, we first calculate impressions per domain with the same methodology described above, but only for a 24-hour period from midnight on the day before the tweet is posted until midnight of the same day. For each domain in the list that has been seen by at least 10 unique users, we then divide the total number of impressions by the number of unique users who were shown it, giving the mean number of impressions per user (“frequency”). Domains are then ranked from the highest to lowest frequency value, and the top 5 are posted by the bot.

↩︎ link

Top Domains by Views Rank Change

Impressions are the total number of times a piece of content has been seen, counting multiple views by the same individuals. This metric calculates the change in rankings of domains, as ranked by impressions.

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

Give Now

To find the top domains by rank change on a given day, we first calculate the number of impressions for each domain that were seen by any of our panelists in the last 24 hours and produce a list of the top 250 domains ranked in order of the number of impressions, highest first.

We then perform the same calculation for the 24 hours before that, i.e. the top 250 domains seen in the previous day. We compare the two lists, and for each domain that appears in the top 250 domains on both days, we calculate the rank change by subtracting the current day’s rank from the previous day’s rank. This means results will fall in a value range from +249 to –249. The bot then posts the top 5 domains with the greatest positive rank change.

↩︎ link

Limitations

An impression as captured by The Markup’s Citizen Browser software is not directly equivalent to a human Facebook user reading or watching a piece of content on the site.

The Citizen Browser app is generally able to perform these captures between two and three times a day, with an average of 2.6 captures per user per day during Q3. According to a recent Pew Research poll, seven in 10 Facebook users say they visit the site daily, and just under half (49 percent) visit the site multiple times per day. Therefore, though the Citizen Browser capture rate is likely to be higher than the mean number of logins per day for a U.S. Facebook user, it is well within the realm of normal use.

Each capture from Citizen Browser records roughly 40 pieces of content placed at the top of a user’s news feed. It is unclear how many pieces of news feed content a human viewer would see in a standard Facebook browsing session. The average U.S. Facebook user spent around 33 minutes per day on the platform as of January 2021.

Insofar as news links are concerned, having a news article appear in a user’s Facebook feed also provides no guarantee that the reader will read and assimilate its message. However, previous studies have observed that high numbers of social media users—perhaps upward of half—share and forward links without ever reading them, meaning an analysis based purely on how frequently links appear may be a good proxy for the impact of news absorbed through social media. (Facebook implicitly acknowledged this problem when it introduced a prompt for users sharing unopened links in May 2021.)

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

Give Now

Our calculation of correlation between our sample and Facebook’s results could be improved if we had a better understanding of what Facebook itself did in the calculation of its Q3 and prior Q2 transparency reports. The report methodology does not go into detail as to what Facebook considers a link and how it counts links overall. We assumed that all links on Facebook posts were considered and that the counts were unique per user. In addition, Citizen Browser only considers public posts, which may be the root cause of the differences between our list and Facebook’s. However, in its transparency report, Facebook does not provide us with enough insight to understand how these features affect its results.

↩︎ link

Facebook Response

Facebook did not take issue with our analysis and declined to answer specific questions.

In a short emailed statement, Meta spokesperson Ariana Anthony said, “The focus of the Widely Viewed Content Report is to show the content that is seen by the most people on Facebook, not the content that is posted most frequently. That said, we will continue to refine and improve these reports as we engage with academics, civil society groups, and researchers to identify the parts of these reports they find most valuable, which metrics need more context, and how we can best support greater understanding of content distribution on Facebook moving forward.”

↩︎ link

Conclusion

Though Facebook’s Widely Viewed Content Report series was presented as a transparency measure, we found that Facebook’s method for assessing top performing content on the platform ignores information about the total number of impressions that domains receive over time, which becomes particularly pertinent for domains that publish new content regularly. Moreover, when impressions are factored into the calculations, sensationalist news and opinion sites outperform much mainstream news content on the platform—with sites that do little original reporting benefiting from some of the most pronounced ranking increases.

Our investigation attempted to replicate Facebook’s methodology as closely as possible when calculating rankings by the same metric as the Widely Viewed Content Report, but this was made more difficult by the brevity of the documentation provided by the platform on how its findings were calculated. Given the fact that changes in methodology can produce significantly different results, better documentation of future reports would aid independent verification and preempt potential claims that the platform was being misleading under the guise of transparency.

↩︎ link

Acknowledgements

We thank Michael Lavine of the University of Massachusetts Amherst for reviewing this methodology.

How We Investigated Facebook’s Most Popular Content

How We Investigated Facebook’s Most Popular Content

Facebook Isn’t Telling You How Popular Right-Wing Content Is on the Platform

Share This Article

Introduction

See our data here.

Methodology

Report Deeply and Fix Things

Ranking Change by Measurement Metric

Top 20 domains, ranked by unique users (Citizen Browser)

Top 20 domains ranked by impressions, and the ranking change from unique users (Citizen Browser)

Correlation

Unique User Counts for Top Domains Across Citizen Browser and Facebook (U.S.)

Correlation value of top Citizen Browser domains and top Facebook domains compared against randomly ordered lists

Unique user count of top 20 domains as reported by Facebook against same domains’ user count in Citizen Browser data

Unique users and rankings recorded by Facebook and Citizen Browser for domains in Facebook’s Q3 report

Twitter Bot

Follow our Twitter bot: Trending on Facebook

Top Domains by Frequency

Top Domains by Views Rank Change

Report Deeply and Fix Things

Limitations

Report Deeply and Fix Things

Facebook Response

Conclusion

Acknowledgements

How We Investigated Facebook’s Most Popular Content

The Latest

We caught companies making it harder to delete your personal data online

Prison inmates can take college classes, but often with no internet and limited tech

Should Lyft and Uber charge more if your battery is low? California may soon ban that