↩︎ link

Abstract

About 40 percent of online purchases in the United States take place on Amazon.com. The next nearest competitor, Walmart, only garnered 5 percent of online sales. J.P. Morgan expects that Amazon will surpass Walmart’s total U.S. online and offline sales next year, knocking it off its pedestal as the nation’s largest retailer.

Small businesses and individuals say that in order to sell their products online in the U.S., they have to be on Amazon and—given the millions of products on its virtual shelves at any moment—they have to get a high ranking from Amazon’s product search engine or buy sponsored listings.

Amazon transitioned from digital retailer to sales platform in 2000, when it took a page from eBay and started allowing individuals and companies to sell through its website. This led to explosive sales growth (though the company reported only small profits overall, choosing to reinvest its profits for most of its existence). Amazon encouraged these “third-party sellers” with add-on services like storage, shipping, and advertising. Third-party sellers now account for 58 percent of sales on Amazon.

Even as sellers saw their revenues grow, they started to suspect that Amazon was using their nonpublic sales information to stock and sell similar products, often for less money.

Indeed, Amazon has been investing in creating products sold under its own brand names since at least 2007. Since 2017, it has dramatically expanded its catalog of private-label brands (which are trademarked by Amazon and its partners) and its list of exclusive products (developed by third-party companies who agree to sell them only on Amazon). The company refers to both as “our brands” in various parts of its website.

In 2019, Amazon told Congress it had 45 in-house brands selling approximately 158,000 products.

We found that Amazon has now registered trademarks for more than 150 private-label brands, and market research firm TJI Research estimated the number of brands developed by others but sold exclusively on Amazon.com at 598 in 2019. Some of its house brand names signal to buyers that they are part of the company—such as Amazon Basics, Amazon Essentials, and Amazon Commercial.

But hundreds of others carry labels that do not clearly indicate that they belong to the online retail giant—including Goodthreads, Lark & Ro, Austin Mill, Whole Paws, Afterthought, Truity, find., Fetch, Mr. Beams, Happy Belly, Mama Bear, Wag, Solimo, and The Portland Plaid Co.

Amazon says it sold $3 billion in private-label goods in 2019, representing just one percent of sales on the platform, but does not specify which brands are included in that estimate. Analysts with SunTrust Robinson Humphrey estimated that Amazon sold five times as much, $15.6 billion of private-label goods in 2019, including brands owned by Whole Foods, and that the figure will reach $31 billion by 2022.

The result is that sellers now not only compete against each other for placement in Amazon search results but also increasingly against Amazon’s own in-house brands and exclusives. According to a to a 2021 report by JungleScout, 50 percent of sellers say Amazon’s products directly compete with theirs.

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

Give Now

We sought to investigate how Amazon treats its own products in search results. These are proprietary devices, private labels, and exclusive-to-Amazon brands it considers “our brands.”

To do so, we started by developing a list of 3,492 popular product searches, ran those searches on desktop (without logging in), and analyzed the first page of results.

We found that in searches that contained Amazon brand and exclusive products, the company routinely put them first, above those from competing brands with better ratings and more reviews on Amazon.

Furthermore, we trained supervised machine learning classifiers and found that being an Amazon brand or exclusive was a significantly more important factor in being selected by Amazon for the number one spot than star ratings (a proxy for quality), review quantity (a proxy for sales volume), and any of the other four factors we tested. We did not analyze the potential effect of price on ranking because unit sizes were not standard, affecting price. In addition, similar products can vary by factors that affect price, such as materials and workmanship, for which we also could not control.

Importantly, we found that knowing only whether a product was an Amazon brand or not could predict whether the product got the top spot 70 percent of the time.

In a nationally representative survey we commissioned, only 17 percent of respondents said they expect the determining factor behind whether Amazon places a product first is whether it owns the brand. About half (49 percent) said they thought the products Amazon placed in the number one spot were the best-selling, best-rated, or had the lowest price. The remaining 33 percent said they didn’t know how Amazon ranked products.

We found that Amazon disproportionately placed its own products in the top search result. Despite making up only 5.8 percent of products in our sample, Amazon gave its own products and exclusives the number one spot 19.5 percent of the time overall. By comparison, competing brands (those that are not Amazon brands or exclusive products) were given the number one spot at a nearly identical rate but comprised more than 13 times as many products at 76.9 percent.

Most of the Amazon brand and exclusive products that the company put in the number one spot, but not all—83.9 percent—were labeled “featured from our brands” and carried the phrase “sponsored result” in the source code (as well as being part of a grid labeled “search results” in the source code). They were not marked “sponsored” to shoppers.

In a short, written statement, Amazon spokesperson Nell Rona said that the company does not favor its brands in search results and that it considers “featured from our brands” listings as “merchandising placements” and not “search results,” despite their presence in the search results grid. Rona said these listings are not advertisements, and declined to answer dozens of other questions.

Overall, 37.4 percent of Amazon brand or exclusive products in search results in our sample were neither labeled as “our brands” nor carried a name widely associated with the company, such as AmazonBasics or Whole Foods. That left buyers unaware that they were buying an Amazon brand or exclusive-to-Amazon product.

Nearly nine-in-10 U.S. adults who responded to our survey were unable to identify Amazon’s highest-selling private label brands (Pinzon, Solimo, and Goodthreads), and only 51 percent were aware that Whole Foods is an Amazon-owned brand.

Rona said Amazon identifies its products by including the words “Amazon brand” on the products page, among a list of the item’s features, and sometimes in the listing title. We only found this to be the case in 23 percent of products in our sample that were Amazon-owned brands.

Comparing product pages three months apart, we found that they were less dynamic than they used to be. The default seller among products with multiple merchants only changed in 23.5 percent of products in our data. This was significantly less often than a comparable study from five years ago.

↩︎ link

Background

Amazon and third-party sellers have a tense symbiosis. Amazon founder and chairman Jeff Bezos has acknowledged the importance of sellers to the company’s bottom line but also calls them competitors. Amazon provides shipping, inventory management, and other services, he wrote, that “helped independent sellers compete against our first-party business” to begin with. Sellers say Amazon’s fees cut deep into their margins but they can’t get the same volume of sales anywhere else.

Antitrust regulators in Europe, Asia, and North America have been examining Amazon’s treatment of third-party sellers.

The European Commission announced an antitrust investigation in 2019, alleging Amazon used third-party seller data to inform its own sales decisions. The commission also announced a separate investigation in 2020 into whether Amazon gives preference to its own listings and to third-party sellers that use its shipping services over other sellers. Last year, India’s antitrust regulator announced an investigation into alleged anti-competitive practices by Amazon, including preferential treatment for some sellers. And in June 2021, U.S. lawmakers introduced the American Choice and Innovation Online Act, which prohibits large platforms from advantaging themselves in their own marketplaces or using nonpublic data generated by business conducted on their platform. Authorities in Germany and Canada are investigating Amazon’s selling conditions for third-party sellers, and the attorney general for Washington, D.C., filed a lawsuit in May 2021 that accuses Amazon of overly restrictive requirements for third-party sellers.

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

Give Now

Also last year, U.S. lawmakers pressed Bezos on his treatment of third-party sellers during a congressional hearing that was part of an antitrust investigation into the four major tech companies. Rep. Lucy McBath, a Democrat from Georgia, told Bezos, “We’ve interviewed many small businesses, and they use the words like ‘bullying,’ ‘fear,’ and ‘panic’ to describe their relationship with Amazon.” The resulting report produced by the subcommittee indicated Amazon was well aware of its power over third-party sellers, citing an internal Amazon document that “suggests the company can increase fees to third-party sellers without concern for them switching to another marketplace.”

Journalists and researchers have documented instances of Amazon promoting its house brands over competitors’. In 2016, Capitol Forum, a subscription news service focused on antitrust issues, examined hundreds of listings and found that Amazon “prioritizes its own clothing brands on the promotional carousel labeled ‘Customers Who Bought This Item Also Bought’ ” on product pages. Capitol Forum said Amazon did not respond to its request for comment.

A study titled “When the Umpire is also a Player: Bias in Private Label Product Recommendations on E-commerce Marketplaces,” presented at the Association for Computing Machinery’s Conference on Fairness, Accountability, and Transparency in March 2021, examined how Amazon’s private-label brands performed in “related products” recommendations on product pages for backpacks and batteries. The researchers said they found that “sponsored recommendations are significantly more biased toward Amazon private label products compared to organic recommendations.”

In June 2020, ProPublica reported that Amazon was reserving the top spot in search results for its own brands across dozens of search terms, labeling it “featured from our brands” and shutting others out. An Amazon spokesperson told ProPublica at the time that the move was a “normal part of retail that’s happened for decades.”

Our investigation is the first study to use thousands of search queries to test how Amazon’s house brands rank in search results—and to use machine learning classifiers to determine whether sales or quality appeared to be predictive of which products Amazon placed first in search results.

In addition, we used a multipronged approach to identify Amazon house brands and exclusives, building a data set of 137,428 unique products on Amazon, which is available in our GitHub. We were unable to find any such publicly accessible dataset when we began our investigation.

↩︎ link

Methodology: Data Collection

↩︎ link

Sourcing Product Search Queries

To measure how Amazon’s search engine ranked Amazon’s own products relative to competing brands, we needed a list of common queries that reflect what real people search. We built the dataset from top searches from U.S. e-commerce retailers, using two sources.

The first was autocomplete queries on Amazon.com’s and Walmart.com’s product search bars. We cycled through each letter of the alphabet (A–Z) as well as numbers ranging from 0 to 19 and saved the suggested search queries presented by the autocomplete algorithm. This process yielded 7,696 queries from Amazon.com and 3,806 queries from Walmart.com.

We then gathered the most popular searches reported by Amazon via its Seller Central hub. We collected the top 300 searches between Q1 and Q3 2020 for the Amazon categories “Softlines,” “Grocery,” “Automotive,” “Toys,” “Office Products,” “Beauty,” “Baby,” “Electronics,” and “Amazon.com.” This provided 2,700 unique searches.

Combining the autocomplete queries and seller-central queries resulted in 11,342 unique “top search” queries.

↩︎ link

Collecting Search Results

We created a Firefox desktop emulator using Selenium. The emulator visited Amazon.com and made each of the 11,342 searches on Jan. 21, 2021. The search emulator was forwarded through IP addresses in a single location, Washington, D.C., in order to reduce variation in search results (which typically vary by location).

We saved a screenshot of the first page of search results as well as the HTML source code. (Examples of screenshots and source code for search results are available on GitHub.)

In the source code of product search result pages, Amazon titles some listings with the data field “s-search-result.” This is what we are calling search results in our data. Amazon does serve other products on the search results page in advertising and other promotional carousels, including “editorial picks” and “top rated from our brands,” but those do not appear in every result (at most a third of our sample), and they are not part of the grid that Amazon labels search results.

On desktop, the majority of Amazon-labeled “search results” in our data were delivered in uniform 60-product positions (four per column for 15 rows, though Amazon narrows the width to three columns on smaller screens). Some searches returned fewer than 60 products, but none returned more. A minority (about one in 10) of searches in our data returned 22 products or fewer, delivered in a single column, one item per row. This happened for some electronics searches but never in other search categories.

Left: Amazon search for “Whiskey glasses.” Right: Products Amazon labeled search results. Brand banners, carousels, and other promotional modules have been removed. Screenshots taken Jan. 21, 2021.

Amazon.com, Amazon.com

Because we were seeking to analyze how Amazon ranks its own products relative to competing brands’ products, we further limited our analysis to search results that contained Amazon brands and exclusives on the first page. Of the 11,342 top searches, slightly less than three in 10 (30.8 percent) contained this type of product on the first page. We used the resulting 3,492 top searches for our analysis.

↩︎ link

Identifying Amazon’s Brands and Exclusives

We were unable to find a public database of Amazon brand and exclusive products, so we had to build one.

We started with the search pages themselves. On many (but not all), Amazon provides a filter on the left-hand side, allowing shoppers to limit the search to “our brands,” which Amazon says lists only its private label products and “a curated selection of brands exclusively sold on Amazon.”

We collected each of those “our brand” results for each query, saving a screenshot and the source code, also on Jan. 21, 2021.

Above: An Amazon search for “garlic pasta sauce” highlighting the “our brands” filter button. Below: the same search with the “our brands” filter applied. Screenshots taken Jan. 21, 2021.

Amazon.com, Amazon.com

We then discovered an undocumented API that yields all Amazon “our brands” products for any given search. We ran all 11,342 search terms through this API and saved those responses as well. (API responses are available on GitHub.)

Both the search emulator and API requests were forwarded through IP addresses in Washington, D.C.

Strangely, Amazon does not identify proprietary electronics, including Kindle readers and Ring doorbells, when a shopper filters a search result to list only Amazon’s “our brands.” To identify those, we also gathered products Amazon listed as best sellers in the category “Amazon Devices & Accessories.”

Together, all three sources yielded a dataset of 137,428 unique products, identified by their 10-character ASIN (Amazon Standard Identification Number). This dataset of Amazon’s proprietary devices, private label, and exclusive products is available on GitHub.

It is the largest and most comprehensive open access dataset of Amazon brand and Amazon-exclusive products we’ve seen, and yet we know it is not complete. Amazon told Congress in July 2019 that at that time it sold approximately 158,000 products from its own brands.

↩︎ link

Collecting Product Pages

In addition to the above, we collected the individual product pages for the 125,769 products that appeared in the first page of our 3,492 top searches in order to analyze the buy box information. The buy box displays the price, return policy, default seller, and default shipper for a product.

To gather the product pages, we used Amazon Web Services and the same Selenium emulator we made for collecting the search result pages. The emulator visited the hyperlink for each product and saved a screenshot and the source code.

We collected these pages on Feb. 3–6 and Feb. 17–18, a few weeks after we scraped the search result pages. To determine the effects of the delay, we analyzed how often a subsample of buy boxes’ default sellers and shippers flipped between Amazon and third parties after a similar lag and found they remained largely unchanged (see more in Limitations).

↩︎ link

Product Characteristics

We asked up to four questions of every product listing in order to identify certain characteristics and used this to produce the categories we used in our analysis.

is_sponsored: Is the listing a paid placement?
is_amazon: Is the listing for an Amazon brand or exclusive?
is_shipped_by_amazon: Does the default seller of the product (the "buy box") use Amazon to ship the listed product?
is_sold_by_amazon: Is the default seller of the product Amazon?

Sponsored products (is_sponsored) are the most straightforward: Amazon labels them “sponsored.” If a product in the Amazon-labeled search results is not sponsored, we consider it “organic.” We only identified products with subsequent features if they were organic.

A screenshot of Amazon's website that shows the confirmation screen to buy a product. Amazon is both the default shipper and seller. — Caption: Screenshot of a buy box for the product page for “Heinz Tomato Ketchup (14 oz Bottle),” where Amazon is both the default shipper and seller. The delivery address has been redacted. Taken Oct. 12, 2021.

We identified an organic product as an Amazon brand or exclusive (is_amazon) when it matched one of the 137,428 Amazon ASINs we collected. If it didn’t match, we considered it a “competing brand.”

We identified a product as is_amazon_sold if the “sold by” text in the buy box contained “Amazon,” “Whole Foods,” or “Zappos” (which is owned by Amazon). If it didn’t, we identified the product as “Third-Party Sold.”

We identified a product as is_amazon_shipped if the buy box shipper information contained “Amazon” (including “Amazon Prime,” “Amazon Fresh,” and “Fulfilled by Amazon”), “Whole Foods,” or “Zappos” (which is owned by Amazon). If it didn’t contain Amazon, we identified products as “Third-Party Shipped.”

We use these features to train and evaluate predictive classifiers (see Random Forest Analysis) as well as produce product categories in our ranking analysis (see the following section).

Most of the categories have a direct relationship with the features they are named after.

We categorized products as “Sponsored” if we identified them as is_sponsored. Similarly, we categorized products as “Amazon Brands” and exclusives if they are organic and is_amazon, and “Competing Brands” if the products are organic and not is_amazon.

We categorized organic products as entirely “Unaffiliated” if they did not meet the criteria for is_amazon, is_amazon_sold, and is_amazon_shipped. In other words, these are competing brands that are sold and shipped by third-party sellers.

The features and categories we identified are hierarchical and overlap. Their relationships are summarized in the diagram below.

↩︎ link

Relationship between features and categories

The overlap of categories and features of products from popular Amazon searches in January 2021. Our analysis focuses mostly on organic (nonsponsored) products. A tiny proportion of products Amazon considers “our brands” are not sold by Amazon but rather by a third party. These include licensed technology (like Fire TVs) and apparent “exclusives.” Source: The Markup/Amazon.com

↩︎ link

Data Analysis

↩︎ link

Ranking Analysis: Who Comes Out on Top?

We analyzed the rate of products that received the top search result relative to the proportion of products of the same category that appeared in our sample. We found that Amazon brands and exclusives were disproportionately given the number one search result relative to their small proportion among all products.

We used two straightforward measures for our analysis. First, we calculated a population metric using the percentage of products belonging to each category among products from all the search pages. To do this, we divided the number of products per category that occupy search result slots compared to all product slots in our sample. This included duplicates.

We then calculated an incidence rate for how frequently Amazon gave products in each category the coveted first spot in search results. We did this by dividing the number of searches in each category in the top spot by the total number of searches in our sample (with at least one product). (A table of each of these metrics by category appears in our GitHub and in “Supplementary datasets.”)

↩︎ link

Where

products_Amazon is the percentage of all products that are Amazon brands
N_{Amazon product slots} is the number of products that are Amazon brands
N_{product slots} is the number of all product slots
#1 spot_Amazon is the percentage of search pages with an Amazon brand in the top search result
N_{Amazon #1 slots} is the number of search results with an Amazon brand in the top spot
N_{#1 slots} is the number of all searches that returned at least one product

We chose to focus on that top left spot because Amazon changes the number of items across the first row based on screen size, and some searches return only a single item per row, so the top left spot is the only one to remain the same across all search results in our data.

In a majority of the searches in our data, 59.7 percent, Amazon sold the top spot to a sponsored product (17.3 percent of all product slots). The bulk of our analysis concerns the remaining 40.3 percent.

When we looked at all searches, Amazon gave its own products the number one spot 19.5 percent of the time even though this category made up only 5.8 percent of products in our sample.

Amazon gave competing brands the number one spot at a nearly identical rate (20.8 percent of the time), but these cover more than 13 times the proportion of products in our sample (76.9 percent).

Amazon gave entirely unaffiliated products (competing brands that were sold and shipped by third-party sellers) the top spot 4.2 percent of the time, but these products made up 5.8 percent of all products in our sample.

The only organic (nonsponsored) category that Amazon placed in the number one spot at a rate that was greater than the proportion of its products in the sample was its own brands and exclusives.

↩︎ link

Amazon brands disproportionately got the top spot in search results

Fig. 2. Scatter plot comparing the percentage of top spots and the percentage of products for sponsored, Amazon brands, competing brands, and entirely unaffiliated products (meaning competing brands that are sold and shipped by third-party sellers). Note that sponsored, Amazon brands, and competing brands make up 100 percent of products, and that unaffiliated products are a subsection of competing brands. Source: The Markup/Amazon.com

About eight in 10 (83.9 percent) of the Amazon brands or exclusives that Amazon placed in the top spot were labeled “featured from our brands.” These are identified as part of Amazon’s “search results” and are not marked “sponsored.” However, the source code for those labeled results contained information that was the same as sponsored product listings (data-component-type=”sp-sponsored-result”). These Amazon brand and exclusive brand products were not labeled as “sponsored” for shoppers.

↩︎ link

Where Are Products Placed?

In addition to the top spot, we calculated how often Amazon placed each type of product in each search result position down the page (1–60). All searches have a number one spot but do not always return 60 results, so we always calculated this rate using the number of searches with that product spot as the denominator. Sponsored results that are part of search results are counted in the denominator of the rates.

(As mentioned earlier, we did not include promotional and advertising carousels and modules because these are not part of the grid labeled “search results” in the metadata and none appeared in the same place in a majority of search results.)

Amazon placed its own products and exclusives in the number one spot 3.5 times more frequently than in any other position on the search page.

It placed competing brands (including those it sells itself) everywhere except the top (1) and bottom (15) rows of the search page. Competing brands appeared only sparsely where sponsored products were common in search results (rows 4–5 and 8–9). The company placed entirely unaffiliated products—meaning a competitor’s brand that was both sold and shipped by a third party—primarily in the lower rows (9–13).

↩︎ link

Placement of Amazon brands vs. unaffiliated products on the search page (N = 3,492)

Fig. 3a and 3b. Amazon brands and exclusives appeared most often in the top spot. Unaffiliated products (competing brands that were sold and shipped by third parties) appeared mostly in the lower rows (9–13). Source: The Markup

In 59.7 percent of searches in our sample, Amazon gave the number one spot to sponsored products. When Amazon returned a 15th row, it always listed sponsored products there, too.

↩︎ link

Placement of sponsored products (N = 3,492)

Fig 3c. Sponsored products were often the first and last thing a shopper sees. Research shows the middle of a list is the least memorable. Source: The Markup

↩︎ link

Not Always Labeled

Amazon only identified 42 percent of its brands and exclusives to the shopper with a disclosure label (e.g., “featured from our brands,” “Amazon brand,” or “Amazon exclusive”). Of the Amazon brand and exclusive products in our sample, 28.8 percent were from a brand many people (but not all) would understand to be a private Amazon label, such as “Whole Foods,” “Amazon Basics,” or “Amazon Essentials.” Some were both labeled and from a better-known Amazon brand. For the remaining 37.4 percent, we found that buyers were not informed that they would be purchasing an Amazon brand or exclusive.

↩︎ link

How often are products from Amazon brands and exclusives labeled?

Labeled with a disclaimer	42%
Well-known Amazon private label	28.8%
Total labeled	62.6%
Total unlabeled	37.4%

Table 1: Breakdown of percent of products from top searches that are clearly labeled or a well-known Amazon brand. Source: The Markup / Amazon

When the same product that is an Amazon brand or exclusive appeared more than once in the same search, we considered it labeled if any of the listings were labeled. This gives Amazon the benefit of the doubt by assuming that a customer will understand that the disclaimer applies to duplicate listings. Therefore, our metrics for disclosure are the lower bound.

↩︎ link

Duplicates

Amazon gave its own products more than one spot in search results in roughly one in 10 (9.2 percent of) searches, not including other potential duplicates in promotional carousels. It did not give competing brands’ products more than one spot for organic search results.

↩︎ link

Survey Results

We commissioned the market research group YouGov to conduct a nationally representative survey of 1,000 U.S. adults on the internet, to contextualize our findings. It revealed that 76 percent of respondents correctly identified Amazon Basics as being owned by Amazon and 51 percent correctly identified Whole Foods.

The vast majority of respondents, however, could not identify the company’s top-selling house brands that did not contain the words “Amazon” or “Whole Foods” in their name. Ninety percent did not recognize Solimo as an Amazon brand, and 89 percent did not know Goodthreads is owned by Amazon. Other top-selling brands, like Daily Ritual, Lark & Ro, and Pinzon were not recognized by 94 percent of respondents as Amazon brands.

↩︎ link

Most Americans can’t identify top-selling Amazon brands

“Which, if any, of these brands do you recognize as owned by Amazon?”

Fig. 4. We commissioned a survey of a national panel of 1,000 adults. We included competitors Champion and Brooklinen as a control. Source: The Markup/YouGov

We also asked respondents what trait defines the top-ranked products in Amazon search results. Few expected it to be based solely on being an Amazon brand. More than 21 percent of respondents thought the top-ranked product would be “the best seller,” 17 percent thought it was “the best rated,” 11 percent thought it was “the lowest price,” and 33 percent of respondents were “not sure.” Only 17 percent thought the number one listed item was “a product from one of Amazon’s brands.”

↩︎ link

Few Americans think Amazon puts its products first

“When you search for a product on Amazon, do you expect that the first result you see that is not a sponsored product is ... ?”

Fig. 5. We commissioned a survey of a national panel of 1,000 adults. Note values do not add up to 100 percent due to decimal rounding. Source: The Markup/YouGov

↩︎ link

Quality and Sales Factors

We compared the star ratings (a rough proxy for quality) and number of reviews (a rough proxy for sales volume) of the Amazon Brands that the company placed in the number one spot on the product search results page with other products on the same page.

We found that in two-thirds (65.3 percent) of the instances where Amazon placed its own products before competitor brands, the products that were Amazon brands and exclusives had lower star ratings than competing brands placed lower in the search results. Half of the time (51.7 percent) that the company placed its own products first, these items had fewer reviews than competing products the company chose to place lower on the search results page.

One in four (28.0 percent of) top-placed Amazon brands had both lower star ratings and fewer reviews than products from competing brands on the same page.

When we evaluated several predictive models, we found that features like star ratings and the number of reviews were not the most predictive features among products Amazon placed in the number one spot.

↩︎ link

Random Forest Analysis

We tried to determine which features differentiate the first organic product on search results from the second organic product on the same page.

To do this, we created a categorical dataset of product comparisons and used it to train and evaluate several random forest models.

The product comparisons looked at differences in features that we had access to, and that seemed relevant to product rankings (like stars and reviews). We found that being an Amazon brand or exclusive was by far the most important feature, of the seven we tested, in Amazon’s decision to place a product in the number one versus number two spot in product search results.

↩︎ link

How We Created Product Comparisons

We took our original dataset of 3,492 search results with at least one Amazon brand or exclusive, filtered out sponsored products, and generated a dataset of product comparisons. Each product comparison is between the number one product and number two product on the same search page. The random forest used these attributes to predict a yes or no (boolean) category: which product among the pair was given the top search result (placed_higher).

The product comparisons encode the differences in star ratings (stars_delta) and number of reviews (reviews_delta); whether the product appeared among the top three clicked products from one million popular searches in 2020 from Amazon Seller Central (is_top_clicked); and whether the product was sold by Amazon (is_amazon_sold), shipped by Amazon (is_amazon_shipped), or was an Amazon brand or exclusive (is_amazon). We also used a randomly generated number as a control (random_noise). Distributions of each of these features is available on GitHub.

While we had access to price information, we did not analyze its potential effect on ranking because price was not standardized per unit. We also had access to each product’s “best sellers rank” for the time period we collected product pages, but the same product could have various different rankings in different Amazon categories (e.g., #214 in Beauty & Personal Care and #3 in Bath Salts), making consistent comparisons impossible.

↩︎ link

How the model sees two products competing for the top spot of “tissues”

How two products for “tissues” are encoded into one row of data in our training set. The image displays three tables in matrix-notation. The first two represent six values that are subtracted between two products: product_order, stars, reviews, is_amazon, is_sold_by_amazon, is_shipped_by_amazon, and is_top_clicked. The third table is the difference between these two products with an additional value for random_noise.

Example of how a product comparison is calculated. For boolean columns like is_amazon, –1 is False and 1 is True. Source: The Markup

This produced a dataset of 1,415 product comparisons. (To see exactly how we created our training and validation dataset, see our GitHub.)

By creating this dataset of product comparisons, we were able to compare two products with one model and control for which features led to higher placement.

↩︎ link

Why Random Forest?

A random forest combines many decision tree models, a technique we used in a previous Markup investigation into Allstate’s price increases. Decision trees work well at predicting categories with mixed data types, like those from our product comparisons.

Decision trees can, however, memorize or “overfit” the training data. When this happens, models can’t make good predictions on new data. Random forests are robust against overfitting and work by training a forest full of decision trees with random subsets of the data. The forest makes predictions by having each tree vote.

We used grid search with five-fold cross-validation to determine optimal hyperparameters (parameters we control versus those that arise from learning cycles): 500 decision trees in each forest, and a maximum of three questions each decision tree can ask the data. By asking more questions, each tree becomes deeper. But that also means that the trees are more likely to memorize the data. The more trees we train, the more resources it takes to run our experiment. Grid search trains and evaluates models with an exhaustive list of combinations of these hyperparameters to determine the best configuration.

↩︎ link

Evaluating the Models

Our model correctly picked Amazon’s number-one-ranked product 73.2 percent of the time when all seven features were considered.

Feature	Accuracy	Change of Accuracy
without is_amazon	63.5%	–9.7
without reviews_delta	69.9%	–3.3
without stars_delta	71.9%	–1.3
without is_shipped_by_amazon	71.9%	–1.3
without is_top_clicked	72.2%	–1.0
without random_noise	72.3%	–0.9
without is_sold_by_amazon	72.4%	–0.8
All Features	73.2%	0.0

Table 2: The random forest models were evaluated using the average accuracy of 10-fold cross validation. Standard deviations ranged from 2.6 to 4.7. Random forest models were implemented using the Scikit-Learn Python library.

We systematically removed each feature and retrained and reevaluated the model (called an ablation study) in order to isolate the importance of each individual feature. We used the accuracy of the model trained on all seven features as a baseline to compare each newly evaluated model (see results in Change of Accuracy in table above).

When we did this, we saw that removing information about whether a product was an Amazon brand or exclusive (is_amazon) reduced the model’s ability to pick the right product by 9.7 percentage points (to 63.5 percent). This drop in performance was far greater than any other individual feature, suggesting that being an Amazon brand or exclusive was the most predictive feature among those we tested in determining which products Amazon placed in the first organic spot of search results.

To demonstrate the influence of Amazon brands and exclusives in another way, we trained a model with only is_amazon, and it correctly predicted the number one product 70.7 percent of the time. Every other standalone feature performed significantly worse, only picking the correct product between 49.3 (random_noise) and 61.5 (is_sold_by_amazon) percent of the time.

To a lesser extent, the number of reviews (reviews_delta) were also predictive of a product getting the number one spot. Removing this feature reduced the model’s performance by 3.3 percentage points.

The other six features were less informative when it came to getting the number one spot versus the number two spot. Performance of the random forest for every possible permutation of features is available in our GitHub.

These findings were consistent with ranking the feature importance from the random forest model trained on all features. This third approach also suggests that is_amazon is the most predictive feature for the random forest.

↩︎ link

Being an Amazon brand was the most predictive feature for the random forest

Fig. 6. Feature importance is a measure of how predictive a feature is among all the decision trees we trained in the random forest model. Feature importance is measured between 0 (not used at all) and 1 (perfectly predicts which product gets placed higher). Source: The Markup

When we compared additional product pairs with the number one spot and those of lower-ranked products beyond just the number two spot, is_amazon remained the most predictive feature out of those we tested (results in our GitHub).

↩︎ link

Model performance comparing the first product through the nth product

Fig 7. We repeat the ablation study (training and evaluating the random forest with select features removed) comparing the first organic product up to the nth (IE 1v2, 1v3, …, 1vn) and continued to find that being an Amazon brand or exclusive (is_amazon) was the most predictive feature for determining which product got placed higher. Search results have up to 48 organic products on the page. "All other models" refers to models including all features, excluding the stars_delta feature, excluding the is_shipped_by_amazon feature, excluding the is_sold_by_amazon feature, excluding the is_top_clicked feature, and excluding the random_noise feature. Source: The Markup

We used predictive models to show that being an Amazon brand or exclusive was the most influential feature among those we tested in determining which products Amazon chose to place at the top of search results.

↩︎ link

Limitations

↩︎ link

Search Data Limitations

The two datasets we created are small in comparison to the full catalog of products for sale on Amazon.com, for which there are no reliable estimates. However, we sought to examine searches and products that generate significant sales, not every product or every search.

We collected search data on desktop, so our analysis only applies to desktop searches. Amazon’s search results may differ on mobile, desktop, and the Amazon app.

Amazon’s search results can also vary by location. One example is the distance of the closest Whole Foods store and its inventory, which would affect any given person’s search for certain items. We collected the data using I.P. addresses in Washington, D.C., so our results are specific to that city.

And, according to an Amazon-authored report for IEEE Internet Computing, a journal published by a division of the Institute of Electrical and Electronics Engineers, Amazon personalizes offerings to buyers according to similar items they have already purchased or rated (called item-to-item collaborative filtering). Our searches were not made in the same session nor were we logged into an Amazon account with user history, so our results were not personalized. In the absence of personalization, Amazon defaults to “generally popular items.” This also means that we did not capture search results or product pages for Amazon Prime subscribers.

↩︎ link

Product Page Data Limitations

Some products that compete with Amazon brand and exclusive products are sold by numerous sellers, including Amazon itself. A 2016 ProPublica investigation revealed that of a sample of 250 products, Amazon took the buy box for itself or gave it to vendors that paid for the “Fulfilled by Amazon” program in 75 percent of cases. The same year, researchers at Northeastern University tracked 1,000 best-selling products over six weeks and found that buy box winners changed for seven out of 10 products in their study.

For our main analysis, we did not seek to analyze which specific seller won the buy box but rather whether the seller or shipper during our snapshot was Amazon or a third party.

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

Give Now

We captured product pages and their subsequent buy boxes in a snapshot of time between Feb. 3–6 and 17–18. Due to a technical problem, there was a two- to four-week delay between when we collected the searches and when we collected the product pages. This means that the seller and shipper of those products are only representative of searches made during that time and could have changed from the time we collected the searches to when we collected the product pages.

When we collected product pages in February, about 3.9 percent of them were no longer available or the product had been removed from the Amazon Marketplace altogether since we gathered the search pages in January. We removed these products from any calculations involving the seller or shipper.

To test the reliability of our product page data, we took a random sample, on May 13, 2021, of 2,500 of the 125,769 products we had collected in February 2021 and reran the product page scraper.

Some of the product pages were missing data: 6.1 percent were sold out, 1.6 percent were removed from Amazon’s marketplace, and another 3.4 percent no longer displayed a default seller who won the buy box. In these latter cases, Amazon provided a button to “See All Buying Options.” The missing data did not overall favor or disfavor Amazon but rather was consistent with the proportion of Amazon-sold products (30.2 compared to 27.1 percent) from the sample of products we recollected.

The remaining 2,103 products that had legible buy boxes (the vast majority) were largely unchanged. Only 16.1 percent of products changed default sellers. This included changes between Amazon and third-party sellers.

Product sellers changed from a third party to Amazon in 1.6 ± 0.5 percent of products, and from Amazon to a third party in 3.1 ± 0.7 percent of products (margins of error calculated with 95 percent confidence).

When it came to who shipped the product, the shipper went from a third party to Amazon in 2.9 ± 0.7 percent of products, and from Amazon to a third party in 6.6 ± 1.1 percent of products.

Because the buy box remained largely unchanged during a 12-week gap in this representative subsample of our data, we find that our buy box findings are reliable, despite the three- to four-week gap between when we gathered search results and product pages.

This seemed to signal a change from previous research. So we went further to determine whether the buy box had become more stable since the 2016 Northeastern University study. That study was limited to products with multiple sellers. When we did the same, it brought the sample size down to 1,209. Looking only at products with multiple sellers, we found Amazon changed the buy box seller for only 23.5 percent of products. In addition, among products with multiple sellers, Amazon gave itself the buy box for 40.0 percent of them.

For products with multiple sellers, the winning sellers changed from Amazon to a third party in 2.1 ± 0.8 percent of products and from a third party to Amazon in 4.4 ± 1.1 percent of products. Third-party sellers changed among themselves in 31.4 percent of products sold by third-party sellers. No individual third-party seller won more than 0.06 percent of the products with more than one seller.

Shippers changed from Amazon to a third-party in 2.3 ± 0.8 percent of products and from a third party to Amazon in 7.8 ± 1.5 percent of products.

Reviewing the product pages three months apart, we found that the default seller Amazon chose for the buy box when multiple merchants were available has become significantly less likely to change from five years ago.

↩︎ link

Limitations Identifying Amazon Brands and Exclusive Products

Amazon’s “our brands” filter is incomplete. For instance, it listed only 70.3 percent of products that were tagged “featured from our brands” on the search page. In addition, Amazon did not include its proprietary electronics in the “our brands” filtered results when we gathered the data. The company declined to answer questions about why these were not included.

Because of this, we had to use three methods to collect our product database of Amazon brands and exclusives, and it’s possible we missed some products, particularly proprietary electronics.

↩︎ link

Black Box Audit

Our investigation is a black box audit. We do not have access to Amazon’s source code or the data that powers Amazon’s search engine. There are likely factors Amazon uses in its ranking algorithm to which we do not have access, including return rates, click-through rates, and sales. We have some data from Amazon’s Seller Central hub about popular products and clicks, but this data is itself limited and did not cover all of the products in our searches.

For these reasons, our investigation focuses on available and clear metrics: how high categories of products are placed compared to their proportion of results, how well users review highly ranked products relative to other products, and how many reviews a product has garnered, which is a crude indication of sales.

↩︎ link

Amazon’s Response

Amazon did not take issue with our analysis or data collection and declined to answer dozens of specific questions.

In a short, prepared statement sent via email, spokesperson Nell Rona said that the company considers “featured from our brands” listings as “merchandising placements,” and as such, the company does not consider them “search results.” Rona said these listings are not advertisements, which by law would need to be disclosed to shoppers. We found these listings were identified as “sponsored” in the source code and also part of a grid marked “search results” in the source code.

“We do not favor our store brand products through search,” Rona wrote.

Report Deeply and Fix Things

Because it turns out moving fast and breaking things broke some super important things.

Give Now

“These merchandising placements are optimized for a customer’s experience and are shown based on a variety of signals,” Rona said. None of these were explained beyond “relevance to the customer’s shopping query.”

Regarding disclosing to customers about Amazon brands, Rona said they are identified as “Amazon brand” on the products page, and some carry that wording in the listing. We found this to be the case in only 23 percent of products that were Amazon-owned brands.

She said brands that are exclusive to Amazon would not carry that wording since they are not owned by Amazon.

Rona supplied a link to an Amazon blog post that mentions that its branded products made up about one percent of sales volume for physical goods and $3 billion of sales revenue in 2019. It is unclear whether brands exclusive to Amazon are included in those figures.

↩︎ link

Conclusion

Our investigation revealed that Amazon gives its own products preference in the number one spot in search results even when competitors have more reviews and better star ratings. We also found that reviews and ratings were significantly less predictive of whether a product would get the number one spot than being an Amazon brand or exclusive.

In addition, we found that Amazon placed its own products and exclusives in the top spot in higher proportion than it appeared in the sample, a preference that did not exist for any other category. In fact, it placed its own brands and exclusives in the top spot as often as competing brands—about 20 percent of the time—although the former made up only six percent of the sample and the latter 77 percent.

Almost four in 10 products that we identified as Amazon brands and exclusives in our sample were neither clearly labeled as an Amazon brand nor carried a name that most people recognize as an Amazon-owned brand, such as Whole Foods. In our survey, almost nine-in-10 U.S. adults did not recognize five of Amazon’s largest brands.

We also found that the default seller among products with multiple merchants changed for just three in 10 products over three months, a significantly lower rate of change than a similar study found five years ago.

Amazon’s dominance in online sales—40 percent in the United States—means the effect of giving its own products preference on the search results page is potentially massive, both for its own business as well as the small businesses that seek to earn a living on its platform.

↩︎ link

Appendix

↩︎ link

Supplementary Search Dataset and Analysis

When first exploring this topic and before hitting on our top searches dataset, we had created a generic dataset that returned similar findings. We replaced it as the main dataset because our top searches dataset was closer to real searches made by users. We include it here as a secondary dataset.

↩︎ link

Generic Searches

We created a search dataset from products listed in each of the 18 departments found on Amazon’s “Explore Our Brands” page.

Three annotators looked through 1,626 products listed on those pages and generated between one and three search queries a person might use if searching for that product. These were meant to represent generic searches for which we know Amazon brands are competing against others.

We generated 2,558 search terms. We randomly sampled 1,600 and collected these searches using the same method and during the same time period we used to collect top searches. A quarter of the search results (24 percent) did not contain Amazon Brands, so we discarded them, leaving 1,217 generic searches, our supplementary dataset.

↩︎ link

Generic Search Findings

In the generic searches, Amazon Brands constituted a slightly larger percentage of the overall product sample (8.2) than our top searches database (5.8). The percentage of the time Amazon gave its own products the number one spot also increased, to roughly one in four of our generic searches from one in five for our top searches.

	% Products	% #1 spot	% Products	% #1 spot
	Top Searches		Generic Searches
Amazon Brands	5.8	19.5	8.2	24.9
Competing Brands	76.9	20.8	73.5	10.8
Unaffiliated	5.8	4.2	3.0	1.5
Sponsored	17.3	59.7	18.2	64.3

Table 3: Comparison of each category of products’ proportion (percent products) and top spot rate (percent #1 spot) between our main dataset (Top Searches) and our supplementary dataset (Generic Searches). See GitHub for calculations and data.

Competing brands constituted a similar proportion of products in both of our datasets. However, Amazon placed competing brands in the number one spot even less often (10.8) in these generic searches than it had for top searches (20.8).

Entirely unaffiliated products made up even less of the pool of products in our generic searches (3.0) than top searches (5.8), and Amazon also gave them the top spot even less frequently, 1.5 percent of the time compared to 4.2 percent for top searches.

The results from this additional dataset show a similar pattern to our main dataset, whereby Amazon prioritizes its own products at the top of search results.

↩︎ link

Counting Carousels

As mentioned earlier, we did not include sponsored or promotional carousels in our analysis.

If we were to consider sponsored or promotional carousels, the percentage of organic products from top searches would drop from 87 to 68 percent. This also means that sponsored products would increase from 17 percent to 32 percent. There were a total of 49,686 products in these carousels.

↩︎ link

Acknowledgements

We thank Christo Wilson of Northeastern University, Juozas “Joe” Kaziukėnas of Marketplace Pulse, Rebecca Goldin of Sense About Science and George Mason University, Kyunghyun Cho of New York University, and Michael Ekstrand of Boise State University for reviewing all or parts of our methodology. We also thank Brendan Nyhan of Dartmouth College for reviewing our survey design.

How We Analyzed Amazon's Treatment of Its "Brands" in Search Results

Share This Article

Abstract

See our data here.

Report Deeply and Fix Things

Background

Report Deeply and Fix Things

Methodology: Data Collection

Sourcing Product Search Queries

Collecting Search Results

Identifying Amazon’s Brands and Exclusives

Collecting Product Pages

Product Characteristics

Relationship between features and categories

Data Analysis

Ranking Analysis: Who Comes Out on Top?

Where

Amazon brands disproportionately got the top spot in search results

Where Are Products Placed?

Placement of Amazon brands vs. unaffiliated products on the search page (N = 3,492)

Placement of sponsored products (N = 3,492)

Not Always Labeled

How often are products from Amazon brands and exclusives labeled?

Duplicates

Survey Results

Most Americans can’t identify top-selling Amazon brands

“Which, if any, of these brands do you recognize as owned by Amazon?”

Few Americans think Amazon puts its products first

“When you search for a product on Amazon, do you expect that the first result you see that is not a sponsored product is ... ?”

Quality and Sales Factors

Random Forest Analysis

How We Created Product Comparisons

How the model sees two products competing for the top spot of “tissues”

Why Random Forest?

Evaluating the Models

Being an Amazon brand was the most predictive feature for the random forest

Model performance comparing the first product through the nth product

Limitations

Search Data Limitations

Product Page Data Limitations

Report Deeply and Fix Things

Limitations Identifying Amazon Brands and Exclusive Products

Black Box Audit

Amazon’s Response

Report Deeply and Fix Things

Conclusion

Appendix

Supplementary Search Dataset and Analysis

Generic Searches

Generic Search Findings

Counting Carousels

Acknowledgements

The Latest

We caught companies making it harder to delete your personal data online

Prison inmates can take college classes, but often with no internet and limited tech

Should Lyft and Uber charge more if your battery is low? California may soon ban that