“No one man should have all that power”
— Kanye West
We’ve all heard that the media business is cutthroat. In books, or movies or music, a few dominant artists tower over the rest. These superstar directors and writers and musicians sell millions of units while their peers languish in obscurity.
I wondered, are all forms of media equally competitive?
To find out, I scraped the internet for as much media data as I could find¹. To my delight and surprise, whenever I ordered things by the biggest winners, the same pattern emerged. It’s called power law, and you can see it below.
What is Power Law?
In technical terms, power law is just a mathematical relationship. Here’s what it looks like.
The part in green dominates, hoarding most of the distribution. This section is followed by a long tail in yellow. Together, these two parts form the power law pattern.
Basically, power law is like a forest². There are tall trees which soak up the sun and grow to be enormous. Then there are all the shrubs on the forest floor.
Media Data and Plots
The New York Times web API³ provides a list of fiction best sellers from the last 7 years. If we measure success by weeks on the list, we can compare the most and least successful books.
Below are plots organized by novel, author, and publishers. I also show the top-20 results in each category, both for fun and to prove that the data is correct.
By scaping data from Billboard.com⁴, we can get all the songs that made it onto the Hot-100 charts and find out how long they stayed there.
Wikipedia contains box office information on just about ever movie ever released⁶. Writing some scripts to grab this data, I was able to find the gross revenue of all major films released from 1970–2018.
Note: there is some difficulty in parsing Wikipedia infobox characters. This resulted in certain films being dropped from the list. Still, the overall trend seems correct.
By now, video games are an established art form with over 100 billion in global sales in 2017⁷. Thanks to Julien F at data world⁸, I was spared from having to scrape game data myself.
Below are all video game sales plotted by units (not dollars) from 1980 to 2017.
On this list, “Wii Sports” skews the scale. If we remove it, we can get a clearer picture of the curve.
Newspapers were one of the first industries to be impacted by the internet. As their local monopolies were disrupted, all newspapers began to compete online where power law reigns supreme.
I contacted The Alliance For Audited Media⁹ and requested data on all US newspapers with circulation above 25,000 from 2018. Below are the results (Sunday circulation only).
Bonus — Podcasts
Podcasting is the new kid on the block when it comes to popular media. You may wonder how it compares to more established forms of entertainment. Below is a chart of the top podcast networks ordered by unique streams. It’s too early to tell, but we may be witnessing a new power law curve in the making.
Why do Power Law Distributions Form?
Looking at these graphs, the same question jumps out with each one -why power law? What is it about media that results in this concentration of success?
The short answer: network effects and positive feedback loops. Both concepts are described well by David Easley and Jon Kleinberg in their book “Networks, Crowds, and Markets: Reasoning about a Highly Connected World”¹⁰. In particular, they posit that popularity is a network phenomenon.
It’s easy to see how this might play out in our examples. In our networked world, people can recommend books, movies and games to each other. These titles will get more reviews, more shelf space, and ultimately, more attention.
In this way, success breeds success. It’s a virtuous cycle, a positive feedback loop. The popularity of one work takes attention away from others. It crowds out other media just as giant trees crowds out smaller plants. This process is called preferential attachment and it is at the heart of power law.
Which Industries are the Most Concentrated?
Knowing that popularity is a network phenomenon we might wonder, which networks are the strongest? The industries that are the most networked — and likely the least regulated by gatekeepers — are the ones where we would expect to see the steepest curves.
One way to measure concentration is to test the Pareto principle and see what percentage of gains are held by those at the top. The table below shows the percentage of success — revenue, weeks on charts, units sold etc — that is held by the top 20%.
Implications and Conclusions
We’ve all noticed that culture is splintering. Still, it seems like the winners in this new world of media are bigger than ever before. We know that popularity is a network effect. As our lives becomes more and more connected, we should expect that power law curves will become even more common. Moreover, winners in this new world will become even more dominant, entrenched by network effects. The rewards for scoring a hit are as high as ever, it’s just that the chances of it happening to you are slim.
Its not all bad news though. The long tail of media is lengthening, making room for more creators of all stripes. More and more people will have a real shot at making a viral movie, song, or novel. Hell, those who can hack the network effect, might even find new ways to make it to the top (as this Vice reporter did ).
Ultimately, equality of oppotunity will be greater than ever, even as the same is true for the inequality of outcomes.
Bonus II— Are These Actual, Mathematical Power Law Curves (and can we compare them?)
All of these curves certainly resemble power law distributions. They all have big winners and exhibit long-tail behavior. Still, can we mathematically say that they are true power law curves?
In its simplest form, a power law curve is one defined by the following exponential relationship:
That is, power law curves are defined by a constant negative exponent.
Now that we’ve focussed on each form of media individually, let’s go back to the beginning and try to compare them on one plot. Using the R nls library¹¹ to fit the various media curves to the power law formula, we once again see the following media curves (formula fitted lines in green).
It is clear that the video game publisher curve (~-0.93 fitted exponent) is especially steep and therefore the most concentrated business. The movie curve is much more smooth (~0.18 fitted exponent), indicating that many movies do well instead of just a few.
Note — though the head of the songs curve looks familiar, its tail does not follow a power law pattern, so fitting it is impossible.
Other Mathematical Analysis
One feature of power law distributions is that they appears linear when plotted on a log-log graph. This is easily derived from the power law formula by taking the log of both sides:
- y = ax^-k
- log(y) = log(a) + log(x^k)
- log(y) = log(a) + klog(x) -> which is a line with slope=k and intercept=a
If we plot the media curves in log-log form, we should see straight lines, or at least straight line segments. Log-log plots below.
From the above, we can say that the books, authors, book publishers, and newspaper look very linear. Video game publishers do as well. Movies, directors, musicians, and games appear more exponential in nature.
Using R Libraries to investigate
To get more certainty about these distributions, we can use existing R libraries that are designed for this sort of analysis. Both the igraph¹² and the PoweRlaw¹³ libraries try to fit to the normalized form of the power distribution shown below
Below are the values generated by fitting the media data with the igraph lib.
Using the poweRlaw library, we obtain similar results.
1 — All code and data used for this project is on Github at https://github.com/taubergm/Powerlaw
2— This paper shows that yes, African tree canopies form a power law distribution https://www.nature.com/articles/nature06060
3— NYT developer API — https://developer.nytimes.com
5 —For a comparison of music in the age of streaming, see my previous post on Spotify— https://medium.com/@michaeltauberg/spotify-is-killing-song-titles-5f48b7827653)
6 — Example of Wikipedia movie list https://en.wikipedia.org/wiki/2010_in_film . The associated links contain box office info.
7 — The gaming market is massive and growing — https://newzoo.com/insights/articles/the-global-games-market-will-reach-108-9-billion-in-2017-with-mobile-taking-42/
8 —Source of video game data — https://data.world/julienf/video-games-global-sales-in-volume-1983-2017
9 —Audited Media collects information on all U.S. publications https://auditedmedia.com
10 — Chapter 18 excerpt from the book— Power Laws and Rich-Get-Richer Phenomena
11 — Nonlinear (weighted) least-squares estimate doc - http://stat.ethz.ch/R-manual/R-devel/library/stats/html/nls.html
12 —R Igraph lib doc — http://igraph.org/r/doc/fit_power_law.html
13 — R poweRlaw lib doc — https://cran.r-project.org/web/packages/poweRlaw/poweRlaw.pdf