TESTING ZIPF'S LAW

George Kingsley Zipf observed that in languages, the most frequently used word occurs twice as often as the second most frequent word, four times as often as the third most frequent word, eight times as often as the fourth most frequent word, etc. This unexpectedly elegant distribution is called the Zipfian rank-size power law distribution where frequency is inversely proportional to rank with a coefficient of close to -1.

This phenomenon has been further found in other types of data, including other language corpora, city populations (Auerbach 1913), corporation sizes, and income rankings. We want to find more instances of Zipf's.

To test for Zipf's, we regress log rank = a + b log value. Perfect Zipf's has a coefficient b = -1. Closer to -1 = closer to Zipf's. We plot the data on a log-log graph with log frequency and log rank to visually see how close it is to a perfect Zipfian distribution.

This site is available on Github here.

Select a dataset to begin