Zipf's law

Zipf's law

all languages follow this law. The second most common word is used 1/2 as much as the most. The third word is used 1/3 the most, and so on and so on for all languages




 (/zɪf/) is an empirical law formulated using mathematical statistics that refers to the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution, one of a family of related discrete power law probability distributionsZipf distribution is related to the zeta distribution, but is not identical.

For example, Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.: the rank-frequency distribution is an inverse relation. For example, in the Brown Corpus of American English text, the word the is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word of accounts for slightly over 3.5% of words (36,411 occurrences), followed by and (28,852). Only 135 vocabulary items are needed to account for half the Brown Corpus.[1]

Comments

Amazon

Popular posts from this blog

Here's why the theory that Taylor Swift is a Illuminauti satanist clone absolutely checks out ;)

WHO RECOMMENDS RESCHEDULING CANNABIS IN INTERNATIONAL LAW FOR FIRST TIME IN HISTORY

The 48 Best Yoga Blogs of 2018