Zipf’s Law: Why You Should Care About Uncommon Long Tail Search Terms

January 20, 2016

6:00 pm

There’s a very short and simple answer: because there are a ton of them. Google itself said that never-before-seen queries accounted for about 15 percent of all searches. These and other phrases make up something called the ‘long tail’.

In late October, Google announced their latest ranking tool, a machine learning algorithm known as RankBrain. They also publicly stated that RankBrain is the third most important ranking factor and has been for a few months now (although they won’t confirm what number 1 and number 2 are, the search science community has opinions: namely, links & content quality).

What RankBrain does

RankBrain’s central theme seems to be figuring out what searchers want when they use an unusual turn of phrase. We have one confirmed example of a query where RankBrain is the star, and that’s “what’s the title of the consumer at the highest level of a food chain”. The answer is ‘predator’, but Google has to figure relevant answers out without knowing the word the searcher really wants. Hence the machine learning (or in other words, the artificial intelligence).

Bill Slawski of Go Fish Digital went back to Google’s patents to find out more. He suggests ‘New York Times puzzle’ (looking for the NYT crossword) as another example. It’s a less common way of asking for the same thing.

Understanding the Long Tail

RankBrain is not just about never-before seen queries, although some rough math shows that those amount to a substantial chunk of a very large cake. The cake is as big as 40,000 search queries per second or 3.5 billion per day. And that’s just on Google, which gets about 67% of the global market. Scaling up gives us roughly 5.2 billion searches per day across all engines worldwide with about 780 million of those new and unique.

With RankBrain it’s clear that Google is thinking hard about how to serve better results for rarer search terms, but to understand how important they really are we need a little more math.

It’s very tempting to think that for your business a few huge search queries (or keywords) eat up most of the volume but that’s not the case. Nor are query frequencies random and messy. The distribution of keywords searchers use conforms very nicely to what is called a power law. In fact, it conforms to a very special but very common case of power laws, called Zip’s Law.

Zipf’s Law

First let’s rank our search phrases from most to least popular. Zipf’s Law means that the second most common phrase, the one with rank #2, will be have ½ the frequency of the most popular. The third most popular, the one with rank #3, will have 1/3, and so on.

There’s a big difference between 1 and 1/2, but 1/100 and 1/101 are pretty similar (0.0100 and 0.0099). The 101st most popular search term is only slightly less popular than the 100th. That makes sense if there are a lot of the less popular terms, each with a roughly similar share.  The plot below shows a snippet of keyword rank and frequency data from a small business pest control company, It shows that characteristic, smooth distribution with many, less frequent keywords.

keyword rank versus

In this case unique and never-before seen keywords (those with frequency = 1) make up well over a whopping 50 percent of the sample. In fact, for almost all clients, we find that for every 1,000 organic keyword driven sessions, there will be roughly 500 distinct keywords!

Technically, Zipf’s law can start to wobble a little when we consider really big data sets (and 5.2 billion different searches per day definitely qualifies) but it’s a very useful tool for understanding the importance of the long tail in search marketing. And it should hold true for businesses up to tens of millions of dollars in revenue. (Thanks to our friend and statistician-for-hire John Cook for helping us think about this.)

Per-Visit Value in the Long Tail

We find power laws in all kinds of other places in digital marketing. You might have heard of the special case of the Pareto Principle – 20 percent of the customers provide 80 percent of the revenue.

Traffic gained from different positions in search engine results pages are not linearly or randomly distributed either. But let’s stick the context of search keywords.

In addition to sending half (or more) of the traffic, the long tail, once-in-a-sites-lifetime keywords can be more valuable than average keywords. Let’s take ‘carpenter ants’. This is a common keyword, but it’s hard to know what the searcher is really look for. Is it pest control services or information for a school project? It seems likely that some searchers will fall into both those categories.

Now consider ‘how to get rid of carpenter ants in the kitchen’. That searcher has an obvious problem and Debug can provide a very clear, very tightly tailored solution. It might be less common but visit by visit the probability of the second keyword converting is much higher.

Long tail keywords often represent searchers looking for something very specific- those who have decided what they want and are close to making a final purchase decision. As such, long tail traffic can provide a strong marginal value per visit. Remember, 20 percent of the sessions can deliver 80 percent of the revenue and those aren’t necessarily sourced from the most popular search terms!

How to Get That Long-Tail Traffic

Long-tail traffic can also be much less competitive. A small company like Debug might struggle to capture the #1 spot on a search engine results page for ‘carpenter ants’ but they may well be able to own the space for the longer example and similar ones around it with relative ease.

Attracting long-tail traffic is all about content. Talk about what you do. In ecommerce, make sure all products have a unique, well-written description.  If your site sells B2B, create detailed case studies. Know what your unique selling points are and discuss them even if your niche is a small one.

A lot of what will help you attract long-tail searchers is also what will help you sell to them and to users coming in from other sources. Use obvious keywords but write for the user first and foremost. Don’t get hung up on old-school concepts like keyword frequency (boring) and celebrate what makes your offering stand out from the crowd. Both conversion rates and long tail traffic are the reward.


Did you like this article?

Get more delivered to your inbox just like it!

Sorry about that. Try these articles instead!

Timothy Carter is Director of Business Development for the Seattle-based content marketing & social media agency AudienceBloom. When he's not working, he's writing for sites like,,, and