First of all, some small explanation that should already give you a clue about how reliable this analysis is for content publishing or SEO. The buzz around WDF*IDF exists since 2012 when it comes to on-page SEO. It not only searches for a single keyword but also for related terms, their order and relevancy on other sites. This helps to get more accurate data about related terms, semantic context and the popularity of terms.
What means WDF?
WDF describes the Keyword Frequency inside a document (within document frequency).
What means IDF?
IDF describes the popularity/frequency the keywords are used in other documents around the web (inverse document frequency).
How WD*IDF works
There is one formula to calculate the frequency of a word (i) in a text (j) which is multiplied by the frequency of the same word in a relevant document corpus. This gives the weighting w of this term (i) in the document (j):
wi,j= WDFi,j* IDFi
The factor WDF is calculated as follows:
WDFi= log2(Freqi,j+ 1) / log2(L)
This determines how relatively frequently a term (ie a word or a combination) occurs within a document.
The logarithm prevents a huge increase in the main keyword from leading to better value in the calculation. While the keyword density only calculates the percentage distribution of a single word in relation to the total number of words in a text, the Within-Document-Frequency also includes the ratio of all words used in the text.
The multiplier IDF calculates the “inverse document frequency”, the document frequency. For this purpose, the term frequency (t) is set in relation to the relative occurrence of all remaining words of a text or document (D) or a website. Thus, IDF determines how relevant a text is to a particular keyword. The calculation is as follows:
IDFt= log (1 + ND/ ft)
With the “Inverse Document Frequency” a corrective is added to the factor WDF. Calculating the inverse document frequency is important to include the frequency of documents at a particular term. IDF sets the number of known documents in relation to the number of texts containing the term. The logarithm also serves to “compress” the results.
Both formulas multiplied together give the relative term weighting of a document in relation to all potentially possible documents that contain the same keyword. To get a useful result, this formula must be done for each meaningful word within a text document.
The larger the database used to calculate the WDF*IDF, the more accurate the results.
Tools you can rely on
Amongst a few other tools out there, I’d recommend using:
If you want to find out more about the tools I use, you can find out more in the Tools Section of my blog.
The most important question: Should you as an SEO rely on those numbers?
The answer is Yes and No.
Advantages of WDF*IDF for SEO
When talking about WDF*IDF in search engine optimization, the user of common tools aims to make texts of a website or subpage as unique as possible. For search engines, it’s important to offer a unique text for a particular search term to rank as high as possible within the SERPs (Search Engine Result Pages). Keyword density was used as a benchmark for search engine optimized texts. The formula WDF*IDF represents a much more precise way of optimizing content.
Semantic Context and more
As search engines try to interpret the semantic context, it can be advantageous to optimize the content of a website semantically. This is Latent Semantic Optimization.
A WDF*IDF tool can help determine a keyword that should ideally be used in the website content. Not only can they be used to optimize keywords. They also provide clues as to which other terms a document should contain to be as unique as possible.
Disadvantages OF WDF*IDF for SEO
WDF*IDF is not a panacea for content optimization. It’s a math-based keyword optimization tool that allows you to create content as unique as possible. Many actual content optimization factors are excluded from the WDF*IDF value. These include significant neighboring terms or signal words that suggest the search intent of the user. Pure orientation to WDF*IDF scores may suggest nonsense content to be optimized. The tools cannot map ambiguities.
More disadvantages of WDF*IDF
Also the formula WDF*IDF alone does not take into account that search terms may also be more frequent in a paragraph, that stemming rules may apply, or that a text is increasingly working with synonyms. If you want to optimize texts based on the term weighting, the user must be aware that all elements of his website are included in the analysis.
Text agencies, copywriters or webmasters should not rely solely on the WDF*IDF curve when writing. Ultimately, the results of the tools are only calculations based on logarithms. Other aspects do not play any role in term weighting. Tonality, CTAs, structure, style, jargon and reading fluency play an important role in the user-friendliness and readability of a text.
What’s the reason for that
There are a couple of reasons where the long traded secret weapon of SEO, the WDF*IDF formula shows it’s weaknesses. The continuous improvement of algorithms, the advancing development of AI (machine learning), and the increasing customer orientation in content optimization.
How to determine well written content
Interaction rates, bounce rates, and length of stay have become significantly more important than Google’s term calculation and search algorithms. In order for content to be accepted by users and a text to become really good, these aspects should get more attention when writing texts.
Last but not least, text optimization is just one of many aspects of on-page optimization. Even the best text written according to WDF*IDF will not outweigh ranking disadvantages caused by inferior content, bad backlinks or non-mobile optimized page.
Have a Shop? Don’t rely on WDF*IDF!
Especially for online stores, category headings and product names are included in the calculation of the weighting. If only one product is described on one page, the formula WDF*IDF is not suitable. It will not help to improve content. For this product descriptions usually contain too little text. Because the formula goes much further as it calculates the value of each term.