The development of transformer models such as BERT (Devlin et al., 2018) and GPT3 (Floridi & Chiriatti, 2020) have revolutionized the world of Natural Language Processing (NLP). These models now show a human level of intelligence in tasks such as text generation and paraphrasing. Earlier research has shown that the powerful GPT3-model makes it possible to generate product texts, and even take into account subjective information such as the clothing style in the style of writing. However, the problem here is that it is unclear how well-written and informative these texts are.
This article explains how product texts, or descriptions, can be optimized to achieve a high(er) Search Engine Optimization (SEO) score for common search terms. These terms are for example keyword density, readability, word-count, and more. An interesting complication to this problem is the fact that Google has changed its search to rely on Transformer models (Vaswani et al., 2017). This is said to have changed about 10% of worldwide rankings. BERT is now used in one out of ten searches to help with understanding a certain webpage. If the search engine finds it difficult to understand, the people who will read it will most likely find it difficult as well. In this blog, we elaborate the key components of the SEO score and how a subset of these components can be optimized using a so-called SEO model.
Building the SEO-score
The SEO-score defined here is built up of seven different sub scores, each having a different weight. The final score is a weighted average of the individual scores. Some of these scores are used by various SEO-tools such as Yoast, SEMrush, and many more.
- Keyword density
The keyword density stands for the ratio of how many times a certain keyword is listed in a text. An ideal density is around 1-2% (https://blog.alexa.com/keyword-density/). This is to make sure that there is no overuse of a certain keyword. The keywords to check for can be the category of a product, brand or keywords generated using the Google Ads API (https://developers.google.com/google-ads/api/docs/keyword-planning/overview). The score is defined as a ratio that gives a high score for a low density and a low score for an high density.
- Query-Text score
To get a better understanding of how good a certain text matches Google’s ranking, the top 10 keywords for a specific category or brand are generated using the Google Keyword Planner and then passed to Sentence-BERT (Reimers & Gurevych, 2019), which computes the cosine-similarity score for the queries and the text.
- Word count
The word count, being the ideal number of words in a text, is dependent on the branch and preference. Blog posts contain more words in general, whereas product descriptions are usually shorter. What matters is that Google can understand what a certain page is about, so it is allowed to write texts that are very short as long as the text contains everything that Google needs to know. This word count is implemented as an addition and should not be considered the most important feature.
- Sentence length
Sentences of a length smaller than three words cannot be considered as valid sentences while sentences that are too long have an impact on the readability. From all the sentences, only 25% is allowed to contain more than 25 words (https://medium.com/illumination/paragraph-length-and-structure-for-seo-d8d503f2a1a6). This is used to improve the readability of a certain text. A greater ratio of long sentences inevitably gives a lower score.
- Passive vs Active score
To make sure that the readability is at its highest, passive voice in sentences should be avoided (when possible). Active voice assures that a text is more understandable for people. Although Google is capable of deriving the meaning of a sentence, whether it is in passive or active voice (Warstadt & Bowman, 2019), writing actively has an impact on the readability and therefore the SEO score (https://developers.google.com/tech-writing/one/active-voice). From all sentences in a text, the ratio of passive voice should be below 10%, a higher ratio leads to a lower score. To classify sentences between active and passive voice in Dutch, we have finetuned BERTje (de Vries et al., 2019) for this classification problem (https://huggingface.co/Gerwin/bert-for-pac). This classifier was finetuned on 780 active sentences and 700 passive sentences. The code for detecting passive voice in English sentences can be found here.
- Use of transition words
Using transition words between sentences assures better readability while also improving the flow of a certain text. This score is calculated as the ratio of transition words used in a sentence, where ideally 30% of the sentences should have a transition word.
- Readability score
To statistically calculate the readability of a text, the Flesch Reading Ease score can be used. The Flesch Reading Ease test can give scores between 0 and 100. An ideal score is between 60-80 for a product description. Therefore, this score must be handled differently so that the scores such as 50 and 90 are given the same penalization.
Data
Three datasets have been used to calculate the SEO-scores for a set of texts and optimize a subset of these texts using the SEO-model. Two of these sample datasets are used by Squadra Machine Learning Company and are direct outputs of their service Powertext.ai.The third dataset is collected via Promptcloud and contains English product descriptions from Victoria’s Secret. In total, the data used consists of 10,500 English texts and 718 Dutch texts.
Shoes
This dataset contains 500 English and 500 Dutch product descriptions about shoes. These descriptions are generated using Powertext.ai.
Washing machines
This dataset contains 218 Dutch product descriptions about washing machines. These descriptions are generated using Powertext.ai.
Victoria’s Secret
This dataset contains 535,600 English product descriptions from the category underwear and swimwear divided over 9 websites. From the total amount, 10,000 English texts were randomly selected to calculate the SEO-score for.
SEO-scores
For the datasets, the SEO-scores were calculated for all texts. From these scores, the minimum, mean, and maximum were calculated. Besides the SEO score in general, the mean of the individual scores is also provided. To make the scoring deterministic, the program was given predefined keywords. These keywords are defined below.
Dataset | Keywords |
Shoes (English) | shoe, shoes, walking |
Shoes (Dutch) | schoen, schoenen, lopen |
Washing machines | wasmachine, wassen, kleding |
Victoria Secret | bra, thong, body, panty, sexy |
The following SEO scores were calculated for the datasets:
Dataset | Min | Mean | Max |
Shoes (English) | 0.520 | 0.692 | 0.820 |
Shoes (Dutch) | 0.600 | 0.776 | 0.870 |
Washing machines | 0.630 | 0.790 | 0.910 |
Victoria’s Secret | 0.270 | 0.591 | 0.820 |
The means of the individual scores are defined below, note that the ‘word count’ is omitted. This is because the overall length of the product descriptions is different for the datasets, thus giving a score based on a general minimum and maximum number of words would be illogical. The scores in the table are defined as ‘score (weight)’.
Dataset | Keyword density (2) | Query-Text (3) | Sentence length (1) | Passive vs Active (2) | Transition words (2) | Readability (3) |
Shoes (English) | 0.694 | 0.251 | 1.000 | 0.723 | 0.820 | 0.821 |
Shoes (Dutch) | 0.693 | 0.443 | 0.994 | 0.637 | 0.968 | 0.980 |
Washing machines | 0.772 | 0.509 | 0.732 | 0.922 | 0.952 | 0.838 |
Victoria’s Secret | 0.927 | 0.082 | 0.628 | 0.976 | 0.287 | 0.709 |
SEO model
Fully optimizing a text to achieve a higher SEO-score is a difficult task. One could implement a GAN that generates new texts using the SEO-score as a loss. It could also be done by implementing a paraphrasing model and using the SEO-score to validate whether the new text is better. In our implementation, we looked at the concepts regarding readability which are: passive vs active, use of transition words, and the readability score. To optimize these scores, we finetuned GPT3 (https://beta.openai.com/docs/guides/fine-tuning), which was trained on 100 inputs and outputs, to improve the readability of the texts. A ‘better’ text is accepted when the sum of the three scores and the overall SEO score are improved, and whether the new text does not differ too much from the old text. This similarity between old and new is calculated using Sentence-BERT. Some examples using the SEO-model are given below, these examples come from the Victoria’s Secret and the Washing Machines dataset.
Old text 1, score: 0.629
A naturally sexy unlined shape, with a lower cut that’s perfect for revealing necklines. Designed to disappear under curve-hugging styles. Lift & Lining Unlined; lace choices are unlined with a foam sling for light lift Underwire cups Straps & Hooks Adjustable straps can convert to crossback and snap into place for a secure hold Back closure Double row of hook and eye closures; Sizes 34DDD, 36DD-36DDD & 38D-38DDD have triple row of closures for a secure, comfortable fit 4 settings to ensure a perfect fit Details & Fabric Double-lined sides for the smoothest shape Imported nylon/spandex
New text 1, score: 0.875
A naturally sexy unlined shape, with a lower cut that’s perfect for revealing necklines. The bra is designed to disappear under curve-hugging styles, with lift and lining that’s unlined and has a foam sling for light lift. The bra has an underwire cup, adjustable straps and hooks that can convert to crossback and snap into place for a secure hold. The back closure has double rows of hook and eye closures, while the sizes 34DDD, 36DD-36DDD & 38D-38DDD have triple rows of closures for a perfect fit.
Old text 2, score: 0.371
A true wardrobe essential with soft, no-show cups and a comfortable fit! Includes removable straps. Most push Structured, underwire cups Straps included Hook-and-eye closure Imported polyamide/spandex
New text 2, score: 0.917
This true wardrobe essential has soft, no-show cups and a comfortable fit. It includes removable straps so you can wear it anywhere. The bra is made of breathable material such as polyamide and spandex, so you can wear it anywhere!
Old text 2, score: 0.667
De digital inverter motor zorgt voor een geluidsniveau van slechts 51 db db, wat echt stil is. Ook het droogresultaat van de aeg wasmachine l7wb86gw is van topniveau dankzij de centrifuge die een maximum toerental van 1600 rpm rotaties per minuut kan bereiken. Met het wassen verkrijg je 51 db db, bij het centrifugeren hoor je maximaal 76 db Zijn laadvermogen van 8 kg is ruim te noemen en wat je ook wilt wassen, met zijn 16 voorgeprogrammeerde wascycli ben jij in staat van wassen echt maatwerk te maken.
New text 2, score: 0.875
De digital inverter motor zorgt voor een geluidsniveau van slechts 51 db, wat echt stil is. Ook het droogresultaat van de aeg wasmachine l7wb86gw is van topniveau dankzij de centrifuge die een maximum toerental van 1600 rpm kan bereiken. Met het wassen verkrijg je 51 db, bij het centrifugeren hoor je maximaal 76 db. Het laadvermogen van 8 kg is ruim te noemen en wat je ook wilt wassen, met zijn 16 voorgeprogrammeerde wascycli ben jij in staat van wassen echt maatwerk te maken.
Conclusion
In this blog, we explained the key concepts of SEO and how to evaluate texts using these concepts. We showed the power of the SEO-model and the ability to reform sentences to achieve higher readability scores and along with those, a higher SEO-score. Still, the problem with the SEO-model is that the model cannot improve the readability in all cases. This is due to GPT3 and its randomness which cannot be tracked and (entirely) regulated. In the future, the model can be improved by adding a EncoderDecoder model which transforms passive sentences to active sentences. This EncoderDecoder model is not fully complete yet due to a lack of data. Overall, the SEO scores are really insightful as to what can be improved, while the SEO-model already shows very good performance in optimizing texts.
References
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30(4), 681-694.
Reimers, N., & Gurevych, I. (2019). Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
de Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., van Noord, G., & Nissim, M. (2019). Bertje: A dutch bert model. arXiv preprint arXiv:1912.09582.
Warstadt, A., & Bowman, S. R. (2019). Linguistic analysis of pretrained sentence encoders with acceptability judgments. arXiv preprint arXiv:1901.03438.