Over the last few years, the number of consumers that are starting to shop online is increasing. It is of great importance for e-Commerce companies to appear as high as possible in search engines in order to attract potential customers to their website. To increase the visibility of a website to users of a search engine, the application of Search Engine Optimization (SEO) could be an interesting option to apply. Similar or duplicate product descriptions will cause a lower rank in the search engines. Besides, a description should be informative, realistic, and written in plain language to increase the visibility of the product.
Natural Language Processing (NLP) techniques can be used to speed up the process of writing product descriptions. In this article, we use the Transformer that was first discussed in Vaswani et al. (2017), we will explain this architecture in more detail later in this article. We trained the transformer architecture for the Dutch language. In this article, we will discuss the possibility of automating the writing of product descriptions. The goal is to translate product attributes into a correct, unique Dutch description. That means the input will be a product attribute of a product, for example, a washing capacity of 8 kg for a washing machine. The output will be a correct and unique generated Dutch sentence.
Currently, most studies used the Chinese language; in this article, we will explain how to generate Dutch descriptions. We were inspired by Chen et al. (2017), which developed the Knowledge Personalized (KOBE) product description generator. The authors used the transformer architecture.
Transformer
In 2017, Vaswani et al. (2017) changed the NLG area by proposing a new simple network architecture, called the Transformer, in their paper, ‘Attention is all you need.’ The baseline model already outperforms state-of-the-art BLEU scores in several NLG tasks. Moreover, the Transformer requires less training time compared to the other models. Current architectures for sequence modeling are based on recurrent or convolutional neural networks and connect the encoder and decoder with an attention mechanism. In contrast, the Transformer relies only on attention mechanisms. That means that the Transformer model has a self-attention encoder-decoder architecture. Some studies implemented the transformer architecture into their model and received promising results. The Transformer architecture is visualized in the figure.
The dataset we used consists of around 230k sentences divided over 52 different product groups, but they are all in the electronics sector. Each product has its own set of product attributes. Together with this list, we were able to extract the attributes from the descriptions. The descriptions are written in the Dutch language, and this has a major disadvantage: the dataset size is limited. 230k sentences seems many descriptions, but machine learning models require much more data. Next to this, the descriptions are divided over around 50 product categories, and not all sentences contain a product attribute (75k sentences contained at least one attribute). Because we are interested in translating attributes into a Dutch sentence, we could, therefore, use a limited number of sentences. In order to still use the other sentences, we used the Term Frequency Independent Document Frequency (TF-IDF) method to extract the two most unique words from the sentence. A word is unique if it does not often occur over the entire corpus.
The words in the input have a major influence on the output. In this blog, we explain different variations and indicate which ones work and which do not. The output is always single sentences. Generating longer descriptions has not (yet) been successful. During training, multi-bleu was used as an evaluation metric. It is often used in text generation assignments. We mainly evaluated the generated sentences for uniqueness. It has been found that many sentences have been copied literally from the training set, and this is, of course, not the intention as unique sentences have to be formed. This uniqueness percentage represents how many percent of the generated sentences were not literally copied from the training set.
The experiment that scored the highest on uniqueness is when we use all data and the input the product category together with the 2 TF-IDF words (i.e. the two most unique words from a sentence). After training, the model has a unique score of approximately 70%. The idea behind this method was that mainly product attributes would be extracted. This turned out not to be the case. Product attributes were not seen as unique words. Since we are interested in translating attributes into a description, this configuration is not optimal. It has proven that unique sentences can be generated with this model.
The second experiment has the product category together with the product attributes as input. With this experiment, we have fewer data at our disposal, as we can only include the sentences that describe an attribute. In about 10% of the cases, the model generated unique sentences. The model has generated the following sentences (in Dutch):
In this blog, we discussed whether it was possible to generate product descriptions. It has turned out that this is possible in Dutch, but that further research is needed to optimize the models. The disadvantage of the Dutch language is that little textual data is available, whereas machine learning models do require it. What is also often seen is that the input sentence (i.e., the attributes) is used literally in the output sentence. This is not a problem for most attributes, but the two attributes “height adjustable” and “height 24 cm” both contain the word “height” which is overlapping and can be incorrectly processed. In 25% of the cases, the product category name (which is in the input) appears literally in the generated sentence. This is comparable to the sentences from the dataset.
To conclude, it is possible to generate Dutch product descriptions with a Transformer architecture. Although a small percentage of the generated descriptions were unique, the model did generate unique descriptions. We are mainly interested in the translation of product attributes into a description. Here, around 10% of the descriptions was unique. Moreover, these descriptions captured the presented attributes and were written in proper Dutch.