How a Cutting-Edge Machine Learning Architecture Can Increase Conversions From Product Recommendations by 71%

Have you ever wondered how exactly ecommerce stores manage product recommendations? All those familiar “people also bought” blocks aren’t filled manually. They are created in real-time by complex machine learning algorithms.

For years, marketers have relied on these algorithms to increase sales and improve the customer experience. And while they constantly evolve, one recent breakthrough in machine learning can achieve results that seemed impossible only a decade ago.

How traditional methods work

Let’s dive deeper into how the older generation of machine learning algorithms for product recommendations works.

Content-based recommendation systems

These systems suggest items similar to the ones the user interacted with before. It analyzes the attributes of each item and makes suggestions based on the user's previous actions. Examples of attributes are size, color, brand, type, category, etc. For example, a person browsed 5 different sneakers. The system uses this data to recommend more sneaker options.

Another example would be a person who bought joggers, a tracksuit, and sneakers from the same brand. The system can’t rely on the category attribute since all three items are different. But since they are all the same brand, the likely recommendation will be a trucker hat from that manufacturer.

The content-based systems are rather basic compared to other approaches as they can’t recommend anything outside the user’s current interests. However, they have their uses.

Pros:

Does not require data on other users' behavior.
Can recommend items to users with unique or niche tastes.
Can recommend new items that have not been rated by others yet.

Cons:

Does not factor in complex relations and sequences in the user’s history.
Struggles with recommending items that don't have enough descriptive features.
May not capture diverse interests if user interacted with a limited number of items.

Collaborative filtering recommendation systems

Another popular approach to creating recommendations is collaborative filtering. This method suggests relevant items based on the preferences of other users. There are two main branches of collaborative filtering: user-based and item-based ones.

User-based collaborative filtering models find users that are similar to our target customer and look at their preferences and purchase history. The system then proceeds to recommend items preferred by similar users.

An item-based collaborative filtering shifts the focus to items instead of users. It examines huge volumes of interaction data to find connections between items based on what other users viewed or purchased together before. Based on these connections between the items, it then proceeds to recommend similar items to our target user.

The item-based filtering is somewhat close to content-based recommendations, as both approaches look for similarities between products. However, content-based relies strictly on the internal attributes of each item, while item-based collaborative filtering also looks at how users interact with items.

Collaborative filtering is a powerful method that is widely used these days for a good reason.

Pros:

Works well even with minimal item’s metadata.
Can capture complex, latent patterns in the user behavior.
Often provides more diverse and unexpected recommendations.

Cons:

Large datasets can be computationally expensive to process.
The user-item relationship data is often sparse, making it challenging to find similarities.
Difficult to recommend items to new users or recommend new items due to the lack of data (the cold start problem).

Hybrid recommendation systems

Hybrid systems use multiple approaches at once to get better results while mitigating the weaknesses of each individual method. The most common way of making hybrid systems is a combination of content-based and collaborative filtering methods.

There are a few different ways to create hybrid systems. For example, a system can generate recommendations through different approaches and then merge them together. Another way is to switch methods based on the situation.

Pros:

Can be tailored to different application contexts and user needs.
Can capture a broader range of user preferences and item characteristics.
By combining methods, hybrid systems can provide more accurate recommendations.
Reduces the limitations of individual methods, such as the cold start problem and data sparsity.

Cons:

Combining multiple methods may require more computational resources.
Balancing and integrating different methods effectively can be challenging.
Implementing and maintaining hybrid systems can be more complex than using a single method.

While all of these approaches can bring significant results, one method is unrivaled when it comes to accuracy.

I want to learn more about website recommendations

Schedule a call

Transformers—the new way of handling recommendations

Transformer models (that are absolutely unrelated to a series of popular action figures made by Hasbro) have changed the world of AI. Transformers were introduced in 2017 by Ashish Vaswani and his team in their groundbreaking paper “Attention Is All You Need.”

Over the following years, transformers became a cornerstone of modern AI technology. In 2018 Google introduced BERT (Bidirectional Encoder Representations from Transformers) model that soon became a part of Google’s search engine. Later, OpenAI presented its take on transformer models, GPT-2.

How do the transformers work?

Transformers excel at handling sequential data, like item views, clicks, or purchases. They can capture complex dependencies and relationships within these sequences, allowing the model to understand the context and importance of each interaction.

The whole process can be roughly outlined in five steps:

Input representation: User interactions and item features are converted into embeddings.

Embedding

is data represented as a string of numbers that captures information about products, customer interactions, and other data needed to create a recommendation. Embedding is somewhat similar to data arrays. However, data arrays contain information that can be interpreted by humans (e.g. [shirt, M, white]) while data in embeddings only makes sense to machine learning systems (e.g. [0.12, -0.08, 0.45, ..., -0.34]).

Positional embeddings: These are added to embeddings to keep the order of interactions. This way, the model knows not just what interactions happened but also the order in which they occurred.

Self-attention mechanism: This is a key part of how transformers work. It helps the model focus on the crucial parts of the sequence.

Self-attention

is a technique used in machine learning models to help them understand the relationships between different parts of a sequence. It analyzes the whole input and finds which parts of it are more important. Let’s say we have a product — iPhone 15 Pink 256 GB. The model will assign more weight to “iPhone” and “15” as these are more important to represent the item than “Pink” or “256 GB.”

Transformer layers: Item representations are refined by considering the entire sequence and context, effectively capturing dependencies between items that aren’t close to each other.

Prediction: After processing the input through transformer layers, the model uses the improved understanding to predict the next item in the user's interaction sequence or to score items for recommendation.

This prediction can be adapted for various tasks, such as ranking items or generating personalized recommendation lists.

The ability to handle sequential data and capture intricate patterns in user behavior makes transformer-based models particularly effective for tasks that involve predicting the next interaction in a sequence.

The benefits of transformer-based recommendations

Transformer-based recommendations have a number of benefits, making them especially suitable for ecommerce businesses. Let’s examine them more closely.

Transformers create more accurate predictions

The most obvious benefit of these recommendations is their accuracy. Users can see better and more accurate recommendations that are more relevant. This leads to several positive outcomes, such as an increase in sales, click-through rates, and better customer experience.

Transformers can view an entire history of purchases in sequence

A major advantage of using transformers for recommendations is their ability to view the entire user’s history as a sequence. Previous methods don’t have this ability, and create recommendations based on each separate data point. Merging them requires certain techniques that aren’t always accurate.

For example, a customer purchased three different books about Ancient Rome. One is a fictional story set in the Roman Empire. One is a historical book. And the third one is a biography of a Roman emperor. All three are by different authors.

Previous methods won’t see this sequence of topics and might suggest a different book, like a biography of a medieval king or a book on the history of Ancient Greece. However, the transformer model will see the whole sequence and can accurately predict that the customer is interested in books specifically about Rome.

Transformers are better at handling incomplete or inconsistent data

This one is a huge boon for any company that doesn’t spend enough effort on cleaning and maintaining their data (e.g. complete and standardized product feeds).

Transformers can handle messy or incomplete data by using the self-attention mechanism mentioned earlier. This allows the model to look at all the parts of the data it has and figure out which pieces are the most important, even if some information is missing.

It’s like putting together a puzzle where some pieces are missing—the transformer can still see the big picture by focusing on the parts that matter.

Another way transformers do this is by understanding the context of the data. For example, if some details are missing in a product description, the model can use the other information it has to make a good guess about what’s missing. This makes transformers flexible and able to work with different types of data, even when it’s not complete or perfectly organized.

Although transformers are a potent solution, they have a number of inherent weaknesses compared to other approaches.

They are resource-intensive. Running a transformer-based recommendations setup requires significant computational power and memory. If you decide to put your own system in place, be prepared for significant investments into GPUs and other server infrastructure, or pay for cloud computing services.

They need lots of data. While transformers can work with inconsistent data, they still require large datasets to unlock their full potential. For smaller businesses that don’t have enough customer data, simpler methods may be more efficient.

They are difficult to develop. The creation of a transformer-based solution requires specialized technical expertise. Like with every cutting-edge tech, there are nowhere near enough specialists who can work with transformers properly. For most companies, it’s not feasible to create their own in-house solution for this.

I want to use recommendations on my website

Try it now!

How transformer-based recommendations work in Yespo

As you’ve seen, transformers are indeed a powerful way to create product recommendations. Previously, we used hybrid methods based on collaborative filtering and other methods.

However, here at Yespo, we saw the potential of transformers, and that’s why we decided to roll out our own solution using this architecture.

Please note

Since April 2024, we have improved our transformer models by combining them with the LLM (Large Language Model) technology. LLM uses advanced AI language understanding to analyze user behavior, preferences, and product details in unprecedented depth.

This has enhanced the accuracy of product recommendations even further, providing customers with more relevant and personalized suggestions.

During May and June 2024, we tested improved product recommendations for several clients of Yespo CDP. Then we compared the results to those from January and February.

We identified three main metrics to evaluate:

Click-through rate (CTR)—the amount of clicks we saw on improved recommendations compared to the baseline period.
Conversions—the amount of users purchased something from the recommendation blocks compared to the baseline period.
Order share—the amount of orders came from the recommendation blocks compared to the overall sales from the website.

Dnipro-M

Dnipro-M is a manufacturer and seller of construction tools. Its share of the Ukrainian market reaches 30%. The company is also represented in several Eastern European countries, including the Czech Republic, Poland, and Moldova.

During the test period, the following results were achieved:

CTR	+105%
Conversions	+43%
Order share	+116%

MasterZoo

MasterZoo is a leading pet store chain in Ukraine with over 200 physical locations and a large online store.

The following results were achieved with improved recommendations:

CTR	+46%
Conversions	+71%
Order share	+76%

Yakaboo

Yakaboo is one of the largest Ukrainian online bookstores. Founded in 2004, it offers books in 71 languages.

After the test, the results were the following:

Conversions	+10%
Order share	+126%

Conclusion

Transformers are a powerful tool that, when implemented properly, can deliver terrific results for personalized product recommendations. Doing it on your own is a daunting task, but fortunately, we at Yespo have already done all the heavy lifting for you.

If you’re interested in trying transformer-powered recommendations in your business, fill in the form below, and we’ll reach out to you to discuss the details!

Get professional expertise