How do algorithms decide what to recommend to us and know what we like

We are constantly surrounded by digital suggestions: every time we access a streaming platform or visit a e-commercesome algorithm based on artificial intelligence is working for us to curate the selection of content in front of us, be it digital content, products to purchase, and so on. This system, technically defined as a recommendation engine, is based on big data analysis and complex machine learning algorithms designed to interpret our past behaviors and anticipate our future desires. The purpose of these technologies is twofold: on the one hand they help us users to orient ourselves in endless catalogues, allowing us to discover films, songs or products that we would struggle to find on our own, and on the other they are essential for companies to keep our involvement high and stimulate sales. It is no coincidence that the market for these systems is now worth almost 7 billion dollars, with this figure expected to triple within the next few years.

But how do algorithms decide what to recommend to us? In summary, the process begins with the widespread collection of our data, both those provided voluntarily and those deduced from our online activities. This information is stored in huge databases, analyzed to identify recurring patterns and finally filtered through three main methodologies: collaborative filtering, which compares us to other similar users; the content-based one, which analyzes the intrinsic characteristics of what we have already appreciated; and hybrid systems. While these algorithms dramatically improve our user experience, they bring with them significant challenges, from the need to protect our privacy and comply with regulations, to the risk of encountering biases learned from the data itself, to the technical complexity of providing us with suggestions in real time. Let’s delve deeper into the topic taking into account that recommendation systems can have different specificities and modus operandi dissimilar to each other.

How recommendation algorithms work and what benefits they bring

To fully understand how recommendation algorithms are able to decipher our tastes, we must analyze the 5 operational phases that transform our interactions into accurate predictions, starting from the first phase, data collection, which represents the main fuel of the entire process. Recommendation engines feed on two categories of traces that we leave online: explicit data, that is, our direct and conscious actions such as a “Like”, a written review or a star rating, and implicit data, much more numerous and subtle, which includes our browsing history, clicks, past purchases or even the time we spend looking at a product. To these are often added demographic and psychographic data, such as our age or our lifestyle. All this data, once collected, passes to the second phase, that of archiving. It is at this point that the data is stored in complex storage structures, known as data warehouse or data lake. Once “stored”, the data passes to the third phase, that of analysis, where machine learning algorithms look for mathematical correlations to create predictive models.

The fourth phase, one of the most important, is filtering, which determines the logic of the suggestion. In collaborative filtering, used massively by giants such as Amazon and Spotify, the system is based on the assumption that if we and another user have had similar preferences in the past, it is likely that we will continue to have them; if we liked the same films as another user, the algorithm will also recommend those that he has seen and we have not. This method can be memory-based, calculating the proximity between users or it can be model-based, exploiting neural networks deep learning to fill the gaps in our preferences. The main limit here is the so-called “cold start”: if we are new users and do not have a history, the system struggles to identify us.

The alternative is content-based filtering, which instead of observing other users, focuses on the characteristics of the objects we liked. If we have listened to a song with certain tags, genre and rhythm, the algorithm will treat us as vectors in a vector space, offering us other songs “close” to the known ones. This approach solves the problem, just mentioned, of the fateful “cold start”, but risks locking us into a bubble where we are always offered things that are too similar to those we already know, limiting the discovery of the new.

To overcome the defects of both filtering methods, platforms such as Netflix adopt hybrid systems, which are very powerful but expensive in terms of calculation. The benefits for our experience are tangible: we save time by avoiding endless scrolling and we discover relevant content, so much so that 80% of views on Netflix derive precisely from these suggestions.

Critical issues related to recommendation systems

Regardless of the recommendation system in use, there are some critical issues inherent in this technology. In addition to the complexity of managing millions of simultaneous recommendations, there is the risk that algorithms learn and amplify social biases present in the training data, generating biased recommendations, without forgetting the delicate issue of privacy linked to the massive collection of our personal information. There is so much to say regarding the critical issues of these systems and it is such a vast and boundless topic that it deserves an ad hoc study.