Cláudio de Sá predicted the preferences of people using rankings. He adjusted ‘classical’ machine learning approaches, making them suitable for predicting preferences. His work can be applied in the prediction of election results. PhD defence on 16 December.
Preferences as ranking
Preferences can be seen as rankings – something we prefer to end up high in a ranking list. ‘Preferences are complex structures for artificial intelligence programmes,’ De Sá says. Traditional machine learning methods are not suitable for Label Ranking, the field of interest of De Sá. He, therefore, adjusted several methods so that they can deal with the Label Ranking problem.
One of these methods is the ‘association rules’ approach, aimed at finding relations. An example is the Market Basket analysis, where stores try to find patterns in shopping behaviours – a person who buys bread often buys milk too. However, in his research De Sá is interested in Label Ranking, where a ranking is made for different types of the same product. ‘Take for instance the “sushi data set”: a group of Japanese people, with information about the date of birth, sex, residence and ten types of sushi ranked according to their preference. We successfully manipulated association rule methods to predict people’s sushi preference based on things such as age and sex.’
Trees in the fores
De Sá also made use of a famous approach in machine learning: the random forest. The random forest consists of a large number of decision trees. ‘Each decision tree gets a different bit of information, altogether determining the decision. Again, we adjusted this approach for rankings. For instance, the ranking ‘1,2,3,4,5’ is very different to the ranking ‘5,4,3,2,1’, but very similar to the ranking ‘1,2,3,5,4’. In this way, we can learn the decision tree to put together similar rankings,’ says De Sá. This enables the system to predict preferences.
Finding the deviation
In the ‘subgroup discovery’ approach, researchers try to find a subgroup that deviates from the norm. ‘For example, if I looked at the price of house rent per square meter in Leiden, I would expect this ratio to be higher for the city centre. But maybe there is a small area in the city centre where the price per square meter is actually much lower than expected.’ De Sá adjusted this approach to use it for rankings. ‘When we looked at the sushi data set with this approach, we found that a specific group had an unusual preference for a specific type of sushi, namely the sea urchin. This sushi type was normally ranked ninth or tenth, but was especially liked by this group of males older than thirty years from a specific region of Japan.’
Finding the right data
De Sá points out that the most difficult part is obtaining the right data. ‘My research shows that these different approaches are working well and can predict certain preferences. However, the most difficult thing is to get our hands on real datasets, like data about elections.’