Arno Knobbe is a senior researcher at the LIACS, and head of the Data Mining group. His current research revolves around the following topics:
Subgroup Discovery is the art of finding specific areas in the data that show a significant difference in behaviour, compared to the overall dataset, and that are easily described in terms of the available attributes. By ‘behaviour’, we can mean many things, depending on the specific application. The simplest setting would be where one would assign one of the attributes as the target, and then try to identify parts of the data, so-called subgroups, where the average value of the target is notably different from that of the entire population. Over the last years, I have been pioneering a variation of Subgroup Discovery that considers multiple target attributes, and subgroups are deemed interesting if a model that is fitted to the target attributes is somehow (depending on the application) different from a model fitted to the targets for the entire data. This approach, called Exceptional Model Mining comes in many flavours, depending on the type of the target attributes and the nature of the models fitted to the data. See this paper for an extensive survey of the work on EMM. My efforts on SD are bundled in the Cortana and Safarii packages.
Sensor Data Analysis
My second line of research deals with time series and the modelling of complex, dynamic systems, specifically where such time series are produced by sensor systems, which are capable of producing data at an unprecedented scale. One of the key applications in this area has been the InfraWatch project, of which I have been the project manager over the last five years. The project revolved around a highway bridge on the A6 between Amsterdam and Almere, which was fitted with 145 sensors of various types, and has been producing data ever since their installation in 2008. The sensor data we model tends to combine continuous (e.g. how the apparent strain on a bridge depends on the outside temperature and amount of sunlight absorbed) and discrete elements (e.g. a heavy truck passing the bridge or traffic jams). Additionally, one tends to recognize multiple intertwined effects in the data, delays and integration over time, and effects at different time scales (ranging from seconds to years). My group has been pioneering the use of Minimum Description Length techniques to model complex sensor data, for example to separate sensor data that works at different time scales into temporal components.