Posters
Best Poster Award Winners
Gemma Canet Tarres
Thinking Outside the BBox: Unconstrained Generative Object Compositing
Florin Cuconasu
The Power of Noise: Redefining Retrieval for RAG Systems
Alessio Borgi
A Multi-Reference Style and Multi-Modal Context-Awareness Zero-Shot Style Alignment in Image Generation
Tuesday, September 10th
Assimilation of Diurnal Satellite Retrievals of Sea Surface Temperature for Ocean Reanalysis with Convolutional Neural Network
Matteo Broccoli
A variety of Sea Surface Temperature (SST) datasets exists, each one of them with almost unique characteristics that differ from the SST variable of ocean general circulation models. Thus, optimally assimilating such datasets requires a mapping to the first model level. However, this projection is non-trivial and depends on the specific characteristics of the dataset. In this work, we consider different ML models to construct the projection operator, i.e., U-Net, pix2pix, and random forest, trained on satellite subskin SST to reproduce the ESACCI SST. Employing the pix2pix in global-ocean reanalysis-like experiments improves the assimilation of SST up to 10% in RMSE w.r.t. direct assimilation. This approach allows for assimilating different satellite products by only re-training the network.
The Power of Noise: Redefining Retrieval for RAG Systems
Florin Cuconasu
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by using an Information Retrieval (IR) system to supplement prompts with relevant documents. This approach is fundamental in fields where knowledge updates are frequent and cannot be fully memorized by LLMs. Our study provides the first detailed analysis of RAG's retrieval strategies, focusing on the types of passages IR should select. We assess various factors like relevance, positioning, and quantity of passages. Surprisingly, we found that high-scoring but irrelevant documents reduce LLM effectiveness, whereas including random documents can improve accuracy by up to 35%. These findings underscore the importance of refining retrieval methods in LLM integration and pave the way for further research.
Meta-Reinforcement Learning in Game Theoretical Scenarios
Imre Gergely Mali
Interactions between agents in multi-agent scenarios can be modeled as games, but in reality the actual parameters of the game being played is quite often unknown or changes over time. It is crucial for agents to quickly identify the scenario they find themselves in and adapt to these situations quickly. Meta-reinforcement learning has been successfully applied to quickly adapt to previously unknown problems such as multi-armed bandits, MDP-s, visual navigation tasks, etc. We intend to do the same but with games. Further, we distinguish and train on different classes of games such as zero-sum, cooperative, bargaining, coordination and auction games in order find algorithms that are quick to adapt, data-efficient, strategically diverse and resilient to changes.
Multi-property Steering of Large Language Models with Dynamic Activation Composition
Daniel Scalena
Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models' intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation
Deep Spatial Context: when attention-based models meet spatial regression
Paulina Tomaszewska
We propose ‘Deep spatial context’ (DSCon) method, which serves for investigation of the attention-based MIL (Multiple Instance Learning) vision models using the concept of spatial context. The DSCon allows for a quantitative measure of the spatial context’s role using three Spatial Context Measures: SCM_features, SCM_targets, SCM_residuals to distinguish if the spatial context is observable within the features of neighboring regions, their target values (attention scores) or residuals, respectively. It is achieved by integrating spatial regression into the pipeline. The DSCon helps to verify research questions. For instance, in the histopathological use case, it was observed that spatial relationships are much bigger in the case of the classification of tumor lesions than normal tissues.
Learning What to Monitor: Using Machine Learning to Improve past STL Monitoring
Nicola Saccomanno
Monitoring is a runtime verification technique that can be used to check whether an execution of a system (trace) satisfies or not a given set of properties. First, we introduce the pure past fragment of Signal Temporal Logic, and we use it to define the monitorable safety and cosafety fragments of STL. Then, we devise a multi-objective genetic programming algorithm to automatically extend the set of properties to monitor on the basis of the history of failure traces collected over time. The framework resulting from the integration of the monitor and the learning algorithm is then experimentally validated on various public datasets. The outcomes of the experimentation confirm the effectiveness of the proposed solution (work presented ad IJCAI 2024).
Domain Randomization for Robust, Affordable and Effective Closed-loop Control of Soft Robots
Andrea Protopapa
Reinforcement Learning for soft robots require massive amount of experience. Training from scratch in simulation is impractical, facing to the huge complexity of accurate models (days or even months for a single training session) and the reality gap problem. We investigate if Domain Randomization can address these two challenges: using DR properly, we can train on largely simplified models and transfer lossless on accurate ones or the real world (8x training time reduction); our studies show that DR can make policies robust to domain gaps, through our novel Adaptive DR method (RF-DROPO) tailored for partially observable systems like soft robots. We provide results on four different tasks and two soft robot designs, opening interesting perspectives on RL for closed-loop soft robot control.
Privacy-preserving datasets by capturing feature distributions with Conditional VAEs
Francesco Di Salvo
Large and well-annotated datasets are essential for advancing deep learning applications, however often costly or impossible to obtain by a single entity. In many areas, including the medical domain, approaches relying on data sharing have become critical to address those challenges. This work introduces a novel approach using Conditional Variational Autoencoders (CVAEs) trained on feature vectors extracted from large pre-trained vision foundation models. Foundation models effectively detect and represent complex patterns across diverse domains, allowing the CVAE to faithfully capture the embedding space of a given data distribution to generate (sample) a diverse, privacy-respecting, and potentially unbounded set of synthetic feature vectors.
Discovering interpretable physical models using Symbolic Regression and Discrete Exterior Calculus
Simone Manti
Machine Learning has significantly improved mathematical modeling and numerical simulations for Physics. However, the most common data-driven models require large datasets and produce complex, black-box models that are hard to interpret. Symbolic Regression (SR) has been proposed to discover equation-based models in a small-data regime. Still, most of the past SR works have focused mainly on classical Physics equations governed by algebraic or ordinary differential equations. To further advance the state-of-the-art, we propose a combination of SR and Discrete Exterior Calculus to discover new field theories from limited data. We validate our approach by re-deriving three Continuum Physics models: the Poisson, Euler's Elastica, and the Linear Elasticity equations.
Deep learning‐based optimization of field geometry for total marrow irradiation
Nicola Lambri
Total marrow irradiation requires ten radiation fields along the patient’s body and, for large anatomies, two specific fields on the arms. The field geometry is designed by specialized medical physicists (MPs). We developed convolutional neural networks (CNNs) to automate this process using a dataset of 117 patients. The CNNs input was a projected frontal view of the patients CT. Two CNNs were trained to predict geometries with (CNN-1) and without (CNN-2) fields on the arms. Local optimization methods refined the models output. Evaluated on 15 test patients, CNN-1 and CNN-2 achieved RMSEs of 13±3 mm and 18±4 mm, respectively. No significant differences from manual designs were observed after blind assessments by three MPs. The CNNs have been clinically implemented for prospective patients.
FFT-based Selection and Optimization of Statistics for Robust Recognition of Severely Corrupted Images
Elena Camuffo
Improving model robustness in case of corrupted images is among the key challenges to enable robust vision systems on smart devices, such as robotic agents. Particularly, robust test-time performance is imperative for most of the applications. This paper presents a novel approach to improve robustness of any classification model, especially on severely corrupted images. Our method (FROST) employs high-frequency features to detect input image corruption type, and select layer-wise feature normalization statistics. FROST provides the state-of-the-art results for different models and datasets, outperforming competitors on ImageNet-C by up to 37.1% relative gain, improving baseline of 40.9% mCE on severe corruptions.
Wednesday, September 11th
Graph Neural Networks Applied to Electroencephalography (EEG) Data for Dementia Classification
Thomas Barbera
Electroencephalography (EEG) is a easy and non-invasive procedure used to measure electrical potentials generated by brain activity by placing electrodes on the scalp. Recent developments in ML allow the analysis of these signals without the need for a physician. This has enabled the development of smart healthcare applications. Notably, EEG recordings can reveal a slowdown in brain rhythm and a reduction in signal complexity, often symptoms of cognitive decline. In this work we propose a novel approach employing a lightweight Graph Neural Network to classify dementia stages from EEG recordings exploiting slow brain rhythms. Experimental results demonstrate the effectiveness of our model, achieving competitive performance compared to existing approaches while being considerably lighter.
The Importance of Integral Time Length Windows for the Classification of Activities of Daily Living Based on Machine Learning Techniques
Ainhoa Ruiz Vitte
Pathological tremor, common in essential tremor (ET) and Parkinson’s disease (PD) patients, impacts their quality of life. This study proposes a method to classify daily activities using a single wrist-worn IMU. The dataset consists of IMU recordings from the dominant arm during 11 tasks performed by ET and PD patients. Features were extracted from different window sizes to train Random Forest (RF) and Support Vector Machine (SVM) models. Results show that larger windows, particularly 10 seconds, provided the highest average F1-score, though some activities were better classified with shorter windows. This method improves classification outcomes and suggests combining window lengths for further accuracy.
Predicting the conformational flexibility of antibody CDRs
Fabian Spoendlin
Proteins are highly flexible macromolecules and the ability to adapt their shape is fundamental to many functional properties. While a single, 'static' protein structure can be predicted at high accuracy, current methods are severely limited at predicting structural flexibility. A major factor limiting such predictions is the scarcity of suitable training data. Here, we focus on the functionally important antibody CDRs and related loop motifs. We implement a strategy to create a large dataset of evidence for conformational flexibility and develop AbFlex, a method able to predict CDR flexibility with high accuracy.
Causal Concept Embedding Models: Beyond Causal Opacity in Deep Learning
Gabriele Dominici
Causal opacity refers to the challenge of understanding the hidden causal structures in deep neural networks (DNNs), making it difficult to trust and verify these systems in high-stakes scenarios. This work addresses the issue by introducing Causal Concept Embedding Models (Causal CEMs), a new class of interpretable models designed for causal transparency. Our experiments show that Causal CEMs can match the performance of causally opaque models while offering enhanced causal interpretability. They support interventional and counterfactual analyses, improve reliability and fairness verification, and allow human-in-the-loop corrections, boosting both accuracy and explanation quality.
Sparks of Superhuman Persuasion: Large Language Models Beat Humans in Online Debates
Francesco Salvi
Can LLMs craft convincing arguments to change minds on polarizing political issues? In a pre-registered study, we examined AI-driven persuasion in a controlled setting, using a web platform where participants engaged in debates with either human or LLM opponents. Participants were randomly assigned to conditions varying whether the opponent was human or LLM, and whether the opponent had access to their sociodemographic information. Participants debating GPT-4 with access to personal information had 81.2% higher odds of increased agreement with their opponent (p < 0.01; N=900) compared to those debating humans. Our findings indicate that LLM-based persuasion has meaningful implications for social media governance and online environments.
Thinking Outside the BBox: Unconstrained Generative Object Compositing
Gemma Canet Tarres
Recent generative image compositing methods face limitations due to their reliance on masking the original object during training, which constrains their generation to the input mask. Furthermore, obtaining an accurate input mask specifying the location and scale of the object in a new image can be highly challenging. To overcome such limitations, we define a novel problem of unconstrained generative object compositing, i.e., the generation is not bounded by the mask, and train a diffusion-based model on a synthesized paired dataset. Our model is able to generate object effects such as shadows and reflections that go beyond the mask, enhancing image realism. Additionally, if an empty mask is provided, our model automatically places the object in diverse natural locations and scales.
Distinguishing Drivers via Wearable Sensor Data and Machine Learning
Natalia Piaseczna
The study focuses on using machine learning analysis of data collected from wearable sensors to identify patterns that differentiate skilled drivers from inexperienced ones. On a predetermined driving route, participants experienced a variety of driving conditions, including parking, navigating cities, driving on highways and driving through residential neighborhoods. The results highlight important differences in sensor data between inexperienced and experienced drivers, providing insight into possible uses for improving safety protocols, driver education, and personalized feedback systems.
State-of-the-Art Fails in the Art of Damage Detection
Daniela Ivanova
Accurately detecting and classifying damage in analogue media such as paintings, photographs, textiles, mosaics, and frescoes is essential for cultural heritage preservation. While machine learning models excel in correcting global degradation if the damage operator is known a priori, we show that they fail to predict where the damage is even after supervised training; thus, reliable damage detection remains a challenge. We introduce DamBench, a dataset for damage detection in diverse analogue media, with over 11,000 annotations covering 15 damage types across various subjects and media. We evaluate CNN, Transformer, and text-guided diffusion segmentation models, revealing their limitations in generalising across media types.
Reinforcement Learning for Heart Failure Treatment Optimization in the Intensive Care Unit
Cristian Drudi
Despite improvements in treatment, mortality rates among heart failure (HF) patients remain high, especially for those in the intensive care unit (ICU) who experience the highest in-hospital mortality rates. Clinical guidelines for the treatment of HF provide general recommendations, that however often lack strong evidence derived from RCTs and fail to determine personalized strategies. Previous literature has shown that reinforcement learning (RL) is effective in determining optimal treatment recommendations in critical care settings. In this study, we used RL to address uncertainty in the administration of vasopressors and diuretics while considering individual patient characteristics. The study indicates that RL achieved a significant mortality reduction of ≈ 20%.
morphOT: Morphological Alignment Of Point Clouds From Confocal Imaging Using Optimal Transport
Manuel Neumann
The creation of morphological 4D gene expression atlases will allow us to obtain a causal understanding of developmental processes. That is how a few undifferentiated cells gradually over the course of many cell divisions form complex organisms. Creating those 4D atlases requires access to computational tools that can project digital representation of biological species from different conditions (e.g. time points, replicates, etc.) into the same underlying space. This will enable us to reason how cells are connected in morphology and time. To this end, we developed a pipeline that extracts morphological features of 3D cells from confocal imaging data and predicts how cells between different biological objects related to one another using Optimal Transport.
Thursday, September 12th
Improving Reasoning and Planning of Language Models using Reinforcement Learning
Varun Dhanraj
This project explores a novel approach to enhance reasoning and planning capabilities in Large Language Models (LLMs) by integrating Reinforcement Learning (RL). The method interprets the LLM as both an agent and an environment, where the environment consists of the LLM's hidden vector formation process for predicting the next token, and the agent is a Deep Q-Network (DQN) that selects the next token based on that hidden vector. The DQN is trained to optimize token selection for specific tasks like the game of 24, receiving positive rewards for correct answers and negative penalties for errors and logical inconsistencies. This hybrid model aims to continually improve the LLMs task-specific performance by performing those tasks and receiving rewards.
Fine-tuning Protein Language Models (PLMs) to predict the effect of missense mutations on kinase function
Moritz Glaser
Current approaches are computationally expensive and use training data with unrepresentative class distribution. Here we fine-tune the 35M ESM2 PLM with different prediction heads for a 3-class classification problem on 3 Deep Mutational Scans (DMSs) covering all possible mutations. In our thus far experiments we achieve an F1 score of 0.61 for loss-of-function prediction on a fourth kinase DMS, but our models struggle to discern rare gain-of-function mutations. Performance is improving, yet progress is limited by compute limits. Besides, the pretraining objective of PLMs aligns only partly with function prediction corroborating the need for more holistic pretraining objectives apart from protein sequences and structure and “world models” of biology that fully understand function.
Federated Behavioural Planes: Explaining the Evolution of Client Behaviour in Federated Learning
Dario Fenoglio
Federated Learning (FL) allows clients to collaboratively train a model without sharing sensitive data, reducing privacy risks in distributed deep learning. However, ensuring trust and control in FL systems requires understanding clients’ evolving behaviour, a key challenge in current research. To address this, we introduce Federated Behavioural Planes (FBPs), an innovative method to analyse, visualise, and explain the dynamics of FL systems, showing how clients behave under two lenses: predictive performance and decision-making processes. FBPs provide informative trajectories describing the evolving states of clients and their contributions. Leveraging FBPs, we present a novel robust aggregation method that detects malicious clients, enhancing security beyond existing SOTA defenses.
A Multi-Reference Style and Multi-Modal Context-Awareness Zero-Shot Style Alignment in Image Generation
Alessio Borgi
In this work, I present a novel framework(pipeline) for zero-shot style alignment in image generation, enhancing traditional methods with multi-modal context-awareness and multi-reference style alignment. Our approach integrates diverse content types—images, audio, weather data, and music—leveraging models like BLIP, Whisper, and CLAP to generate richly contextualized text embeddings. Additionally, we introduce blending techniques such as linear weighted blending and spherical interpolation to combine multiple reference styles effectively. Using minimal attention sharing during the diffusion process, our method ensures style consistency without the need for fine-tuning, offering a robust solution for generating high-quality, style-aligned images across diverse inputs.
Deep Learning in the SPD cone with structure
Can Pouliquen
Estimating matrices in the symmetric positive-definite (SPD) cone is of interest for many applications. While there exist various convex optimization-based estimators, they remain limited in expressivity due to their model-based approach. The success of deep learning has led many to use neural networks instead. However, designing correct architectures is difficult: they either do not guarantee that their output has all the desired properties, rely on heavy computations, or are overly restrained to specific matrices. In this work, we propose a novel and generic learning module with guaranteed SPD outputs, that also enables learning a larger class of functions than existing approaches. Notably, it solves the challenging task of learning jointly SPD and sparse matrices.
How To Explain Reinforcement Learning with Shapley Values
Daniel Beechey
Reinforcement learning is a rich framework for creating intelligent agents that adapt and improve through continuous interaction with the world. However, uninterpretable agents hinder the deployment of reinforcement learning at scale. We use first principles to propose Shapley Values for Explaining Reinforcement Learning (SVERL), a mathematical framework for explaining agent-environment interactions in reinforcement learning. Paralleling Lloyd Shapley's work on attributing a game's outcome between its players, we show that SVERL is the unique method satisfying mathematical axioms for fairly attributing the influence of state feature values. In simple domains, SVERL produces meaningful explanations that match human intuition. In complex domains, the explanations reveal novel insight.
Multimodal deep learning approaches for fusion neuroimaging data
Maria Boyko
The World Health Organization (WHO) reported in 2001 that about 450 million people worldwide have some form of mental disorder or brain condition and that one in four people meet the criteria at some point in their lives. However, at the moment there are no clear biomarkers of psy- chiatric diseases and many of the psychiatric diseases are not treatable. Nowadays, there have been many studies that diagnose psychiatric dis- eases based on unimodal modality (for example, only on MRI or only on EEG). But at the same time, each modality has its own range of limitations and cannot reflect the whole picture of the disease. Therefore multimodal fusion can solve this problems and hopefully pro- vide a key to finding the missing link(s) in mental complex illness.
One Robot to Grasp Them All: The Fellowship of the Grippers
Stephany Ortuno Chanelo
As industries faces increasing demands for efficiency, the need to handle a wide variety of daily items has become crucial. This project addresses the challenges associated with bin automatic bin picking by proposing a novel gripper combination strategy. By equipping a single robotic system with both a vacuum and a parallel gripper, the robot can efficiently handle diverse objects of varying shapes, sizes, and materials. This approach not only reduces the need for multiple specialized robots, thereby lowering costs, but also increases the system’s adaptability and efficiency in dynamic environments. Key contributions include the development of a multi-gripper robotic system, and a grasping pose generator based on semantic segmentation masks.
Learning Long Sequences in Spiking Neural Networks
Matei Ioan Stan
A recent renewed interest in efficient alternatives to Transformers has given rise to state-of-the-art recurrent architectures named state space models (SSMs). This work systematically investigates, for the first time, the intersection of state-of-the-art SSMs with SNNs for long-range sequence modelling. Results suggest that SSM-based SNNs can outperform the Transformer on all tasks of a well-established long-range sequence modelling benchmark. A novel feature mixing layer is introduced, improving SNN accuracy while challenging assumptions about the role of binary activations in SNNs. This work paves the way for deploying powerful SSM-based architectures, such as large language models, to neuromorphic hardware for energy-efficient long-range sequence modelling.
Torsion in Persistent Homology and Neural Networks
Maria Walch
Synergy between topological data analysis (TDA) and deep learning is increasingly leveraged in hybrid methods, particularly in dimensionality reduction techniques like the 'topological autoencoder.' These approaches often rely on predetermined assumptions for both computational and mathematical reasons. Typically, TDA’s key invariants, the persistence diagrams, are computed over a coefficient field. When these vectorized invariants are used in deep learning, especially in loss function design, it's assumed that torsion effects in the data are negligible. However, based on Obayashi and Yoshiwaki's work, 'Field Choice Problem in Persistent Homology,' we demonstrate that these assumptions are incorrect.
Eye-Tracking as an Intelligent Human-Computer Interface
Andrei Paul Bejan
The technology has evolved rapidly in the last decades, and there is now the opportunity to move towards new, intelligent, more intuitive, and faster humancomputer interfaces. Eye-Tracking represents a promising step in this direction. Recent developments in appearance-based gaze estimation using deep learning have come a long way, and soon they will reach consumer-ready performance. This work introduces FastSightNet, an efficient model based on the MobileNet architecture, evaluated and compared with existing models. Trained on the MPIIFaceGaze dataset, FastSightNet achieves a 5.1° mean angular error with 77 frames per second inference speed. Additionally, GazeTrack is presented, a system that allows real-time evaluation of the trained models on the webcam feed in both 3D and 2D scenarios.
MARLYC: multi-agent reinforcement learning yaw control
Elie Kadoche
Inside wind farms, turbines are subject to physical interactions such as the wake effects. In this work, a new method called (MARLYC) is proposed to control the yaw of each turbine in order to improve the total energy production of the farm. It consists in the centralized training and decentralized execution of multiple reinforcement learning agents, each agent controlling the setting of one turbine’s yaw. Agents are trained together so that collective control strategies can emerge. During execution, agents are completely independent, making their usage simpler. MARLYC increases the total energy production by controlling the yaws of the turbines judiciously with negligible increase of the computation time.
BioNAS: Incorporating Bio-inspired Learning Rules to Neural Architecture Search
Imane Hamzaoui
We propose BioNAS, a framework for neural architecture search that explores different bio-inspired neural network architectures and learning rules. The novelty of BioNAS lies in exploring the use of different bio-inspired learning rules for the different layers of the model. Using BioNAS, we get state-of-the-art bio-inspired neural network performance achieving an accuracy of 94.81 on CIFAR10, 76.48 on CIFAR-100 and 45.38 on ImageNet16-120, surpassing state-of-the-art bio-inspired neural networks. We show that a part of this improvement comes from the use of different learning rules instead of using a single algorithm for all the layers.