Djallel Bouneffouf
We present Exponentiated Gradient LINUCB, an algorithm for con-textual multi-armed bandits. This algorithm uses Exponentiated Gradient to find the optimal exploration of the LINUCB. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.
Djallel Bouneffouf
The wide development of mobile applications provides a considerable amount of data of all types (images, texts, sounds, videos, etc.). Thus, two main issues have to be considered: assist users in finding information and reduce search and navigation time. In this sense, context-based recommender systems (CBRS) propose the user the adequate information depending on her/his situation. Our work consists in applying machine learning techniques and reasoning process in order to bring a solution to some of the problems concerning the acceptance of recommender systems by users, namely avoiding the intervention of experts, reducing cold start problem, speeding learning process and adapting to the user's interest. To achieve this goal, we propose a fundamental modification in terms of how we model the learning of the CBRS. Inspired by models of human reasoning developed in robotic, we combine reinforcement learning and case-based reasoning to define a contextual recommendation process based on different context dimensions (cognitive, social, temporal, geographic). This paper describes an ongoing work on the implementation of a CBRS based on a hybrid Q-learning (HyQL) algorithm which combines Q-learning, collaborative filtering and case-based reasoning techniques. It also presents preliminary results by comparing HyQL and the standard Q-Learning w.r.t. solving the cold start problem.
Djallel Bouneffouf
This project is part of the development of mobile CRM. It aims to develop a management application client named NOMALYS. This application allows the commercial and business leaders to see their CRM Mobile. We have focused in this project on the techniques of projects management, this study allowed to classify different techniques for managing software projects and proposed the most closely technique that match the needs of the studied company.
Djallel Bouneffouf, Sohini Upadhyay, Yasaman Khazaeni
We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the reward associated with each context-based decision may not always be observed("missing rewards"). This new problem is motivated by certain online settings including clinical trial and ad recommendation applications. In order to address the missing rewards setting, we propose to combine the standard contextual bandit approach with an unsupervised learning mechanism such as clustering. Unlike standard contextual bandit methods, by leveraging clustering to estimate missing reward, we are able to learn from each incoming event, even those with missing rewards. Promising empirical results are obtained on several real-life datasets.
Djallel Bouneffouf
Active learning strategies respond to the costly labelling task in a supervised classification by selecting the most useful unlabelled examples in training a predictive model. Many conventional active learning algorithms focus on refining the decision boundary, rather than exploring new regions that can be more informative. In this setting, we propose a sequential algorithm named EG-Active that can improve any Active learning algorithm by an optimal random exploration. Experimental results show a statistically significant and appreciable improvement in the performance of our new approach over the existing active feedback methods.
Djallel Bouneffouf
In this paper, we develop a dynamic exploration/ exploitation (exr/exp) strategy for contextual recommender systems (CRS). Specifically, our methods can adaptively balance the two aspects of exr/exp by automatically learning the optimal tradeoff. This consists of optimizing a utility function represented by a linearized form of the probability distributions of the rewards of the clicked and the non-clicked documents already recommended. Within an offline simulation framework we apply our algorithms to a CRS and conduct an evaluation with real event log data. The experimental results and detailed analysis demonstrate that our algorithms outperform existing algorithms in terms of click-through-rate (CTR).
Djallel Bouneffouf
The information that mobiles can access becomes very wide nowadays, and the user is faced with a dilemma: there is an unlimited pool of information available to him but he is unable to find the exact information he is looking for. This is why the current research aims to design Recommender Systems (RS) able to continually send information that matches the user's interests in order to reduce his navigation time. In this paper, we treat the different approaches to recommend.
Djallel Bouneffouf
The evolution of the user's content still remains a problem for an accurate recommendation.This is why the current research aims to design Recommender Systems (RS) able to continually adapt information that matches the user's interests. This paper aims to explain this problematic point in outlining the proposals that have been made in research with their advantages and disadvantages.
Djallel Bouneffouf, Raphaël Feraud
We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different online problems like active learning, music and interface recommendation applications, where when an arm is sampled by the model the received reward change according to a known trend. By adapting the standard multi-armed bandit algorithm UCB1 to take advantage of this setting, we propose the new algorithm named A-UCB that assumes a stochastic model. We provide upper bounds of the regret which compare favourably with the ones of UCB1. We also confirm that experimentally with different simulations
Djallel Bouneffouf
Ubiquitous information access becomes more and more important nowadays and research is aimed at making it adapted to users. Our work consists in applying machine learning techniques in order to bring a solution to some of the problems concerning the acceptance of the system by users. To achieve this, we propose a fundamental shift in terms of how we model the learning of recommender system: inspired by models of human reasoning developed in robotic, we combine reinforcement learning and case-base reasoning to define a recommendation process that uses these two approaches for generating recommendations on different context dimensions (social, temporal, geographic). We describe an implementation of the recommender system based on this framework. We also present preliminary results from experiments with the system and show how our approach increases the recommendation quality.
Charu Aggarwal, Djallel Bouneffouf, Horst Samulowitz, Beat Buesser, Thanh Hoang, Udayan Khurana, Sijia Liu, Tejaswini Pedapati, Parikshit Ram, Ambrish Rawat, Martin Wistuba, Alexander Gray
Data science is labor-intensive and human experts are scarce but heavily involved in every aspect of it. This makes data science time consuming and restricted to experts with the resulting quality heavily dependent on their experience and skills. To make data science more accessible and scalable, we need its democratization. Automated Data Science (AutoDS) is aimed towards that goal and is emerging as an important research and business topic. We introduce and define the AutoDS challenge, followed by a proposal of a general AutoDS framework that covers existing approaches but also provides guidance for the development of new methods. We categorize and review the existing literature from multiple aspects of the problem setup and employed techniques. Then we provide several views on how AI could succeed in automating end-to-end AutoDS. We hope this survey can serve as insightful guideline for the AutoDS field and provide inspiration for future research.
Djallel Bouneffouf, Srinivasan Parthasarathy, Horst Samulowitz, Martin Wistub
We consider the stochastic multi-armed bandit problem and the contextual bandit problem with historical observations and pre-clustered arms. The historical observations can contain any number of instances for each arm, and the pre-clustering information is a fixed clustering of arms provided as part of the input. We develop a variety of algorithms which incorporate this offline information effectively during the online exploration phase and derive their regret bounds. In particular, we develop the META algorithm which effectively hedges between two other algorithms: one which uses both historical observations and clustering, and another which uses only the historical observations. The former outperforms the latter when the clustering quality is good, and vice-versa. Extensive experiments on synthetic and real world datasets on Warafin drug dosage and web server selection for latency minimization validate our theoretical insights and demonstrate that META is a robust strategy for optimally exploiting the pre-clustering information.
Djallel Bouneffouf, Raphaël Féraud, Sohini Upadhyay, Yasaman Khazaeni, Irina Rish
In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration;however, the agent has a freedom to choose which variables to observe. We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets
Djallel Bouneffouf
To follow the dynamicity of the user's content, researchers have recently started to model interactions between users and the Context-Aware Recommender Systems (CARS) as a bandit problem where the system needs to deal with exploration and exploitation dilemma. In this sense, we propose to study the freshness of the user's content in CARS through the bandit problem. We introduce in this paper an algorithm named Freshness-Aware Thompson Sampling (FA-TS) that manages the recommendation of fresh document according to the user's risk of the situation. The intensive evaluation and the detailed analysis of the experimental results reveals several important discoveries in the exploration/exploitation (exr/exp) behaviour.
Djallel Bouneffouf, Raphael Feraud
Bandit algorithms and Large Language Models (LLMs) have emerged as powerful tools in artificial intelligence, each addressing distinct yet complementary challenges in decision-making and natural language processing. This survey explores the synergistic potential between these two fields, highlighting how bandit algorithms can enhance the performance of LLMs and how LLMs, in turn, can provide novel insights for improving bandit-based decision-making. We first examine the role of bandit algorithms in optimizing LLM fine-tuning, prompt engineering, and adaptive response generation, focusing on their ability to balance exploration and exploitation in large-scale learning tasks. Subsequently, we explore how LLMs can augment bandit algorithms through advanced contextual understanding, dynamic adaptation, and improved policy selection using natural language reasoning. By providing a comprehensive review of existing research and identifying key challenges and opportunities, this survey aims to bridge the gap between bandit algorithms and LLMs, paving the way for innovative applications and interdisciplinary research in AI.
Djallel Bouneffouf, Charu C. Aggarwal
In recent years, the Neurosymbolic framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance. This success is due to its stellar performance combined with attractive properties, such as learning and reasoning. The new emerging Neurosymbolic field is currently experiencing a renaissance, as novel frameworks and algorithms motivated by various practical applications are being introduced, building on top of the classical neural and reasoning problem setting. This article aims to provide a comprehensive review of significant recent developments in real-world applications of Neurosymbolic Artificial Intelligence. Specifically, we introduce a taxonomy of common Neurosymbolic applications and summarize the state-of-the-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this burgeoning field.
Djallel Bouneffouf, Irina Rish
In recent years, multi-armed bandit (MAB) framework has attracted a lot of attention in various applications, from recommender systems and information retrieval to healthcare and finance, due to its stellar performance combined with certain attractive properties, such as learning from less feedback. The multi-armed bandit field is currently flourishing, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize state-of-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this exciting and fast-growing field.
Djallel Bouneffouf, Emmanuelle Claeys
We study here the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting. In the traditional algorithms that solve the contextual bandit problem, the exploration is a parameter that is tuned by the user. However, our proposed algorithm learn to choose the right exploration parameters in an online manner based on the observed context, and the immediate reward received for the chosen action. We have presented here two algorithms that uses a bandit to find the optimal exploration of the contextual bandit algorithm, which we hope is the first step toward the automation of the multi-armed bandit algorithm.
Djallel Bouneffouf
The notion of profile appeared in the 1970s decade, which was mainly due to the need to create custom applications that could be adapted to the user. In this paper, we treat the different aspects of the user's profile, defining it, profile, its features and its indicators of interest, and then we describe the different approaches of modelling and acquiring the user's interests.
Djallel Bouneffouf
We introduce in this paper an algorithm named Contextuel-E-Greedy that tackles the dynamicity of the user's content. It is based on dynamic exploration/exploitation tradeoff and can adaptively balance the two aspects by deciding which situation is most relevant for exploration or exploitation. The experimental results demonstrate that our algorithm outperforms surveyed algorithms.