DeLTA 2024 Abstracts


Area 1 - Computer Vision Applications

Full Papers
Paper Nr: 22
Title:

Development and Applications of Gesture-Controlled Drones: Advances in Hand Gesture Recognition for Aerial Navigation

Authors:

Diego García Ricaño and Lorna Verónica Rosas Téllez

Abstract: This paper explores the feasibility of using hand gestures to control drones instead of traditional joystick control methods. As drones become more popular and accessible, the need for a more intuitive and simple control method increases. The paper analyses existing technologies and evaluates different methods of gesture control, using open-source computer vision libraries to track the user's hand and map the hand gestures to the controls of the drone. The paper also identifies limitations and challenges in implementing gesture control for drones and proposes solutions to improve its effectiveness. The system allows users to control the drone's movement in three axes, as well as raw movement, landing, and an emergency stop gesture. The paper concludes that gesture-based drone control can provide an alternative and more accessible control method for drones.

Paper Nr: 30
Title:

Closing the Sim-to-Real Gap: Enhancing Autonomous Precision Landing of UAVs with Detection-Informed Deep Reinforcement Learning

Authors:

Harry Soteriou, Christos Kyrkou and Panayiotis S. Kolios

Abstract: Autonomous precision landing of a UAV is a challenging task relying on simultaneous target localization and control. It can not be performed using GPS coordinates, due to limitations in the accuracy of the technology. Control policies are fine-tuned in simulation before deployment but most simulators are equipped with low-quality graphics, posing a challenge when utilizing vision based algorithms, intended to be implemented in the real world. In this paper we showcase a joint computer vision and reinforcement learning approach, in the photo-realistic simulator AirSim, to reduce the sim-to-real gap. Localization is performed using the Yolov8 object detector and control using the PPO and LSTM-PPO algorithms. We achieve a 94.25% and 94.86% success rate, over 1391 and 817 landings at three different simulated environments, respectively.

Paper Nr: 68
Title:

Version 8 of YOLO for Wildfire Detection

Authors:

César Abreu, Creison Nunes and Adriano Lisboa

Abstract: Large-scale forest fires can cause irreversible damage to the ecosystem, prompting research into different ways to prevent and monitor these events from happening. One possible prevention method is monitoring areas most susceptible to fires and using computer vision techniques to detect these events as quickly as possible while they are still small-scale, accelerating the response of responsible authorities, and hence reducing environmental damage and restoration costs. Convolutional neural networks (CNNs) is currently enjoying the best accuracy among other methods (e.g. feature modeling) for wildfire detection from images. This paper applies version 8 of YOLO to reduce computational costs while maintaining the high detection capability.

Paper Nr: 77
Title:

ME-ODAL: Mixture-of-Experts Ensemble of CNN Models for 3D Object Detection from Automotive LiDAR Point Clouds

Authors:

Dhvani Katkoria, Jaya Sreevalsan-Nair, Mayank Sati and Sunil Karunakaran

Abstract: 3D object detection from automotive/vehicle LiDAR point clouds is used for environment perception which is much needed for autonomous driving. While this task is achieved using different deep learning models, not all models work well for all classes and their instances. There is limited work in the use of ensemble methods for LiDAR point cloud analysis that finds the optimal model from a set of existing models. We propose a workflow for an ensemble method for Object Detection from Automotive LiDAR point clouds (ODAL), built using convolutional neural networks (CNNs). Our proposed method is a Mixture-of-Experts method (ME-ODAL) for Object Detection from Automotive LiDAR. For ME-ODAL, we propose a regression model learned on shape descriptor features as the meta-learning model, i.e., the gating function. We tested the workflow on the nuScenes dataset using two ensembles, i.e., different sets of CNNs to compare their performance. Our experimental results show that the ensemble of models demonstrates the expected results of overall improved accuracy.

Short Papers
Paper Nr: 29
Title:

Evolving Deep Architectures: A New Blend of CNNs and Transformers Without Pre-Training Dependencies

Authors:

Manu Kiiskilä and Padmasheela Kiiskilä

Abstract: Modeling in computer vision is slowly moving from Convolution Neural Networks (CNNs) to Vision Transformers due to the high performance of self-attenuation mechanisms in in capturing global dependencies within the data. Although vision transformers proved to surpass CNNs in performance and require less computational power, their need for pre-training on large-scale datasets can become burdensome. Using pre-trained models has critical limitations, including limited flexi-bility to adjust network structures and domain mismatches of source and target domains. To address this, a new architecture with a blend of CNNs and Transformers is proposed. SegFormer with four transformer blocks is used as an example, replacing the first two transformer blocks with two CNN modules and training from scratch. Experiments with the MS COCO dataset show a clear improvement in accuracy with C-C-T-T architecture compared to T-T-T-T architecture when trained from scratch on limited data. This project proposes an architecture modifying the SegFormer Trans-former with two convolutional modules, achieving pixel accuracies of 0.6956 on MS COCO

Paper Nr: 40
Title:

Refining Weights for Enhanced Object Similarity in Multi-Perspective 6Dof Pose Estimation and 3D Object Detection

Authors:

Budiarianto S. Kusumo and Ulrike Thomas

Abstract: At the moment, there are increasing trends in using deep learning for 6 DoF pose estimation and 3D object detection. Recognizing objects and determining their 3D positioning and orientation in the scene has numerous applications in robotic applications. However, due to the variety of objects within the real world, the challenge is complicated. They have various 3D shapes, and their appearance on images can be affected by lighting, clutter in the scenery, and occlusions between objects. Matching feature points between 3D models and images are used to solve the challenge of 6-DOF object pose estimation. We propose a hybrid method for 6 Dof and 3D object detection using a Modification of the Resnet18 pre-trained model, combined with object matching and object symmetric method. We conduct experimental tests with video files and a live view cam. Visualization test variations are carried out on single object single perspective view, single object multi-perspective, multi-object single perspective view, and multi-object in multi-perspective view. We evaluated the performance of this hybrid method on 15 types of objects from Linemod (LM) and Linemod Occluded (LM-O dataset, resulting in a total training loss of 0.0533049 and a total validation loss of 0.0481146.

Paper Nr: 58
Title:

Automatic Emotion Analysis in Movies: Matteo Garrone’s Dogman as a Case Study

Authors:

Claudiu D. Hromei, Alessia Forciniti, Daniele Margiotta and Stefano Locati

Abstract: This paper is the first step of an expansive ongoing initiative centered on automated film analysis through an ecocritical lens. Ecocriticism, an interdisciplinary field, delves into environmental themes within cultural works, broadening the scope of humanities' focus on representation issues. Our objective is to pioneer a method for automated, dependable analysis of audiovisual narratives within fictional feature films, exploring the interplay between human emotions exhibited by characters and their surrounding environments. Using the acclaimed Italian crime/noir film, Dogman (2018), as a case study, we have constructed a modular pipeline integrating Facial Recognition and Emotion Detection technologies to scrutinize the emotional dynamics of the film's two main characters. Our approach facilitates a comprehensive comparison over the film's duration, enabling human analysts to future insights into the nuanced relationship between characters' emotional states and the environmental contexts in which they unfold. Preliminary findings indicate promising outcomes from our pipeline, laying a solid foundation for subsequent film analyses. These results not only underscore the viability of automated methods in film studies but also offer a substantive starting point for deeper explorations into the complex interconnections between human emotions and cinematic environments.

Paper Nr: 65
Title:

Computer Vision Based Monitoring System for Flotation in Mining Industry 4.0

Authors:

Ahmed Bendaouia, El Hassan Abdelwahed, Sara Qassimi, Abdelmalek Boussetta, Intissar Benzakour, Mustapha Ahricha, Oumkeltoum Amar and François Bouzeix

Abstract: In the mineral processing industry, specifically in froth flotation, the extraction of detailed information from bubble images is imperative for effective monitoring of the flotation process and its associated production indicators. This study delves into a range of semantic segmentation methods and algorithms, notably including YOLO, Watershed, and Thresholding, to accurately process these images. Our investigation leads to the proposal of an innovative cloud-based segmentation architecture, seamlessly integrated with the Internet of Things (IoT). This integration not only enhances the segmentation process but also supports a comprehensive monitoring application, offering a significant advancement in the real-time analysis and optimization of the flotation process. The article presents an empirical comparison of the segmentation methods and demonstrates the efficacy of the proposed cloud-based system in a practical industrial setting.

Paper Nr: 66
Title:

Self-Supervised Learning for Robust Surface Defect Detection

Authors:

Muhammad Aqeel, Shakiba Sharifi, Marco Cristani and Francesco Setti

Abstract: In this study, we discuss about the use of Self-Supervised Learning to improve robustness of Surface Defect Detection (SDD) models. We show how different state-of-the-art SDD methods are already implementing some sort of self-supervision in their learning procedure, and we discuss how more advanced techniques inspired to Confident Learning can be used in a generic pipeline. We also propose One-Shot Removal strategy, a baseline approach that can be applied to any SDD model to improve its robustness. Our method employs a three-step training pipeline: initial training on the entire dataset, followed by removal of anomalous samples, and fine-tuning on the refined dataset. Experiments conducted on the challenging Kolektor SDD2 dataset show how this process enhances the representation of `normal' data and mitigates overfitting risks.

Paper Nr: 79
Title:

Deep Learning for Cattle Face Identification

Authors:

Sinan Dede, Eleni Vrochidou, Venetis Kanakaris and George A. Papakostas

Abstract: Cattle farming plays a crucial role in meeting the increasing nutritional demands of growing populations. Therefore, it is important to have efficient methods of identifying individual cattle for effective livestock management, maintaining the herd’s health, and ensuring farms’ financial sustainability. Traditional identification methods, such as ear tags and branding, are not ideal due to theft issues and discomfort for the animals, while microchips, although accurate, present logistical and ethical challenges. Therefore, novel identification methods are required. This work aims to review all current innovative cattle identification systems, specifically utilizing facial recognition technology based on deep learning. The conducted research identifies and summarizes state-of-the-art approaches for data preprocessing, feature extraction, model training, and testing. Furthermore, a comparative study on the performance metrics of the identified works takes place. Challenges associated with lighting conditions, dataset quality, processing speed, and practical implementation in dynamic farm environments, such as the motion of cattle, are also reported.

Area 2 - Models and Algorithms

Full Papers
Paper Nr: 36
Title:

Action Conditioned Attention Encoder-Decoder and Discriminator for Human Motion Generation

Authors:

Chaitanya Bandi and Ulrike Thomas

Abstract: We present a CVAE-GAN-based architecture for human mo- tion generation with an action-conditioned variational autoencoder and a generative discriminator. In this work, we focus on generating more accu- rate actions performed by a single person using a conditioned generative model. The primary motivation for this work comes from human-robot collaboration scenarios where a person interacts with the robot using hu- man actions. Our approach consists of a self-attention-based conditional variational autoencoder for reconstructions and a graph network-based discriminator for realistic human motion quality. We evaluate our net- work on three open-source datasets known as HumanAct12, NTU 120 RGB+D, and Actions in Supermarket Dataset. The extensive experi- ments show that the presented approach works for various human mo- tions and input representations, such as the SMPL pose parameters, tra- jectory data, and skeleton joints. We achieve higher accuracy compared to the state-of-the-art methods on all three datasets.

Paper Nr: 46
Title:

Few-Shot Learning with Novelty Detection

Authors:

Kim Bjerge, Paul Bodesheim and Henrik Karstoft

Abstract: Machine learning has achieved considerable success in data-intensive applications, yet encounters challenges when confronted with small datasets. Recently, few-shot learning (FSL) has emerged as a promising solution to address this limitation. By leveraging prior knowledge, FSL exhibits the ability to swiftly generalize to new tasks, even when presented with only a handful of samples in an accompanied support set. This paper extends the scope of few-shot learning by incorporating novelty detection for samples of categories not present in the support set of FSL. This extension holds substantial promise for real-life applications where the availability of samples for each class is either sparse or absent. Our approach involves adapting existing FSL methods with a cosine similarity function, complemented by the learning of a probabilistic threshold to distinguish between known and outlier classes. During episodic training with domain generalization, we introduce a scatter loss function designed to disentangle the distribution of similarities between known and outlier classes, thereby enhancing the separation of novel and known classes. The efficacy of the proposed method is evaluated on commonly used FSL datasets and the EU Moths dataset characterized by few samples. Our experimental results showcase accuracy, ranging from 95.4% to 96.7%, as demonstrated on the Omniglot dataset through few-shot-novelty learning (FSNL). This high accuracy is observed across scenarios with 5 to 30 classes and the introduction of novel classes in each query set, underscoring the robustness and versatility of our proposed approach.

Paper Nr: 47
Title:

Geometrical Realization for Time Series Forecasting

Authors:

Ali Bayeh, Malek Mouhoub and Samira Sadaoui

Abstract: This research primarily focuses on tackling the challenge of forecasting univariate time series. Existing methods for training forecasting models range from traditional statistical models to domain-specific algorithms, and more recently deep neural network models typically rely on raw time series observations, with alternative representations like higher-dimensional embedding used mainly for auxiliary analyses such as Topological Data Analysis (TDA). In contrast to conventional time series analysis methods, this study explores the impact of higher-dimensional embedding as the primary data representation. Leveraging this higher-dimensional embedding, we introduce a geometrical realization model that captures crucial data points of the embedding representation. Subsequently, we propose a deep neural network model inspired by N-BEATS, incorporating a TDA model, an attention model, and a convolutional neural network (CNN) model in parallel as sub-modules alongside the geometrical realization model. To assess the efficacy of the proposed model, we conduct evaluations on diverse time series datasets spanning various domains, including electricity load demands and M4 competition datasets. Furthermore, we conducted an ablation study to analyze the specific contributions of each sub-module towards the final predictions.

Paper Nr: 59
Title:

Empowering Cybersecurity: CyberShield AI Advanced Integration of Machine Learning and Deep Learning for Dynamic Ransomware Detection

Authors:

Sijjad Ali, Asad Ali, Muhammad Uzair, Hamza Amir, Rana Zaki A. Bari, Hamid Sharif, Maryam Jamil, M. Hunza, Nabel Akram and Sharofiddin Allaberdiev

Abstract: Cybersecurity is an ever-changing field, and the onset of powerful ransomware attacks makes it important to develop high-technology detection techniques. In this article, CyberShield AI is presented as an innovative integration of Artificial Neural Networks (ANN), Light Gradient Boosting Machine (LightGBM), and eXtreme Gradient Boosting (XGBoost). The methodology utilizes on the specific strengths of its individual components to provide unprecedented precision, speed, and functionality in the process of ransomware detection. By applying rigorous data processing, feature selection, and accurate evaluation, CyberShield AI is featured for flawless detection of favorable cases and ransomware-affected samples. We conclude by providing a detailed account of the integration that arises from the combination of advanced machine learning and deep learning technologies, setting high standards for real-time threat detection. The application of CyberShield AI addresses the compromises in cybersecurity methodologies by offering robust defenses against dynamic cyber threats. Hence, the research not only describes the remarkable detection capabilities of the model but also sets a standard for future cybersecurity measures by providing more robust barriers against ransomware and other malicious factors. The rise of CyberShield AI plays a exemplary role and marks the establishment of artificial intelligence-powered cybersecurity tools.

Paper Nr: 78
Title:

BitNet b1.58 Reloaded: State-of-the-Art Performance Also on Smaller Networks

Authors:

Jacob Nielsen and Peter Schneider-Kamp

Abstract: Recently proposed methods for 1-bit and 1.58-bit quantiza- tion aware training investigate the performance and behavior of these methods in the context of large language models, finding state-of-the-art performance for models with more than 3B parameters. In this work, we investigate 1.58-bit quantization for small language and vision models ranging from 100K to 48M parameters. We introduce a variant of BitNet b1.58, which allows to rely on the median rather than the mean in the quantization process. Through extensive experiments we investigate the performance of 1.58-bit models obtained through quantization aware train- ing. We further investigate the robustness of 1.58-bit quantization-aware training to changes in the learning rate and regularization through weight decay, finding different patterns for small language and vision models than previously reported for large language models. Our results showcase that 1.58-bit and 16-bit quantization-aware training provides state-of-the- art performance for small language models when doubling hidden layer sizes and reaches or even surpasses state-of-the-art performance for small vision models of identical size. Ultimately, we demonstrate that 1.58-bit quantization-aware training is a viable and promising approach also for training smaller deep learning networks.

Short Papers
Paper Nr: 23
Title:

Towards Natural-Sounding Speech to Text in English

Authors:

Evalds Urtans, Kriss Saulitis and Vairis Caune

Abstract: This study focuses on a systematic review of the literature and a comparison of 20 notable English speech synthesis methods. Using a defined data set and quality, precision criteria 9 models are objectively analyzed. The research methodology includes the configuration of speech synthesis models to generate audio samples, which are then used to compare models based on established criteria. The NISQA model is used to evaluate speech quality through machine learning, mimicking the subjective MOS metric. The CoMoSpeech model showed the best quality indicators (MOS - 3.85), while the VITS model demonstrated the highest precision (CER - 1. 48%) and the total metric average. The study includes a detailed assessment of the strengths and weaknesses of various models.

Paper Nr: 27
Title:

Detecting Flow via a Machine Learning Model in a MOOC Context

Authors:

Sergio Iván Ramírez Luelmo, Nour El Mawas, Rémi Bachelet and Jean Heutte

Abstract: Flow is a human psychological state positively correlated to self-efficacy, motiva-tion, engagement, and academic achievement, all of which positively affect learn-ing. However, automatic, real-time flow detection is extremely difficult, a chal-lenge particularly exacerbated in a Massively Online Open Course (MOOC) con-text, where the distant, and asynchronous components rejoin the educational and online context. We approach this issue by training a Machine Learning (ML) model to detect flow transparent and automatically in a MOOC. We pair the re-sults of the EduFlow2 and Flow-Q questionnaires (n = 1 553, two years data col-lection), and their MOOC log data (French MOOC “Gestion de Projet” [Project Management]) to a ML pipeline to create a ML model that detects flow (ROC = 0.68 and PRC = 0.87) in a MOOC context. This ML model detects flow (0.85) with a greater Precision than its absence (0.34).

Paper Nr: 32
Title:

Mitigating Class Imbalance in Healthcare AI Image Classification: Evaluating the Efficacy of Existing Generative Adversarial Network

Authors:

Dennis Chia Yin Lim, Brian Chung Shiong Loh, Wan Tze Vong and Patrick Then

Abstract: This study tackles the challenge of class imbalance in medical datasets, crucial for accurate and reliable classification models in healthcare. Acknowledging the paramount importance of precise diagnostic tools in the medical field, the research aims to understand and mitigate the adverse effects of class imbalance on AI-driven diagnostic systems. To address this, it em-ploys state-of-the-art techniques including data augmentation, DCGAN, Pix2Pix, and diffusion methods.Data augmentation is selected for its sim-plicity and widespread adoption among researchers, providing a straight-forward means to expand dataset sizes. DCGAN is chosen as an advanced method capable of generating high-quality images from noise, representing one of the earliest and most sophisticated GANs. Pix2Pix is incorporated due to the relevance of image translation GAN methods in healthcare, par-ticularly in tasks like translating MRI images to CT scans. Additionally, the diffusion method is included to explore recent advancements in image generation, especially its ability to produce realistic images controlled by text prompts, with a focus on its effectiveness in medical applications.Using a subset of the HAM10000K dataset comprising 600 normal skin images and 600 melanoma images, the study gradually reduces the number of melanoma images to demonstrate the detrimental effects of class imbalance on classifi-cation accuracy. Through experimentation with techniques such as data augmentation and synthetic image generation, the research aims to provide insights into enhancing the performance of classification models in medical AI applications and the challenges faced by the experimentation technique. Future research endeavors should prioritize surmounting the challenges linked with Stable Diffusion, particularly its demand for extensive datasets for training. Moreover, exploring its potential for generating multi-modality medical images could unlock new avenues for enhancing classification model performance in imbalanced datasets.

Paper Nr: 34
Title:

Vector Analysis of Deep Neural Network Training Process

Authors:

Alexey Podoprosvetov, Vladimir Smolin and Sergey Sokolov

Abstract: Optimization of parameters in complex systems based on gradient descent has become widely used. Its application for training deep neural networks is commonly referred to as the "backpropagation method". The simplicity of the ideas behind gradient descent and the success of neural network algorithms in solving intelligent tasks on one hand create the impression of the correctness of the widespread use of the backpropagation method, but on the other hand, they hinder understanding of the mechanisms by which neural networks implement complex transformations from input signals to output. Advancing understanding of neural network algorithms can be facilitated by vector analysis of the deep learning process. This article provides several examples of using vector-matrix analysis for various aspects of training neural networks based on the backpropagation method. They not only help to better understand the essence of the transformations performed but also offer recommendations for speeding up and "harmonizing" the process of training neural networks. However, the main goal of the article is not so much to improve individual aspects of the operation of neural network algorithms as to demonstrate the effectiveness of applying vector-matrix analysis to studying various properties of neural network data processing.

Paper Nr: 37
Title:

Citation Polarity Identification in Scientific Research Articles Using Deep Learning Methods

Authors:

Souvik Kundu and Robert E. Mercer

Abstract: The way in which scientific research articles are cited reflects how previous works are utilized by other researchers and stakeholders and can indicate the impact of that work on subsequent experiments. Citations can be perceived as positive, negative, or neutral. While current citation indexing systems provide information on the author and publication name of the cited article, as well as the citation count, they do not indicate the polarity of the citation. This study aims to identify the polarity of citations in scientific research articles. The method uses pre-trained language models, BERT, Bio-BERT, RoBERTa, Bio-RoBERTa, ELECTRA, ALBERT, and SPECTER, as the word embeddings in a deep-learning classifier. Most citations have a neutral polarity, resulting in imbalanced datasets for training deep-learning models. To address this issue, a class balancing technique is proposed and applied to all datasets to improve consistency and results. Ensemble techniques are utilized to combine all of the model predictions to produce the highest F1-scores for all three labels.

Paper Nr: 48
Title:

Brains over Brawn: Small AI Labs in the Age of Datacenter-Scale Compute

Authors:

Jeroen Put, Nick Michiels, Bram Vanherle and Brent Zoomers

Abstract: The prevailing trend towards large models that demand extensive computational resources threatens to marginalize smaller research labs, constraining innovation and diversity in the field. This position paper advocates for a strategic pivot of small institutions to research directions that are computationally economical, specifically through a modular approach inspired by neurobiological mechanisms. We argue for a balanced approach that draws inspiration from the brain's energy-efficient processing and specialized structures, yet is liberated from the evolutionary constraints of biological growth. By focusing on modular architectures that mimic the brain's specialization and adaptability, we can strive to keep energy consumption within reasonable bounds. Recent research into forward-only training algorithms has opened up concrete avenues to include such modules into existing networks. This approach not only aligns with the imperative to make AI research more sustainable and inclusive but also leverages the brain's proven strategies for efficient computation. We posit that there exists a middle ground between the brain and datacenter-scale models that eschews the need for excessive computational power, fostering an environment where innovation is driven by ingenuity rather than computational capacity.

Paper Nr: 50
Title:

Time Series Prediction for Anomalies Detection in Concentrating Solar Power Plants Using Long Short-Term Memory Networks

Authors:

Sylwia Olbrych, Robert Jungnickel, Michael Zeng, Cher Dao Tan, Marco Kemmerling, Anas Abdelrazeq and Robert H. Schmitt

Abstract: Concentrating Solar Power (CSP) plants that use a parabolic trough system rely on Heat Transfer Fluid (HTF) to absorb thermal energy from sunlight. The heated HTF is then used in thermal power blocks to produce electricity in conventional steam generators. Unexpectedly high HTF temperatures may lead to degradation of the system components and reduced efficiency. Therefore, closely monitoring and maintaining the HTF's operational temperatures is crucial to ensure the system's efficiency and longevity. This paper focuses on the detection of over-temperature anomalies of HTF in CSP plants. Encoder-decoder Long Short-Term Memory (LSTM) networks are applied to predict HTF temperature in a time series, and subsequently, anomalies are detected based on the mean average error threshold. The study concludes by analysing the effectiveness of the encoder-decoder LSTM-based method in detecting over-temperature anomalies in historical plant data. The proposed approach allows operators to take preventive measures before any potential alarms by providing a 300-second forecast window.

Paper Nr: 80
Title:

OBBabyFace: Oriented Bounding Box for Infant Face Detection

Authors:

José Carlos Reyes-Hernández, Antonia Alomar, Ricardo Rubio, Gemma Piella and Federico Sukno

Abstract: This study presents an infant-specific face detection approach that addresses the existing gap in facial detection for non-adults, where the typical bias is toward adult faces. A new infant faces dataset was created to enhance Deep Learning (DL) models' ability to accurately detect infant faces, comprising over 8,862 images with diverse orientations. We introduce Oriented Bounding Boxes (OBB) to account for greater variability in face orientations observed in infants, offering precise alignment to their orientation, a significant improvement over traditional Axis-Aligned Bounding Boxes (AABB). Employing the YOLOv8-OBB architecture, our model is trained and compared against state-of-the-art models such as RetinaFace and MogFace. The results show that our approach outperforms state-of-the-art methods in precision and recall, particularly in non-frontal facial orientations. The proposed infant face detector marks a major advancement in pediatric face detection technology, offering a robust foundation for future advancements in medical monitoring and developmental diagnosis.

Paper Nr: 21
Title:

CNN-N-BEATS: Novel Hybrid Model for Time-Series Forecasting

Authors:

Konstandinos Aiwansedo, Jérôme Bosche, Wafa Badreddine, M. H. Kermia and Oussama Djadane

Abstract: Time-series forecasting (TS) is a vital tool for scientific study and has many applications in a wide range of disciplines, including engineering, economics, finance, environmental science, weather, energy, etc. Unquestionably, the ability to accurately forecast trends and patterns from historical load data in particular, is a benefit, that is especially helpful in decision-making scenarios and aids in understanding the underlying mechanisms connected to observable events. This paper aims to tackle this problem by proposing a novel hybrid forecasting model for time-series forecasting. The proposed forecasting model consists of a Convolution Neural Network (CNN) coupled with a Neural Basis Expansion Analysis for interpretable Time-Series forecasting model (N-BEATS) dubbed CNN-N-BEATS. The CNN-N-BEATS model outperforms popular forecasting techniques as it proposes a mechanism for dealing with a plaguing problem associated with time-series forecasting, that is, the distribution shift that tend to occur during forecasting. In addition, we also implement a technique, called Weighted Linear Stacking (WLS), for improving time-series accuracy when a large amount of historical load is available. The WLS technique combines the output of numerous forecasting models in order to achieve better accuracy. This research also shows that load forecasting using only historical data can produce results that are as accurate as those achieved by leveraging additional exogenous variables during the forecasting process.

Paper Nr: 62
Title:

Empirical Performance of Deep Learning Models with Class Imbalance for Crop Disease Classification

Authors:

Sèton Calmette Ariane Houetohossou, Castro Gbêmêmali Hounmenou, Vinasetan Ratheil Houndji and Romain Glele Kakaï

Abstract: Class imbalance refers to a situation where the number of observations in the different classes of a dataset is not equally distributed. This situation is most often encountered in agriculture for the classification of crop diseases. This can lead to challenges in training deep learning models, as they may become biased toward the majority class and perform poorly in predicting the minority class. One common approach to address class imbalance is resampling techniques, such as oversampling the minority class or undersampling the majority class. This study examined the performances of deep learning architectures (GoogleNet, VGG16, and ResNet50) for disease classification of tomatoes, peppers, and peaches in contexts of class imbalance. Data has been collected online from different websites (PlantVillage and PlantDisease). Each model was run in transfer learning and evaluated in three situations: without balancing, with Random Over Sampling (ROS) and with Random Under Sampling (RUS). The batch size and the number of epochs were set at 32 and 10, respectively. Recall, F1 score, Area Under the Receiver Operating Characteristic Curve, and the computing time were recorded. Results indicated that RUS significantly improves the precision, recall, and F1 score for GoogleNet despite a longer processing time than ROS. For VGG16, ROS proves superior in terms of learning time and performance. ROS and RUS enable Resnet50 to maintain high performance in the face of increasing class imbalance. Moreover, GoogleNet demonstrated more excellent results stability than VGG16 and ResNet50, especially under various levels of imbalance. This study highlights the importance of data balancing while acknowledging certain limitations, such as the size of the datasets and the model parameters used, paving the way for future research to optimize these methods.

Area 3 - Natural Language Understanding

Full Papers
Paper Nr: 70
Title:

Deep Learning-Based Preprocessing Tools for Turkish Natural Language Processing

Authors:

Buse Ak and Tunga Güngör

Abstract: As the demand for effective natural language processing applications in Turkish continues to rise, the need for text preprocessing tools tailored to the Turkish language increases. These tools form the initial step of any natural language application and improves the efficiency of complex tasks such as text summarization, question-answering, and machine translation. We propose a novel deep learning-based framework focusing on Turkish preprocessing tasks, including tokenization, sentence splitting, deasciification, part-of-speech tagging, vowelization, spell correction, and morphological analysis. The proposed framework is suitable for independent use of each preprocessing tool as well as the use in an all-in-one scheme. We use the CANINE model to train the characterlevel tools, and BERT and mT5 models for the token-based tools. We evaluate the framework for each task on the BOUN Treebank in the UD project and make both the tools and the codes publicly available.

Short Papers
Paper Nr: 75
Title:

Multilingual Detection of Cyberbullying on Social Networks Using a Fine-Tuned GPT-3.5 Model

Authors:

Elizabeth Nina, Jesús Pacheco and Juan Carlos Morales

Abstract: Cyberbullying on social networks has emerged as a global problem with serious consequences on the mental health of victims, mainly children and adolescents. Although there are AI-based solutions to address this issue, they face limitations such as a lack of multilingual datasets and difficulty detecting sarcasm and idioms. In this research, we present an innovative approach to effective cyberbullying detection using a fine-tuned GPT-3.5 model. Our main contribution is the creation of an extensive multi-label dataset of approximately 60,000 data in English and Spanish, spanning diverse dialects. This data set was obtained by combining and processing multiple datasets from reliable sources. In addition, we developed a fine-tuned model based on GPT-3.5, capable of identifying hate speech and offensive language in textual content on social networks. We conducted a thorough evaluation comparing our model to specialized solutions such as Perspective API, Moderation, Content Safety, Toxic Bert, and Gemini. The results demonstrate that our approach outperforms existing models in metrics such as precision, f1-score and accuracy, making it the most suitable choice for effective cyberbullying detection. This research lays the groundwork for a future app where users can be alerted to harmful content online.

Paper Nr: 28
Title:

Online Job Posting Authenticity Prediction with Machine and Deep Learning: Performance Comparison Between N-Gram and TF-IDF

Authors:

Gayathri Malaichamy, Cristina H. Muntean and Anderson A. Simiscuka

Abstract: Fraudulent job postings are a widespread scam. People give their personal details as well as processing fees to scammers when they submit an application for fake job postings, and they are then scammed out of their funds. This research paper aims to help address this concern by proposing a methodology that utilizes Machine Learning, Deep Learning and Natural Language Processing (NLP) algorithms for detection of fake job ads online. For feature extraction, the N-Gram model (Unigram, Bigram and Trigram) and TF-IDF (Term Frequency-Inverse Document Frequency) techniques were investigated. The results have shown that the TF-IDF feature extraction techniques performed better than the N-Gram technique. The analysis considered five classifier algorithms: Naive Bayes, Random Forest, LightGBM, XGBoost and Multi Layer Perceptron (MLP). It was observed that the MLP classifier with ADAM optimizer outperformed all other classifiers with an accuracy of 95.68% and a prediction time of 13s. The second highest performer was the Naive Bayes classifier which attained an accuracy of 95.38% and a prediction time of 0.2s.

Paper Nr: 63
Title:

Automating the Conducting of Surveys Using Large Language Models

Authors:

Trevon Tewari and Patrick Hosein

Abstract: Survey conduction is a popular data collection method used primarily in the Social Sciences. Conducting these surveys electronically in remote areas can be challenging due to the lack of the required Internet infrastructure. Traditional phone surveys are typically used in such cases since mobile phones have become pervasive. However, this process can be time-consuming since an individual must record, collate and update the collected data. We propose using Large Language Models (LLMs) to process an audio recording of the session, extract the responses and store them in a database. In our proof of concept the survey questions were asked by a human but even this part can be automated by having a robot make the call, take the survey and then process the responses. A NoSQL database was used to store the survey. The recorded survey was processed using OpenAI’s Whisper for the speech-to-text process. The resulting text was passed to a Large Language Model (GPT-4) which extracted the responses to each question using both the survey information and the extracted text from the recording. The result is uploaded to a database through which questions can be asked about the responses. The concept was tested and received 97% accuracy for multiple choice questions.

Area 4 - Machine Learning

Full Papers
Paper Nr: 24
Title:

Scoping Review of Active Learning Strategies and Their Evaluation Environments for Entity Recognition Tasks

Authors:

Philipp Kohl, Yoka Krämer, Claudia Fohry and Bodo Kraft

Abstract: Abstract. We conducted a scoping review for active learning in the domain of natural language processing (NLP), which we summarize in accordance with the PRISMA-ScR guidelines as follows: Objective: Identify active learning strategies that were proposed for entity recognition and their evaluation environments (datasets, metrics, hardware, execution time). Design: We used Scopus and ACM as our search engines. We compared the results with two literature surveys to assess the search quality. We included peerreviewed English publications introducing or comparing active learning strategies for entity recognition. Results: We analyzed 62 relevant papers and identified 106 active learning strategies. We grouped them into three categories: exploitation-based (60x), explorationbased (14x), and hybrid strategies (32x). We found that all studies used the F1- score as an evaluation metric. Information about hardware (6x) and execution time (13x) was only occasionally included. The 62 papers used 57 different datasets to evaluate their respective strategies. Most datasets contained newspaper articles or biomedical/medical data. Our analysis revealed that 26 out of 57 datasets are publicly accessible. Conclusion: Numerous active learning strategies have been identified, along with significant open questions that still need to be addressed. Researchers and practitioners face difficulties when making data-driven decisions about which active learning strategy to adopt. Conducting comprehensive empirical comparisons using the evaluation environment proposed in this study could help establish best practices in the domain.

Paper Nr: 35
Title:

Secure Coalition Formation for Federated Machine Learning

Authors:

Subhasis Thakur

Abstract: Federated machine learning allows multiple collaborative parties to build a machine learning model in a distributed fashion where the data is distributed among the parties. There can be few restrictions on the participation of such parties in a federate machine learning process. For example, parties may have spatial restrictions such that only a limited number of parties from one locality can participate in one federated machine learning process. Only a fixed number of parties may participate in one federated machine-learning process. To include such restrictions on federated machine learning we need to consider optimally of such group formation. This optimization condition will ensure that each group's aggregated DNN model reaches sufficient accuracy. In this paper, we investigate such constraints on the federated machine learning process. Specifically, we explore party-affiliation game-based partition over the parties for a federated machine learning model. Partition function games find optimal and stable partitions (where no party is better off by changing groups to improve the accuracy of the aggregated DNN model) among the parties and one federated machine learning model is formed in each partition. Further, we investigate methods for secure and privacy-preserving federated machine learning processes where parties are partitioned into a set of groups and the machine learning model of one party is not shared with other parties in different groups. Our main contributions are (a) we proposed a federated machine learning protocol for a group of participants where the number of parties in each group may not exceed a predefined threshold, (b)we proposed a federated machine learning protocol that forms a stable partition among the parties such that no party can leave a group and join another group to improve the accuracy of aggregated DNN models, (c) Our proposed federated machine learning protocol is secure as it prevents a party from reporting the wrong DNN model to the aggregator, (d)our proposed federated machine learning protocol is secure as ensures that the aggregator correctly generates the coalition formation protocol, (e)our proposed federated machine learning protocol is secure and privacy-preserving as it ensures that valid DNN model parameters are exchanged among the parties.

Paper Nr: 52
Title:

Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes

Authors:

Patrick Hosein and Kevin Baboolal

Abstract: The Naive-Bayes classifier is widely used due to its simplicity, speed and accuracy. However this approach fails when, for at least one attribute value in a test sample, there are no corresponding training samples with that attribute value. This is known as the zero frequency problem and is typically addressed using Laplace Smoothing. However, Laplace Smoothing does not take into account the statistical characteristics of the neighbourhood of the attribute values of the test sample. Gaussian Naive Bayes addresses this but the resulting Gaussian model is formed from global information. We instead propose an approach that estimates conditional probabilities using information in the neighbourhood of the test sample. In this case we no longer need to make the assumption of independence of attribute values and hence consider the joint probability distribution conditioned on the given class which means our approach (unlike the Gaussian and Laplace approaches) takes into consideration dependencies among the attribute values. We illustrate the performance of the proposed approach on a wide range of datasets taken from the University of California at Irvine (UCI) Machine Learning Repository. We also include results for the $k$-NN classifier and demonstrate that the proposed approach is simple, robust and outperforms standard approaches.

Paper Nr: 54
Title:

Pollutant Source Localization Based on Siamese Neural Network Similarity Measure

Authors:

Sidi Mohammed Alaoui, Khalifa Djemal, Ehsan Sedgh Gooya, Amir Ali Feiz, Ayman Alfalou and Pierre Ngae

Abstract: In this paper, we present an optimization methodology for reducing the number of sensors in an existing monitoring network. These sensors measure the concentration of pollutant gas in the air, in order to estimate the position and intensity of a pollutant source. In a Hierarchical Agglomerative Clustering (HAC) framework we aim to regroup sensors of the same behavior, based on similarity measure, then, we keep only one sensor of each cluster. Unlike previous studies that used Pearson correlation coefficient and euclidean distance, our work uses a similarity measure based on Siamese Neural Networks (SNN). The methodology was tested on simulated measurements based on real atmospheric conditions. and Monte Carlo Markov Chain (MCMC) in a Bayesian inference framework was used to identify the source position and intensity.

Paper Nr: 67
Title:

Efficient Deep Neural Network Verification with QAP-Based zkSNARK

Authors:

Subhasis Thakur and John Breslin

Abstract: In MLaaS, DNN models are kept in a server operated by the service provider and inputs to the DNN models are provided by the clients. Such inputs are used to execute the DNN models and classification results are sent back to the client. In MLaaS, the DNN model owner does not reveal the DNN model parameters to the client. MLaaS there are a few trust problems: (a) The server may not be secure and an attacker may send manipulated classification results to the client. In the case of safety-critical systems using such classification in the decision-making process, an attacker may specifically manipulate the classification result to disrupt the operations of the safety-critical system, (b) The server may intentionally send wrong or random classification results without executing the DNN model to respond to a massive number of classification requests from the clients. In this paper, we investigate the problem of verifying DNN model execution by the service provider in an MLaaS paradigm. A proof of DNN model execution will prove that given an input, the DNN model is executed to generate the classification result by providing sequences of outputs of all functions used in the DNN model. As the service provider in MLaaS does not share the DNN model with the client, we need to verify DNN function outcomes without the knowledge of DNN function parameters. Hence zero-knowledge proof can be used for verifying DNN model execution. In this paper, we use Zero-Knowledge Succinct Non-interactive Arguments of Knowledge (zk-SNARKs) which reduces the size of proof and complexity of proof verification considerably. In particular, we use a quadratic arithmetic program-based zkSNARK for DNN model verification. Our main results in this paper are as follows: (a) We have developed a DNN model execution verification method using a QAP-based zkSNARK. (b) We prove that the verification protocol is correct and privacy-preserving. (c) We analyzed the cost of using such a verification protocol.

Paper Nr: 69
Title:

Investigating a Semantic Similarity Loss Function for the Parallel Training of Abstractive and Extractive Scientific Document Summarizers

Authors:

Sudipta Singha Roy and Robert E. Mercer

Abstract: Scientific document summarization focusses on condensing scientific literature, research papers, and technical documents into concise summaries while preserving crucial scientific concepts, findings, and conclusions. In this work, we present a novel loss function that incorporates semantic similarity, and use it in the parallel training of extractive and abstractive summarizers, thereby improving the performance of the individual summarizer units. The new loss function is a union of the summarizer cross-entropy losses and the semantic similarity losses among the generated and reference summaries. To validate the effectiveness of the proposed loss function joint with the parallel training, the experiments use a combination of four recently state-of-the-art extractive summarizers and four recently state-of-the-art abstractive summarizers. Results indicate that for all combinations, the extractive and abstractive summarizers both gain significant performance boosts. It is conjectured that the new semantic similarity-induced cross-entropy loss combined with the parallel training will improve any combination of quality extractive and abstractive summarizers.

Short Papers
Paper Nr: 33
Title:

More than Noise: Assessing the Viscosity of Food Products Based on Sound Emission

Authors:

Dominik Schiller, Silvan Mertes, Marcel Achzet, Fabio Hellmann, Ruben Schlagowski and Elisabeth André

Abstract: In the era of Industry 4.0, manufacturing is rapidly shifting towards automation, particularly in processes such as quality control, production lines, and logistics. However, the food industry poses distinctive challenges to automation due to the variability in raw materials and stringent hygiene standards. Sensory analysis is crucial in maintaining consistent quality and safety while manufacturing food products. This paper focuses on the automatic estimation of viscosity, a key parameter in many industry quality aspects of food products. An often overlooked aspect is the potential correlation between viscosity and sound emissions. While conventional methods for determining viscosity require expensive equipment, this research investigates the possibility of analyzing the acoustic emission when a liquid is sucked through a vacuum pump to determine the viscosity. By simulating industry-like food products with varying viscosity through different flour and water mixtures, we aim to investigate the feasibility of developing an automatic, deep-learning-based system for real-time viscosity estimation in manufacturing processes. Our results indicate that our proposed methodology can automatically determine the difference in viscosity, showing the feasibility of using sound emission analysis as a tool for viscosity

Paper Nr: 39
Title:

Exploring Physiology-Based Classification of Flow During Musical Improvisation in Mixed Reality

Authors:

Ruben Schlagowski, Silvan Mertes, Dominik Schiller, Yekta S. Can and Elisabeth André

Abstract: The flow state is desirable in many activities, e.g., while making music or being active in Virtual, Augmented, and Mixed Realities. With the long-term goal of creating affective systems that can consider the user's flow state in real-time, we evaluated an approach for real-time flow classification during networked music performance using a deep neural network. We trained our classifier based on physiological signals (PPG and GSR) that we recorded and annotated in a laboratory study, including jamming musicians. The results that we present in this paper confirm the technical validity of this approach while also facing challenges that stem from inter-rater reliability and a heavily unbalanced dataset.

Paper Nr: 74
Title:

Skin Cancer Classification: A Comparison of CNN-Backbones for Feature-Extraction

Authors:

Anna-Lena Vischer, Jiayu Liu, Sinclair Rockwell-Kollmann, Stefan Günther and Klemens Schnattinger

Abstract: In order to classify histopathology images into 11 classes of cancerous and non-cancerous skin diseases, the pharmaceutical bioinformatics research group at University Freiburg implemented a pipeline based on the TransMIL model [1]. To improve the general performance, we compared four different CNNs for feature extraction in the TransMIL preprocessing pipeline CLAM [2]. Comprehensive evaluations, including detailed analyses of loss, accuracy, F1 score, and attention focus, indicate that ConvNextV2 [3] falls short in comparison to ResNet50 [4], DenseNet201 [5], and EfficientNet [6], which demonstrate nearly equal performance across classes and achieve higher average accuracy. Surprisingly, despite its size, EfficientNet showed slightly better results compared to DenseNet201 and ResNet50. Moreover, the present pipeline exhibits uneven performance across various disease classes, with particular difficulty in distinguishing between the three classes, likely due to the inherent complexities associated with these categories. Despite these challenges, the EfficientNet model remains the most balanced among all those evaluated.

Paper Nr: 82
Title:

EEG-Based Patient Independent Epileptic Seizure Detection Using GCN-BRF

Authors:

Raghad Alqirshi and Samir Brahim Belhaouari

Abstract: Epilepsy affects an estimated 50–70 million people worldwide, making it one of the most prevalent neurological disorders. Detecting seizures, a primary symptom of epilepsy, poses significant challenges due to the limitations of patient-specific models and the complex nature of EEG signal analysis. Traditional methods, which focus on isolated EEG channels, often fail to capture the dynamic interconnections within the brain's network that are essential for accurate seizure detection. This study introduces a novel, patient-independent approach using Graph Convolutional Networks (GCNs) to process graph-structured data representing the brain's network. Our methodology employs a comprehensive feature set derived from Random Forest selection, encompassing 37 node-specific features and two global features: the GCN's classification output and an eigenvector from the correlation matrix. This rich feature representation allows for an in-depth analysis of the structural properties of EEG data. The proposed model was evaluated on unseen data from four patients and demonstrated exceptional generalizability and performance, achieving notable metrics such as 91.70% accuracy, 91.32% precision, 88.71% sensitivity, and a 91.57% F1-Score in patient-independent settings. Further patient-specific evaluations reinforced the model's efficacy, with near-perfect scores across all key metrics. Our findings highlight the potential of GCNs to overcome existing challenges in seizure detection, offering a promising direction for epilepsy care and contributing valuable insights into the analysis of neurological disorders.

Paper Nr: 18
Title:

A Deep Learning-Based Plant Disease Detection and Classification for Arabica Coffee Leaves

Authors:

Harshitha P. Somanna, Paul Stynes and Cristina H. Muntean

Abstract: Coffee leaf disease is a growing concern to coffee agroforestry, predominantly caused by pathogenic fungi and, to a lesser extent, bacteria and viruses reducing the yield and adversely affecting the quality of the coffee. Detecting and control-ling these diseases in their early stages represent formidable challenges, since tra-ditional methods rely on visual observation by experts and often fail in accurate diagnosis. Machine learning (ML) techniques are alternative solutions for auto-mating the classification of plant diseases and with the rapid advancements in deep learning methods, there is a potential to identify and recognize coffee leaf diseases at early stages, thereby supporting efforts to enhance crop yield. How-ever, there is a notable gap in research, particularly regarding the detection of cof-fee leaf diseases on a larger dataset. This research study aims to provide a comprehensive understanding of the strengths and weaknesses of various deep learning models and transfer learning approaches such as EfficientNetB0, MobileNetV2, CNN and VGG16, shedding light on their effectiveness in addressing the complexities of the multi-class label problem in the context of leaf disease detection in Arabica coffee leaves. Utilizing the "JMuBEN" dataset with 58,405 images across five classes, Phoma, Cerco-spora, Leaf Rust, Miner and one set of healthy leaf images, the research aims to comprehensively assess each model's efficacy. The test accuracy of the models ranged from 32.49% to 99.7% and the best performing model, EfficientNetB0 outperformed all the models in this study giving a test accuracy of 99.72% and an overall F1 score, Recall and Precision of 99.7%. Beyond Arabica, the findings may extend to Robusta and have broader applications in crop disease detection.

Paper Nr: 26
Title:

Solar Activity Impact on Firefighter Interventions: Factors Analysis

Authors:

Naoufal Sirri and Christophe Guyeux

Abstract: This research is a natural continuation of studies exploring various categories of variables and their implications for firefighters. The motivation behind this study stems from the acknowledgment that solar activity, although it may affect health and the environment, typically is not an immediate priority for firefighting interventions. Firefighters are primarily summoned for emergency situations, prompting contemplation on how solar activity, despite its environmental impact, might exert influence on their operations. In this context, our investigation aims to comprehensively evaluate how solar activity influences firefighting interventions, with a specific focus on intervention frequency. The study's relevance is underscored by the significant impacts of solar risks on both health and the environment. Notably, the summer 2023 heatwave assessment in France recorded a number of heat-related deaths, reaching nearly 1500 during the four heatwave episodes and exceeding 5000 over the entire summer period, as documented by the Ministry of Ecology, emphasizes the crucial nature of our research. Over an eight-year period, from 2015 to 2023, our methodology encompasses data preparation, in-depth analysis, and the application of the XGBoost predictive model, known for its resilience to outliers. The iterative training pipeline selects features that enhance the RMSE score over a 24-hour horizon, highlighting the crucial importance of variables related to solar activity, especially over extended periods. The key conclusion drawn from this study is these variables exert a progressive impact on interventions, suggesting increased relevance in predicting outcomes over prolonged durations. This precision in understanding the models associated with the presence or absence of solar risk offers a practical approach to anticipate resource management, improve firefighter response times, and contribute to saving lives by addressing intervention failures during major incidents. This study initiates a comprehensive exploration of variable families to understand the factors influencing firefighting activities.

Paper Nr: 42
Title:

End-to-End Video Surveillance Framework for Anomaly Detection and Person Re-Identification

Authors:

Rohan Nandan, Rohan Lingeri, Rohan Mehta, Preet Kanwal and Rishita Atluri

Abstract: Unmanned surveillance systems represent a cutting-edge frontier in security technology, offering enhanced monitoring capabilities while minimizing reliance on constant human oversight. However, traditional approaches suffer from limitations due to manual monitoring, leading to potential lapses in critical event detection. In response, our project introduces a unique semi-supervised approach using the UCF-Crime dataset[1] to automate surveillance by integrating the state-of-the-art MGFN[2] model with our custom LSTM multi-class classifier and re-identification model. While MGFN[2] provides binary classification with an AUC of 86.98%, our implementation uses this along with our custom multi-class classifier which has an AUC of 83%, to predict specific categories of anomalies like burglary, abuse, and fighting. The detected offenders are noted and are looked for in every other feed of video as and when they appear using DeepSORT[3] re-identification model. Moreover, our system notifies authorities about anomalies and identifies individuals through a comprehensive dashboard interface. This fusion of models allows for more nuanced event detection, contributing to the advancement of surveillance technology and public safety.

Paper Nr: 76
Title:

Detecting Big-5 Personality Dimensions from Text Based on Large Language Models

Authors:

Joseph Killian Jr and Ron Sun

Abstract: Detecting personality from text has been a challenging problem to tackle for a variety of reasons. One is the over-reliance on small datasets. Another is the significant variation in the personality tests used. This work utilizes a large (thus far underutilized) dataset of comments from Reddit, labeled with the Big-5 dimensions - the most widely validated and well accepted measure of personality. This work combines large language models with additional prediction layers to produce personality predictions. For model evaluation, new metrics are adopted to assess the accuracy of the model at various levels of error tolerance. Additionally, a comparison of the Mean Squared Error (MSE) with the previous best results is provided.

Paper Nr: 84
Title:

Predicting Components of a Target Value Versus Predicting the Target Value Directly

Authors:

Shellyann Sooklal and Patrick Hosein

Abstract: In many Regression problems one can predict components of a target value and then combine those components to determine the target value prediction. The alternative is to predict the target value directly. A simple example is automobile insurance claims. The traditional approach is to compute Severity (the average value of claims made) and Frequency (the number of claims made per year). The product of these will then provide the average money paid annually to the customer (henceforth called claim rate). On the other hand, one can derive the claim rate for each customer and use this as the target value. Intuitively one would think that the latter approach (predicting the target value directly) should provide better results but, in fact, the former approach is better. We investigate the difference in performance of these two approaches (called component and composite predictors respectively) and illustrate the difference. We demonstrate this difference using three Machine Learning algorithms.