DeLTA 2025 Abstracts


Area 1 - Big Data Analytics

Short Papers
Paper Nr: 28
Title:

Trojan Vulnerabilities in Host-Based Intrusion Detection Systems

Authors:

Mark Cheung, Sridhar Venkatesan and Rauf Izmailov

Abstract: Host-based intrusion detection systems (HIDS) play a critical role in cybersecurity, yet they remain vulnerable to stealthy adversarial attacks. In particular, Trojan attacks — where hidden triggers are embedded into training data — can manipulate models to misclassify malicious activity while remaining undetected. In this paper, we investigate these vulnerabilities using the DARPA OpTC dataset, a large-scale benchmark simulating real-world cyber operations. We present a trigger identification framework leveraging random forests and ngram feature importance analysis, followed by targeted poisoning strategies that embed highly effective triggers at low injection rates. Through extensive experiments on DeepLog and LogBERT models, we demonstrate that carefully crafted trigger injections can significantly reduce detection confidence on malicious sequences without degrading performance on benign data. Additionally, we provide discernibility analysis showing that poisoned models are nearly indistinguishable from clean models based on weight inspection alone. Our findings underscore the urgent need for proactive defenses and contribute practical methodologies for trigger discovery, poisoning assessment, and model vulnerability evaluation. These results offer actionable insights for security practitioners and lay the foundation for more robust Trojan detection in operational environments.

Area 2 - Computer Vision Applications

Full Papers
Paper Nr: 21
Title:

A Fast Fourier Transform-Aided Diffusion-Based U-Net Architecture for Microscopic Medical Image Segmentation

Authors:

Saptarshi Pani, Gouranga Maity, Dmitrii Kaplun, Alexander Voznesensky and Ram Sarkar

Abstract: The rapid development of deep learning techniques has led to major advancements in medical image segmentation. The majority of segmentation models now in use are discriminative, i.e., they are mostly aimed at developing a mapping between segmentation masks and the input image. These discriminative techniques, however, suffer from an unstable feature space and ignore the underlying data distribution of input samples. This issue is highly pertinent to the segmentation of microscopic medical images, which often have low contrasts and intricate patterns. This paper suggests at using a generative model’s understanding of the underlying data distribution to supplement discriminative segmentation techniques. Hence, a diffusion based segmentation model is proposed in this study in combination with the Fast Fourier Transform (FFT). The proposed model integrates diffusion principles in the frequency domain. After that, a U-Net architecture with an FFT-based feature extraction aided by an attention mechanism is designed to enhance the segmentation performance. By combining frequency-domain processing, attention module and iterative noise reduction, the model effectively captures both global and local features, enabling precise segmentation of complex structures in microscopic medical images. The effectiveness of the model has been evaluated on three publicly available standard and complex microscopic medical image datasets. The proposed model has obtained Dice scores of 88.13%, 88.52% and 98.57% on TNBC, CPM17 and GlaS datasets, respectively, which are better than many recently proposed models found in the literature. The code implementation of the methodology is available at: FCAM-Diffusion.

Paper Nr: 42
Title:

Application of Neural Networks to Ultrasonic Data for Discrimination of Fat Types in Muscle Tissue Models

Authors:

Jegors Lukjanovs, Aleksandrs Sisojevs, Alexey Tatarinov and Tamara Laimiņa

Abstract: Differential assessment of subcutaneous adipose tissue (SAT) and intermuscular adipose tissue (IMAT), two forms of muscle fat, is necessary for studying manifestations of ageing, muscle atrophy, sarcopenia, obesity and metabolic diseases such as diabetes. The discrimination of SAT and IMAT by ultrasonic measurements is difficult due to their complex influence. In the present study, machine-learning algorithms applied to key parameters extracted from ultrasound propagation signals obtained in simplified tissue models (phantoms) were investigated. The acoustical phantoms of muscle tissue were made of gelatin with oil simulating fat layers (SAT) and inner inclusions (IMAT). SAT and IMAT contents varied from zero to 50 % with a step 12.5%. A specialised recurrent neural network (RNN) architecture, the long short-term memory (LSTM) method is used in this paper and was used as the main method in the experiments. The result of SAT and IMAT evaluation of objects with an error of no more than 3% in 95% of cases.

Paper Nr: 55
Title:

RevCD: Reversed Conditional Diffusion for Generalized Zero-Shot Learning

Authors:

William Heyden, Habib Ullah, Muhammad Salman Siddiqui and Fadi Al Machot

Abstract: In Generalized Zero-Shot Learning (GZSL), we aim to recognize both seen and unseen categories using a model trained only on seen categories. In computer vision, this translates into a classification problem, where knowledge from seen categories is transferred to unseen ones by exploiting the relationships between visual features and available semantic information. However, learning this joint distribution is costly and requires one-to-one alignment with corresponding semantic information. We present a reversed conditional diffusion-based model (RevCD) that mitigates this issue by estimating the semantic density conditioned on visual inputs. Our RevCD model consists of a cross Hadamard-addition embedding of a sinusoidal time schedule, and a multi-headed visual transformer for attention-guided embeddings. The proposed approach introduces two key innovations. First, we apply diffusion models to zero-shot learning, a novel approach that exploits their strengths in capturing data complexity. Second, we reverse the process by approximating the semantic densities based on visual data, made possible through the classifier-free guidance of diffusion models. Empirical results demonstrate that RevCD achieves competitive performance compared to state-of-the-art generative methods on standard GZSL benchmarks. The complete code will be available on GitHub.

Short Papers
Paper Nr: 47
Title:

Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design

Authors:

Vasudev Sharma, Ahmed Alagha, Abdelhakim Khellaf, Vincent Quoc-Huy Trinh and Mahdi S. Hosseini

Abstract: Vision-language models (VLMs) have gained significant attention in computational pathology due to their multimodal learning capabilities that enhance big-data analytics of giga-pixel whole slide image (WSI). However, their sensitivity to large-scale clinical data, task formulations, and prompt design remains an open question, particularly in terms of diagnostic accuracy. In this paper, we present a systematic investigation and analysis of three state of the art VLMs for histopathology, namely Quilt-Net, Quilt-LLAVA, and CONCH, on an in-house digestive pathology dataset comprising 3,507 WSIs, each in giga-pixel form, across distinct tissue types. Through a structured ablative study on cancer invasiveness and dysplasia status, we develop a comprehensive prompt engineering framework that systematically varies domain specificity, anatomical precision, instructional framing, and output constraints. Our findings demonstrate that prompt engineering significantly impacts model performance, with the CONCH model achieving the highest accuracy when provided with precise anatomical references. Additionally, we identify the critical importance of anatomical context in histopathological image analysis, as performance consistently degraded when reducing anatomical precision. We also show that model complexity alone does not guarantee superior performance, as effective domain alignment and domain-specific training are critical. These results establish foundational guidelines for prompt engineering in computational pathology and highlight the potential of VLMs to enhance diagnostic accuracy when properly instructed with domain-appropriate prompts.

Paper Nr: 22
Title:

Non-Cooperative Game Theory-Aided Learning of CNN Models for Skin Lesion Classification

Authors:

Diptarka Mandal, Sujan Sarkar, Siddhant Majumder, Dmitrii Kaplun, Daria Sidorina and Ram Sarkar

Abstract: Millions of people worldwide are afflicted with skin cancer every year. It happens when DNA damage from UV radiation from the sun or tanning beds causes skin cells to proliferate out of control. Skin lesion analysis is one of the many medical imaging jobs where computer vision models are widely used. Convolutional Neural Networks (CNNs), in particular, are deep learning models that have demonstrated proficiency in extracting pertinent information from images and offer high accuracy in classification tasks. In order to train several CNN models simultaneously on a dataset, this work suggests a novel non-cooperative game theory-based method that involves forcing the models to compete with one another. The interaction between the models not only improves their overall performance, but also underscores the efficiency of our approach in optimizing training time. This approach is evaluated on two publicly available skin lesion datasets, namely HAM10000 dataset for multi-class classification yielding 92.93% accuracy, and PH2 for binary class classification achieving 99.88% accuracy. The code of the proposed methodology can be found at: https://github.com/Cmatermedicalimageanalysis/Noncoperative-Game-Theory-based-skin-lesion-classification (GitHub repository).

Paper Nr: 26
Title:

Leveraging Synthetic Data for Deep-Learning-Based Road Crack Segmentation from UAV Imagery

Authors:

Christos Kyrkou and Andriani Panagi

Abstract: . One of the critical tasks in monitoring of road infrastructure is the identification of road cracks. Recent efforts have been made in utilising Unmanned Aerial Vehicles (UAVs) to automate this task without interfering with the road network traffic and infrastructure. However, high-quality annotated datasets that allow the development of reliable deep learning models for this purpose, are scarce. Synthetic data generation offers a promising alternative to mitigate this issue by reducing annotation costs and enhancing the dataset’s variability. This paper represents a comparative study of two state-of-the-art deep learning models - UNet with EfficientNet and UNet with MobileNet- for road crack segmentation trained with three loss functions (Dice Loss, Focal Loss, and Weighted Binary Cross-Entropy Loss (WBCEL)) by using synthetic datasets generated with three different ways. Performance evaluation indicates that the best results were achieved using UNet with a MobileNet encoder, trained with WBCEL and synthetic images, yielding an mIoU of 63.52% tested on real crack imagery. A more granular analysis underlines how both synthetic data realism and the choice of the loss function impact segmentation accuracy. Our preliminary study concludes that well-designed synthetic data and appropriate loss functions have the potential to allow better generalization of the model to real-world scenarios.

Area 3 - Models and Algorithms

Full Papers
Paper Nr: 20
Title:

Diagnostic Trouble Codes Prediction with DTC-GOAT and Ensembles

Authors:

Abdul Basit Hafeez, Atif Riaz and Eduardo Eduardo Alonso

Abstract: Diagnostic Trouble Codes (DTCs) produced by On-Board Diagnostic Systems (OBDs), and the research focused on their use for predictive maintenance have been around for a while now. In the last few years, we have witnessed advancement in terms of how these DTCs are utilised to perform self-supervised end-to-end prediction with the introduction of sequential prediction models, where the goal is to utilize past occurred fault events to predict the next DTC fault event. These models mainly use neural embeddings to encode the DTCs, along with their features, before applying neural networks capable, in turn, of processing sequential data. For instance, DTC-TranGru, which uses a GRU layer on top of a Transformer, has reported better results than LSTM and Attention-based models.In this paper, we first put forward an enhanced version of the DTC-TranGru model called DTC-GOAT (GRU's Optimized Alignment with Transformer), proposing optimizations including a better alignment of the Transformer with GRU's output, end-of-sequence EOS tokens, and strategically placed 1D spatial-dropout layers, to boost the accuracy of DTC prediction. Secondly, we also introduce an so-called Ensemble approach that uses multiple models for next-DTC prediction and show that it gives slightly higher top-5 accuracy results than the individual models.

Paper Nr: 39
Title:

Variational Mode Decomposition (VMD) Parameter Selection Using Sine-Cosine Algorithm (SCA): Application on Vibration Signals for Rotating Machinery Monitoring

Authors:

Ikram Bagri, Achraf Touil, Ahmed Mousrij, Aziz Hraiba and Karim Tahiry

Abstract: The role of rotating machinery in industrial operations is fundamentally important, necessitating proficient maintenance strategies that are significantly dependent on accurate fault diagnosis methodologies. The present study introduces an optimized Variational Mode Decomposition (VMD) strategy intended for the analysis of vibration signals for the monitoring of this machinery. The proposed approach employs the Sine-Cosine Algorithm (SCA) to refine VMD parameters, namely, the number of modes (K), the penalty factor (α), and the convergence tolerance (τ ), using an energy difference metric as a performance criterion. Using the Case Western Reserve University Bearing Data Dataset, an optimal configuration has been identified displaying a significant reduction in energy discrepancies between original signals and their decomposed components. Furthermore, an examination of the frequency content, coupled with statistical and correlation analyses, have validated the quality of the decomposition and elucidated the impact of VMD parameters on energy conservation. The synthesis of these analyses demonstrated the efficacy of the proposed methodology in the precise selection of VMD parameters for the analysis of vibration signals of faulty machinery components.

Paper Nr: 49
Title:

Achieving Zero False Negatives: Optimizing Anomaly Detection with Genetic Neural Architecture Search

Authors:

Rabie Najem and Mohammed Benjelloun

Abstract: Neural Architecture Search (NAS) methods, which aim to identify the best architecture for a given problem, have demonstrated their effectiveness across various domains, from computer vision to natural language processing. These approaches have significantly contributed to optimizing performance while addressing constraints such as computational efficiency and resource management. The Genetic Neural Architecture Search (GeNAS) proposed in this work illustrates the potential of NAS to go beyond its traditional objective of finding optimal architectures. While NAS is often employed to address constraints such as memory management and latency reduction, our study focuses on the critical challenge of minimizing False Negatives, with the ultimate goal of achieving Zero False Negatives (ZFN). To address this challenge, we integrate a methodology based on the Augmented Lagrangian Method (ALM), allowing for a better consideration of specific problem constraints. By adopting this targeted strategy, GeNAS demonstrates its effectiveness in tackling critical problems that require both high performance and enhanced sensitivity.

Short Papers
Paper Nr: 15
Title:

End-to-End ASR Model with Iterative Attention Mechanism Enhanced RNN Model for Phoneme Recognition

Authors:

Ke Fang and Yancong Deng

Abstract: Automatic Speech Recognition (ASR) systems have significantly advanced with the integration of deep learning techniques, particularly neural networks and at-tention mechanisms; however, challenges remain in accurately modeling variable-length sequences and capturing complex temporal dependencies inherent in speech data. In this paper, we propose an enhanced end-to-end ASR model that incorporates a novel iterative attention mechanism, which loops through the attention layer multiple times with independent Dense and Dropout layers in each iteration. This design enables the model to focus on different aspects of the input sequence, effectively enhancing its ability to capture nuanced temporal patterns. Evaluated on the TIMIT dataset—a benchmark known for its comprehensive phonetic coverage—our model achieves a Phoneme Error Rate (PER) of 15.9%, outperforming existing neural network-based models. The results demonstrate that our iterative attention mechanism offers a more flexible and accurate solution for speech recognition tasks, addressing challenges associated with variable-length sequences and noisy environments, and contributing to the development of more robust ASR systems.

Paper Nr: 46
Title:

Context-Aware Deep Learning for Longitudinal Data Imputation in Parkinson's Disease

Authors:

Moad Hani, Nacim Betrouni, Fatima Zahra Ouardirhi, Saïd Mahmoudi and Mohammed Benjelloun

Abstract: Missing data in longitudinal Parkinson’s Disease (PD) studies presents significant challenges, particularly when missingness correlates with disease severity, introducing systematic biases that compromise predictive validity. We present the first comprehensive benchmark of 14 imputation methods (6 cross-sectional, 5 longitudinal, 3 generative) on the Parkinson’s Progression Markers Initiative dataset (N=1,483) across different missingness mechanisms. Our evaluation reveals generative methods significantly outperform traditional approaches, with Variational Autoencoder-based Multiple Imputation (VAEM) achieving optimal performance (MAE = 3.87, R²=0.449) compared to MICE (MAE = 4.15, R = 0.401) and Linear Mixed Models (MAE = 5.42, R = 0.232). Importantly, while traditional methods degrade by 36.6% under Missing Not At Random conditions, generative approaches maintain robustness with only 17.6% performance reduction. Subgroup analysis reveals persistent demographic disparities, with 23% higher imputation errors for patients over 70 compared to those under 60, despite VAEM maintaining consistent performance (<5% variance) across education levels. Based on these findings, we propose a novel contextaware architecture that integrates demographic, clinical, and temporal information through attention mechanisms to improve imputation accuracy while mitigating demographic biases inherent in PD progression modeling. All code, models, and evaluation frameworks will be publicly released to advance equitable healthcare AI.

Paper Nr: 57
Title:

Toward an Explainable Heatmap-Based Deep Neural Network for Product Defect Classification and Machine Failure Prediction in Industry 4.0

Authors:

Tojo Valisoa Andrianandrianina Johanesa, Lucas Equeter, Sidi Ahmed Mahmoudi and Pierre Dehombreux

Abstract: In the context of Industry 4.0, machine maintenance and product quality control are crucial for manufacturing efficiency and reliability. This paper introduces a novel approach based on heatmap transformation and deep neural networks for product defect classification and machine failure prediction from tabular data, including static numerical and time-series data. Unlike existing approaches that analyze numerical values using either a single record or a sequence of records as input, our method converts these inputs into heatmaps. This allows for visualizing multivariate process parameters and detecting signs of defects or failures through color variations using image-based classification models. The method also incorporates an explainability approach that leverages existing image-based explainability techniques to identify specific parameters and values associated with defects or failures. This provides operators with valuable insights to help identify the root causes of problems. The approach has shown promising results when applied to two public datasets from real industrial use cases.

Paper Nr: 43
Title:

SwiNight: Class Imbalanced Night-Time Accident Detection with Swin Transformer

Authors:

Shrusti Porwal, Preety Singh, Anukriti Bansal, Saumilya Gupta, Kartikay Goel and Palakurthy Guneeth

Abstract: Night-time accident detection is a challenging task due to the scarcity of anomalous frames in the dataset. In this paper, we present a dataset of night-time accidents. We propose a Swin Transformer-based model for detecting accidents, specifically addressing the issue of dataset imbalance. The performance of the model is evaluated using different loss functions. Our experiments demonstrate that the \textit{focal loss} function outperforms the others, achieving an F1-Score of 0.710 and an accuracy of 79.77%. Experiments also reveal that the Swin Transformer delivers superior performance compared to a Vision Transformer.

Area 4 - Natural Language Understanding

Full Papers
Paper Nr: 23
Title:

LoRA-Based Summarization of Data Privacy Clauses in Terms and Conditions Documents Aligned with India’s 2023 Digital Personal Data Protection Act

Authors:

Preet Kanwal, Amish Gupta, Sai Mohananshu J and Prasad B Honnavalli

Abstract: The increasing complexity and legal jargon found in the terms and conditions documents of various organizations pose significant challenges for users attempting to understand potential privacy risks. To address this issue, we developed a method to automatically summarize these documents, particularly focusing on clauses that could lead to privacy breaches. Furthermore, our work aligns with the principles outlined in India's Digital Personal Data Protection Bill, passed in 2023, ensuring that our approach is not only effective but also compliant with emerging privacy regulations. Our approach involves fine-tuning a large language model, Mistral 7B, using a custom dataset derived from the TOS;DR dataset. We employed the Low-Rank Adaptation technique to optimize the model's performance while ensuring computational efficiency. On inference, our model achieved an average BERTScore of 88.414%. The results of our experiments demonstrate that our method can produce concise and semantically accurate summaries that effectively highlight potential privacy concerns, offering users a clearer understanding of the terms they agree to.

Paper Nr: 24
Title:

Comparison of AI Speech-to-Text Systems and Their Application in Artillery Command and Fire Control Systems

Authors:

Martin Blaha, Jaroslav Varecha, Jan Drábek and Jiří Novák

Abstract: This paper presents a comparative analysis of three leading AI speech-to-text (STT) systems: Descript.com, Google Vertex AI Studio (Chirp), and OpenAI Whisper. The objective of the study is to evaluate the accuracy, functionality, and potential applications of these technologies, with a particular focus on their inte-gration into artillery command and fire control systems. The analysis outlines the evolution of speech recognition technologies, from tra-ditional methods based on Hidden Markov Models (HMMs) to modern deep neural networks, including Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Transformer-based architectures. Practical testing was conducted on a dataset of English and Czech recordings with varying audio quality. The results indicate that Google Chirp achieves the highest accuracy in English transcriptions, while OpenAI Whisper demonstrates superior perfor-mance for the Czech language. Additionally, the paper explores the optimization of STT systems for combat en-vironments, including the use of Ant Colony Optimization (ACO) algorithms to minimize errors and enhance the relevance of transcriptions. The study also high-lights security risks associated with deploying cloud-based STT services in mili-tary applications and emphasizes the advantages of on-premise solutions to en-sure data protection. Finally, the paper discusses strategies for modernizing defense capabilities through AI technologies. It advocates for increased investment in automated command and control systems, fire control, and situational awareness, emphasiz-ing their crucial role in improving response times and the accuracy of artillery fire support.

Short Papers
Paper Nr: 54
Title:

Whisper-Conformer: A Modified Automatic Speech Recognition for Thai Speech Recognition

Authors:

Thanakron Noppanamas and Suronapee Phoomvuthisarn

Abstract: Speech is a fundamental aspect of human communication, driving advancements in Automatic Speech Recognition (ASR) to bridge the gap between humans and machines. ASR has evolved significantly with the introduction of Deep Neural Networks (DNNs), which enable models to learn speech patterns directly from raw audio. The transition to end-to-end DNNs architecture, including models like Whisper and Conformer, dramatically improved ASR performance. In the context of Thai ASR, researchers have adopted DNNs-based models, including fine-tuned versions of Whisper, to improve recognition accuracy. However, prior studies have shown that Thai ASR still faces challenges due to the lack of spaces in Thai sentences and regional dialectal variations. To address these challenges, this study proposes an alternative approach by modifying the existing Whisper model by integrating the Conformer architecture. We refer to the resulting model as Whisper-Conformer. The model is trained on 373.5 hours of Thai speech data. Our results indicate that Whisper-Conformer learns significantly faster than base-line models and outperforms the fine-tuned Whisper model, achieving 0.64 Word Error Rate (WER) and 0.42 Character Error Rate (CER) on the Common Voice Corpus (v18) and 83.27 WER and 39.96 CER on the Thai Dialect Corpus, without using a language model for spelling correction. These findings suggest that integrating the Conformer architecture enhances ASR performance and enables the model to handle challenges in Thai ASR more effectively. The pretrained models are available at https://huggingface.co/Thanakron/whisperConformer-medium-th.

Paper Nr: 58
Title:

Question Answering in a Low-Resource Language: Dataset and Deep Learning Adaptations for Sinhala

Authors:

Janani Ranasinghe and Ruvan Weerasinghe

Abstract: Significant advancements have been made in Natural Language Processing (NLP) in recent years, particularly in Question-Answering (QA). The availability of pre-trained Large Language Models (LLMs) and annotated datasets has driven these improvements. However, most resources are designed for high-resource languages like English, while low-resource languages face challenges due to data limitations. Sinhala, the most widely spoken language in Sri Lanka, with over 20 million speakers, still lacks sufficient annotated datasets and monolingual models for downstream tasks like QA. To address this gap, this study presents a Sinhala QA dataset, SiQuAD, translated from SQuAD v1.1, containing 16,000 unique question-answer pairs. Experiments covering monolingual, cross-lingual, and multilingual approaches are conducted, with the best-performing model achieving an F1 score of 73%, indicating promising capabilities while highlighting room for improvement and future research. The dataset will be made publicly available.

Area 5 - Machine Learning

Full Papers
Paper Nr: 44
Title:

Enhancing Off-Policy Method SAC with KAN for Continuous Reinforcement Learning

Authors:

Ali Bayeh, Malek Mouhoub and Samira Sadaoui

Abstract: This paper is the first to explore the integration of Kolmogorov-Arnold networks (KANs) into off-policy methods for continuous reinforcement learning (CRL) tasks. We introduce KAN-SAC, a method that integrates the KAN model and its variants, namely MultKAN and SineKAN, with the Soft Actor-Critic (SAC) algorithm. The integration is based on the embedding of the KAN architecture in both actor and critic networks. Using the Mujoco Half-Cheetah environment as a case study, we evaluate the performance of these KAN-based SAC algorithms against traditional MLP-based SAC. Our results show that KAN models have great potential, even outperforming MLP models in certain scenarios. However, further refinement of these methods is needed before they can be used as a robust alternative in complex CRL applications.

Short Papers
Paper Nr: 29
Title:

Identification of Key Feature Interactions via PDP Decomposition

Authors:

Selim Eren Eryilmaz and Ron Triepels

Abstract: Understanding feature interactions is essential for interpreting complex machine learning models. Global interpretation methods, such as Partial Dependence Plots (PDPs), are commonly used to visualize the marginal effects of features on model predictions. However, PDPs average feature effects across all other features, which can obscure critical interaction patterns and fail to identify important features influenced by these interactions. While Individual Conditional Expectation plots reveal variations in a feature's effects across individual data points, they do not provide insights into the specific interactions causing these differences. To address these limitations, we propose a method that combines functional decomposition with PDP analysis, enabling the isolation and interpretation of feature interactions. High variance indicates significant interaction effects, while low variance suggests a constant contribution to the prediction. We evaluate this approach on synthetic and real-world datasets, showing that it effectively identifies and interprets feature interactions, offering deeper insights into model behavior.

Paper Nr: 25
Title:

Rhythm Fusion: Synchronizing Audio and Motion Features for Music-Driven Dance Generation

Authors:

Nuha Aldausari, Gelareh Mohammadi and David Cooper

Abstract: Dance and music, as universal languages of emotion and expression, have been integral practices throughout human history, often associated with social and religious ceremonies. Manually animating a dancing person based on music is a challenging task that requires skill, time, and effort. However, with the help of an artificial intelligence (AI) model, dances can be generated automatically in response to music. Despite significant advancements in motion generation using AI techniques such as transformers, diffusion models, and GANs, challenges remain because existing frameworks primarily aim to produce movements that appear plausible in a general sense, rather than fully realistic. We developed our model with a specific goal: to understand the rhythmic link between music and motion and build specific components to learn the relationship between these features and then utilize that in the generative model. To achieve this, we employ a state-of-the-art diffusion-style model to create dance sequences. We then introduce two sub-models: the Fusion Sync Classifier and Fusion Sync Enhancer. These sub-models, when integrated into the main model, Rhythm Fusion, ensure audio-video synchronization and facilitate the alignment and correlation between motion and music. Through the use of quantitative metrics, we show that our model outperforms other state-of-the-art models.

Paper Nr: 41
Title:

Forecasting Ethereum Prices with Machine Learning, Deep Learning, and Explainable Artificial Intelligence Using Multi-Source Market Articles and Hybrid Sentiment Analysis

Authors:

Naresh Kumar Satish, Mathieu Mercadier, Cristina Hava Muntean and Anderson Augusto Simiscuka

Abstract: The cryptocurrency market is widely regarded as one of the most volatile financial markets due to inconsistencies in its pricing factors. Despite this volatility, it continues to attract a large population of investors, many of whom incur significant losses. To address this challenge and support risk assessment for investors, users, and other stakeholders, this paper focuses on forecasting Ethereum prices by analyzing social media sentiment. The study gathers data from sources such as global news headlines and Reddit discussion forums, enhancing it with hybrid sentiment features derived from the VADER, BERT and TextBlob models. These sentiment insights are then correlated with Ethereum’s financial parameters to establish meaningful relationships within the data, which are used to train machine learning models. The study evaluates the predictive performance of Random Forest, Extreme Gradient Boosting, and Long Short-Term Memory models. Among these, Extreme Gradient Boosting demonstrated superior performance, effectively capturing complex relationships within the data and achieving an R-squared value of 0.982115. To further enhance the study’s risk assessment capabilities, the concept of Explainable Artificial Intelligence (XAI) is employed to improve transparency and accountability in the model outcomes. Specifically, Shapley Additive Explanations (SHAP) are used to interpret the feature interactions within the Extreme Gradient Boosting model, thereby increasing its reliability and providing deeper insights into its decision-making process.