DeLTA 2020 Abstracts


Area 1 - Computer Vision Applications

Full Papers
Paper Nr: 9
Title:

Nematode Identification using Artificial Neural Networks

Authors:

Jason Uhlemann, Oisin Cawley and Thomais Kakouli-Duarte

Abstract: Nematodes are microscopic, worm-like organisms with applications in monitoring the environment for potential ecosystem damage or recovery. Nematodes are an extremely abundant and diverse organism, with millions of different species estimated to exist. This trait leads to the task of identifying nematodes, at a species level, being complicated and time-consuming. Their morphological identification process is fundamentally one of pattern matching, using sketches in a standard taxonomic key as a comparison to the nematode image under a microscope. As Deep Learning has shown vast improvements, in particular, for image classification, we explore the effectiveness of Nematode Identification using Convolutional Neural Networks. We also seek to discover the optimal training process and hyper-parameters for our specific context.

Paper Nr: 28
Title:

Attention-based Text Recognition in the Wild

Authors:

Zhi-Chen Yan and Stephanie A. Yu

Abstract: Recognizing texts in real-world scenes is an important research topic in computer vision. Many deep learning based techniques have been proposed. Such techniques typically follow an encoder-decoder architecture, and use a sequence of feature vectors as the intermediate representation. In this approach, useful 2D spatial information in the input image may be lost due to vector-based encoding. In this paper, we formulate scene text recognition as a spatiotemporal sequence translation problem, and introduce a novel attention based spatiotemporal decoding framework. We first encode an image as a spatiotemporal sequence, which is then translated into a sequence of output characters using the aforementioned decoder. Our encoding and decoding stages are integrated to form an end-to-end trainable deep network. Experimental results on multiple benchmarks, including IIIT5k, SVT, ICDAR and RCTW-17, indicate that our method can significantly outperform conventional attention frameworks.

Short Papers
Paper Nr: 16
Title:

Real-time On-board Detection of Components and Faults in an Autonomous UAV System for Power Line Inspection

Authors:

Naeem Ayoub and Peter Schneider-Kamp

Abstract: The inspection of power line components is periodically conducted by specialized companies to identify possible faults and assess the state of the critical infrastructure. UAV-systems represent an emerging technological alternative in this field, with the promise of safer, more efficient, and less costly inspections. In the Drones4Energy project, we work toward a vision-based beyond-visual-line-of-sight (BVLOS) power line inspection architecture for automatically and autonomously detecting components and faults in real-time on board of the UAV. In this paper, we present the first step towards the vision system of this architecture. We train Deep Neural Networks (DNNs) and tune them for reliability under different conditions such as variations in camera used, lighting, angles, and background. For the purpose of real-time on-board implementation of the architecture, experimental evaluations and comparisons are performed on different hardware such as Raspberry Pi 4, Nvidia Jetson Nano, Nvidia Jetson TX2, and Nvidia Jetson AGX Xavier. The use of such Single Board Devices (SBDs) is an integral part of the design of the proposed power line inspection architecture. Our experimental results demonstrate that the proposed approach can be effective and efficient for fully-automatic real-time on-board visual power line inspection.

Paper Nr: 18
Title:

Visual Inspection of Collective Protection Equipment Conditions with Mobile Deep Learning Models

Authors:

Bruno G. Ferreira, Bruno C. Lima and Tiago F. Vieira

Abstract: Even though Deep Learning models are presenting increasing popularity in a variety of scenarios, there are many demands to which they can be specifically tuned to. We present a real-time, embedded system capable of performing the visual inspection of Collective Protection Equipment conditions such as fire extinguishers (presence of rust or disconnected hose), emergency lamp (disconnected energy cable) and horizontal and vertical signalization, among others. This demand was raised by a glass-manufacturing company which provides devices for optical-fiber solutions. To tackle this specific necessity, we collected and annotated a database with hundreds of in-factory images and assessed three different Deep Learning models aiming at evaluating the trade-off between performance and processing time. A real-world application was developed with potential to reduce time and costs of periodic inspections of the company’s security installations.

Paper Nr: 7
Title:

Retinal Vessel Segmentation by Inception-like Convolutional Neural Networks

Authors:

Hadi N. Shirvan, Reza A. Moghadam and Kurosh Madani

Abstract: Deep learning architectures have been proposed in some neural networks like convolutional neural networks (CNN), recurrent neural networks and deep belief neural networks. Among them, CNNs have been applied in image processing tasks frequently. An important section in intelligent image processing is medical image processing which provides intelligent tools and software for medical applications. Analysis of blood vessels in retinal images would help the physicians to detect some retina diseases like glaucoma or even diabetes. In this paper a new neural network structure is proposed which can process the retinal images and detect vessels apart from retinal background. This neural network consists of convolutional layers, concatenate layers and transpose convolutional layers. The results for DRIVE dataset show acceptable performance regarding to accuracy, recall and F-measure criteria.

Area 2 - Models and Algorithms

Full Papers
Paper Nr: 17
Title:

Generation of Human Images with Clothing using Advanced Conditional Generative Adversarial Networks

Authors:

Sheela R. Kurupathi, Pramod Murthy and Didier Stricker

Abstract: One of the main challenges of human-image generation is generating a person along with pose and clothing details. However, it is still a difficult task due to challenging backgrounds and appearance variance. Recently, various deep learning models like Stacked Hourglass networks, Variational Auto Encoders (VAE), and Generative Adversarial Networks (GANs) have been used to solve this problem. However, still, they do not generalize well to the real-world human-image generation task qualitatively. The main goal is to use the Spectral Normalization (SN) technique for training GAN to synthesize the human-image along with the perfect pose and appearance details of the person. In this paper, we have investigated how Conditional GANs, along with Spectral Normalization (SN), could synthesize the new image of the target person given the image of the person and the target (novel) pose desired. The model uses 2D keypoints to represent human poses. We also use adversarial hinge loss and present an ablation study. The proposed model variants have generated promising results on both the Market-1501 and DeepFashion Datasets. We supported our claims by benchmarking the proposed model with recent state-of-the-art models. Finally, we show how the Spectral Normalization (SN) technique influences the process of human-image synthesis.

Short Papers
Paper Nr: 14
Title:

Data Augmentation for Semantic Segmentation in the Context of Carbon Fiber Defect Detection using Adversarial Learning

Authors:

Silvan Mertes, Andreas Margraf, Christoph Kommer, Steffen Geinitz and Elisabeth André

Abstract: Computer vision systems are popular tools for monitoring tasks in highly specialized production environments. The training and configuration, however, still represents a time-consuming task in process automation. Convolutional neural networks have helped to improve the ability to detect even complex anomalies withouth exactly modeling image filters and segmentation strategies for a wide range of application scenarios. In recent publications, image-to-image translation using generative adversarial networks was introduced as a promising strategy to apply patterns to other domains without prior explicit mapping. We propose a new approach for generating augmented data to enable the training of convolutional neural networks for semantic segmentation with a minimum of real labeled data. We present qualitative results and demonstrate the application of our system on textile images of carbon fibers with structural anomalies. This paper compares the potential of image-to-image translation networks with common data augmentation strategies such as image scaling, rotation or mirroring. We train and test on image data acquired from a high resolution camera within an industrial monitoring use case. The experiments show that our system is comparable to common data augmentation approaches. Our approach extends the toolbox of semantic segmentation since it allows for generating more problem-specific training data from sparse input.

Paper Nr: 21
Title:

Technical Sound Event Classification Applying Recurrent and Convolutional Neural Networks

Authors:

Constantin Rieder, Markus Germann, Samuel Mezger and Klaus P. Scherer

Abstract: In many intelligent technical assistance systems (especially diagnostics), the sound classification is a significant and useful input for intelligent diagnostics. A high performance classification of the heterogeneous sounds of any mechanical components can support the diagnostic experts with a lot of information. Classical pattern recognition methods fail because of the complex features and the heterogeneous state noise. Because of no explicit human knowledge about the characteristic representation of the classes, classical feature generation is impossible. A new approach by generation of a concept for neural networks and realization by especially convolutional networks shows the power of technical sound classification methods. After the concept finding a parametrized network model is devised and realized. First results show the power of the RNNs and CNNs. Dependent on the parametrized configuration of the net architecture and the training sets an enhancement of the sound event classification is possible.

Paper Nr: 26
Title:

Using Conditional Generative Adversarial Networks to Boost the Performance of Machine Learning in Microbiome Datasets

Authors:

Derek Reiman and Yang Dai

Abstract: The microbiome of the human body has been shown to have profound effects on physiological regulation and disease pathogenesis. However, association analysis based on statistical modeling of microbiome data has continued to be a challenge due to inherent noise, complexity of the data, and high cost of collecting large number of samples. To address this challenge, we employed a deep learning framework to construct a data-driven simulation of microbiome data using a conditional generative adversarial network. Conditional generative adversarial networks train two models against each other while leveraging side information learn from a given dataset to compute larger simulated datasets that are representative of the original dataset. In our study, we used a cohorts of patients with inflammatory bowel disease to show that not only can the generative adversarial network generate samples representative of the original data based on multiple diversity metrics, but also that training machine learning models on the synthetic samples can improve disease prediction through data augmentation. In addition, we also show that the synthetic samples generated by this cohort can boost disease prediction of a different external cohort.

Area 3 - Natural Language Understanding

Short Papers
Paper Nr: 29
Title:

Detection of Depression in Thai Social Media Messages using Deep Learning

Authors:

Boriharn Kumnunt and Ohm Sornil

Abstract: Depression problems can severely affect not only personal health, but also society. There is evidence that shows people who suffer from depression problems tend to express their feelings and seek help via online posts on online platforms. This study is conducted to apply Natural Language Processing (NLP) with messages associated with depression problems. Feature extractions, machine learning, and neural network models are applied to carry out the detection. The CNN-LSTM model, a unified model combining Convolutional Neural Networks (CNN) and Long Short-Term Memory Networks (LSTM), is used sequentially and in parallel as branches to compare the outcomes with baseline models. In addition, different types of activation functions are applied in the CNN layer to compare the results. In this study, the CNN-LSTM models show improvement over the classical machine learning method. However, there is a slight improvement among the CNN-LSTM models. The three-branch CNN-LSTM model with the Rectified Linear Unit (ReLU) activation function is capable of achieving the F1-score of 83.1%.

Area 4 - Machine Learning

Full Papers
Paper Nr: 11
Title:

Deep Learning Residual-like Convolutional Neural Networks for Optic Disc Segmentation in Medical Retinal Images

Authors:

Amir H. Panahi, Reza A. Moghadam and Kurosh Madani

Abstract: Eye diseases such as glaucoma, if undiagnosed in time, can have irreversible detrimental effects, which can lead to blindness. Early detection of this disease by screening programs and subsequent treatment can prevent blindness. Deep learning architectures have many applications in medicine, especially in medical image processing, that provides intelligent tools for the prevention and treatment of diseases. Optic disk segmentation is one of the ways to diagnose eye disease. This paper presents a new approach based on deep learning, which is accurate and fast in optic disc segmentation. By Comparison proposed method with the best-known methods on publicly available databases DRIONS-DB, RIM-ONE v.3, the proposed algorithm is much faster, which can segment the optic disc in 0.008 second with outstanding performance concerning IOU and DICE scores. Therefore, this method can be used in ophthalmology clinics to segment the optic disc in retina images and videos as online medical assistive tool.

Short Papers
Paper Nr: 22
Title:

Accelerating Matrix Factorization by Overparameterization

Authors:

Pu Chen and Hung-Hsuan Chen

Abstract: This paper studies overparameterization on the matrix factorization (MF) model. We confirm that overparameterization can significantly accelerate the optimization of MF with no change in the expressiveness of the learning model. Consequently, modern applications on recommendations based on MF or its variants can largely benefit from our discovery. Specifically, we theoretically derive that applying the vanilla stochastic gradient descent (SGD) on the overparameterized MF model is equivalent to employing gradient descent with momentum and adaptive learning rate on the standard MF model. We empirically compare the overparameterized MF model with the standard MF model based on various optimizers, including vanilla SGD, AdaGrad, Adadelta, RMSprop, and Adam, using several public datasets. The experimental results comply with our analysis – overparameterization converges faster. The overparameterization technique can be applied to various learning-based recommendation models, including deep learning-based recommendation models, e.g., SVD++, nonnegative matrix factorization (NMF), factorization machine (FM), NeuralCF, Wide&Deep, and DeepFM. Therefore, we suggest utilizing the overparameterization technique to accelerate the training speed for the learning-based recommendation models whenever possible, especially when the size of the training dataset is large.

Paper Nr: 25
Title:

Multi-channel ConvNet Approach to Predict the Risk of in-Hospital Mortality for ICU Patients

Authors:

Fabien Viton, Mahmoud Elbattah, Jean-Luc Guérin and Gilles Dequen

Abstract: The healthcare arena has been undergoing impressive transformations thanks to advances in the capacity to capture, store, process, and learn from data. This paper re-visits the problem of predicting the risk of in-hospital mortality based on Time Series (TS) records emanating from ICU monitoring devices. The problem basically represents an application of multi-variate TS classification. Our approach is based on utilizing multiple channels of Convolutional Neural Networks (ConvNets) in parallel. The key idea is to disaggregate multi-variate TS into separate channels, where a ConvNet is used to extract features from each univariate TS individually. Subsequently, the features extracted are concatenated altogether into a single vector that can be fed into a standard MLP classification module. The approach was experimented using a dataset extracted from the MIMIC-III database, which included about 13K ICU-related records. Our experimental results show a promising accuracy of classification that is competitive to the state-of-the-art.