Abstracts of 2nd SPLab Workshop 2012

Dr. Peter Balazs, Acoustics Research Institute, Austrian Academy of Sciences, Austria

Frame Theory and its Acoustical Applications
Frames, as a generalization of bases, allow redundant representations, that still allow perfect reconstruction. We will shortly review the basic notions of frame theory, and, in particular explain, why they are especially interesting for signal-processing applications. As a particular useful case, we will briefly review the basic notions of non-stationary Gabor frames.
Certain kind of operators appear in a lot of scientific fields, like mathematics, physics, signal processing and acoustics. Those operators consist in analysis, multiplication and re-synthesis. In the frame theory context, we will review these operators as frame multipliers, and their applications as time-variant filters. We will show recent (mathematical) results about the invertibility of multipliers. As possible application we will introduce the notion of irrelevance filters, as well as a denoising approach in the time-frequency plane.

Prof. Jesús Bernardino Alonso Hernández, Departamento de Seńales y Comunicaciones, Universidad de Las Palmas de Gran Canaria (ULPGC), Las Palmas de Grand Canaria, Spain

Acoustic Analysis for the Clinical Evaluation of the Fonator System
The literature offers a wide range of references about measures based on the speech signal for the acoustic study of quality voice but the most measures have a difficult interpretation in the medical environment. In this work, we have identified four abnormalities of the laryngeal function (problems of stability in the voice, problems in the rhythm of hit, glottal closure problems and problems of irregularities in the mass) that produce a phonation of abnormal quality. In addition, through the use of data mining techniques, we have been selected the best measures that quantify each anomaly. From these results, we have developed a software tool based on the proposed protocol that is being used in medical environment.

Prof. Peter Brezany, Research Group Scientific Computing, University of Vienna, Austria

Introduction of the Research Group for Scientific Computing
The research group was created from the former Institute for Scientific Computing on January 1, 2011. It consists of two cooperating teams addressing: 1) High Performance Computing, a team lead by Prof. Siegfried Benkner and 2) Data-Intensive Research, a team lead by Prof. Peter Brezany. In the past the institute and reasearch group have been involved in more than 20 EU projects. The presentation will introduce teaching and research activities of the group with a more focus on the Data-Intensive Research team; its activities involve large-scale data integration and mining workflows in the context of grid and cloud technologies, parallel and distributed data mining and OLAP, data provenance, data space management, data stream management and mining, etc.

Large-scale Data Analytics: Applications and Technology
Every day, we create 2.5 quintillion (10^18) bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: business transactions, sensor networks, experiments supported by high-fidelity scientific instruments, social networks, computer simulations, cell phone GPS signals, to name a few. This data is called BIG DATA. It is not only massive, but also large-scale distributed. Efficient management of big data (secure storage, discovering relevant data sets, searching the data for answers to specific questions, discovering interesting new data elements and patterns, etc.) is one of the most challenging research task. Fortunately, the advent of modern cyber-infrastructures such as grids and clouds offer foundational platforms on which to scale massive data analytics. In this talk several big data example applications and various technologies are discussed in the context of data-intensive research to support the handling of big data. The focus is on the EU project ADMIRE and Austrian project CloudMiner. ADMIRE has developed a data-intensive language DISPEL for data integration and analysis, data analytics services, and a platform for optimal execution large-scale analytics workflow. ADMIRE technology and its example applications are also described in the book, which appears in the Wiley publishing house in this year. CloudMiner is a follow-up project of the former GridMiner project. The main research focus has been by now on cloud-based data stream management and mining. A new EU project SPES (Support Patients through E-Service Solutions) whose solutions are based on the CloudMiner technology is briefly discussed.

Dr. Vincezo Capuano, Second University of Naples, Caserta and IIASS, Italy

Assessing natural emotional facial expressions: an evaluation of the I.Vi.T.E. database
The use of expressions portrayed by actors is a popular method for researching on the perception of emotion, however some recent experiments suggests that actors do not feel the acted emotion, and the produced expressions are far from being genuine and realistic. This work presents experimental results on the assessment of the 479 selected frames derived from I.Vi.T.E. (Italian Visible Thermal Emotional database) a naturalistic visible and thermal database developed by video capturing 49 facial expressions naturally produced by Italian students in four induced emotional context (happiness, fear, sadness, disgust). The huge number of facial emotional expressions, have been evaluated by 120 subjects (balanced by gender) who have been asked to attach each of them the emotional label they considered appropriate. Also the intensity of the proposed emotional facial expressions have been considered. The results will be used to create a database of spontaneous facial expressions of emotions which will be of great utility in helping researchers to develop models describing and reproducing human interaction.

Dr. Monika Dörfler, NuHAG, University of Vienna, Austria

Structured Sparsity in Audio Processing
Meaningful audio signals are known to be highly structured. Incorporating knowledge about the inherent structures helps to improve algorithms in various applications such as denoising, source separation or inpainting. In this talk, we are going to address the principal ideas of structured sparse atomic decomposition and their relation to some well-known thresholding operators, which may be applied in fast iterative shrinkage algorithms. The general framework allows for the exploitation of structural properties, in particular the persistence inherent to most natural audio signals. We are going to show the application of our algorithms to real-life audio signals.

Prof. Anna Esposito, Second University of Naples, Caserta and IIASS, Italy

The effects of mild developmental disorders in decoding visual and auditory emotional information
This work will reports on children ability to recognize emotional information from face, voice, and from dynamic stimuli such as videos.
Experimental perceptual data show that there seems to be no differences between typical and dyslexic children in decoding facial and vocal emotional expressions. This is the case, instead, for children with mild developmental learning disorders (that include dyslexia) when the stimuli are dynamic in the form, in particular for fear. This results is particularly surprising since fear is considered a basic emotion, and therefore associated with evolutionary needs. Considerations are made for typical and disordered children encoding of emotional information.

Prof. Marcos Faundez-Zanuy, EUP Mataró, TecnoCampus Mataró-Maresme, Spain

Online drawings for dementia diagnose: in-air and pressure information analysis
In this paper we present experimental results comparing on-line drawings for control population (left and right hand) as well as Alzheimer disease patients. The drawings have been acquired by means of a digitizing tablet, which acquires time information angles and pressures. Experimental measures based on pressure and in-air movements appear to be significantly different for both groups, even when control population performs the tasks with the nondominant hand.

Prof. Hans G. Feichtinger, NuHAG, University of Vienna, Austria

How can we approximate continuous problems by periodic discrete ones?
Gabor Analysis can be realized naturally over any LCA (locally compact Abelian) group. Naturally it is described in the context of Euclidean spaces (with natural notions of time- and frequence shifts) using the continuous Fourier transform, or in the setting of finite groups, resp. in the context of discrete and periodic signals using the corresponding FFT in order to realize the corresponding signal expansions.
In this talk we will report about the challenge to approximate an given continuous problem in different ways, either by trying to perform the necessary computations with the necessary precision in a continuous context, or by finding a “similar” discrete context where the standard algorithms (by now availalbe) for regular Gabor families can be applied.
In this connection Cauchy-conditions on such families, i.e. the question when two approximations to the same continuous situation, with signals of different finite length, can be considered to be (more or less closely) related, not only in a practical sense (based on visual inspection), but also in terms of strict mathematiccal terminology allowing to compute quantitative measures of similarity.

Prof. Pedro Gómez Vilda, Departamento de Arquitectura y Tecnología de Sistemas Informáticos (DATSI), Facultad de Informática, Universidad Politécnica de Madrid (UPM), Madrid, Spain

Center for Biomedical Technology, Universidad Politécnica de Madrid
An introduction of Center for Biomedical Technology, Universidad Politécnica de Madrid.

Organic and Neurological Disease Detection and Monitoring from Voice: Where we are and how far may we go?
Voice and voiced speech are biological signals produced by humans with communication purposes. Current signal processing methods allow the inversion of the voice producing apparatus to deepen into biomechanical systems driven by neuromotor signals. Complex inverse models may be designed and set-up to reconstruct semantically rich correlates which are related with organic as well as neurologic disease etiology. New detecting, tracking and monitoring tools and protocols may help in improving current protocols to help in the care of illnesses as Parkinson, Alzheimer or Lateral Amiotrophic Schlerosis, or in Voice Rehabilitation of the Laryngectomized, or in evaluating function restoration after Larynx Conservative Surgery, as well as in Singing Education and many other fields. The talk concentrates in discussing the basic methodologies and algorithms to focus on protocols, tools and results in the state-of-the-art.

Dr. Martin Holters, Department of Signal Processing and Communications, Helmut Schmidt University, University of the Federal Armed Forces Hamburg, Germany

Department of Signal Processing and Communications, Helmut Schmidt University – University of the Federal Armed Forces, Hamburg
The expression “University of the Federal Armed Forces” usually raises a couple of questions, so the talk will start with what is special (or not) about a university belonging to the armed forces. The main focus will then be on the research done in the Department of Signal Processing and Communications, comprising audio signal and image processing as well as communications.

Virtual Analog Modeling using Non-linear State-Space Models
In recent years, non-linear state-space models have gained popularity for modeling analog guitar effect circuits and amplifiers. Together with circuit analysis methods like the nodal DK method, they provide a systematic way from circuit schematic to digital simulation. The talk will start with an introduction into the nodal DK method and how to systematically derive a non-linear state-space model. After the basis has thus been laid, shortcomings and problems will be discussed, possible extensions pointed out and open questions for future research posed.

Prof. Amir Hussain, COSIPRA LAB, University of Stirling, United Kingdom


Dr. Christian H. Kasess, Acoustics Research Institute, Austrian Academy of Sciences, Austria

Modelling speech using pole-zero models
All-pole filter models have a long tradition in the modeling of speech. These models are particularly well suited to describe oral vowels and have a direct link to one-dimensional single-tube models. For other types of phonemes, however, all-pole models are not the most efficient description. For nasalized vowels and nasal stops such as /n/ and /m/ for instance, the nasal cavity is coupled to the pharyngeal and oral tract and thus regions of lower energy appear in the speech spectrum. As a consequence, pole-zero models are more efficient in describing such data. Here, different pole-zero estimation methods will be overviewed and the link to a simple speech production model for nasal stops will be described.

M.Sc. Mikko-Ville Laitinen, Department of Signal Processing and Acoustics, Aalto University, Finland

Department of Signal Processing and Acoustics at the Aalto University
The talk introduces the research areas in the Department of Signal Processing and Acoustics at the Aalto University. Particular attention is paid to spatial sound, audio signal processing and speech applications. Furthermore, facilities used in the research are presented.

Directional Audio Coding and Spatial Impulse Response Rendering
Directional audio coding (DirAC) is a perceptually motivated method to reproduce spatial sound. It analyzes the directional properties of the sound field in the time-frequency domain with the temporal and the frequency resolution of the human hearing. In the synthesis phase, the sound is divided into nondiffuse and diffuse parts according to the analysis. The nondiffuse part is reproduced as point sources at the analyzed directions-of-arrival, and the diffuse part as surrounding.
In this talk, an overview of the DirAC approach is presented. Furthermore, a few application areas of DirAC are discussed, such as teleconferencing, high-quality spatial sound reproduction, spatial impulse response rendering, and spatial sound reproduction in virtual worlds.

Prof. Karmele López de Ipina Pena, Department of System Engineering and Automation, University of the Basque Country, San Sebastián, Spain

Biosystem and Biomedical Engineering: standpoint from a research group in the Basque Country
Biosystems Engineering is a field of engineering which deals with engineering science and design oriented to biological, environmental and agricultural sciences. This field is also the branch of engineering that tries to solve problems involving biological systems. On the other hand Biomedical Engineering is the application of engineering principles to medicine and biology.
This work present both disciplines from the point of view of a research group in the Basque Country. We will explain the features of environment with regard to the region, society, economy, industrial sector and national and international renown. The group is developing project in: Biomedicine, Biodiversity analysis, Biosystem Modelling Biosignal Processing, Data Mining and Pattern Recognition.

Emotion recognition oriented to early diagnosis of dementias
Emotions arise from the need to face a changing and partially unpredictable world which makes necessary to any intelligent system (natural or artificial) the develop-ment of emotions to survive. Emotions are closely linked to learning and understanding process. Emotions are cognitive processes related to the architecture of the human mind (decision making, memory, attention, etc.). In the case of emotional response in Alzheimer’s patients becomes impaired and seems to go through different states. In the early stages appears social and even sexual disinhibition, behavioural changes (be angry and not being able to perform common tasks, not to express or not remembering). However, the emotional memory remains… Some responses are likely to be magnified due to an alteration in perception. Other research suggests, moreover, that the patients in this progressive brain disorder, in advanced stages, may also have a reduced ability to feel emotions due to loss of memory and memories. Then it appears apathy and sometimes depression. New approaches for early diagnosis dementia based on Automatic Spontaneous Speech Analysis and Emotional Temperature will be presented.

Prof. Carlos Manuel Travieso González, Departamento de Seńales y Comunicaciones, Universidad de Las Palmas de Gran Canaria (ULPGC), Las Palmas de Grand Canaria, Spain

IDeTIC – ULPGC: Research and Innovation
In this presentation, we will show our university (ULPGC) and in particular our institute (IDeTIC). We will comment which are our research lines of our institute and in particular the main research lines of our Division (biometrics, biomedicine and biodiversity), our projects and finally, our publication. The goal is to show our expertise in order to search synergies.

Opportunities given by biometrics in the field of neurodegenerative diseases: Face and Writing
Nowadays, we have researched the biometric identification using different modalities, including facial recognition and handwritten identification (signature and writing). We have developed a new soft-biometric approach for identifying people’s sex using facial images, and now we are developing an approach for the detection of facial emotion. This instrument opens the door to assess people’s emotions and their expression in the early detection of neurodegenerative diseases. Therefore, we have been working in image and the next step, we are going to work in video, under real time. A second element that is used by neurologists is handwriting strokes. Currently, we have developed online and offline systems the biometric identification, and certain parameters can be used to provide information to medical doctors on the skills of prospective patients. We are showing in this speech, a system for identifying emotions based on face and a handwriting recognition system.

Dr. Thomas Mazzocco, COSIPRA LAB, University of Stirling, United Kingdom

Information processing for new generation of clinical decision support systems
A major discrepancy between clinical care actually delivered and optimal patient care has been ascertained. A key challenge is how to mine the huge amount of information nowadays collected in healthcare domain in order to build intelligent models which can effectively help clinicians to make optimal decisions. Alternative models for traditional primary care have been actively explored, with many pilot predictive models successfully developed over the last decades. Artificial intelligence techniques, many of which can be considered black-box models, can improve accuracy when compared to traditional models generally based on linear or logistic regression. Our proposed framework tries to combine the benefits of these two approaches maximizing the accuracy of the model, keeping it as transparent as possible. The design process of enhanced clinical decision support systems, including further developments of the commonly used framework to reduce the misclassification rates, will be presented and exemplified.Information processing for new generation of clinical decision support systems.

Dr. Ingo Mierswa, CEO, Rapid-I, Germany

From Science to Enterprise: Data Analysis for the Masses
RapidMiner and its server RapidAnalytics are today among the world-wide most often used solutions for complex data analysis tasks. The company Rapid-I maintaines the open source software solutions and drives innovative development together with an amazing community creating leading-edge scientific results as well as practical solutions for business problems. During his presentation, Rapid-I co-founder Dr. Ingo Mierswa will demonstrate how to create a sophisticated predictive analytics application from scratch with no programming using RapidMiner. He will also discuss how the different world of scientists and business users as well as data mining professionals and first-day analysts are connected by the open but yet professional environment Rapid-I offers to all users and customers alike.

ViSTA-TV: Live-stream data mining analysis of TV data
Live video content is increasingly consumed over IP networks in addition to traditional broadcasting. The move to IP provides a huge opportunity to discover what people are watching in much greater breadth and depth than currently possible through interviews or set-top box based data gathering by rating organizations, because it allows direct analysis of consumer behavior via the logs they produce. The ViSTA-TV project proposes to gather consumers’ anonymized viewing behavior and the actual video streams from broadcasters/IPTV-transmitters, to combine them with enhanced electronic program guide information as the input for a holistic live-stream data mining analysis.
ViSTA-TV will employ the gathered information via a stream-analytics process to generate a high-quality linked open dataset (LOD) describing live TV programming. Combining the LOD with the behavioral information gathered, ViSTA-TV will be in the position to provide highly accurate market research information about viewing behavior that can be used for a variety of analyses of high interest to all participants in the TV-industry. ViSTA-TV will employ the information gathered to build a recommendation service that exploits both usage information and personalized feature extraction in conjunction with existing metadata to provide real-time viewing recommendations. These results will be made possible by scientific progress in data-stream mining consisting of advances in data mining for tagging, recommendations, and behavioral analyses and temporal/probabilistic RDF-triple stream processing.

Dr. Thibaud Necciari, Acoustics Research Institute, Austrian Academy of Sciences, Austria

The ERBlet transform, time-frequency masking and perceptual sparsity
Time-frequency (TF) representations are widely used in audio applications involving sound analysis-synthesis. For such applications, obtaining an invertible TF transform that accounts for some aspects of human auditory perception is of high interest. To that end, we combine results of non-stationary signal processing and psychoacoustics. First, we exploit the theory of non-stationary Gabor frames to obtain a linear and perfectly invertible non-stationary Gabor transform (NSGT) whose TF resolution best matches the TF analysis properties by the ear. The peripheral auditory system can be modeled in a first approximation as a bank of bandpass filters whose bandwidth increases with increasing center frequency. These so-called “auditory filters” are characterized by their equivalent rectangular bandwidths (ERB) that follow the ERB scale. Here, we use a NSGT with resolution evolving across frequency to mimic the ERB scale, thereby naming the resulting paradigm “ERBlet transform”.
Second, we exploit recent psychoacoustical data on auditory TF masking to find an approximation of the ERBlet that keeps only the audible components (perceptual sparsity criterion). Our long-term goal is to obtain a perceptually relevant signal representation, i.e., as close as possible to “what we see is what we hear”. Auditory masking occurs when the detection of a sound (referred to as the “target” in psychoacoustics) is degraded by the presence of another sound (the “masker”). To accurately predict auditory masking in the TF plane, TF masking data for masker and target signals with a good localization in the TF plane are required. To our knowledge, these data are not available in the literature. Therefore, we conducted psychoacoustical experiments to obtain a measure of the TF spread of masking produced by a Gaussian TF atom.
The ERBlet transform and the psychoacoustical data on TF masking will be presented. The implementation of the perceptual sparsity criterion in the ERBlet will be discussed.

Dr. Darian Onchis-Moaca, NuHAG, University of Vienna, Austria

Approximate dual Gabor atoms via the adjoint lattice method
This talk promotes a numerical approach for the calculation of the dual Gabor atom for general Gabor frames, which are obtained by applying all time-frequency shifts from a given lattice to a Gabor atom. The theoretical foundation for the approach is the well-known Wexler-Raz biorthogonality relation and the more recent theory of localized frames. The combination of these principles guarantees that the dual Gabor atom can be approximated by a linear combination of a few time-frequency shifted atoms from the adjoint lattice. The effectiveness of this approach is demonstrated by numerical examples and justified by a new theoretical argument.

Dr. Jiří Sedlář, Department of Image Processing, Institute of Information Theory and Automation (UTIA), Academy of Sciences, Czech Republic

Automatic measurement of vocal fold vibration parameters in videokymographic images
Videokymography is a novel video recording technique used for examination of vocal fold vibrations. A videokymographic camera scans a single line repeatedly at a high-speed rate (7200 lines/s); the resulting videokymogram consists of successive images of the scanned line. Visual evaluation of videokymograms is difficult and time-consuming, so the possibility of computer-aided diagnostics is of great interest. The objective of this project was to develop methods for automatic detection of reflections, rima glottidis, and mucosal waves. Applicability of the proposed methods was demonstrated on a set of videokymograms with a wide range of voice disorders; the results were comparable with visual measurements by clinicians.

Dr. Peter Sondergaard, Department of Mathematics, Technical University of Denmark, Denmark

Frames in LTFAT
The Linear Time-Frequency Toolbox was originally constructed around the Discrete Gabor Transform. In the future, LTFAT will contain a large variety of frames. In this talk we present the object oriented framework that makes this possible, the basic and advanced methods that the framework provides, and an introduction to the frames in the framework.

Dr. Filip Šroubek, Department of Image Processing, Institute of Information Theory and Automation (UTIA), Academy of Sciences, Czech Republic

Research topics in the Department of Image Processing, UTIA (Seeing the Unseen)

Superresolution imaging: from equations to mobile applications
In the last five years we have witnessed a rapid improvement of methods that perform image restoration, such as, denoising, deconvolution and superresolution. We will provide a brief mathematical background to superresolution as an optimization problem and summarize our contribution. Specifically, we will talk about robustness to misregistration, an extension to space-variant cases and a fast converging method of augmented Lagrangian suitable for constrained optimization problems. We will also give an overview of our past and ongoing commercial applications in which superresolution plays a key role.

Prof. Alessandro Vinciarelli, School of Computing Science, University of Glasgow, Glasgow, Scotland

The School of Computing Science at the University of Glasgow
The talk introduces the main activity areas in the School of Computing Science at the University of Glasgow. Particular attention is paid to domains that involve signal processing and machine learning as the main technological components. Furthermore, the accent will be put on international activities that range from the exchange programs from undergraduate and graduate students to large European scientific collaborations.

An Introduction to Social Signal Processing
Social Signal Processing is the interdisciplinary domain aimed at modelling analysis and synthesis of nonverbal communication in social interactions. The talk will introduce the main principles and methodologies of the domain as well as two major research directions currently explored at the University of Glasgow, namely the automatic detection of conflict and the automatic inference of personality traits from speech. The conclusion will focus on the most important challenges still open in the domain and will highlight the opportunities for researchers potentially interested in the field.