Ing. Zoltán Galáž, Ing. Jiří Mekyska, prof. Ing. Zdeněk Smékal, CSc.
Date of creation: 3. 11. 2016
It is possible to download software here.
This tool provides a methodology for the de-identification of speech. For this purpose, the tool (SDT) uses a non-parametric approach based on so called “frequency warping” (VTLN) applied to the spectrum of the speech signal. During VTLN the frequency axis of the signal is mapped using a non-linear monotonous function.
The sequence of steps during VTLN can be described in 6 steps:
- Speech signal is segmented synchronously with the pitch periods of the vocal tract so it contains one pitch period of the voiced part of the speech signal. For the purpose of pitch extraction, “pitch marks” algorithm is used (W. Goncharoff).
- Each segment is filtered by the Hamming window function.
- Forward Discrete Fourier Transform is performed.
- Frequency axis of the segments is warped using specified warping function and the associated warping factor.
- Backward Discrete Fourier Transform is performed.
- Segments are chained using PSOLA (Pitch-Synchronous Overlap and Add) method.
This work was supported by project COST IC1206. The described research was performed in laboratories supported by the SIX project; the registration number CZ.1.05/2.1.00/03.0072, the operational program Research and Development for Innovation.