RA4: Automatic error prediction and dedicated signal modification¶

There is always a danger in concatenation-based speech synthesis that an artifact occurs at a concatenation point, even when phonetically motivated optimizations described in RA3 are proposed. This is caused by the limited size of speech unit database relative to the natural variability of speech. Though it is widely accepted that the best quality in unit selection is achieved when no signal modification is carried out at all, we believe selective signal modification targeted at the specific component of unit selection which causes the artifact can suppress it. Based on the analysis of artifacts in synthetic speech carried out in RA1, an error prediction module will be designed to predict potential artifacts (e.g. F0 discontinuity) in to-be-synthesized speech during the unit-selection runtime (RA4a) [LU10], [VIT13], [LEG13]. According to the type of the predicted artifact, dedicated signal modification (e.g. F0 smoothing) will be carried out (RA4b). Since a combination of unit selection and HMM-based speech synthesis were reported to be helpful in literature (e.g. [BLA07], [SIL10]), hybrid approaches will be examined as well (RA4c). The possibility to generate speech from HMMs when the unit-selection scheme would result in an artifact will also be researched, and a compromise between using the selected (i.e. natural) speech segments (which can, however, result in discontinuities and disruptive artifacts) and generated segments (either by dedicated signal modification technique or by HMM- based synthesis) will be sought. The compromise should balance mixing the selected and smoothed/generated speech, possibly with a configurable scheme according to listeners’ preference (RA4d).

Activity	Objective	Workplace	2016	2017	2018	Dissemination
RA4a	Automatic error prediction	UWB	x	x		Jimp: 1, D: 6
RA4b	Dedicated signal modification	UWB		x	x
RA4c	Hybrid approaches	UWB		x	x
RA4d	Compromise between selected and generated speech	UWB			x

		Task #3680: RA4a - Automatic error prediction	Actions
		Task #3681: RA4b - Dedicated signal modification	Actions
		Task #3682: RA4c - Hybrid approaches	Actions
		Task #3683: RA4d - Compromise between selected and generated speech	Actions
		Task #3698: Experiment with one-class clasification for join cost enhancements	Actions
		Task #3699: Compute features (MFCC, LPC, LPCenv and FFTpow)	Actions
		Task #3770: Experiment with gender and age classification for synthetic speech error predictions and evaluation	Actions
		Task #3771: Create a paper for TSD 2016	Actions
		Task #3772: Submit a paper for TSP 2016	Actions
		Task #3773: Submit a paper to journal Measurement Science Review	Actions
		Task #3809: Submit a paper for INTERSPEECH 2016	Actions
		Task #3811: Experiment with statistical outlier detection and removal	Actions
		Task #3855: More data for artefacts collection	Actions
		Task #4150: Statistical parametric speech synthesis (SPS)	Actions
		Task #4151: Neural network based statistical parametric speech synthesis	Actions
		Task #4152: Vocoder	Actions
		Task #4153: Hybrid neural network / unit selection approach	Actions
		Task #4154: Hybrid HMM / unit selection approach	Actions
		Task #4205: DNN-based outlier duration detection and penalization	Actions
		Task #4206: HMM-based outlier duration detection and penalization	Actions
		Task #4462: Automatic evaluation of synthetic speech quality by a system based on statistical analysis	Actions
		Task #4463: First Steps Towards Hybrid Speech Synthesis in Czech TTS system ARTIC	Actions

Project

General

Profile

HQSYN16

RA4: Automatic error prediction and signal modification

RA4: Automatic error prediction and dedicated signal modification¶