RA4: Automatic error prediction and signal modification
openover 6 years late (31.12.2018)
Automatic error prediction and dedicated signal modification
95%
RA4: Automatic error prediction and dedicated signal modification¶
There is always a danger in concatenation-based speech synthesis that an artifact occurs at a concatenation point, even when phonetically motivated optimizations described in RA3 are proposed. This is caused by the limited size of speech unit database relative to the natural variability of speech. Though it is widely accepted that the best quality in unit selection is achieved when no signal modification is carried out at all, we believe selective signal modification targeted at the specific component of unit selection which causes the artifact can suppress it. Based on the analysis of artifacts in synthetic speech carried out in RA1, an error prediction module will be designed to predict potential artifacts (e.g. F0 discontinuity) in to-be-synthesized speech during the unit-selection runtime (RA4a) [LU10], [VIT13], [LEG13]. According to the type of the predicted artifact, dedicated signal modification (e.g. F0 smoothing) will be carried out (RA4b). Since a combination of unit selection and HMM-based speech synthesis were reported to be helpful in literature (e.g. [BLA07], [SIL10]), hybrid approaches will be examined as well (RA4c). The possibility to generate speech from HMMs when the unit-selection scheme would result in an artifact will also be researched, and a compromise between using the selected (i.e. natural) speech segments (which can, however, result in discontinuities and disruptive artifacts) and generated segments (either by dedicated signal modification technique or by HMM- based synthesis) will be sought. The compromise should balance mixing the selected and smoothed/generated speech, possibly with a configurable scheme according to listeners’ preference (RA4d).
Activity | Objective | Workplace | 2016 | 2017 | 2018 | Dissemination |
---|---|---|---|---|---|---|
RA4a | Automatic error prediction | UWB | x | x | Jimp: 1, D: 6 | |
RA4b | Dedicated signal modification | UWB | x | x | ||
RA4c | Hybrid approaches | UWB | x | x | ||
RA4d | Compromise between selected and generated speech | UWB | x |