Task #3709
closedTask #3672: RA1d - Automatic cleaning of speech corpora
Task #3704: Detection and correction of prosodic structures
Merge ASF files (segmentations) and SNT files (annotations)
100%
Description
- Find differences between words in ASF and SNT and update SNT in SVN repository.
- Add new columns into ASF: punctuation and pronunciation.
Related issues
Updated by Hanzlíček Zdeněk about 9 years ago
Add missing non-speech events from SNT into ASF with zero duration.
Updated by Hanzlíček Zdeněk about 9 years ago
Use upper-case chars in words in ASF (copy from SNT).
Updated by Hanzlíček Zdeněk about 9 years ago
- % Done changed from 0 to 70
Scripts for this task are placed in SVN repository ARTIC_UTILS/trunk/hmm_synth/LabLight
.
- Script
diff_asf_snt.py
performs a simple comparison between ASF and SNT files, prints out suspicious inconsistent utterances (pauses are ignored). This comparison reveals only some basic types of inconsistency. - Manual correction of SNT file (when needed).
- Script
merge_asf_snt.py
merges ASF and SNT files, a new ASF file is created.- Join verbs and enclictic "li" into one word (e.g. bude-li).
- Use words from SNT (with capital letters).
- Add punctuation and pronunciation columns.
Two ASF files were processed (voices MR and TJ). New ASF and SNT files are placed in ARTIC directory /artic/Experiments/asf.snt.merge
.
Updated by Hanzlíček Zdeněk about 9 years ago
- % Done changed from 70 to 80
Two more voices were processed: KI and AJ.
NOTE: Several specific corrections had to be done manually. Thus, the new ASFs should replace the default ASFs in the SVN repository. Otherwise, sooner or later, we could have two parallel inconsistent ASF versions.
Updated by Hanzlíček Zdeněk about 9 years ago
- Related to Task #3761: Create script for conversion ASF to SNT added
Updated by Hanzlíček Zdeněk over 8 years ago
- Status changed from Assigned to Resolved
Summary
Voices wih merged ASFs:- Czech voices: AJ, JS, KI, MR, SK, TJ
- Slovak voice: MM
ASF file for the Czech female voice PP was not merged, since the annotation file is not available.
Updated by Hanzlíček Zdeněk over 8 years ago
- Status changed from Resolved to Closed