Task #3709
closed
Task #3672: RA1d - Automatic cleaning of speech corpora
Task #3704: Detection and correction of prosodic structures
Merge ASF files (segmentations) and SNT files (annotations)
Added by Hanzlíček Zdeněk about 9 years ago.
Updated over 8 years ago.
Estimated time:
(Total: 0.00 h)
Description
Merge ASF files (segmentations) and SNT files (annotations):
- Find differences between words in ASF and SNT and update SNT in SVN repository.
- Add new columns into ASF: punctuation and pronunciation.
Add missing non-speech events from SNT into ASF with zero duration.
Use upper-case chars in words in ASF (copy from SNT).
- % Done changed from 0 to 70
Scripts for this task are placed in SVN repository ARTIC_UTILS/trunk/hmm_synth/LabLight
.
A simple description of the merging procedure:
- Script
diff_asf_snt.py
performs a simple comparison between ASF and SNT files, prints out suspicious inconsistent utterances (pauses are ignored). This comparison reveals only some basic types of inconsistency.
- Manual correction of SNT file (when needed).
- Script
merge_asf_snt.py
merges ASF and SNT files, a new ASF file is created.
- Join verbs and enclictic "li" into one word (e.g. bude-li).
- Use words from SNT (with capital letters).
- Add punctuation and pronunciation columns.
Two ASF files were processed (voices MR and TJ). New ASF and SNT files are placed in ARTIC directory /artic/Experiments/asf.snt.merge
.
- % Done changed from 70 to 80
Two more voices were processed: KI and AJ.
NOTE: Several specific corrections had to be done manually. Thus, the new ASFs should replace the default ASFs in the SVN repository. Otherwise, sooner or later, we could have two parallel inconsistent ASF versions.
- Related to Task #3761: Create script for conversion ASF to SNT added
- Status changed from Assigned to Resolved
Summary
Voices wih merged ASFs:
- Czech voices: AJ, JS, KI, MR, SK, TJ
- Slovak voice: MM
ASF file for the Czech female voice PP was not merged, since the annotation file is not available.
- Status changed from Resolved to Closed
Also available in: Atom
PDF