Task #3709
closed
  
    
    
  
Task #3672: RA1d - Automatic cleaning of speech corpora
Task #3704: Detection and correction of prosodic structures
Merge ASF files (segmentations) and SNT files (annotations)
 
        
        Added by Hanzlíček Zdeněk almost 10 years ago.
        Updated over 9 years ago.
        
Estimated time:
 (Total: 0.00 h)
 
  
  
  
  Description
  
  Merge ASF files (segmentations) and SNT files (annotations):
	
	- Find differences between words in ASF and SNT and update SNT in SVN repository.
- Add new columns into ASF: punctuation and pronunciation.
 
 
  
  
    
    
    
    Add missing non-speech events from SNT into ASF with zero duration.
 
   
  
  
    
    
    
    Use upper-case chars in words in ASF (copy from SNT).
 
   
  
  
    
    
    
    
       - % Done changed from 0 to 70
Scripts for this task are placed in SVN repository ARTIC_UTILS/trunk/hmm_synth/LabLight.
A simple description of the merging procedure:
	
	- Script diff_asf_snt.pyperforms a simple comparison between ASF and SNT files, prints out suspicious inconsistent utterances (pauses are ignored). This comparison reveals only some basic types of inconsistency.
- Manual correction of SNT file (when needed).
- Script merge_asf_snt.pymerges ASF and SNT files, a new ASF file is created.
	- Join verbs and enclictic "li" into one word (e.g. bude-li).
- Use words from SNT (with capital letters).
- Add punctuation and pronunciation columns.
 
Two ASF files were processed (voices MR and TJ). New ASF and SNT files are placed in ARTIC directory /artic/Experiments/asf.snt.merge.
 
   
  
  
    
    
    
    
       - % Done changed from 70 to 80
Two more voices were processed: KI and AJ.
NOTE: Several specific corrections had to be done manually. Thus, the new ASFs should replace the default ASFs in the SVN repository. Otherwise, sooner or later, we could have two parallel inconsistent ASF versions.
 
   
  
  
    
    
    
    
       - Related to Task #3761: Create script for conversion ASF to SNT added
 
   
  
  
    
    
    
    
       - Status changed from Assigned to Resolved
Summary
Voices wih merged ASFs:
	
	- Czech voices: AJ, JS, KI, MR, SK, TJ
- Slovak voice: MM
ASF file for the Czech female voice PP was not merged, since the annotation file is not available.
 
   
  
  
    
    
    
    
       - Status changed from Resolved to Closed
 
   
  
 
  
  
 
Also available in:  Atom
  PDF