Difference between revisions of "STDfusion"

From HLT@INESC-ID

Line 1: Line 1:
 
== Introduction ==
 
== Introduction ==
  
* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems.  
+
* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:
 +
 
 +
          A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
 +
          On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
 +
          In Interspeech 2013, August 25-29 2013
  
 
* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.  
 
* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.  
Line 7: Line 11:
 
* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.
 
* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.
  
* The package can be downloaded from here. Please cite the following work if you find it useful:
+
* The package can be downloaded from here.
 
+
    A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
+
    On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
+
    In Interspeech 2013, August 25-29 2013
+
  
 +
== Package contents ==
  
contains the following files and directories:
+
* The package contains the following files and directories:
  
  README.txt - This file
+
        README.txt - It contains pretty much the same information of this wiki site
  ./bin/ - This directory contains the different scripts necessary for calibration and fusion:
+
        ./bin/ - This directory contains the different scripts necessary for calibration and fusion:
  ./bin/PrepareForFusion.sh  - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for a vaiable number of input devices
+
        ./bin/PrepareForFusion.sh  - Main script than normalizes, aligns, hypotesizes missing scores and creates the
  ./bin/align_score_files.pl - This script is used by ./bin/PrepareForFusion.sh
+
                                                    groundtruh for the input systems
  ./bin/create_groundtruth_centerdistance.pl - This script is used by ./bin/PrepareForFusion.sh
+
        ./bin/align_score_files.pl - This script is used by ./bin/PrepareForFusion.sh
  ./bin/heuristicScoring.pl - This script is used by ./bin/PrepareForFusion.sh  
+
        ./bin/create_groundtruth_centerdistance.pl - This script is used by ./bin/PrepareForFusion.sh
  ./bin/fusion.sh  - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation set
+
        ./bin/heuristicScoring.pl - This script is used by ./bin/PrepareForFusion.sh  
  ./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the evaluation
+
        ./bin/fusion.sh  - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation set
  ./scoring_atwv_sws2013/ - This directory contains the Mediaeval 2013 official scoring package (only for atwv)
+
        ./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the evaluation
  ./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
+
        ./scoring_atwv_sws2013/ - This directory contains the Mediaeval 2013 official scoring package (only for atwv)
  ./sample_results/ - Contains the sample results that should be obtained if you run the scripts following the instructions bellow
+
        ./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
 +
        ./sample_results/ - Contains the sample results that should be obtained if you run the scripts following the instructions bellow
  
 
   STEP BY STEP SAMPLE INSTRUCTIONS
 
   STEP BY STEP SAMPLE INSTRUCTIONS

Revision as of 22:01, 16 October 2013

Introduction

  • In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:
          A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel. 
          On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems. 
          In Interspeech 2013, August 25-29 2013
  • The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.
  • Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.
  • The package can be downloaded from here.

Package contents

  • The package contains the following files and directories:
        README.txt - It contains pretty much the same information of this wiki site
        ./bin/ - This directory contains the different scripts necessary for calibration and fusion:
        ./bin/PrepareForFusion.sh  - Main script than normalizes, aligns, hypotesizes missing scores and creates the
                                                   groundtruh for the input systems
        ./bin/align_score_files.pl - This script is used by ./bin/PrepareForFusion.sh
        ./bin/create_groundtruth_centerdistance.pl - This script is used by ./bin/PrepareForFusion.sh
        ./bin/heuristicScoring.pl - This script is used by ./bin/PrepareForFusion.sh 
        ./bin/fusion.sh  - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation set
        ./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the evaluation
        ./scoring_atwv_sws2013/ - This directory contains the Mediaeval 2013 official scoring package (only for atwv)
       ./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
        ./sample_results/ - Contains the sample results that should be obtained if you run the scripts following the instructions bellow
 STEP BY STEP SAMPLE INSTRUCTIONS
 STEP1 - Prepare the dev scores for training fusion typing the following command:
 ./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml -r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm -o DEV_SCORES_4FUSION.txt -t queries_in_data.ref -g -z qnorm -m 1 -n 1 ./scores/dev/akws_br-devterms.stdlist.xml ./scores/dev/dtw_br-devterms.stdlist.xml
 Before running the script, change the MATLAB_BIN variable to be the path to your Matlab binary.
 This script call will generate 2 output files:
 The first one is the DEV_SCORES_4FUSION.txt that contains the scores ready to be used for the following stage. The format is something like:
 <query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>
 ...
 ...
 <query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>
 Notice that all systems have produced a score for all candidate detections. 
 If the -g option is selected (like in this case) the last column will contain the 0s and 1s for the false and true trials respectively derived from de rttm. If the -g option is not selected, the last column simply contains a column of 1s.
 The second (optional) output file is the queries_in_data.ref. This file simply contains the number of times each query appears in the collection data and it is used later in the fusion stage.
 You can have a look to the general usage of this script typing without arguments:
 
   Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3]  ... [stdlistfileN] 
    <stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml)        | - Required argument (at least 1)
    -q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml)                      | - Required parameter
    -r <rttm> rttm file in the SWS2012 format (*.rttm)                                   | - Required parameter
    -o <outputfile> output file name                                                     | - Required parameter
    -z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm)                    | - Default: none
    -g add ground-truth information to the outputfile
    -t <filename> saves the number of true terms in the reference per query (implies -g)
    -m <value> apply majority voting fusion with <value> minimum number of votes         | - Default: 1
    -n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based)  | - Default: 0
    -d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
    -h help 
    NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl 
          that should be located in the same folder of the main script
 STEP2 - Prepare the eval scores for fusion typing the following command:
   ./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml -r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm -o EVAL_SCORES_4FUSION.txt -z qnorm -m 1 -n 1 ./scores/eval/akws_br-evalterms.stdlist.xml  ./scores/eval/dtw_br-evalterms.stdlist.xml
   Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case. 
   
   It is also FUNDAMENTAL to provide the stdlist.xml score files in the same order used in the preparation of the development scores.
   
 STEP3 - Train fusion on development scores and apply it to eval scores typing the following command:
 ./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt  queries_in_data.ref
 Before running this script you will need to change also the value of the MATLAB_BIN variable and download the Bosaris toolkit and change the variable BOSARIS to the path that contains this toolkit.
 Notice also that the script contains 4 hard-coded variables that are specific to the Mediaeval SWS2013 evaluation. These are the cost parameters P_TARGET, C_MISS, C_FA and the total duration in seconds of the collection data TSPEECH
 This fusion script uses the DEV_SCORES_4FUSION.txt (with groundtruh information) to learn the calibration and fusion parameters that are applied both to the development set and to the evaluation set. 
 As mentioned previously the queries_in_data.ref contains the statistics abbout the true number of instances of each query on the data and it is used for hypothesizing missing scores and for computing AWTV.
 This call (that can take a while, several minutes) will produce 1 output file for the dev scores (dev_fusion.scores) and 1 output file for the eval scores (eval_fusion.scores). The format of these output files is as follows:
 <query_id> <file_id> <start_time> <duration> <fusion_score> <decision>
 ...
 ...
 <query_id> <file_id> <start_time> <duration> <fusion_score> <decision>


 The <decision> field is 0 or 1 depending respectively if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).
 Additionally, the fusion parameters are stored in the fuse_params.txt.
 The final step consists of converting this result files to the format used in the SWS2013 challenge:
 ./bin/raw2stdslist.sh dev_fusion.scores ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml
 ./bin/raw2stdslist.sh eval_fusion.scores ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml
 
 Notice that this script can take a third parameter. This paramater is a threshold value that you can use to apply a different decision threshold to the optimal Bayes one.
 Reference TWV results in Mediaeval SWS2013 task:

dev eval mtwv atwv mtwv atwv akws_br 0.1571 0.1408 dtw_br fusion