HLT@INESC-ID - User contributions [en]

Alberto Abad Gareta

2024-02-16T09:43:16Z

Alberto: /* Research Interests */

<Inesc-id what='card' id='5b544c6b-f1bb-4007-b82a-414f2d1a24e2'></Inesc-id>

Alberto Abad obtained the Telecommunication Engineering degree and the PhD degree on 2002 and 2007 respectively, both from the [http://www.upc.edu Technical University of Catalonia] (UPC). Currently, he is an Associate Professor at the [https://dei.tecnico.ulisboa.pt Department of Computer Science and Engineering (DEI)] of [http://www.ist.utl.pt Instituto Superior Técnico (IST)] and researcher at INESC-ID. He is the coordinator of the [http://www.hlt.inesc-id.pt Human LanguageTechnologies laboratory (HLT)] at INESC-ID and the deputy coordinator of the [https://fenix.tecnico.ulisboa.pt/cursos/meic-a Master in Computer Science and Engineering of IST]. He is also an IEEE Senior member, an ACM member, and an ISCA member.

Alberto Abad has developed his research career in the area of human language technologies for more than 15 years. His research interests include robust speech recognition, speaker and language characterization, applied machine learning, health-care applications, and privacy-preserving speech processing. During these years, Alberto Abad has co-authored over 100 peer-reviewed publications in top-tier conferences and scientific journals of the field. He has been Principal Investigator of FCT funded project VITHEA and INESC-ID representative in the DIRHA European project, Biovisualspeech CMU-Portugal project, and TAPAS ITN. He is currently INESC-ID/IST PI at [https://www.inesc-id.pt/accelerat-ai/ Accelerat.ai project]. He has also been team member in various Spanish and Portuguese national and international projects, and team leader in international technological evaluation challenges achieving excellent results. He has supervised and co-supervised more than 25 M.Sc. thesis and 1 Ph.D. thesis (+5 on-going) and lectured in courses at both UPC and IST on a variety of topics, including signal processing, machine learning, programming principles, object oriented programming and compilers. Alberto Abad has been actively involved in research community activities, including conference organization, and closely collaborated with local industry and start-up companies.


== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language characterization
* Speech and language technology for health-care applications
* Privacy-preserving speech processing

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV2021.pdf]

<Inesc-id what='publications' id='5b544c6b-f1bb-4007-b82a-414f2d1a24e2'></Inesc-id>
<Inesc-id what='advising' id='5b544c6b-f1bb-4007-b82a-414f2d1a24e2'></Inesc-id>
<Inesc-id what='projects' id='5b544c6b-f1bb-4007-b82a-414f2d1a24e2'></Inesc-id>

[[category: People]]
[[category:Researchers]]

Alberto Abad Gareta

2024-02-16T09:42:12Z

Alberto:

<Inesc-id what='card' id='5b544c6b-f1bb-4007-b82a-414f2d1a24e2'></Inesc-id>

Alberto Abad obtained the Telecommunication Engineering degree and the PhD degree on 2002 and 2007 respectively, both from the [http://www.upc.edu Technical University of Catalonia] (UPC). Currently, he is an Associate Professor at the [https://dei.tecnico.ulisboa.pt Department of Computer Science and Engineering (DEI)] of [http://www.ist.utl.pt Instituto Superior Técnico (IST)] and researcher at INESC-ID. He is the coordinator of the [http://www.hlt.inesc-id.pt Human LanguageTechnologies laboratory (HLT)] at INESC-ID and the deputy coordinator of the [https://fenix.tecnico.ulisboa.pt/cursos/meic-a Master in Computer Science and Engineering of IST]. He is also an IEEE Senior member, an ACM member, and an ISCA member.

Alberto Abad has developed his research career in the area of human language technologies for more than 15 years. His research interests include robust speech recognition, speaker and language characterization, applied machine learning, health-care applications, and privacy-preserving speech processing. During these years, Alberto Abad has co-authored over 100 peer-reviewed publications in top-tier conferences and scientific journals of the field. He has been Principal Investigator of FCT funded project VITHEA and INESC-ID representative in the DIRHA European project, Biovisualspeech CMU-Portugal project, and TAPAS ITN. He is currently INESC-ID/IST PI at [https://www.inesc-id.pt/accelerat-ai/ Accelerat.ai project]. He has also been team member in various Spanish and Portuguese national and international projects, and team leader in international technological evaluation challenges achieving excellent results. He has supervised and co-supervised more than 25 M.Sc. thesis and 1 Ph.D. thesis (+5 on-going) and lectured in courses at both UPC and IST on a variety of topics, including signal processing, machine learning, programming principles, object oriented programming and compilers. Alberto Abad has been actively involved in research community activities, including conference organization, and closely collaborated with local industry and start-up companies.


== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV2021.pdf]

<Inesc-id what='publications' id='5b544c6b-f1bb-4007-b82a-414f2d1a24e2'></Inesc-id>
<Inesc-id what='advising' id='5b544c6b-f1bb-4007-b82a-414f2d1a24e2'></Inesc-id>
<Inesc-id what='projects' id='5b544c6b-f1bb-4007-b82a-414f2d1a24e2'></Inesc-id>

[[category: People]]
[[category:Researchers]]

Alberto Abad Gareta

2022-09-09T08:23:02Z

Alberto:

{{infobox|name=Alberto Abad
|username=alberto
|contact=alberto.abad
|phone=+351-213-100-217
|fax=+351-213-145-843
}}

Alberto Abad obtained the Telecommunication Engineering degree and the PhD degree on 2002 and 2007 respectively, both from the [http://www.upc.edu Technical University of Catalonia] (UPC). Currently, he is an Associate Professor at the [https://dei.tecnico.ulisboa.pt Department of Computer Science and Engineering (DEI)] of [http://www.ist.utl.pt Instituto Superior Técnico (IST)] and researcher at INESC-ID. He is the coordinator of the [http://www.hlt.inesc-id.pt Human LanguageTechnologies laboratory (HLT)] at INESC-ID and the executive coordinator of the scientific area of [https://dei.tecnico.ulisboa.pt/sobre-o-departamento/areas-cientificas/ac-metodologia-e-tecnologias-de-programacao Programming Methodology and Technology (MTP)] at DEI. He is also an IEEE Senior member, an ACM member, an ISCA member, and an Associated Editor for the IEEE/ACM Transactions on Audio, Speech and Language Processing journal.

Alberto Abad has developed his research career in the area of human language technologies for more than 15 years. His research interests include robust speech recognition, speaker and language characterization, applied machine learning, health-care applications, and privacy-preserving speech processing. During these years, Alberto Abad has co-authored over 100 peer-reviewed publications in top-tier conferences and scientific journals of the field. He has been Principal Investigator of FCT funded project VITHEA and INESC-ID representative in the DIRHA European project, Biovisualspeech CMU-Portugal project, and TAPAS ITN. He has also been team member in various Spanish and Portuguese national and international projects, and team leader in international technological evaluation challenges achieving excellent results. He has supervised and co-supervised more than 25 M.Sc. thesis and 1 Ph.D. thesis (+5 on-going) and lectured in courses at both UPC and IST on a variety of topics, including signal processing, machine learning, programming principles, object oriented programming and compilers. Alberto Abad has been actively involved in research community activities, including conference organization, and closely collaborated with local industry and start-up companies.



<inesc-id what='person' id='905'></inesc-id>

== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV2021.pdf]

[[category:People]]
[[category:Researchers]]

Alberto Abad Gareta

2022-09-09T08:12:29Z

Alberto:

{{infobox|name=Alberto Abad
|username=alberto
|contact=alberto.abad
|phone=+351-213-100-217
|fax=+351-213-145-843
}}

Alberto Abad is an Associate Professor at the [https://dei.tecnico.ulisboa.pt Department of Computer Science and Engineering (DEI)] of [http://www.ist.utl.pt Instituto Superior Técnico (IST)] and researcher at INESC-ID. He is the coordinator of the [http://www.hlt.inesc-id.pt Human LanguageTechnologies laboratory (HLT)] at INESC-ID and the executive coordinator of the scientific area of [https://dei.tecnico.ulisboa.pt/sobre-o-departamento/areas-cientificas/ac-metodologia-e-tecnologias-de-programacao Programming Methodology and Technology (MTP)] at DEI. He is also an IEEE Senior member, an ACM member, an ISCA member, and an Associated Editor for the IEEE/ACM Transactions on Audio, Speech and Language Processing journal.

Alberto Abad has developed his research career in the area of human language technologies for more than 15 years. His research interests include robust speech recognition, speaker and language characterization, applied machine learning, health-care applications, and privacy-preserving speech processing. During these years, Alberto Abad has co-authored over 100 peer-reviewed publications in top-tier conferences and scientific journals of the field. He has been Principal Investigator of FCT funded project VITHEA and INESC-ID representative in the DIRHA European project, Biovisualspeech CMU-Portugal project, and TAPAS ITN. He has also been team member in various Spanish and Portuguese national and international projects, and team leader in international technological evaluation challenges achieving excellent results. He has supervised and co-supervised more than 25 M.Sc. thesis and 1 Ph.D. thesis (+5 on-going) and lectured in courses at both [http://www.upc.edu Technical University of Catalonia] (UPC) and IST on a variety of topics, including signal processing, machine learning, programming principles, object oriented programming and compilers. Alberto Abad has been actively involved in research community activities, including conference organization, and closely collaborated with local industry and start-up companies.



<inesc-id what='person' id='905'></inesc-id>

== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV_new_latex.pdf]

[[category:People]]
[[category:Researchers]]

Alberto Abad Gareta

2020-03-10T23:31:37Z

Alberto:

{{infobox|name=Alberto Abad
|username=alberto
|contact=alberto.abad
|phone=+351-213-100-217
|fax=+351-213-145-843
}}

Alberto Abad obtained the Telecommunication Engineering degree on 2002 from [http://www.upc.edu Technical University of Catalonia] (UPC). Between 2002 and 2006 he was a PhD student at the [http://www.talp.upc.edu TALP Research Centre] of the [http://www.tsc.upc.edu/ Department of Signal Theory and Communications] of the Technical University of Catalonia, where he was supported by a Catalan Government grant. His Thesis was related with multi-microphone approaches to speech processing in smart-room environments. At UPC he participated in several European and National Spanish projects. He was also lecturer in the [http://www.etsetb.upc.edu/ Telecommunication Engineering School of Barcelona] and collaborating lecturer in the doctorate program of the Department of Signal Theory and Communications.

In February 2007, Alberto Abad joined the [http://www.l2f.inesc-id.pt Spoken Language Systems Lab (L2F)] of [http://www.inesc-id.pt/ INESC-ID] and in July 2008 he obtained a ''Programa Ciência'' contract for 5 years. During that period at INESC-ID, Alberto Abad contributed to consolidate group's research lines and to open new strategic ones. He has been Principal Investigator of the FCT funded project [http://www.vithea.org VITHEA] and INESC-ID representative in the [http://dirha.fbk.eu DIRHA] European project, besides team member in several national and international projects, and team leader in many international technological evaluation challenges achieving excellent results. He also collaborated in several courses of the [http://www.ist.utl.pt Instituto Superior Técnico (IST)] of the University of Lisbon. In August 2013, Alberto Abad was hired as an Invited Researcher and Professor at the IST, where was lecturer of the [http://fenix.tecnico.ulisboa.pt/cursos/meec Mestrado Integrado em Engenharia Electrotécnica e de Computadores] (MEEC) during the academic years 2013- 2014 and 2014-2015. In February 2015, Alberto Abad won a position as Assistant Professor at the [https://fenix.tecnico.ulisboa.pt/departamentos/dei Department of Computer Science and Engineering (DEI)] of the IST.

His current research interests include robust speech recognition, speaker and language identification, computational acoustic scene analysis, multimedia, and health-care applications.

<inesc-id what='person' id='905'></inesc-id>

== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV_new_latex.pdf]

[[category:People]]
[[category:Researchers]]

File:Tipo-passe-alberto.png

2017-02-01T11:24:57Z

Alberto: Alberto uploaded a new version of File:Tipo-passe-alberto.png

File:Tipo-passe-alberto.png

2017-02-01T11:18:10Z

Alberto: Alberto uploaded a new version of File:Tipo-passe-alberto.png

Alberto Abad Gareta

2015-11-03T11:57:06Z

Alberto:

{{infobox|name=Alberto Abad Gareta
|username=alberto
|contact=alberto.abad
|phone=+351-213-100-217
|fax=+351-213-145-843
}}

Alberto Abad obtained the Telecommunication Engineering degree on 2002 from [http://www.upc.edu Technical University of Catalonia] (UPC). Between 2002 and 2006 he was a PhD student at the [http://www.talp.upc.edu TALP Research Centre] of the [http://www.tsc.upc.edu/ Department of Signal Theory and Communications] of the Technical University of Catalonia, where he was supported by a Catalan Government grant. His Thesis was related with multi-microphone approaches to speech processing in smart-room environments. At UPC he participated in several European and National Spanish projects. He was also lecturer in the [http://www.etsetb.upc.edu/ Telecommunication Engineering School of Barcelona] and collaborating lecturer in the doctorate program of the Department of Signal Theory and Communications.

In February 2007, Alberto Abad joined the [http://www.l2f.inesc-id.pt Spoken Language Systems Lab (L2F)] of [http://www.inesc-id.pt/ INESC-ID] and in July 2008 he obtained a ''Programa Ciência'' contract for 5 years. During that period at INESC-ID, Alberto Abad contributed to consolidate group's research lines and to open new strategic ones. He has been Principal Investigator of the FCT funded project [http://www.vithea.org VITHEA] and INESC-ID representative in the [http://dirha.fbk.eu DIRHA] European project, besides team member in several national and international projects, and team leader in many international technological evaluation challenges achieving excellent results. He also collaborated in several courses of the [http://www.ist.utl.pt Instituto Superior Técnico (IST)] of the University of Lisbon. In August 2013, Alberto Abad was hired as an Invited Researcher and Professor at the IST, where was lecturer of the [http://fenix.tecnico.ulisboa.pt/cursos/meec Mestrado Integrado em Engenharia Electrotécnica e de Computadores] (MEEC) during the academic years 2013- 2014 and 2014-2015. In February 2015, Alberto Abad won a position as Assistant Professor at the [https://fenix.tecnico.ulisboa.pt/departamentos/dei Department of Computer Science and Engineering (DEI)] of the IST.

His current research interests include robust speech recognition, speaker and language identification, computational acoustic scene analysis, multimedia, and health-care applications.

<inesc-id what='person' id='905'></inesc-id>

== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV_new_latex.pdf]

[[category:People]]
[[category:Researchers]]

Alberto Abad Gareta

2014-09-30T10:36:34Z

Alberto:

{{infobox|name=Alberto Abad Gareta
|username=alberto
|contact=alberto.abad
|phone=+351-213-100-217
|fax=+351-213-145-843
}}

Alberto Abad obtained the Telecommunication Engineering degree on 2002 from [http://www.upc.edu Technical University of Catalonia] (UPC). Between 2002 and 2006 he was a PhD student at the [http://www.talp.upc.edu TALP Research Centre] of the [http://www.tsc.upc.edu/ Department of Signal Theory and Communications] of the Technical University of Catalonia, where he was supported by a Catalan Government grant. His Thesis was related with multi-microphone approaches to speech processing in smart-room environments. At UPC he participated in several European and National Spanish projects. He was also lecturer in the [http://www.etsetb.upc.edu/ Telecommunication Engineering School of Barcelona] and collaborating lecturer in the doctorate program of the Department of Signal Theory and Communications.

In February 2007, Alberto Abad joined the [http://www.l2f.inesc-id.pt Spoken Language Systems Lab (L2F)] of [http://www.inesc-id.pt/ INESC-ID] and in July 2008 he obtained a ''Programa Ciência'' contract for 5 years. During that period at INESC-ID, Alberto Abad contributed to consolidate group's research lines and to open new strategic ones. He has been Principal Investigator of the FCT funded project [http://www.vithea.org VITHEA] and INESC-ID representative in the [http://dirha.fbk.eu DIRHA] European project, besides team member in several national and international projects, and team leader in many international technological evaluation challenges achieving excellent results. He also collaborated in several courses of the [http://www.ist.utl.pt Instituto Superior Técnico (IST)] of the University of Lisbon. In August 2013, Alberto Abad was hired as an Invited Researcher and Professor at the IST, where he has been lecturer of the course ''Arquitectura de Computadores'' of the [http://fenix.tecnico.ulisboa.pt/cursos/meec Mestrado Integrado em Engenharia Electrotécnica e de Computadores] (MEEC) since the academic year 2013-2014.

His current research interests include robust speech recognition, speaker and language identification, computational acoustic scene analysis, multimedia, and health-care applications.

<inesc-id what='person' id='905'></inesc-id>

== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV_new_latex.pdf]

[[category:People]]
[[category:Researchers]]

Alberto Abad Gareta

2014-09-30T10:36:16Z

Alberto:

{{infobox|name=Alberto Abad Gareta
|username=alberto
|contact=alberto.abad
|phone=+351-213-100-217
|fax=+351-213-145-843
}}

Alberto Abad obtained the Telecommunication Engineering degree on 2002 from [http://www.upc.edu Technical University of Catalonia] (UPC). Between 2002 and 2006 he was a PhD student at the [http://www.talp.upc.edu TALP Research Centre] of the [http://www.tsc.upc.edu/ Department of Signal Theory and Communications] of the Technical University of Catalonia, where he was supported by a Catalan Government grant. His Thesis was related with multi-microphone approaches to speech processing in smart-room environments. At UPC he participated in several European and National Spanish projects. He was also lecturer in the [http://www.etsetb.upc.edu/ Telecommunication Engineering School of Barcelona] and collaborating lecturer in the doctorate program of the Department of Signal Theory and Communications.

In February 2007, Alberto Abad joined the [http://www.l2f.inesc-id.pt Spoken Language Systems Lab (L2F)] of [http://www.inesc-id.pt/ INESC-ID] and in July 2008 he obtained a ''Programa Ciência'' contract for 5 years. During that period at INESC-ID, Alberto Abad contributed to consolidate group's research lines and to open new strategic ones. He has been Principal Investigator of the FCT funded project [http://www.vithea.org VITHEA] and INESC-ID representative in the [http://dirha.fbk.eu DIRHA] European project, besides team member in several national and international projects, and team leader in many international technological evaluation challenges achieving excellent results. He also collaborated in several courses of the [http://www.ist.utl.pt Instituto Superior Técnico (IST)] of the University of Lisbon.

In August 2013, Alberto Abad was hired as an Invited Researcher and Professor at the IST, where he has been lecturer of the course ''Arquitectura de Computadores'' of the [http://fenix.tecnico.ulisboa.pt/cursos/meec Mestrado Integrado em Engenharia Electrotécnica e de Computadores] (MEEC) since the academic year 2013-2014.

His current research interests include robust speech recognition, speaker and language identification, computational acoustic scene analysis, multimedia, and health-care applications.

<inesc-id what='person' id='905'></inesc-id>

== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV_new_latex.pdf]

[[category:People]]
[[category:Researchers]]

Alberto Abad Gareta

2014-09-30T10:34:01Z

Alberto:

{{infobox|name=Alberto Abad Gareta
|username=alberto
|contact=alberto.abad
|phone=+351-213-100-217
|fax=+351-213-145-843
}}

Alberto Abad obtained the Telecommunication Engineering degree on 2002 from [http://www.upc.edu Technical University of Catalonia] (UPC). Between 2002 and 2006 he was a PhD student at the [http://www.talp.upc.edu TALP Research Centre] of the [http://www.tsc.upc.edu/ Department of Signal Theory and Communications] of the Technical University of Catalonia, where he was supported by a Catalan Government grant. His Thesis was related with multi-microphone approaches to speech processing in smart-room environments. At UPC he participated in several European and National Spanish projects. He was also lecturer in the [http://www.etsetb.upc.edu/ Telecommunication Engineering School of Barcelona] and collaborating lecturer in the doctorate program of the Department of Signal Theory and Communications.

In February 2007, Alberto Abad joined the [http://www.l2f.inesc-id.pt Spoken Language Systems Lab (L2F)] of [http://www.inesc-id.pt/ INESC-ID] and in July 2008 he obtained a ''Programa Ciência'' contract for 5 years. During that period at INESC-ID, Alberto Abad contributed to consolidate group's research lines and to open new strategic ones. He has been Principal Investigator of the FCT funded project [http://www.vithea.org VITHEA] and INESC-ID representative in the [http://dirha.fbk.eu DIRHA] European project, besides team member in several national and international projects, and team leader in several international technological evaluation challenges achieving excellent results. He also collaborated in courses of the Electrical and Computer Engineering degree of [http://www.ist.utl.pt Instituto Superior Técnico (IST)] of the University of Lisbon.

In August 2013, Alberto Abad was hired as an Invited Researcher and Professor at the IST, where he has been lecturer of the course ''Arquitectura de Computadores'' of the [http://fenix.tecnico.ulisboa.pt/cursos/meec Mestrado Integrado em Engenharia Electrotécnica e de Computadores] (MEEC) since the academic year 2013-2014.

His current research interests include robust speech recognition, speaker and language identification, computational acoustic scene analysis, multimedia, and health-care applications.

<inesc-id what='person' id='905'></inesc-id>

== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV_new_latex.pdf]

[[category:People]]
[[category:Researchers]]

Alberto Abad Gareta

2014-09-30T10:32:18Z

Alberto:

{{infobox|name=Alberto Abad Gareta
|username=alberto
|contact=alberto.abad
|phone=+351-213-100-217
|fax=+351-213-145-843
}}

Alberto Abad obtained the Telecommunication Engineering degree on 2002 from [http://www.upc.edu Technical University of Catalonia] (UPC). Between 2002 and 2006 he was a PhD student at the [http://www.talp.upc.edu TALP Research Centre] of the [http://www.tsc.upc.edu/ Department of Signal Theory and Communications] of the Technical University of Catalonia, where he was supported by a Catalan Government grant. His Thesis was related with multi-microphone approaches to speech processing in smart-room environments. At UPC he participated in several European and National Spanish projects. He was also lecturer in the [http://www.etsetb.upc.edu/ Telecommunication Engineering School of Barcelona] and collaborating lecturer in the doctorate program of the Department of Signal Theory and Communications.

In February 2007, Alberto Abad joined the [https://www.l2f.inesc-id.pt Spoken Language Systems Lab (L2F)] of [https://www.inesc-id.pt/ INESC-ID] and in July 2008 he obtained a "Programa Ciência" contract for 5 years. During that period at INESC-ID, Alberto Abad developed both basic research and managerial activities that contributed to consolidate group's research lines and to open new strategic ones. Alberto Abad has been Principal Investigator of the FCT funded project [http://www.vithea.org VITHEA] and INESC-ID representative in the [http://dirha.fbk.eu DIRHA] European project, besides team member in several national and international projects, and team leader in several international technological evaluation challenges achieving excellent results. He also collaborated in courses of the Electrical and Computer Engineering degree of [http://www.ist.utl.pt Instituto Superior Técnico (IST)] of the University of Lisbon.

In August 2013, Alberto Abad was hired as an Invited Researcher and Professor at the IST, where he has been lecturer of the course "Arquitectura de Computadores" of the [https://fenix.tecnico.ulisboa.pt/cursos/meec Mestrado Integrado em Engenharia Electrotécnica e de Computadores] (MEEC) since the academic year 2013-2014.

His current research interests include robust speech recognition, speaker and language identification, computational acoustic scene analysis, multimedia, and health-care applications.

<inesc-id what='person' id='905'></inesc-id>

== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV_new_latex.pdf]

[[category:People]]
[[category:Researchers]]

Alberto Abad Gareta

2014-09-30T10:31:01Z

Alberto:

{{infobox|name=Alberto Abad Gareta
|username=alberto
|contact=alberto.abad
|phone=+351-213-100-217
|fax=+351-213-145-843
}}

Alberto Abad obtained the Telecommunication Engineering degree on 2002 from [http://www.upc.edu Technical University of Catalonia] (UPC). Between 2002 and 2006 he was a PhD student at the [http://www.talp.upc.edu TALP Research Centre] of the [http://www.tsc.upc.edu/ Department of Signal Theory and Communications] of the Technical University of Catalonia, where he was supported by a Catalan Government grant. His Thesis was related with multi-microphone approaches to speech processing in smart-room environments. At UPC he participated in several European and National Spanish projects. He was also lecturer in the Telecommunication Engineering School of Barcelona and collaborating lecturer in the doctorate program of the Department of Signal Theory and Communications.

In February 2007, Alberto Abad joined the [https://www.l2f.inesc-id.pt Spoken Language Systems Lab (L2F)] of [https://www.inesc-id.pt/ INESC-ID] and in July 2008 he obtained a "Programa Ciência" contract for 5 years. During that period at INESC-ID, Alberto Abad developed both basic research and managerial activities that contributed to consolidate group's research lines and to open new strategic ones. Alberto Abad has been Principal Investigator of the FCT funded project [http://www.vithea.org VITHEA] and INESC-ID representative in the [http://dirha.fbk.eu DIRHA] European project, besides team member in several national and international projects, and team leader in several international technological evaluation challenges achieving excellent results. He also collaborated in courses of the Electrical and Computer Engineering degree of [http://www.ist.utl.pt Instituto Superior Técnico (IST)] of the University of Lisbon.

In August 2013, Alberto Abad was hired as an Invited Researcher and Professor at the IST, where he has been lecturer of the course "Arquitectura de Computadores" of the [https://fenix.tecnico.ulisboa.pt/cursos/meec Mestrado Integrado em Engenharia Electrotécnica e de Computadores] (MEEC) since the academic year 2013-2014.

His current research interests include robust speech recognition, speaker and language identification, computational acoustic scene analysis, multimedia, and health-care applications.

<inesc-id what='person' id='905'></inesc-id>

== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV_new_latex.pdf]

[[category:People]]
[[category:Researchers]]

Alberto Abad Gareta

2014-09-30T10:26:06Z

Alberto:

{{infobox|name=Alberto Abad Gareta
|username=alberto
|contact=alberto.abad
|phone=+351-213-100-217
|fax=+351-213-145-843
}}

Alberto Abad obtained the Telecommunication Engineering degree on 2002 from [http://www.upc.edu Technical University of Catalonia] (UPC). Between 2002 and 2006 he was a PhD student at the [http://www.talp.upc.edu TALP Research Centre] of the [http://www.tsc.upc.edu/ Department of Signal Theory and Communications] of the Technical University of Catalonia, where he was supported by a Catalan Government grant. His Thesis was related with multi-microphone approaches to speech processing in smart-room environments. At UPC he participated in several European and National Spanish projects. He was also lecturer in the Telecommunication Engineering School of Barcelona and collaborating lecturer in the doctorate program of the Department of Signal Theory and Communications.

In February 2007, Alberto Abad joined the Spoken Language Systems Lab (L2F) of INESC-ID and in July 2008 he obtained a "Programa Ciência" contract for 5 years. During that period at INESC-ID, Alberto Abad developed both basic research and managerial activities that contributed to consolidate group's research lines and to open new strategic ones. Alberto Abad has been Principal Investigator of the FCT funded project -VITHEA- and INESC-ID representative in the DIRHA European project, besides team member in several national and international projects, and team leader in several international technological evaluation challenges achieving excellent results. He also collaborated in courses of the Electrical and Computer Engineering degree of Instituto Superior Técnico (IST) of the University of Lisbon.

In August 2013, Alberto Abad was hired as an Invited Researcher and Professor at the IST, where he has been lecturer of the course "Arquitectura de Computadores" of the "Mestrado Integrado em Engenharia Electrotécnica e de Computadores" (MEEC) since the academic year 2013-2014.

His current research interests include robust speech recognition, speaker and language identification, computational acoustic scene analysis, multimedia, and health-care applications.

<inesc-id what='person' id='905'></inesc-id>

== Research Interests ==

* Multi-microphone processing
* Robust Speech Recognition
* Speaker and language recognition
* Speech and language technology for health-care applications

== Resume ==
PDF Version [http://www.l2f.inesc-id.pt/~alberto/CV_new_latex.pdf]

[[category:People]]
[[category:Researchers]]

Alberto Abad Gareta

2013-12-19T10:33:36Z

Alberto:

STDfusion

2013-10-17T00:18:58Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the [https://www.l2f.inesc-id.pt L2F] and the [http://gtts.ehu.es/gtts/ GTTS] groups in the context of their research activities in the topic of query-by-example STD. The method has been succesfully tested in the SWS2012 and SWS2013 tasks of [http://www.multimediaeval.org/mediaeval2013/sws2013/ Mediaeval Evaluation].

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from [http://www.l2f.inesc-id.pt/~alberto/STDfusion/STDfusion.v1.tgz here].

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o dev_scores_4fusion.txt \
-t num_queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' dev_scores_4fusion.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).
::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''num_queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluation scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o eval_scores_4fusion.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in the previous step.
:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh dev_scores_4fusion.txt eval_scores_4fusion.txt queries_in_data.ref

:* This call permits learning from the dev_scores_4fusion.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.
:* '''Warning''' The [https://sites.google.com/site/bosaristoolkit/ Bosaris toolkit] needs to be installed. Change the variable $BOSARIS to the path where this the Bosaris toolkit is installed.
:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters $P_TARGET, $C_MISS, $C_FA
::* $TSPEECH - The total duration (in seconds) of the data collection
:* '''Warning''' The call to the fusion script can take several minutes.
* To convert the score files to the SWS2013 stdslist.xml format type the following commands:
./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml
: This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example
* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the individual systems and for the fusion system obtained following this example are:

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 || 0.2355 / 0.2320
|}

STDfusion

2013-10-17T00:04:34Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the [https://www.l2f.inesc-id.pt L2F] and the [http://gtts.ehu.es/gtts/ GTTS] groups in the context of their research activities in the topic of query-by-example STD. The method has been succesfully tested in the SWS2012 and SWS2013 tasks of [http://www.multimediaeval.org/mediaeval2013/sws2013/ Mediaeval Evaluation].

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from [[Media:STDfusion.tgz | here]].

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o dev_scores_4fusion.txt \
-t num_queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' dev_scores_4fusion.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).
::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''num_queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluation scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o eval_scores_4fusion.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in the previous step.
:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh dev_scores_4fusion.txt eval_scores_4fusion.txt queries_in_data.ref

:* This call permits learning from the dev_scores_4fusion.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.
:* '''Warning''' The [https://sites.google.com/site/bosaristoolkit/ Bosaris toolkit] needs to be installed. Change the variable $BOSARIS to the path where this the Bosaris toolkit is installed.
:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters $P_TARGET, $C_MISS, $C_FA
::* $TSPEECH - The total duration (in seconds) of the data collection
:* '''Warning''' The call to the fusion script can take several minutes.
* To convert the score files to the SWS2013 stdslist.xml format type the following commands:
./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml
: This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example
* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the individual systems and for the fusion system obtained following this example are:

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 || 0.2355 / 0.2320
|}

STDfusion

2013-10-17T00:03:30Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the [https://www.l2f.inesc-id.pt L2F] and the [http://gtts.ehu.es/gtts/ GTTS] groups in the context of their research activities in the topic of query-by-example STD. The method has been succesfully tested in the SWS2012 and SWS2013 tasks of [http://www.multimediaeval.org/mediaeval2013/sws2013/ Mediaeval Evaluation].

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from [[Media:STDfusion.tgz | here]].

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o dev_scores_4fusion.txt \
-t num_queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' dev_scores_4fusion.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).
::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''num_queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluation scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o eval_scores_4fusion.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in the previous step.
:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh dev_scores_4fusion.txt eval_scores_4fusion.txt queries_in_data.ref

:* This call permits learning from the dev_scores_4fusion.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.
:* '''Warning''' The [https://sites.google.com/site/bosaristoolkit/ Bosaris toolkit] needs to be installed. Change the variable $BOSARIS to the path where this the Bosaris toolkit is installed.
:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters $P_TARGET, $C_MISS, $C_FA
::* $TSPEECH - The total duration (in seconds) of the data collection
:* '''Warning''' The call to the fusion script can take several minutes.
* To convert the score files to the SWS2013 stdslist.xml format type the following commands:
./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml
:: This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example
* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the individual systems and for the fusion system obtained following this example are:

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-17T00:00:14Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the [https://www.l2f.inesc-id.pt L2F] and the [http://gtts.ehu.es/gtts/ GTTS] groups in the context of their research activities in the topic of query-by-example STD. The method has been succesfully tested in the SWS2012 and SWS2013 tasks of [http://www.multimediaeval.org/mediaeval2013/sws2013/ Mediaeval Evaluation].

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from [[Media:STDfusion.tgz | here]].

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o dev_scores_4fusion.txt \
-t num_queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' dev_scores_4fusion.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).
::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''num_queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluation scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o eval_scores_4fusion.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in the previous step.
:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh dev_scores_4fusion.txt eval_scores_4fusion.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The [https://sites.google.com/site/bosaristoolkit/ Bosaris toolkit] needs to be installed. Change the variable $BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters $P_TARGET, $C_MISS, $C_FA

::* $TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the dev_scores_4fusion.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the individual systems and for the fusion system obtained following this example are:

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T23:59:15Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the [https://www.l2f.inesc-id.pt L2F] and the [http://gtts.ehu.es/gtts/ GTTS] groups in the context of their research activities in the topic of query-by-example STD. The method has been succesfully tested in the SWS2012 and SWS2013 tasks of [http://www.multimediaeval.org/mediaeval2013/sws2013/ Mediaeval Evaluation].

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from [[Media:STDfusion.tgz | here]].

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o dev_scores_4fusion.txt \
-t num_queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' dev_scores_4fusion.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''num_queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluation scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o eval_scores_4fusion.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in the previous step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh dev_scores_4fusion.txt eval_scores_4fusion.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The [https://sites.google.com/site/bosaristoolkit/ Bosaris toolkit] needs to be installed. Change the variable $BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters $P_TARGET, $C_MISS, $C_FA

::* $TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the dev_scores_4fusion.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the individual systems and for the fusion system obtained following this example are:

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T23:53:57Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the [https://www.l2f.inesc-id.pt L2F] and the [http://gtts.ehu.es/gtts/ GTTS] groups in the context of their research activities in the topic of query-by-example STD. The method has been succesfully tested in the SWS2012 and SWS2013 tasks of [http://www.multimediaeval.org/mediaeval2013/sws2013/ Mediaeval Evaluation].

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from [[Media:STDfusion.tgz | here]].

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o dev_scores_4fusion.txt \
-t num_queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' dev_scores_4fusion.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''num_queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o eval_scores_4fusion.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh dev_scores_4fusion.txt eval_scores_4fusion.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The [https://sites.google.com/site/bosaristoolkit/ Bosaris toolkit] needs to be installed. Change the variable $BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters $P_TARGET, $C_MISS, $C_FA

::* $TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the dev_scores_4fusion.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the individual systems and for the fusion system obtained following this example are:

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T23:52:25Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the [https://www.l2f.inesc-id.pt L2F] and the [http://gtts.ehu.es/gtts/ GTTS] groups in the context of their research activities in the topic of query-by-example STD. The method has been succesfully tested in the SWS2012 and SWS2013 tasks of [http://www.multimediaeval.org/mediaeval2013/sws2013/ Mediaeval Evaluation].

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from [[Media:STDfusion.tgz | here]].

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o dev_scores_4fusion.txt \
-t num_queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' dev_scores_4fusion.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''num_queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o eval_scores_4fusion.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh dev_scores_4fusion.txt eval_scores_4fusion.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The [https://sites.google.com/site/bosaristoolkit/ Bosaris toolkit] needs to be installed. Change the variable $BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters $P_TARGET, $C_MISS, $C_FA

::* $TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the dev_scores_4fusion.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the indovidual systems and for the fusion system obtained following this example are :

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T23:51:22Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the [https://www.l2f.inesc-id.pt L2F] and the [http://gtts.ehu.es/gtts/ GTTS] groups in the context of their research activities in the topic of query-by-example STD. The method has been succesfully tested in the SWS2012 and SWS2013 tasks of [http://www.multimediaeval.org/mediaeval2013/sws2013/ Mediaeval Evaluation].

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from [[Media:STDfusion.tgz | here]].

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o dev_scores_4fusion.txt \
-t num_queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' dev_scores_4fusion.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''num_queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o eval_scores_4fusion.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh dev_scores_4fusion.txt eval_scores_4fusion.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the $MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The Bosaris toolkit needs to be installed. Change the variable $BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters $P_TARGET, $C_MISS, $C_FA

::* $TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the dev_scores_4fusion.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the indovidual systems and for the fusion system obtained following this example are :

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T23:49:01Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the [https://www.l2f.inesc-id.pt L2F] and the [http://gtts.ehu.es/gtts/ GTTS] groups in the context of their research activities in the topic of query-by-example STD. The method has been succesfully tested in the SWS2012 and SWS2013 tasks of [http://www.multimediaeval.org/mediaeval2013/sws2013/ Mediaeval Evaluation].

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o dev_scores_4fusion.txt \
-t num_queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' dev_scores_4fusion.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''num_queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o eval_scores_4fusion.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh dev_scores_4fusion.txt eval_scores_4fusion.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The Bosaris toolkit needs to be installed. Change the variable BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters P_TARGET, C_MISS, C_FA

::* TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the dev_scores_4fusion.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the indovidual systems and for the fusion system obtained following this example are :

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T23:24:47Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o dev_scores_4fusion.txt \
-t num_queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' dev_scores_4fusion.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''num_queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o eval_scores_4fusion.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh dev_scores_4fusion.txt eval_scores_4fusion.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The Bosaris toolkit needs to be installed. Change the variable BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters P_TARGET, C_MISS, C_FA

::* TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the dev_scores_4fusion.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the indovidual systems and for the fusion system obtained following this example are :

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T23:16:24Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' DEV_SCORES_4FUSION.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>
::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o EVAL_SCORES_4FUSION.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The Bosaris toolkit needs to be installed. Change the variable BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters P_TARGET, C_MISS, C_FA

::* TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the DEV_SCORES_4FUSION.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the indovidual systems and for the fusion system obtained following this example are :

{| border=1 cellspacing=0 align=center cellpadding=5px width=50%
|+ '''Reference results for Mediaeval SWS 2013'''
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T23:13:46Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' DEV_SCORES_4FUSION.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>

::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o EVAL_SCORES_4FUSION.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The Bosaris toolkit needs to be installed. Change the variable BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters P_TARGET, C_MISS, C_FA

::* TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the DEV_SCORES_4FUSION.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the indovidual systems and for the fusion system obtained following this example are :

{| border=1
|+ Reference results for Mediaeval SWS 2013
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T23:12:12Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' DEV_SCORES_4FUSION.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>

::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o EVAL_SCORES_4FUSION.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The Bosaris toolkit needs to be installed. Change the variable BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters P_TARGET, C_MISS, C_FA

::* TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the DEV_SCORES_4FUSION.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the indovidual systems and for the fusion system obtained following this example are :

{| border=1
|+ Reference sample results at Mediaeval 2013 SWS
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T23:11:37Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' DEV_SCORES_4FUSION.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>

::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o EVAL_SCORES_4FUSION.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The Bosaris toolkit needs to be installed. Change the variable BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters P_TARGET, C_MISS, C_FA

::* TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the DEV_SCORES_4FUSION.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the indovidual systems and for the fusion system obtained following this example are :

{| border=1
|+ TWV
|-
! !! dev (mtwv/atwv) !! eval (mtwv/atwv)
|-
| akws-br || 0.1571 / 0.1408 || 0.1441 / 0.1271
|-
| dtw-br || 0.2066 / 0.2012 || 0.1654 / 0.1581
|-
| fusion || 0.2731 / 0.2713 ||
|}

STDfusion

2013-10-16T22:56:22Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' DEV_SCORES_4FUSION.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>

::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o EVAL_SCORES_4FUSION.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The Bosaris toolkit needs to be installed. Change the variable BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters P_TARGET, C_MISS, C_FA

::* TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the DEV_SCORES_4FUSION.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task for each one of the indovidual systems and for the fusion system obtained following this example are :

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

STDfusion

2013-10-16T22:55:46Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

<big>'''STEP1'''</big> Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' DEV_SCORES_4FUSION.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>

::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

<big>'''STEP2'''</big> Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o EVAL_SCORES_4FUSION.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

<big>'''STEP3'''</big> Train fusion parameters using thedevelopment scores and apply it to the evaluation scores

* Type the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* '''Warning''' The Bosaris toolkit needs to be installed. Change the variable BOSARIS to the path where this the Bosaris toolkit is installed.

:* '''Warning''' The script contains 4 internal variables that are specific to the Mediaeval SWS2013 evaluation:
::* The cost parameters P_TARGET, C_MISS, C_FA

::* TSPEECH - The total duration (in seconds) of the data collection

:* '''Warning''' The call to the fusion script can take several minutes.

:* This call permits learning from the DEV_SCORES_4FUSION.txt (with groundtruh information) the calibration and fusion parameters that are applied both to the development and to the evaluation scores. It produces 3 output files:

::'''1.''' dev_fusion.scores - file containing the development well-calibrated fusion scores
::'''2.''' eval_fusion.scores - file containing the evaluation well-calibrated fusion scores
::'''3.''' fuse_params.txt - file containig thefusion parameters

::The format of the score output files is as follows (one row per candidate detection):

<query_id> <file_id> <start_time> <duration> <score> <decision>
::The <decision> field is 0 or 1 depending if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

* To convert the score files to the SWS2013 stdslist.xml format type the following commands:

./bin/raw2stdslist.sh dev_fusion.scores \
./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml

./bin/raw2stdslist.sh eval_fusion.scores \
./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

:* This script can take a third parameter, a threshold value to apply a different decision threshold to the optimal Bayes one.

<big>'''Reference results'''<big>

* In the folder ./sample_results/ you can find each one of the intermediate files procuced in this example

* The TWV results obtained with the sample score files provided for the Mediaeval SWS2013 task are for each one of the indovidual systems and for the fusion system obtained following this example are :

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

STDfusion

2013-10-16T22:31:56Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

'''STEP1''' Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' DEV_SCORES_4FUSION.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>

::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

'''STEP2''' Prepare the evaluations scores for fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm \
-o EVAL_SCORES_4FUSION.txt \
-z qnorm -m 1 -n 1 \
./scores/eval/akws_br-evalterms.stdlist.xml \
./scores/eval/dtw_br-evalterms.stdlist.xml

:*'''Warning''' It is fundamental to provide the stdlist.xml input files in the same order used in theprevious step.

:*Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

STEP3 - Train fusion on development scores and apply it to eval scores typing the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

Before running this script you will need to change also the value of the MATLAB_BIN variable and download the Bosaris toolkit and change the variable BOSARIS to the path that contains this toolkit.

Notice also that the script contains 4 hard-coded variables that are specific to the Mediaeval SWS2013 evaluation. These are the cost parameters P_TARGET, C_MISS, C_FA and the total duration in seconds of the collection data TSPEECH

This fusion script uses the DEV_SCORES_4FUSION.txt (with groundtruh information) to learn the calibration and fusion parameters that are applied both to the development set and to the evaluation set.

As mentioned previously the queries_in_data.ref contains the statistics abbout the true number of instances of each query on the data and it is used for hypothesizing missing scores and for computing AWTV.

This call (that can take a while, several minutes) will produce 1 output file for the dev scores (dev_fusion.scores) and 1 output file for the eval scores (eval_fusion.scores). The format of these output files is as follows:

<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>
...
...
<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>

The <decision> field is 0 or 1 depending respectively if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

Additionally, the fusion parameters are stored in the fuse_params.txt.

The final step consists of converting this result files to the format used in the SWS2013 challenge:

./bin/raw2stdslist.sh dev_fusion.scores ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml
./bin/raw2stdslist.sh eval_fusion.scores ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

Notice that this script can take a third parameter. This paramater is a threshold value that you can use to apply a different decision threshold to the optimal Bayes one.

Reference TWV results in Mediaeval SWS2013 task:

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

STDfusion

2013-10-16T22:26:23Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the
groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

'''STEP1''' Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

:* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

:* This process will generate 2 output files:

::'''1.''' DEV_SCORES_4FUSION.txt - It contains the scores ready to be used for the following fusion stage in the following general format (one row per detection candidate):

<query_id> <file_id> <start_time> <duration> <sc1> <sc2> ... <scN> <label>

::*Notice that now all systems produce a score for all candidate detections (the score matrix is full).

::* If the -g option is selected (like in this case) the <label> column contains 0s and 1s for the false and true trials respectively (derived from de rttm file). If the -g option is not selected, the last column simply contains a column with 1s.

::'''2.'''queries_in_data.ref - This second (optional) output file contains the number of times each query appears in the collection data and it is used later in the fusion stage.

*You can have a look to the general usage of this script calling it without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

STEP2 - Prepare the eval scores for fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml -r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm -o EVAL_SCORES_4FUSION.txt -z qnorm -m 1 -n 1 ./scores/eval/akws_br-evalterms.stdlist.xml ./scores/eval/dtw_br-evalterms.stdlist.xml

Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

It is also FUNDAMENTAL to provide the stdlist.xml score files in the same order used in the preparation of the development scores.

STEP3 - Train fusion on development scores and apply it to eval scores typing the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

Before running this script you will need to change also the value of the MATLAB_BIN variable and download the Bosaris toolkit and change the variable BOSARIS to the path that contains this toolkit.

Notice also that the script contains 4 hard-coded variables that are specific to the Mediaeval SWS2013 evaluation. These are the cost parameters P_TARGET, C_MISS, C_FA and the total duration in seconds of the collection data TSPEECH

This fusion script uses the DEV_SCORES_4FUSION.txt (with groundtruh information) to learn the calibration and fusion parameters that are applied both to the development set and to the evaluation set.

As mentioned previously the queries_in_data.ref contains the statistics abbout the true number of instances of each query on the data and it is used for hypothesizing missing scores and for computing AWTV.

This call (that can take a while, several minutes) will produce 1 output file for the dev scores (dev_fusion.scores) and 1 output file for the eval scores (eval_fusion.scores). The format of these output files is as follows:

<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>
...
...
<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>

The <decision> field is 0 or 1 depending respectively if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

Additionally, the fusion parameters are stored in the fuse_params.txt.

The final step consists of converting this result files to the format used in the SWS2013 challenge:

./bin/raw2stdslist.sh dev_fusion.scores ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml
./bin/raw2stdslist.sh eval_fusion.scores ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

Notice that this script can take a third parameter. This paramater is a threshold value that you can use to apply a different decision threshold to the optimal Bayes one.

Reference TWV results in Mediaeval SWS2013 task:

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

STDfusion

2013-10-16T22:18:32Z

Alberto:

== Introduction ==

In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

The package can be downloaded from here.

== Package contents ==

The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the
groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

'''STEP1''' Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

* This process will generate 2 output files:

* DEV_SCORES_4FUSION.txt: It contains the scores ready to be used for the following fusion stage in the following formar:

<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>
...
...
<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>

Notice that all systems have produced a score for all candidate detections.

If the -g option is selected (like in this case) the last column will contain the 0s and 1s for the false and true trials respectively derived from de rttm. If the -g option is not selected, the last column simply contains a column of 1s.

The second (optional) output file is the queries_in_data.ref. This file simply contains the number of times each query appears in the collection data and it is used later in the fusion stage.

You can have a look to the general usage of this script typing without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

STEP2 - Prepare the eval scores for fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml -r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm -o EVAL_SCORES_4FUSION.txt -z qnorm -m 1 -n 1 ./scores/eval/akws_br-evalterms.stdlist.xml ./scores/eval/dtw_br-evalterms.stdlist.xml

Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

It is also FUNDAMENTAL to provide the stdlist.xml score files in the same order used in the preparation of the development scores.

STEP3 - Train fusion on development scores and apply it to eval scores typing the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

Before running this script you will need to change also the value of the MATLAB_BIN variable and download the Bosaris toolkit and change the variable BOSARIS to the path that contains this toolkit.

Notice also that the script contains 4 hard-coded variables that are specific to the Mediaeval SWS2013 evaluation. These are the cost parameters P_TARGET, C_MISS, C_FA and the total duration in seconds of the collection data TSPEECH

This fusion script uses the DEV_SCORES_4FUSION.txt (with groundtruh information) to learn the calibration and fusion parameters that are applied both to the development set and to the evaluation set.

As mentioned previously the queries_in_data.ref contains the statistics abbout the true number of instances of each query on the data and it is used for hypothesizing missing scores and for computing AWTV.

This call (that can take a while, several minutes) will produce 1 output file for the dev scores (dev_fusion.scores) and 1 output file for the eval scores (eval_fusion.scores). The format of these output files is as follows:

<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>
...
...
<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>

The <decision> field is 0 or 1 depending respectively if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

Additionally, the fusion parameters are stored in the fuse_params.txt.

The final step consists of converting this result files to the format used in the SWS2013 challenge:

./bin/raw2stdslist.sh dev_fusion.scores ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml
./bin/raw2stdslist.sh eval_fusion.scores ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

Notice that this script can take a third parameter. This paramater is a threshold value that you can use to apply a different decision threshold to the optimal Bayes one.

Reference TWV results in Mediaeval SWS2013 task:

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

STDfusion

2013-10-16T22:14:49Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the
groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

'''STEP1''' Prepare the development scores for learning the calibration/fusion

* Type the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

* '''Warning''' Before running the script, you will have to change the MATLAB_BIN variable to be the path where your Matlab binary is actually installed.

* This process will generate 2 output files:

1. DEV_SCORES_4FUSION.txt contains the scores ready to be used for the following fusion stage in the following formar:

<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>
...
...
<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>

Notice that all systems have produced a score for all candidate detections.

If the -g option is selected (like in this case) the last column will contain the 0s and 1s for the false and true trials respectively derived from de rttm. If the -g option is not selected, the last column simply contains a column of 1s.

The second (optional) output file is the queries_in_data.ref. This file simply contains the number of times each query appears in the collection data and it is used later in the fusion stage.

You can have a look to the general usage of this script typing without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

STEP2 - Prepare the eval scores for fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml -r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm -o EVAL_SCORES_4FUSION.txt -z qnorm -m 1 -n 1 ./scores/eval/akws_br-evalterms.stdlist.xml ./scores/eval/dtw_br-evalterms.stdlist.xml

Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

It is also FUNDAMENTAL to provide the stdlist.xml score files in the same order used in the preparation of the development scores.

STEP3 - Train fusion on development scores and apply it to eval scores typing the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

Before running this script you will need to change also the value of the MATLAB_BIN variable and download the Bosaris toolkit and change the variable BOSARIS to the path that contains this toolkit.

Notice also that the script contains 4 hard-coded variables that are specific to the Mediaeval SWS2013 evaluation. These are the cost parameters P_TARGET, C_MISS, C_FA and the total duration in seconds of the collection data TSPEECH

This fusion script uses the DEV_SCORES_4FUSION.txt (with groundtruh information) to learn the calibration and fusion parameters that are applied both to the development set and to the evaluation set.

As mentioned previously the queries_in_data.ref contains the statistics abbout the true number of instances of each query on the data and it is used for hypothesizing missing scores and for computing AWTV.

This call (that can take a while, several minutes) will produce 1 output file for the dev scores (dev_fusion.scores) and 1 output file for the eval scores (eval_fusion.scores). The format of these output files is as follows:

<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>
...
...
<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>

The <decision> field is 0 or 1 depending respectively if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

Additionally, the fusion parameters are stored in the fuse_params.txt.

The final step consists of converting this result files to the format used in the SWS2013 challenge:

./bin/raw2stdslist.sh dev_fusion.scores ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml
./bin/raw2stdslist.sh eval_fusion.scores ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

Notice that this script can take a third parameter. This paramater is a threshold value that you can use to apply a different decision threshold to the optimal Bayes one.

Reference TWV results in Mediaeval SWS2013 task:

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

STDfusion

2013-10-16T22:09:36Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - Contains pretty much the same information of this wiki site
./bin/ - Contains the different scripts necessary for calibration and fusion
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the
groundtruh for the input systems
./bin/align_score_files.pl - Used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - Used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - Used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation scores
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the SWS evaluation
./scoring_atwv_sws2013/ - Contains the Mediaeval 2013 atwv scoring package
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample (intermediate) results that are obtained if you run the instructions bellow

== Example of use ==

'''STEP1''' Prepare the development scores for training the fusion parameters

* Typing the following command:

./bin/PrepareForFusion.sh \
-q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml \
-r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm \
-o DEV_SCORES_4FUSION.txt \
-t queries_in_data.ref \
-g -z qnorm -m 1 -n 1 \
./scores/dev/akws_br-devterms.stdlist.xml \
./scores/dev/dtw_br-devterms.stdlist.xml

Before running the script, change the MATLAB_BIN variable to be the path to your Matlab binary.

This script call will generate 2 output files:

The first one is the DEV_SCORES_4FUSION.txt that contains the scores ready to be used for the following stage. The format is something like:

<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>
...
...
<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>

Notice that all systems have produced a score for all candidate detections.

If the -g option is selected (like in this case) the last column will contain the 0s and 1s for the false and true trials respectively derived from de rttm. If the -g option is not selected, the last column simply contains a column of 1s.

The second (optional) output file is the queries_in_data.ref. This file simply contains the number of times each query appears in the collection data and it is used later in the fusion stage.

You can have a look to the general usage of this script typing without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

STEP2 - Prepare the eval scores for fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml -r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm -o EVAL_SCORES_4FUSION.txt -z qnorm -m 1 -n 1 ./scores/eval/akws_br-evalterms.stdlist.xml ./scores/eval/dtw_br-evalterms.stdlist.xml

Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

It is also FUNDAMENTAL to provide the stdlist.xml score files in the same order used in the preparation of the development scores.

STEP3 - Train fusion on development scores and apply it to eval scores typing the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

Before running this script you will need to change also the value of the MATLAB_BIN variable and download the Bosaris toolkit and change the variable BOSARIS to the path that contains this toolkit.

Notice also that the script contains 4 hard-coded variables that are specific to the Mediaeval SWS2013 evaluation. These are the cost parameters P_TARGET, C_MISS, C_FA and the total duration in seconds of the collection data TSPEECH

This fusion script uses the DEV_SCORES_4FUSION.txt (with groundtruh information) to learn the calibration and fusion parameters that are applied both to the development set and to the evaluation set.

As mentioned previously the queries_in_data.ref contains the statistics abbout the true number of instances of each query on the data and it is used for hypothesizing missing scores and for computing AWTV.

This call (that can take a while, several minutes) will produce 1 output file for the dev scores (dev_fusion.scores) and 1 output file for the eval scores (eval_fusion.scores). The format of these output files is as follows:

<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>
...
...
<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>

The <decision> field is 0 or 1 depending respectively if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

Additionally, the fusion parameters are stored in the fuse_params.txt.

The final step consists of converting this result files to the format used in the SWS2013 challenge:

./bin/raw2stdslist.sh dev_fusion.scores ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml
./bin/raw2stdslist.sh eval_fusion.scores ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

Notice that this script can take a third parameter. This paramater is a threshold value that you can use to apply a different decision threshold to the optimal Bayes one.

Reference TWV results in Mediaeval SWS2013 task:

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

STDfusion

2013-10-16T22:01:38Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems as described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here.

== Package contents ==

* The package contains the following files and directories:

README.txt - It contains pretty much the same information of this wiki site
./bin/ - This directory contains the different scripts necessary for calibration and fusion:
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the
groundtruh for the input systems
./bin/align_score_files.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation set
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the evaluation
./scoring_atwv_sws2013/ - This directory contains the Mediaeval 2013 official scoring package (only for atwv)
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample results that should be obtained if you run the scripts following the instructions bellow

STEP BY STEP SAMPLE INSTRUCTIONS

STEP1 - Prepare the dev scores for training fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml -r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm -o DEV_SCORES_4FUSION.txt -t queries_in_data.ref -g -z qnorm -m 1 -n 1 ./scores/dev/akws_br-devterms.stdlist.xml ./scores/dev/dtw_br-devterms.stdlist.xml

Before running the script, change the MATLAB_BIN variable to be the path to your Matlab binary.

This script call will generate 2 output files:

The first one is the DEV_SCORES_4FUSION.txt that contains the scores ready to be used for the following stage. The format is something like:

<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>
...
...
<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>

Notice that all systems have produced a score for all candidate detections.

If the -g option is selected (like in this case) the last column will contain the 0s and 1s for the false and true trials respectively derived from de rttm. If the -g option is not selected, the last column simply contains a column of 1s.

The second (optional) output file is the queries_in_data.ref. This file simply contains the number of times each query appears in the collection data and it is used later in the fusion stage.

You can have a look to the general usage of this script typing without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

STEP2 - Prepare the eval scores for fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml -r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm -o EVAL_SCORES_4FUSION.txt -z qnorm -m 1 -n 1 ./scores/eval/akws_br-evalterms.stdlist.xml ./scores/eval/dtw_br-evalterms.stdlist.xml

Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

It is also FUNDAMENTAL to provide the stdlist.xml score files in the same order used in the preparation of the development scores.

STEP3 - Train fusion on development scores and apply it to eval scores typing the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

Before running this script you will need to change also the value of the MATLAB_BIN variable and download the Bosaris toolkit and change the variable BOSARIS to the path that contains this toolkit.

Notice also that the script contains 4 hard-coded variables that are specific to the Mediaeval SWS2013 evaluation. These are the cost parameters P_TARGET, C_MISS, C_FA and the total duration in seconds of the collection data TSPEECH

This fusion script uses the DEV_SCORES_4FUSION.txt (with groundtruh information) to learn the calibration and fusion parameters that are applied both to the development set and to the evaluation set.

As mentioned previously the queries_in_data.ref contains the statistics abbout the true number of instances of each query on the data and it is used for hypothesizing missing scores and for computing AWTV.

This call (that can take a while, several minutes) will produce 1 output file for the dev scores (dev_fusion.scores) and 1 output file for the eval scores (eval_fusion.scores). The format of these output files is as follows:

<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>
...
...
<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>

The <decision> field is 0 or 1 depending respectively if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

Additionally, the fusion parameters are stored in the fuse_params.txt.

The final step consists of converting this result files to the format used in the SWS2013 challenge:

./bin/raw2stdslist.sh dev_fusion.scores ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml
./bin/raw2stdslist.sh eval_fusion.scores ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

Notice that this script can take a third parameter. This paramater is a threshold value that you can use to apply a different decision threshold to the optimal Bayes one.

Reference TWV results in Mediaeval SWS2013 task:

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

STDfusion

2013-10-16T21:53:57Z

Alberto:

== Introduction ==

* In this wiki we make available the necessary code and the basic instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection (STD) systems.

* The proposed method is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in the Mediaeval Spoken Web Search (SWS) task. The method has been succesfully tested in the SWS2012 and SWS2013 tasks.

* Unfortunately, we did not have time to consolidate (and clean up) the code and what we are making available is very much the same set of scripts that we developed during the first experiments. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some external dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

* The package can be downloaded from here. Please cite the following work if you find it useful:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

contains the following files and directories:

README.txt - This file
./bin/ - This directory contains the different scripts necessary for calibration and fusion:
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for a vaiable number of input devices
./bin/align_score_files.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation set
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the evaluation
./scoring_atwv_sws2013/ - This directory contains the Mediaeval 2013 official scoring package (only for atwv)
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample results that should be obtained if you run the scripts following the instructions bellow

STEP BY STEP SAMPLE INSTRUCTIONS

STEP1 - Prepare the dev scores for training fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml -r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm -o DEV_SCORES_4FUSION.txt -t queries_in_data.ref -g -z qnorm -m 1 -n 1 ./scores/dev/akws_br-devterms.stdlist.xml ./scores/dev/dtw_br-devterms.stdlist.xml

Before running the script, change the MATLAB_BIN variable to be the path to your Matlab binary.

This script call will generate 2 output files:

The first one is the DEV_SCORES_4FUSION.txt that contains the scores ready to be used for the following stage. The format is something like:

<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>
...
...
<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>

Notice that all systems have produced a score for all candidate detections.

If the -g option is selected (like in this case) the last column will contain the 0s and 1s for the false and true trials respectively derived from de rttm. If the -g option is not selected, the last column simply contains a column of 1s.

The second (optional) output file is the queries_in_data.ref. This file simply contains the number of times each query appears in the collection data and it is used later in the fusion stage.

You can have a look to the general usage of this script typing without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

STEP2 - Prepare the eval scores for fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml -r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm -o EVAL_SCORES_4FUSION.txt -z qnorm -m 1 -n 1 ./scores/eval/akws_br-evalterms.stdlist.xml ./scores/eval/dtw_br-evalterms.stdlist.xml

Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

It is also FUNDAMENTAL to provide the stdlist.xml score files in the same order used in the preparation of the development scores.

STEP3 - Train fusion on development scores and apply it to eval scores typing the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

Before running this script you will need to change also the value of the MATLAB_BIN variable and download the Bosaris toolkit and change the variable BOSARIS to the path that contains this toolkit.

Notice also that the script contains 4 hard-coded variables that are specific to the Mediaeval SWS2013 evaluation. These are the cost parameters P_TARGET, C_MISS, C_FA and the total duration in seconds of the collection data TSPEECH

This fusion script uses the DEV_SCORES_4FUSION.txt (with groundtruh information) to learn the calibration and fusion parameters that are applied both to the development set and to the evaluation set.

As mentioned previously the queries_in_data.ref contains the statistics abbout the true number of instances of each query on the data and it is used for hypothesizing missing scores and for computing AWTV.

This call (that can take a while, several minutes) will produce 1 output file for the dev scores (dev_fusion.scores) and 1 output file for the eval scores (eval_fusion.scores). The format of these output files is as follows:

<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>
...
...
<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>

The <decision> field is 0 or 1 depending respectively if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

Additionally, the fusion parameters are stored in the fuse_params.txt.

The final step consists of converting this result files to the format used in the SWS2013 challenge:

./bin/raw2stdslist.sh dev_fusion.scores ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml
./bin/raw2stdslist.sh eval_fusion.scores ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

Notice that this script can take a third parameter. This paramater is a threshold value that you can use to apply a different decision threshold to the optimal Bayes one.

Reference TWV results in Mediaeval SWS2013 task:

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

STDfusion

2013-10-16T21:42:43Z

Alberto:

== Introduction

In this wiki we make available the necessary code and the instructions to carry out discriminative calibration and fusion of heterogeneous spoken term detection systems. This approach is the result of the collaboration between the L2F and the GTTS groups in the context of their activities in and described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel.
On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems.
In Interspeech 2013, August 25-29 2013

The proposed approach has been succesfully tested for the query-by-example spoken search task of Mediaeval 2012 and Mediaeval 2013. Sample Mediaeval 2013 system results are provided to test the package.

Unfortunately, we did not have time to consolidate (and clean) the code and what we are making available is very much the same set of scripts that we developed during the preparation of the experiments reported in the Interspeech paper. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

The package contains the following files and directories:

README.txt - This file
./bin/ - This directory contains the different scripts necessary for calibration and fusion:
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for a vaiable number of input devices
./bin/align_score_files.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation set
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the evaluation
./scoring_atwv_sws2013/ - This directory contains the Mediaeval 2013 official scoring package (only for atwv)
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample results that should be obtained if you run the scripts following the instructions bellow

STEP BY STEP SAMPLE INSTRUCTIONS

STEP1 - Prepare the dev scores for training fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml -r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm -o DEV_SCORES_4FUSION.txt -t queries_in_data.ref -g -z qnorm -m 1 -n 1 ./scores/dev/akws_br-devterms.stdlist.xml ./scores/dev/dtw_br-devterms.stdlist.xml

Before running the script, change the MATLAB_BIN variable to be the path to your Matlab binary.

This script call will generate 2 output files:

The first one is the DEV_SCORES_4FUSION.txt that contains the scores ready to be used for the following stage. The format is something like:

<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>
...
...
<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>

Notice that all systems have produced a score for all candidate detections.

If the -g option is selected (like in this case) the last column will contain the 0s and 1s for the false and true trials respectively derived from de rttm. If the -g option is not selected, the last column simply contains a column of 1s.

The second (optional) output file is the queries_in_data.ref. This file simply contains the number of times each query appears in the collection data and it is used later in the fusion stage.

You can have a look to the general usage of this script typing without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

STEP2 - Prepare the eval scores for fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml -r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm -o EVAL_SCORES_4FUSION.txt -z qnorm -m 1 -n 1 ./scores/eval/akws_br-evalterms.stdlist.xml ./scores/eval/dtw_br-evalterms.stdlist.xml

Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

It is also FUNDAMENTAL to provide the stdlist.xml score files in the same order used in the preparation of the development scores.

STEP3 - Train fusion on development scores and apply it to eval scores typing the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

Before running this script you will need to change also the value of the MATLAB_BIN variable and download the Bosaris toolkit and change the variable BOSARIS to the path that contains this toolkit.

Notice also that the script contains 4 hard-coded variables that are specific to the Mediaeval SWS2013 evaluation. These are the cost parameters P_TARGET, C_MISS, C_FA and the total duration in seconds of the collection data TSPEECH

This fusion script uses the DEV_SCORES_4FUSION.txt (with groundtruh information) to learn the calibration and fusion parameters that are applied both to the development set and to the evaluation set.

As mentioned previously the queries_in_data.ref contains the statistics abbout the true number of instances of each query on the data and it is used for hypothesizing missing scores and for computing AWTV.

This call (that can take a while, several minutes) will produce 1 output file for the dev scores (dev_fusion.scores) and 1 output file for the eval scores (eval_fusion.scores). The format of these output files is as follows:

<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>
...
...
<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>

The <decision> field is 0 or 1 depending respectively if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

Additionally, the fusion parameters are stored in the fuse_params.txt.

The final step consists of converting this result files to the format used in the SWS2013 challenge:

./bin/raw2stdslist.sh dev_fusion.scores ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml
./bin/raw2stdslist.sh eval_fusion.scores ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

Notice that this script can take a third parameter. This paramater is a threshold value that you can use to apply a different decision threshold to the optimal Bayes one.

Reference TWV results in Mediaeval SWS2013 task:

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

STDfusion

2013-10-16T21:37:08Z

Alberto:

This package contains the code necessary to run descriminative calibration and fusion of spoken term detection systems resulting from the collaboration between the L2F and the GTTS group of the Unversity of the Basque Country and described in:

A. Abad, L. J. Rodriguez Fuentes, M. Penagarikano, A. Varona, M. Diez, and G. Bordel. On the Calibration and Fusion of Heterogeneous Spoken Term Detection Systems. In Interspeech 2013, August 25-29 2013

The proposed approach has been succesfully tested for the query-by-example spoken search task of Mediaeval 2012 and Mediaeval 2013. Sample Mediaeval 2013 system results are provided to test the package.

Unfortunately, we did not have time to consolidate (and clean) the code and what we are making available is very much the same set of scripts that we developed during the preparation of the experiments reported in the Interspeech paper. Thus, the code is written in different pieces in different coding languages (bash, perl, matlab, etc.) and it has some dependencies. Anyway, we expect that it can be still useful for researchers that want to try to fuse their STD systems.

The package contains the following files and directories:

README.txt - This file
./bin/ - This directory contains the different scripts necessary for calibration and fusion:
./bin/PrepareForFusion.sh - Main script than normalizes, aligns, hypotesizes missing scores and creates the groundtruh for a vaiable number of input devices
./bin/align_score_files.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/create_groundtruth_centerdistance.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/heuristicScoring.pl - This script is used by ./bin/PrepareForFusion.sh
./bin/fusion.sh - This script takes the output of ./bin/PrepareForFusion.sh to learn the fusion parameters and apply them to the evaluation set
./bin/raw2stdslist.sh - Converts scores from a raw (internal) format to the stdlist xml format of the evaluation
./scoring_atwv_sws2013/ - This directory contains the Mediaeval 2013 official scoring package (only for atwv)
./scores/ - Contains sample dev (./scores/dev/akws_br-devterms.stdlist.xml, ./scores/dev/dtw_br-devterms.stdlist.xml) and eval scores (./scores/eval/akws_br-evalterms.stdlist.xml, ./scores/eval/dtw_br-evalterms.stdlist.xml)
./sample_results/ - Contains the sample results that should be obtained if you run the scripts following the instructions bellow

STEP BY STEP SAMPLE INSTRUCTIONS

STEP1 - Prepare the dev scores for training fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml -r ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.rttm -o DEV_SCORES_4FUSION.txt -t queries_in_data.ref -g -z qnorm -m 1 -n 1 ./scores/dev/akws_br-devterms.stdlist.xml ./scores/dev/dtw_br-devterms.stdlist.xml

Before running the script, change the MATLAB_BIN variable to be the path to your Matlab binary.

This script call will generate 2 output files:

The first one is the DEV_SCORES_4FUSION.txt that contains the scores ready to be used for the following stage. The format is something like:

<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>
...
...
<query_id> <file_id> <start_time> <duration> <score_system1> <score_system2> ... <score_systemN> <groundtruth_label>

Notice that all systems have produced a score for all candidate detections.

If the -g option is selected (like in this case) the last column will contain the 0s and 1s for the false and true trials respectively derived from de rttm. If the -g option is not selected, the last column simply contains a column of 1s.

The second (optional) output file is the queries_in_data.ref. This file simply contains the number of times each query appears in the collection data and it is used later in the fusion stage.

You can have a look to the general usage of this script typing without arguments:

Usage: PrepareForFusion.sh -q <tlistxml> -r <rttm> -o <outputfile> [opts] <stdlistfile1> [stdlistfile2] [stdlistfile3] ... [stdlistfileN]

<stdlistfile*> input score stdlist file in the SWS2012 format (*.stdlist.xml) | - Required argument (at least 1)
-q <tlistxml> termlist file in the SWS2012 format (*.tlist.xml) | - Required parameter
-r <rttm> rttm file in the SWS2012 format (*.rttm) | - Required parameter
-o <outputfile> output file name | - Required parameter
-z <value> score z-norm type (none, qnorm, fnorm, qfnorm, fqnorm) | - Default: none
-g add ground-truth information to the outputfile
-t <filename> saves the number of true terms in the reference per query (implies -g)
-m <value> apply majority voting fusion with <value> minimum number of votes | - Default: 1
-n <value> method for creating default scores (0: average of the other detections (MV approach); 1; min per query, 2: global min, 3: histogram based) | - Default: 0
-d debug mode, done remove auxiliar files stored in /tmp/tmpdir.$$
-h help

NOTE: Requires Matlab (or octave) and perl. It also depends on the perl scripts align_score_files.pl, heuristicScoring.pl and create_groundtruth_centerdistance.pl
that should be located in the same folder of the main script

STEP2 - Prepare the eval scores for fusion typing the following command:

./bin/PrepareForFusion.sh -q ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml -r ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.rttm -o EVAL_SCORES_4FUSION.txt -z qnorm -m 1 -n 1 ./scores/eval/akws_br-evalterms.stdlist.xml ./scores/eval/dtw_br-evalterms.stdlist.xml

Notice that in contrast to case of the development scores, we did not use the -g option since we do not need the groudtruth in this case.

It is also FUNDAMENTAL to provide the stdlist.xml score files in the same order used in the preparation of the development scores.

STEP3 - Train fusion on development scores and apply it to eval scores typing the following command:

./bin/fusion.sh DEV_SCORES_4FUSION.txt EVAL_SCORES_4FUSION.txt queries_in_data.ref

Before running this script you will need to change also the value of the MATLAB_BIN variable and download the Bosaris toolkit and change the variable BOSARIS to the path that contains this toolkit.

Notice also that the script contains 4 hard-coded variables that are specific to the Mediaeval SWS2013 evaluation. These are the cost parameters P_TARGET, C_MISS, C_FA and the total duration in seconds of the collection data TSPEECH

This fusion script uses the DEV_SCORES_4FUSION.txt (with groundtruh information) to learn the calibration and fusion parameters that are applied both to the development set and to the evaluation set.

As mentioned previously the queries_in_data.ref contains the statistics abbout the true number of instances of each query on the data and it is used for hypothesizing missing scores and for computing AWTV.

This call (that can take a while, several minutes) will produce 1 output file for the dev scores (dev_fusion.scores) and 1 output file for the eval scores (eval_fusion.scores). The format of these output files is as follows:

<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>
...
...
<query_id> <file_id> <start_time> <duration> <fusion_score> <decision>

The <decision> field is 0 or 1 depending respectively if the score is lower or greater than the minimum cost Bayes optimum threshold (see the Interspeech paper for details).

Additionally, the fusion parameters are stored in the fuse_params.txt.

The final step consists of converting this result files to the format used in the SWS2013 challenge:

./bin/raw2stdslist.sh dev_fusion.scores ./scoring_atwv_sws2013/sws2013_dev/sws2013_dev.tlist.xml > fusion-devterms.stdlist.xml
./bin/raw2stdslist.sh eval_fusion.scores ./scoring_atwv_sws2013/sws2013_eval/sws2013_eval.tlist.xml > fusion-evalterms.stdlist.xml

Notice that this script can take a third parameter. This paramater is a threshold value that you can use to apply a different decision threshold to the optimal Bayes one.

Reference TWV results in Mediaeval SWS2013 task:

dev eval
mtwv atwv mtwv atwv
akws_br 0.1571 0.1408
dtw_br
fusion

DTW

2013-10-02T12:25:16Z

Alberto: Created page with "This page will contain DTW code"

This page will contain DTW code

STDfusion

2013-10-02T12:24:52Z

Alberto: Created page with "This page will contain the STD fusion tools used for"

This page will contain the STD fusion tools used for

Tools

2013-10-02T12:22:57Z

Alberto:

{{TOCright}}
{| align="center" style="border-style: solid; border-width: 1px; border-color: gray80;"
|[[Image:tools1.jpg|197px]]
|[[Image:tools2.jpg|197px]]
|[[Image:tools3.jpg|197px]]
|}
Language processing uses tools such as syntactic and semantic analyzers. In order to perform their tasks, some of these tools use linguistic information (for instance, dictionaries and grammars), making natural language processing by computers closer to the human process.

We are using natural language processing tools in many of our applications, namely in dialog management, automatic summarization, information retrieval, question answering, discourse analysis, term and emotion extraction.
Besides applying these tools to text we are applying them to automatic transcriptions of spoken documents, leading to new challenges.

== Morphology ==

* [[MARv]] - a morphossyntactic disambiguation tool;
* [[monge]] - a word form generator;
* [[PAsMo]] - a rule-based morphology processor, tag converter, and sentence splitter;
* [[RuDriCo]] - a rule-based morphology processor;
* [[SMorph]] - a morphological analyser;
* [[XA]] - a morphological analyser similar to ispell and jspell;
* [[YAH]] - yet another hyphenator.
* [[VerbForms]] - a verbal form generator;

== Syntax ==

* [[Algas]] -- establishes dependency relations between chunks and words
* [[ParVO]] -- a C++ implementation of Earley's algorithm with attribute unification (as in an attribute grammar).
* [[SuSAna]] -- susrface syntactic analyser and chunker
* [[TiraTeimas]] -- verifies if a set of chunks satisfies a set of constraints

== Syntax/Semantics Interface ==

* [[AsDeCopas]] -- applies contextual rules (possibly hierarchically organized) to a graph
* [[Ogre]] -- transforms a structure where both chunks and words are connected into a dependency structure

== Semantics ==

* [[ATA (system)|ATA]] - automatic term extraction (semantics-like processing)

== Discourse Analysis ==

* [[DID]] - a discourse indentifier.

== Multi-purpose ==

* [[Galinha]] - a portal for building and running applications.
* [[Language Resources Database|LRDB]] - a language resources database and access framework.
* [[fstk|FSTk]] - a finite-state transducer library.
* [[ShReP]] - A Framework for constructing NLP systems.

== Speech Annotation ==

* [[L2F_PhoneAlign]] - A DTW-based phonetic aligner.
* [[L2F_MuLA]] - A tool to synchronize annotation of speech at various levels of granularity.

== Speech Recognition ==

* [[AUDIMUS]] - Automatic Large Vocabulary Continuous Speech Recognition for the European Portuguese language

== Speech Search ==

* [[DTW]] - DTW system developed for the Mediaeval 2013 Spoken Web Search challenge
* [[STDfusion]] - Tools for discriminative calibration/fusion of heterogeneous spoken term detection systems

[[category:Tools]]

Master Theses Themes at L2F

2013-05-10T13:26:47Z

Alberto:

* Aferição de técnicas de apresentação

* My-ROBOT: a robotic dialogue based learning companion for a maths game

* Simplificação/tradução automática de textos

* Ferramenta de pesquisa em áudio para advogados

* Microphone network selection and calibration approaches in multi-room environments (Aluno: Miguel Matos, Orientador: Alberto Abad)

* Mobile alert: combining human mobile motion detection and voice analysis

* Os arquivos de áudio de ontem e de hoje

* Query-by-example speaker search in large data speech collections (Aluno: ???, Orientador: Alberto Abad)

* Spoken term detection in speech collections using spoken queries (Aluno: ???, Orientador: Alberto Abad)

* Towards a universal language recognition system (Aluno: ???, Orientador: Alberto Abad)

* Using Shazam-like audio fingerprinting to block advertisements in audio podcasts

Advertisement-blocking applications in internet browsers (e.g. AdBlock in Firefox or Chrome) block advertisements appearing on websites so that they do not display on screen. Audio fingerprinting applications like Shazam recognize a song from small segments by using audio fingerprinting. The objective of this thesis is to design a program which is able to block advertisements found on audio available for download on the web (podcast) on semi-supervised form. The method should exploit fingerprinting techniques like those of Shazam and the fact that Ads do not change from podcast to podcast, unlike the conventional content. The work will include

- Gathering a small set of audio examples from the web containing advertisements (i.e. from BBC podcasts) which is representative of the problem to be solved.

- Using fingerprinting algorithms to identify similar segments in the audio podcasts, store fingerprints and filter out advertisements.

* Ensinando um chatbot a responder de modo credível com base em diferentes tipos de feedback

*Just.Chat – respondendo como um humano com base em legendas de filmes

* SERTO – Sobrevivendo a Erros de Reconhecimento, de Teclado e Outros

* POE: PlatafOrma de Escrita

* Dizes-me onde vive a Scarlett Johansson? – desenvolvimento de um sistema de pergunta/resposta para português

* Medicine.Ask: An intelligent search facility for medicine information

* Teach me! Interface verbal para ensino de agentes artificiais

* Aplicação de métodos de aprendizagem ativa em Língua Natural (Aluno: Nuno Aniceto, Orientador: Nuno Mamede)

* Classificador de textos para o ensino de português como segunda língua (Aluno: Pedro Curto, Orientador: Nuno Mamede)

* Corretor ortográfico e identificação de sufixos com Transdutores (Aluno: Marco Ferreira, Orientador: Nuno Mamede)

* Desambiguação semântica de nomes (Aluno: Rita Policarpo, Orientador: Nuno Mamede)

* Desambiguação sintatica/semântica de verbos (Aluno: Goncalo Suissas, Orientador: Nuno Mamede)

Master Theses Themes at L2F

2013-05-10T13:26:14Z

Alberto:

* Aferição de técnicas de apresentação

* My-ROBOT: a robotic dialogue based learning companion for a maths game

* Simplificação/tradução automática de textos

* Ferramenta de pesquisa em áudio para advogados

* Microphone network selection and calibration approaches in multi-room environments (Aluno: Miguel Matos Orientador: Alberto Abad)

* Mobile alert: combining human mobile motion detection and voice analysis

* Os arquivos de áudio de ontem e de hoje

* Query-by-example speaker search in large data speech collections (Aluno: ??? Orientador: Alberto Abad)

* Spoken term detection in speech collections using spoken queries (Aluno: ??? Orientador: Alberto Abad)

* Towards a universal language recognition system (Aluno: ??? Orientador: Alberto Abad)

* Using Shazam-like audio fingerprinting to block advertisements in audio podcasts

Advertisement-blocking applications in internet browsers (e.g. AdBlock in Firefox or Chrome) block advertisements appearing on websites so that they do not display on screen. Audio fingerprinting applications like Shazam recognize a song from small segments by using audio fingerprinting. The objective of this thesis is to design a program which is able to block advertisements found on audio available for download on the web (podcast) on semi-supervised form. The method should exploit fingerprinting techniques like those of Shazam and the fact that Ads do not change from podcast to podcast, unlike the conventional content. The work will include

- Gathering a small set of audio examples from the web containing advertisements (i.e. from BBC podcasts) which is representative of the problem to be solved.

- Using fingerprinting algorithms to identify similar segments in the audio podcasts, store fingerprints and filter out advertisements.

* Ensinando um chatbot a responder de modo credível com base em diferentes tipos de feedback

*Just.Chat – respondendo como um humano com base em legendas de filmes

* SERTO – Sobrevivendo a Erros de Reconhecimento, de Teclado e Outros

* POE: PlatafOrma de Escrita

* Dizes-me onde vive a Scarlett Johansson? – desenvolvimento de um sistema de pergunta/resposta para português

* Medicine.Ask: An intelligent search facility for medicine information

* Teach me! Interface verbal para ensino de agentes artificiais

* Aplicação de métodos de aprendizagem ativa em Língua Natural (Aluno: Nuno Aniceto, Orientador: Nuno Mamede)

* Classificador de textos para o ensino de português como segunda língua (Aluno: Pedro Curto, Orientador: Nuno Mamede)

* Corretor ortográfico e identificação de sufixos com Transdutores (Aluno: Marco Ferreira, Orientador: Nuno Mamede)

* Desambiguação semântica de nomes (Aluno: Rita Policarpo, Orientador: Nuno Mamede)

* Desambiguação sintatica/semântica de verbos (Aluno: Goncalo Suissas, Orientador: Nuno Mamede)

People

2013-02-22T18:44:09Z

Alberto:

__NOTOC__ 

== Researchers ==

{| width="100%"
|-
! style="font-weight: normal; text-align: left; width: 25%;" | [[Ramon Fernandez Astudillo]]
! style="font-weight: normal; text-align: left; width: 25%;" | [[Gracinda Carvalho]]
| style="font-weight: normal; text-align: left; width: 25%;" | [[Nuno Mamede]]
| style="font-weight: normal; text-align: left; width: 25%;" | [[Thomas Pellegrini]]
|-
| [[Anabela Barreiro]]
| [[Joao Paulo Carvalho]]
| [[David Martins de Matos]]
| [[Ricardo Daniel Ribeiro|Ricardo Ribeiro]]
|-
| [[Fernando Batista]]
| [[Luísa Coheur]]
| [[Hugo Meinedo]]
| [[António Serralheiro|António J. Serralheiro]]
|-
| [[Luís Caldas de Oliveira|Luís C. Oliveira]]
| [[Alberto Abad Gareta]]
| [[João Paulo Neto]]
| [[Isabel Trancoso|Isabel M. Trancoso]] (Coordinator)
|}

== PhD Students ==

{| width="100%"
|-
! style="font-weight: normal; text-align: left; width: 25%;" | [[Gopala Krishna Anumanchipalli]]
! style="font-weight: normal; text-align: left; width: 25%;" | [[José David Lopes]]
! style="font-weight: normal; text-align: left; width: 25%;" | [[Helena Moniz]]
! style="font-weight: normal; text-align: left; width: 25%;" | [[Hugo Rodrigues]]
|-
| [[Rui Correia]]
| [[Isabel Mascarenhas]]
| [[Pedro Mota]]
| [[Paula Cristina Vaz|Paula Cristina Vaz]]
|-
| [[Ângela Costa]]
| [[Luís Marujo]]
| [[Joana Paulo Pardal]]
|-
| [[Luís Garcia]] (external)
| [[Ana Cristina Mendes]]
| [[José Portêlo]]
|-
| [[Wang Ling]]
| [[João Miranda]]
| [[Eugénio Ribeiro]]
|}

== Research Associates ==

{| width="100%"
|-
! width="25%" style="text-align: left; font-weight: normal;" | [[Vera Cabarrão]]
! width="25%" style="text-align: left; font-weight: normal;" | [[Cláudio Diniz]]
! width="25%" style="text-align: left; font-weight: normal;" | [[Pedro Fialho]]
! width="25%" style="text-align: left; font-weight: normal;" |
|-
| [[Sérgio Curto]]
| [[Jaime Ferreira]]
| [[Justyna Kosmala]]
|}

== Associated Researchers ==

{| width="100%"
|-
! width="25%" style="text-align: left; font-weight: normal;" | [[Jorge Baptista]]
! width="25%" style="text-align: left; font-weight: normal;" | [[Diamantino Caseiro]]
! width="25%" style="text-align: left; font-weight: normal;" | [[Ciro Martins]]
! width="25%" style="text-align: left; font-weight: normal;" |
|-
| [[Miguel Bugalho]]
| [[João Graça]]
| [[Maria do Céu Viana]]
|}

== Masters Students ==

{| width="100%"
|-
! width="25%" style="text-align: left; font-weight: normal;" | [[Cristiana Amorim]]
! width="25%" style="text-align: left; font-weight: normal;" | [[Pedro Figueirinha]]
! width="25%" style="text-align: left; font-weight: normal;" | [[Sérgio Morais]]
! width="25%" style="text-align: left; font-weight: normal;" | [[Tiago Travanca]]
|-
| [[Viviana Cabrita]]
| [[Alfredo Gomes]]
| [[Miguel Neto]]
| [[Pedro Valério]]
|-
| [[João Camejo]]
| [[Tiago Gonçalves]]
| [[Anna Pompili]] *
| [[Alexandre Vicente]]
|-
| [[Filipe Carapinha]]
| [[Vahid Keshavarz Hedayati]] *
| [[Daniel Rosa]]
| [[Zoran Vitez]]
|-
| [[Tagore Dinis]]
| [[Jorge Jorge]] *
| [[Ricardo Silva]] *
|-
| [[Vanessa Feliciano]]
| [[Miguel Matos]] *
| [[João Silvestre Marques]]
|}

[*] Scholarship

== Trainees ==

{| width="100%"
|-
! width="25%" style="text-align: left; font-weight: normal;" | [[Yegor Afanasyev]]
! width="25%" style="text-align: left; font-weight: normal;" | [[Guilherme Ferreira]]
! width="25%" style="text-align: left; font-weight: normal;" | [[Filipe Pires da Silva]]
! width="25%" style="text-align: left; font-weight: normal;" |
|-
|
|}

== Administrative Support ==

{| width="100%"
|-
! style="font-weight: normal; text-align: left; width: 33%;" | [[Teresa Mimoso]]
! style="font-weight: normal; text-align: left; width: 33%;" |
|
|-
|
|
|}

== Former L²F Members ==

*List of [[Former L²F Members]]

File:Logo-dirha.jpg

2012-06-19T11:58:39Z

Alberto:

File:Logo-dirha.png

2012-06-19T11:55:46Z

Alberto: uploaded a new version of "File:Logo-dirha.png"

File:Logo-dirha.png

2012-06-19T11:53:45Z

Alberto: uploaded a new version of "File:Logo-dirha.png"

File:Logo-dirha.png

2012-06-19T11:51:13Z

Alberto: