<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.hlt.inesc-id.pt/wiki/index.php?action=history&amp;feed=atom&amp;title=Audio_Pre-Processing</id>
	<title>Audio Pre-Processing - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.hlt.inesc-id.pt/wiki/index.php?action=history&amp;feed=atom&amp;title=Audio_Pre-Processing"/>
	<link rel="alternate" type="text/html" href="https://www.hlt.inesc-id.pt/wiki/index.php?title=Audio_Pre-Processing&amp;action=history"/>
	<updated>2026-05-08T19:12:15Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.0</generator>
	<entry>
		<id>https://www.hlt.inesc-id.pt/wiki/index.php?title=Audio_Pre-Processing&amp;diff=2822&amp;oldid=prev</id>
		<title>Imt at 18:25, 3 July 2006</title>
		<link rel="alternate" type="text/html" href="https://www.hlt.inesc-id.pt/wiki/index.php?title=Audio_Pre-Processing&amp;diff=2822&amp;oldid=prev"/>
		<updated>2006-07-03T18:25:14Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Broadcast News media monitoring is an important technology but poses a&lt;br /&gt;
number of difficulties and challenges for speech processing both in&lt;br /&gt;
terms of computational complexity and transcription accuracy. In this&lt;br /&gt;
kind of application the speech signal not only has to be transcribed&lt;br /&gt;
but also characterized in terms of acoustic content. Most present day&lt;br /&gt;
transcription systems perform some kind of audio characterization&lt;br /&gt;
(segmentation and labelling) as a first step in the processing&lt;br /&gt;
chain.&lt;br /&gt;
&lt;br /&gt;
Audio pre-processing offers some practical advantages: no waste of&lt;br /&gt;
time on the processing of non-speech intervals, no need to process&lt;br /&gt;
very long speech chunks, facilitation of gender or speaker dependent&lt;br /&gt;
acoustic model selection during recognition. On the other hand,&lt;br /&gt;
indexing errors may cause extra transcription errors, if a speaker&lt;br /&gt;
change is hypothesized in the middle of an utterance, or even worse,&lt;br /&gt;
in the middle of a word.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Image:audio_pre_processor.v1.block_diagram.png|center]]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
To accomplish this audio characterization the first stage in our media&lt;br /&gt;
monitoring system is an audio pre-processor. This pre-processor is&lt;br /&gt;
responsible for&lt;br /&gt;
&lt;br /&gt;
# Segmentation of the signal into acoustically homogeneous regions&lt;br /&gt;
# Classification of those segments according to speech and non-speech intervals, background conditions, speaker gender&lt;br /&gt;
# Identifying all segments uttered by the same speaker&lt;br /&gt;
&lt;br /&gt;
The segmentation provides information regarding speaker&lt;br /&gt;
turns and identities allowing for automatic retrieval of all&lt;br /&gt;
occurrences of a particular speaker. The segmentation can also be used&lt;br /&gt;
to improve performance through adaptation of the speech recognition&lt;br /&gt;
acoustic models. Additionally the final transcriptions enriched by the&lt;br /&gt;
pre-processing information are somewhat more human readable.&lt;/div&gt;</summary>
		<author><name>Imt</name></author>
	</entry>
</feed>