Summarizing Speech
Speaker: Gerald Penn, University of Toronto
Speech is arguably the most basic, most natural form of communication
that we engage in, so it should come as no surprise that there has
been a consistent pressure to deliver spoken audio content on web
pages that, in principle, can be searched through. Even once the
search problem is solved, however, the low-bandwidth, non-visual,
traditional delivery of spoken audio makes it much more difficult to
browse through. This makes the automated summarization of
speech particularly attractive: given a number N, prepare a summary of
a spoken "document" that contains the most important or salient
content that is N seconds long, or N utterances long, or N percent of
the original document's length.
This talk will present a (human-prepared) summary of our research on
summarizing speech. We'll talk about how speech summarization is
usually evaluated, including some of the appropriate baselines in this
area, the dependence of genre on the performance and tuning of
summarizers, the role of automated speech transcription in
summarization, and the usefulness of some of the acoustic,
untranscribed features of the speech signal.
Gerald Penn is an Associate Professor of Computer Science at the
University of Toronto. He received his Ph.D. in 2000 from the School
of Computer Science at Carnegie Mellon University. From 1999 to
2001, he was a Member of Technical Staff in the Multimedia Communications
Research Laboratory at Bell Labs in the United States. His other
research interests include mathematical linguistics, parsing in freer
word-order languages, spoken language processing and programming
languages.