Zahorian, a professor of electrical and computer engineering, recently received a grant of nearly half a million dollars from the Air Force Office of Scientific Research. The funds will support the two-year development of a multi-language, multi-speaker audio database that will be available for spoken-language processing research. Zahorian and his team plan to gather and annotate recordings of several hundred speakers each in English, Spanish and Mandarin Chinese.
“The challenge,” he said, “is to get speech recognition working better in real-life situations.”
That’s why the samples in the new database will come from publicly available sources such as YouTube.
Zahorian’s team will annotate each sample, creating a more detailed version of closed captioning, including time stamps and descriptions of background sounds. Once the human listener has finished with the transcription, automatic speech recognition algorithms will be used to align the recording with the captions. Next, software will be developed to verify and correct errors in the time alignment.
“Speech-recognition algorithms begin by mimicking what your ear does,” Zahorian said. “But we want the algorithms to extract just the most useful characteristics of the speech, not all of the possible data. That’s because more detail can actually hurt performance, past a certain point.”
The field of automatic speech recognition has a long history, dating back to projects at Bell Labs before the computer age. These days, much of the technology relies on algorithms that convert sounds into numbers.
In Zahorian’s research, he represents speech as a picture in a time-frequency plane. He then uses image-processing techniques to extract features of the speech, which has led him to focus more on time than on frequency.
When researchers are ready to test an algorithm, they rely on a common set of databases held by the Linguistic Data Consortium. Zahorian’s unusual image-based approach has given his team some of the best results ever reported for automatic speech recognition experiments using two of the consortium’s best-known databases.
The database Zahorian develops with the new funding will join these others, offering researchers around the world a new way to test their theories with samples of real-life speech.
Some mistakes are inevitable, given the variations in pitch, tone and pronunciation from person to person. Still, the field does have a clear standard, Zahorian said: “In order to be useful, a system should have a word-error rate of no more than 10 percent.”
Zahorian is interested in language modeling – if someone has said these three words, what’s the fourth word likely to be? – as well as conversation modeling – that is, predicting when the speakers will switch. He’s also intrigued by the potential to make advances by using established methods from other fields, including the neural networks developed by researchers working in artificial intelligence.
He sees a future in which automatic speech recognition will enable technology to extract the meaning of speech as well as the words.
“The dream,” Zahorian said, “is that someday travelers will be able to speak into a little gadget that will translate what they’ve said into another language instantly and accurately.”
For more Binghamton University research news, visit http://discovere.binghamton.edu/
Gail Glover | Newswise Science News
Ultra-precise chip-scale sensor detects unprecedentedly small changes at the nanoscale
18.01.2017 | The Hebrew University of Jerusalem
Data analysis optimizes cyber-physical systems in telecommunications and building automation
18.01.2017 | Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI
For the first time ever, a cloud of ultra-cold atoms has been successfully created in space on board of a sounding rocket. The MAIUS mission demonstrates that quantum optical sensors can be operated even in harsh environments like space – a prerequi-site for finding answers to the most challenging questions of fundamental physics and an important innovation driver for everyday applications.
According to Albert Einstein's Equivalence Principle, all bodies are accelerated at the same rate by the Earth's gravity, regardless of their properties. This...
An important step towards a completely new experimental access to quantum physics has been made at University of Konstanz. The team of scientists headed by...
Yersiniae cause severe intestinal infections. Studies using Yersinia pseudotuberculosis as a model organism aim to elucidate the infection mechanisms of these...
Researchers from the University of Hamburg in Germany, in collaboration with colleagues from the University of Aarhus in Denmark, have synthesized a new superconducting material by growing a few layers of an antiferromagnetic transition-metal chalcogenide on a bismuth-based topological insulator, both being non-superconducting materials.
While superconductivity and magnetism are generally believed to be mutually exclusive, surprisingly, in this new material, superconducting correlations...
Laser-driving of semimetals allows creating novel quasiparticle states within condensed matter systems and switching between different states on ultrafast time scales
Studying properties of fundamental particles in condensed matter systems is a promising approach to quantum field theory. Quasiparticles offer the opportunity...
19.01.2017 | Event News
10.01.2017 | Event News
09.01.2017 | Event News
23.01.2017 | Process Engineering
23.01.2017 | Physics and Astronomy
23.01.2017 | Life Sciences