Zahorian, a professor of electrical and computer engineering, recently received a grant of nearly half a million dollars from the Air Force Office of Scientific Research. The funds will support the two-year development of a multi-language, multi-speaker audio database that will be available for spoken-language processing research. Zahorian and his team plan to gather and annotate recordings of several hundred speakers each in English, Spanish and Mandarin Chinese.
“The challenge,” he said, “is to get speech recognition working better in real-life situations.”
That’s why the samples in the new database will come from publicly available sources such as YouTube.
Zahorian’s team will annotate each sample, creating a more detailed version of closed captioning, including time stamps and descriptions of background sounds. Once the human listener has finished with the transcription, automatic speech recognition algorithms will be used to align the recording with the captions. Next, software will be developed to verify and correct errors in the time alignment.
“Speech-recognition algorithms begin by mimicking what your ear does,” Zahorian said. “But we want the algorithms to extract just the most useful characteristics of the speech, not all of the possible data. That’s because more detail can actually hurt performance, past a certain point.”
The field of automatic speech recognition has a long history, dating back to projects at Bell Labs before the computer age. These days, much of the technology relies on algorithms that convert sounds into numbers.
In Zahorian’s research, he represents speech as a picture in a time-frequency plane. He then uses image-processing techniques to extract features of the speech, which has led him to focus more on time than on frequency.
When researchers are ready to test an algorithm, they rely on a common set of databases held by the Linguistic Data Consortium. Zahorian’s unusual image-based approach has given his team some of the best results ever reported for automatic speech recognition experiments using two of the consortium’s best-known databases.
The database Zahorian develops with the new funding will join these others, offering researchers around the world a new way to test their theories with samples of real-life speech.
Some mistakes are inevitable, given the variations in pitch, tone and pronunciation from person to person. Still, the field does have a clear standard, Zahorian said: “In order to be useful, a system should have a word-error rate of no more than 10 percent.”
Zahorian is interested in language modeling – if someone has said these three words, what’s the fourth word likely to be? – as well as conversation modeling – that is, predicting when the speakers will switch. He’s also intrigued by the potential to make advances by using established methods from other fields, including the neural networks developed by researchers working in artificial intelligence.
He sees a future in which automatic speech recognition will enable technology to extract the meaning of speech as well as the words.
“The dream,” Zahorian said, “is that someday travelers will be able to speak into a little gadget that will translate what they’ve said into another language instantly and accurately.”
For more Binghamton University research news, visit http://discovere.binghamton.edu/
Gail Glover | Newswise Science News
A novel hybrid UAV that may change the way people operate drones
28.03.2017 | Science China Press
Timing a space laser with a NASA-style stopwatch
28.03.2017 | NASA/Goddard Space Flight Center
The Institute of Semiconductor Technology and the Institute of Physical and Theoretical Chemistry, both members of the Laboratory for Emerging Nanometrology (LENA), at Technische Universität Braunschweig are partners in a new European research project entitled ChipScope, which aims to develop a completely new and extremely small optical microscope capable of observing the interior of living cells in real time. A consortium of 7 partners from 5 countries will tackle this issue with very ambitious objectives during a four-year research program.
To demonstrate the usefulness of this new scientific tool, at the end of the project the developed chip-sized microscope will be used to observe in real-time...
Astronomers from Bonn and Tautenburg in Thuringia (Germany) used the 100-m radio telescope at Effelsberg to observe several galaxy clusters. At the edges of these large accumulations of dark matter, stellar systems (galaxies), hot gas, and charged particles, they found magnetic fields that are exceptionally ordered over distances of many million light years. This makes them the most extended magnetic fields in the universe known so far.
The results will be published on March 22 in the journal „Astronomy & Astrophysics“.
Galaxy clusters are the largest gravitationally bound structures in the universe. With a typical extent of about 10 million light years, i.e. 100 times the...
Researchers at the Goethe University Frankfurt, together with partners from the University of Tübingen in Germany and Queen Mary University as well as Francis Crick Institute from London (UK) have developed a novel technology to decipher the secret ubiquitin code.
Ubiquitin is a small protein that can be linked to other cellular proteins, thereby controlling and modulating their functions. The attachment occurs in many...
In the eternal search for next generation high-efficiency solar cells and LEDs, scientists at Los Alamos National Laboratory and their partners are creating...
Silicon nanosheets are thin, two-dimensional layers with exceptional optoelectronic properties very similar to those of graphene. Albeit, the nanosheets are less stable. Now researchers at the Technical University of Munich (TUM) have, for the first time ever, produced a composite material combining silicon nanosheets and a polymer that is both UV-resistant and easy to process. This brings the scientists a significant step closer to industrial applications like flexible displays and photosensors.
Silicon nanosheets are thin, two-dimensional layers with exceptional optoelectronic properties very similar to those of graphene. Albeit, the nanosheets are...
20.03.2017 | Event News
14.03.2017 | Event News
07.03.2017 | Event News
28.03.2017 | Life Sciences
28.03.2017 | Information Technology
28.03.2017 | Physics and Astronomy