Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Researchers Teach Medical Search Engines to Learn Slang

18.11.2010
Medical websites like WebMD provide consumers with more access than ever before to comprehensive health and medical information, but the sites’ utility becomes limited if users use unclear or unorthodox language to describe conditions in a site search.

However, a group of Georgia Tech researchers have created a machine-learning model that enables the sites to “learn” dialect and other medical vernacular, thereby improving their performance for users who use such language themselves.

Called “diaTM” (short for “dialect topic modeling”), the system learns by comparing multiple medical documents written in different levels of technical language. By comparing enough of these documents, diaTM eventually learns which medical conditions, symptoms and procedures are associated with certain dialectal words or phrases, thus shrinking the “language gap” between consumers with health questions and the medical databases they turn to for answers.

“The language gap problem seems to be the most acute in the medical domain,” said Hongyuan Zha, professor in the School of Computational Science & Engineering and a paper co-author. “Providing a solution for this domain will have a high impact on maintaining and improving people’s health.”

To educate diaTM in various modes of medical language, Crain and his fellow researchers pulled publicly available documents not only from WebMD but also Yahoo! Answers, PubMed Central, the Centers for Disease Control & Prevention website, and other sources. After processing enough documents, he said, diaTM can learn that the word “gunk,” for example, is often a vernacular term for “discharge,” and it can process user searches that incorporate the word “gunk” appropriately.

In this initial study using small-scale experiments, the researchers found that diaTM can achieve a 25 percent improvement in nDCG (“normalized discounted cumulative gain”), a scientific term that refers to the relevance of information retrieval in a web search. Zha, whose research focuses on Internet search engines and their related algorithms, said a 5 percent improvement in nDCG is “very significant.”

“DiaTM figures out enough language relationships that over time it does quite well,” said Steven Crain, Ph.D. student in computer science and lead author of the paper that describes diaTM. “Another benefit is we’re not doing word-for-word equivalencies, so ‘gunk’ doesn’t necessarily have to be connected to ‘discharge,’ as long as it’s recognized that ‘gunk’ is related to infections.”

Also, diaTM is not limited to medical search; it is a machine-learning technique that would work equally well in any topic-related search. In addition to approaching websites about incorporating diaTM into their search engines, Crain said one next stop is to develop the model so that it can learn dialects by looking at patterns that do not make sense from a topical perspective. For example, using a similar algorithm he was able to automatically discover dialects including text-speak dialect (e.g. “b4” as a subsititue for “before”), but the dialects were mixed in with topically-related groups of words.

“We’re trying to get to where you can isolate just the dialects,” Crain said.

“This feature will help common users of medical websites,” Zha said. “It will help enable consumers with a relatively low level of health literacy to access the critical medical information they need.”

DiaTM is described in the paper, “Dialect Topic Modeling for Improved Consumer Medical Search,” to be presented by Crain at the American Medical Informatics Association Annual Symposium, Nov. 17 in Washington, D.C. Crain’s coauthors include Hongyuan Zha, professor in the School of Computational Science & Engineering; Shuang-Hong Yang, a Ph.D. student in Computational Science and Engineering; and Yu Jiao, research scientist at Oak Ridge National Laboratory (ORNL). The research was conducted with partial funding from ORNL, Microsoft and Hewlett-Packard.

About the Georgia Tech College of Computing
The Georgia Tech College of Computing is a national leader in the creation of real-world computing breakthroughs that drive social and scientific progress. With its graduate program ranked 10th nationally by U.S. News and World Report, the College’s unconventional approach to education is defining the new face of computing by expanding the horizons of traditional computer science students through interdisciplinary collaboration and a focus on human centered solutions. For more information about the Georgia Tech College of Computing, its academic divisions and research centers, please visit http://www.cc.gatech.edu.

Michael Terrazas | Newswise Science News
Further information:
http://www.gatech.edu

More articles from Studies and Analyses:

nachricht Obstructing the ‘inner eye’
07.07.2017 | Friedrich-Schiller-Universität Jena

nachricht Drone vs. truck deliveries: Which create less carbon pollution?
31.05.2017 | University of Washington

All articles from Studies and Analyses >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Manipulating Electron Spins Without Loss of Information

Physicists have developed a new technique that uses electrical voltages to control the electron spin on a chip. The newly-developed method provides protection from spin decay, meaning that the contained information can be maintained and transmitted over comparatively large distances, as has been demonstrated by a team from the University of Basel’s Department of Physics and the Swiss Nanoscience Institute. The results have been published in Physical Review X.

For several years, researchers have been trying to use the spin of an electron to store and transmit information. The spin of each electron is always coupled...

Im Focus: The proton precisely weighted

What is the mass of a proton? Scientists from Germany and Japan successfully did an important step towards the most exact knowledge of this fundamental constant. By means of precision measurements on a single proton, they could improve the precision by a factor of three and also correct the existing value.

To determine the mass of a single proton still more accurate – a group of physicists led by Klaus Blaum and Sven Sturm of the Max Planck Institute for Nuclear...

Im Focus: On the way to a biological alternative

A bacterial enzyme enables reactions that open up alternatives to key industrial chemical processes

The research team of Prof. Dr. Oliver Einsle at the University of Freiburg's Institute of Biochemistry has long been exploring the functioning of nitrogenase....

Im Focus: The 1 trillion tonne iceberg

Larsen C Ice Shelf rift finally breaks through

A one trillion tonne iceberg - one of the biggest ever recorded -- has calved away from the Larsen C Ice Shelf in Antarctica, after a rift in the ice,...

Im Focus: Laser-cooled ions contribute to better understanding of friction

Physics supports biology: Researchers from PTB have developed a model system to investigate friction phenomena with atomic precision

Friction: what you want from car brakes, otherwise rather a nuisance. In any case, it is useful to know as precisely as possible how friction phenomena arise –...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

»We are bringing Additive Manufacturing to SMEs«

19.07.2017 | Event News

The technology with a feel for feelings

12.07.2017 | Event News

Leipzig HTP-Forum discusses "hydrothermal processes" as a key technology for a biobased economy

12.07.2017 | Event News

 
Latest News

Researchers create new technique for manipulating polarization of terahertz radiation

20.07.2017 | Information Technology

High-tech sensing illuminates concrete stress testing

20.07.2017 | Materials Sciences

First direct observation and measurement of ultra-fast moving vortices in superconductors

20.07.2017 | Physics and Astronomy

VideoLinks
B2B-VideoLinks
More VideoLinks >>>