Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Researchers Teach Medical Search Engines to Learn Slang

18.11.2010
Medical websites like WebMD provide consumers with more access than ever before to comprehensive health and medical information, but the sites’ utility becomes limited if users use unclear or unorthodox language to describe conditions in a site search.

However, a group of Georgia Tech researchers have created a machine-learning model that enables the sites to “learn” dialect and other medical vernacular, thereby improving their performance for users who use such language themselves.

Called “diaTM” (short for “dialect topic modeling”), the system learns by comparing multiple medical documents written in different levels of technical language. By comparing enough of these documents, diaTM eventually learns which medical conditions, symptoms and procedures are associated with certain dialectal words or phrases, thus shrinking the “language gap” between consumers with health questions and the medical databases they turn to for answers.

“The language gap problem seems to be the most acute in the medical domain,” said Hongyuan Zha, professor in the School of Computational Science & Engineering and a paper co-author. “Providing a solution for this domain will have a high impact on maintaining and improving people’s health.”

To educate diaTM in various modes of medical language, Crain and his fellow researchers pulled publicly available documents not only from WebMD but also Yahoo! Answers, PubMed Central, the Centers for Disease Control & Prevention website, and other sources. After processing enough documents, he said, diaTM can learn that the word “gunk,” for example, is often a vernacular term for “discharge,” and it can process user searches that incorporate the word “gunk” appropriately.

In this initial study using small-scale experiments, the researchers found that diaTM can achieve a 25 percent improvement in nDCG (“normalized discounted cumulative gain”), a scientific term that refers to the relevance of information retrieval in a web search. Zha, whose research focuses on Internet search engines and their related algorithms, said a 5 percent improvement in nDCG is “very significant.”

“DiaTM figures out enough language relationships that over time it does quite well,” said Steven Crain, Ph.D. student in computer science and lead author of the paper that describes diaTM. “Another benefit is we’re not doing word-for-word equivalencies, so ‘gunk’ doesn’t necessarily have to be connected to ‘discharge,’ as long as it’s recognized that ‘gunk’ is related to infections.”

Also, diaTM is not limited to medical search; it is a machine-learning technique that would work equally well in any topic-related search. In addition to approaching websites about incorporating diaTM into their search engines, Crain said one next stop is to develop the model so that it can learn dialects by looking at patterns that do not make sense from a topical perspective. For example, using a similar algorithm he was able to automatically discover dialects including text-speak dialect (e.g. “b4” as a subsititue for “before”), but the dialects were mixed in with topically-related groups of words.

“We’re trying to get to where you can isolate just the dialects,” Crain said.

“This feature will help common users of medical websites,” Zha said. “It will help enable consumers with a relatively low level of health literacy to access the critical medical information they need.”

DiaTM is described in the paper, “Dialect Topic Modeling for Improved Consumer Medical Search,” to be presented by Crain at the American Medical Informatics Association Annual Symposium, Nov. 17 in Washington, D.C. Crain’s coauthors include Hongyuan Zha, professor in the School of Computational Science & Engineering; Shuang-Hong Yang, a Ph.D. student in Computational Science and Engineering; and Yu Jiao, research scientist at Oak Ridge National Laboratory (ORNL). The research was conducted with partial funding from ORNL, Microsoft and Hewlett-Packard.

About the Georgia Tech College of Computing
The Georgia Tech College of Computing is a national leader in the creation of real-world computing breakthroughs that drive social and scientific progress. With its graduate program ranked 10th nationally by U.S. News and World Report, the College’s unconventional approach to education is defining the new face of computing by expanding the horizons of traditional computer science students through interdisciplinary collaboration and a focus on human centered solutions. For more information about the Georgia Tech College of Computing, its academic divisions and research centers, please visit http://www.cc.gatech.edu.

Michael Terrazas | Newswise Science News
Further information:
http://www.gatech.edu

More articles from Studies and Analyses:

nachricht The personality factor: How to foster the sharing of research data
06.09.2017 | ZBW – Leibniz-Informationszentrum Wirtschaft

nachricht Europe’s Demographic Future. Where the Regions Are Heading after a Decade of Crises
10.08.2017 | Berlin-Institut für Bevölkerung und Entwicklung

All articles from Studies and Analyses >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: The pyrenoid is a carbon-fixing liquid droplet

Plants and algae use the enzyme Rubisco to fix carbon dioxide, removing it from the atmosphere and converting it into biomass. Algae have figured out a way to increase the efficiency of carbon fixation. They gather most of their Rubisco into a ball-shaped microcompartment called the pyrenoid, which they flood with a high local concentration of carbon dioxide. A team of scientists at Princeton University, the Carnegie Institution for Science, Stanford University and the Max Plank Institute of Biochemistry have unravelled the mysteries of how the pyrenoid is assembled. These insights can help to engineer crops that remove more carbon dioxide from the atmosphere while producing more food.

A warming planet

Im Focus: Highly precise wiring in the Cerebral Cortex

Our brains house extremely complex neuronal circuits, whose detailed structures are still largely unknown. This is especially true for the so-called cerebral cortex of mammals, where among other things vision, thoughts or spatial orientation are being computed. Here the rules by which nerve cells are connected to each other are only partly understood. A team of scientists around Moritz Helmstaedter at the Frankfiurt Max Planck Institute for Brain Research and Helene Schmidt (Humboldt University in Berlin) have now discovered a surprisingly precise nerve cell connectivity pattern in the part of the cerebral cortex that is responsible for orienting the individual animal or human in space.

The researchers report online in Nature (Schmidt et al., 2017. Axonal synapse sorting in medial entorhinal cortex, DOI: 10.1038/nature24005) that synapses in...

Im Focus: Tiny lasers from a gallery of whispers

New technique promises tunable laser devices

Whispering gallery mode (WGM) resonators are used to make tiny micro-lasers, sensors, switches, routers and other devices. These tiny structures rely on a...

Im Focus: Ultrafast snapshots of relaxing electrons in solids

Using ultrafast flashes of laser and x-ray radiation, scientists at the Max Planck Institute of Quantum Optics (Garching, Germany) took snapshots of the briefest electron motion inside a solid material to date. The electron motion lasted only 750 billionths of the billionth of a second before it fainted, setting a new record of human capability to capture ultrafast processes inside solids!

When x-rays shine onto solid materials or large molecules, an electron is pushed away from its original place near the nucleus of the atom, leaving a hole...

Im Focus: Quantum Sensors Decipher Magnetic Ordering in a New Semiconducting Material

For the first time, physicists have successfully imaged spiral magnetic ordering in a multiferroic material. These materials are considered highly promising candidates for future data storage media. The researchers were able to prove their findings using unique quantum sensors that were developed at Basel University and that can analyze electromagnetic fields on the nanometer scale. The results – obtained by scientists from the University of Basel’s Department of Physics, the Swiss Nanoscience Institute, the University of Montpellier and several laboratories from University Paris-Saclay – were recently published in the journal Nature.

Multiferroics are materials that simultaneously react to electric and magnetic fields. These two properties are rarely found together, and their combined...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

Event News

“Lasers in Composites Symposium” in Aachen – from Science to Application

19.09.2017 | Event News

I-ESA 2018 – Call for Papers

12.09.2017 | Event News

EMBO at Basel Life, a new conference on current and emerging life science research

06.09.2017 | Event News

 
Latest News

Rainbow colors reveal cell history: Uncovering β-cell heterogeneity

22.09.2017 | Life Sciences

Penn first in world to treat patient with new radiation technology

22.09.2017 | Medical Engineering

Calculating quietness

22.09.2017 | Physics and Astronomy

VideoLinks
B2B-VideoLinks
More VideoLinks >>>