Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Researchers Teach Medical Search Engines to Learn Slang

18.11.2010
Medical websites like WebMD provide consumers with more access than ever before to comprehensive health and medical information, but the sites’ utility becomes limited if users use unclear or unorthodox language to describe conditions in a site search.

However, a group of Georgia Tech researchers have created a machine-learning model that enables the sites to “learn” dialect and other medical vernacular, thereby improving their performance for users who use such language themselves.

Called “diaTM” (short for “dialect topic modeling”), the system learns by comparing multiple medical documents written in different levels of technical language. By comparing enough of these documents, diaTM eventually learns which medical conditions, symptoms and procedures are associated with certain dialectal words or phrases, thus shrinking the “language gap” between consumers with health questions and the medical databases they turn to for answers.

“The language gap problem seems to be the most acute in the medical domain,” said Hongyuan Zha, professor in the School of Computational Science & Engineering and a paper co-author. “Providing a solution for this domain will have a high impact on maintaining and improving people’s health.”

To educate diaTM in various modes of medical language, Crain and his fellow researchers pulled publicly available documents not only from WebMD but also Yahoo! Answers, PubMed Central, the Centers for Disease Control & Prevention website, and other sources. After processing enough documents, he said, diaTM can learn that the word “gunk,” for example, is often a vernacular term for “discharge,” and it can process user searches that incorporate the word “gunk” appropriately.

In this initial study using small-scale experiments, the researchers found that diaTM can achieve a 25 percent improvement in nDCG (“normalized discounted cumulative gain”), a scientific term that refers to the relevance of information retrieval in a web search. Zha, whose research focuses on Internet search engines and their related algorithms, said a 5 percent improvement in nDCG is “very significant.”

“DiaTM figures out enough language relationships that over time it does quite well,” said Steven Crain, Ph.D. student in computer science and lead author of the paper that describes diaTM. “Another benefit is we’re not doing word-for-word equivalencies, so ‘gunk’ doesn’t necessarily have to be connected to ‘discharge,’ as long as it’s recognized that ‘gunk’ is related to infections.”

Also, diaTM is not limited to medical search; it is a machine-learning technique that would work equally well in any topic-related search. In addition to approaching websites about incorporating diaTM into their search engines, Crain said one next stop is to develop the model so that it can learn dialects by looking at patterns that do not make sense from a topical perspective. For example, using a similar algorithm he was able to automatically discover dialects including text-speak dialect (e.g. “b4” as a subsititue for “before”), but the dialects were mixed in with topically-related groups of words.

“We’re trying to get to where you can isolate just the dialects,” Crain said.

“This feature will help common users of medical websites,” Zha said. “It will help enable consumers with a relatively low level of health literacy to access the critical medical information they need.”

DiaTM is described in the paper, “Dialect Topic Modeling for Improved Consumer Medical Search,” to be presented by Crain at the American Medical Informatics Association Annual Symposium, Nov. 17 in Washington, D.C. Crain’s coauthors include Hongyuan Zha, professor in the School of Computational Science & Engineering; Shuang-Hong Yang, a Ph.D. student in Computational Science and Engineering; and Yu Jiao, research scientist at Oak Ridge National Laboratory (ORNL). The research was conducted with partial funding from ORNL, Microsoft and Hewlett-Packard.

About the Georgia Tech College of Computing
The Georgia Tech College of Computing is a national leader in the creation of real-world computing breakthroughs that drive social and scientific progress. With its graduate program ranked 10th nationally by U.S. News and World Report, the College’s unconventional approach to education is defining the new face of computing by expanding the horizons of traditional computer science students through interdisciplinary collaboration and a focus on human centered solutions. For more information about the Georgia Tech College of Computing, its academic divisions and research centers, please visit http://www.cc.gatech.edu.

Michael Terrazas | Newswise Science News
Further information:
http://www.gatech.edu

More articles from Studies and Analyses:

nachricht Drought hits rivers first and more strongly than agriculture
06.09.2018 | Max-Planck-Institut für Biogeochemie

nachricht Landslides triggered by human activity on the rise
23.08.2018 | European Geosciences Union

All articles from Studies and Analyses >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Scientists present new observations to understand the phase transition in quantum chromodynamics

The building blocks of matter in our universe were formed in the first 10 microseconds of its existence, according to the currently accepted scientific picture. After the Big Bang about 13.7 billion years ago, matter consisted mainly of quarks and gluons, two types of elementary particles whose interactions are governed by quantum chromodynamics (QCD), the theory of strong interaction. In the early universe, these particles moved (nearly) freely in a quark-gluon plasma.

This is a joint press release of University Muenster and Heidelberg as well as the GSI Helmholtzzentrum für Schwerionenforschung in Darmstadt.

Then, in a phase transition, they combined and formed hadrons, among them the building blocks of atomic nuclei, protons and neutrons. In the current issue of...

Im Focus: Patented nanostructure for solar cells: Rough optics, smooth surface

Thin-film solar cells made of crystalline silicon are inexpensive and achieve efficiencies of a good 14 percent. However, they could do even better if their shiny surfaces reflected less light. A team led by Prof. Christiane Becker from the Helmholtz-Zentrum Berlin (HZB) has now patented a sophisticated new solution to this problem.

"It is not enough simply to bring more light into the cell," says Christiane Becker. Such surface structures can even ultimately reduce the efficiency by...

Im Focus: New soft coral species discovered in Panama

A study in the journal Bulletin of Marine Science describes a new, blood-red species of octocoral found in Panama. The species in the genus Thesea was discovered in the threatened low-light reef environment on Hannibal Bank, 60 kilometers off mainland Pacific Panama, by researchers at the Smithsonian Tropical Research Institute in Panama (STRI) and the Centro de Investigación en Ciencias del Mar y Limnología (CIMAR) at the University of Costa Rica.

Scientists established the new species, Thesea dalioi, by comparing its physical traits, such as branch thickness and the bright red colony color, with the...

Im Focus: New devices based on rust could reduce excess heat in computers

Physicists explore long-distance information transmission in antiferromagnetic iron oxide

Scientists have succeeded in observing the first long-distance transfer of information in a magnetic group of materials known as antiferromagnets.

Im Focus: Finding Nemo's genes

An international team of researchers has mapped Nemo's genome

An international team of researchers has mapped Nemo's genome, providing the research community with an invaluable resource to decode the response of fish to...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

VideoLinks
Industry & Economy
Event News

"Boston calling": TU Berlin and the Weizenbaum Institute organize a conference in USA

21.09.2018 | Event News

One of the world’s most prominent strategic forums for global health held in Berlin in October 2018

03.09.2018 | Event News

4th Intelligent Materials - European Symposium on Intelligent Materials

27.08.2018 | Event News

 
Latest News

Astrophysicists measure precise rotation pattern of sun-like stars for the first time

21.09.2018 | Physics and Astronomy

Brought to light – chromobodies reveal changes in endogenous protein concentration in living cells

21.09.2018 | Life Sciences

"Boston calling": TU Berlin and the Weizenbaum Institute organize a conference in USA

21.09.2018 | Event News

VideoLinks
Science & Research
Overview of more VideoLinks >>>