Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Researchers Teach Medical Search Engines to Learn Slang

18.11.2010
Medical websites like WebMD provide consumers with more access than ever before to comprehensive health and medical information, but the sites’ utility becomes limited if users use unclear or unorthodox language to describe conditions in a site search.

However, a group of Georgia Tech researchers have created a machine-learning model that enables the sites to “learn” dialect and other medical vernacular, thereby improving their performance for users who use such language themselves.

Called “diaTM” (short for “dialect topic modeling”), the system learns by comparing multiple medical documents written in different levels of technical language. By comparing enough of these documents, diaTM eventually learns which medical conditions, symptoms and procedures are associated with certain dialectal words or phrases, thus shrinking the “language gap” between consumers with health questions and the medical databases they turn to for answers.

“The language gap problem seems to be the most acute in the medical domain,” said Hongyuan Zha, professor in the School of Computational Science & Engineering and a paper co-author. “Providing a solution for this domain will have a high impact on maintaining and improving people’s health.”

To educate diaTM in various modes of medical language, Crain and his fellow researchers pulled publicly available documents not only from WebMD but also Yahoo! Answers, PubMed Central, the Centers for Disease Control & Prevention website, and other sources. After processing enough documents, he said, diaTM can learn that the word “gunk,” for example, is often a vernacular term for “discharge,” and it can process user searches that incorporate the word “gunk” appropriately.

In this initial study using small-scale experiments, the researchers found that diaTM can achieve a 25 percent improvement in nDCG (“normalized discounted cumulative gain”), a scientific term that refers to the relevance of information retrieval in a web search. Zha, whose research focuses on Internet search engines and their related algorithms, said a 5 percent improvement in nDCG is “very significant.”

“DiaTM figures out enough language relationships that over time it does quite well,” said Steven Crain, Ph.D. student in computer science and lead author of the paper that describes diaTM. “Another benefit is we’re not doing word-for-word equivalencies, so ‘gunk’ doesn’t necessarily have to be connected to ‘discharge,’ as long as it’s recognized that ‘gunk’ is related to infections.”

Also, diaTM is not limited to medical search; it is a machine-learning technique that would work equally well in any topic-related search. In addition to approaching websites about incorporating diaTM into their search engines, Crain said one next stop is to develop the model so that it can learn dialects by looking at patterns that do not make sense from a topical perspective. For example, using a similar algorithm he was able to automatically discover dialects including text-speak dialect (e.g. “b4” as a subsititue for “before”), but the dialects were mixed in with topically-related groups of words.

“We’re trying to get to where you can isolate just the dialects,” Crain said.

“This feature will help common users of medical websites,” Zha said. “It will help enable consumers with a relatively low level of health literacy to access the critical medical information they need.”

DiaTM is described in the paper, “Dialect Topic Modeling for Improved Consumer Medical Search,” to be presented by Crain at the American Medical Informatics Association Annual Symposium, Nov. 17 in Washington, D.C. Crain’s coauthors include Hongyuan Zha, professor in the School of Computational Science & Engineering; Shuang-Hong Yang, a Ph.D. student in Computational Science and Engineering; and Yu Jiao, research scientist at Oak Ridge National Laboratory (ORNL). The research was conducted with partial funding from ORNL, Microsoft and Hewlett-Packard.

About the Georgia Tech College of Computing
The Georgia Tech College of Computing is a national leader in the creation of real-world computing breakthroughs that drive social and scientific progress. With its graduate program ranked 10th nationally by U.S. News and World Report, the College’s unconventional approach to education is defining the new face of computing by expanding the horizons of traditional computer science students through interdisciplinary collaboration and a focus on human centered solutions. For more information about the Georgia Tech College of Computing, its academic divisions and research centers, please visit http://www.cc.gatech.edu.

Michael Terrazas | Newswise Science News
Further information:
http://www.gatech.edu

More articles from Studies and Analyses:

nachricht Antarctic Ice Sheet mass loss has increased
14.06.2018 | Technische Universität Dresden

nachricht WAKE-UP provides new treatment option for stroke patients | International study led by UKE
17.05.2018 | Universitätsklinikum Hamburg-Eppendorf

All articles from Studies and Analyses >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Overdosing on Calcium

Nano crystals impact stem cell fate during bone formation

Scientists from the University of Freiburg and the University of Basel identified a master regulator for bone regeneration. Prasad Shastri, Professor of...

Im Focus: AchemAsia 2019 will take place in Shanghai

Moving into its fourth decade, AchemAsia is setting out for new horizons: The International Expo and Innovation Forum for Sustainable Chemical Production will take place from 21-23 May 2019 in Shanghai, China. With an updated event profile, the eleventh edition focusses on topics that are especially relevant for the Chinese process industry, putting a strong emphasis on sustainability and innovation.

Founded in 1989 as a spin-off of ACHEMA to cater to the needs of China’s then developing industry, AchemAsia has since grown into a platform where the latest...

Im Focus: First real-time test of Li-Fi utilization for the industrial Internet of Things

The BMBF-funded OWICELLS project was successfully completed with a final presentation at the BMW plant in Munich. The presentation demonstrated a Li-Fi communication with a mobile robot, while the robot carried out usual production processes (welding, moving and testing parts) in a 5x5m² production cell. The robust, optical wireless transmission is based on spatial diversity; in other words, data is sent and received simultaneously by several LEDs and several photodiodes. The system can transmit data at more than 100 Mbit/s and five milliseconds latency.

Modern production technologies in the automobile industry must become more flexible in order to fulfil individual customer requirements.

Im Focus: Sharp images with flexible fibers

An international team of scientists has discovered a new way to transfer image information through multimodal fibers with almost no distortion - even if the fiber is bent. The results of the study, to which scientist from the Leibniz-Institute of Photonic Technology Jena (Leibniz IPHT) contributed, were published on 6thJune in the highly-cited journal Physical Review Letters.

Endoscopes allow doctors to see into a patient’s body like through a keyhole. Typically, the images are transmitted via a bundle of several hundreds of optical...

Im Focus: Photoexcited graphene puzzle solved

A boost for graphene-based light detectors

Light detection and control lies at the heart of many modern device applications, such as smartphone cameras. Using graphene as a light-sensitive material for...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

VideoLinks
Industry & Economy
Event News

Munich conference on asteroid detection, tracking and defense

13.06.2018 | Event News

2nd International Baltic Earth Conference in Denmark: “The Baltic Sea region in Transition”

08.06.2018 | Event News

ISEKI_Food 2018: Conference with Holistic View of Food Production

05.06.2018 | Event News

 
Latest News

Carbon nanotube optics provide optical-based quantum cryptography and quantum computing

19.06.2018 | Physics and Astronomy

How to track and trace a protein: Nanosensors monitor intracellular deliveries

19.06.2018 | Life Sciences

New material for splitting water

19.06.2018 | Physics and Astronomy

VideoLinks
Science & Research
Overview of more VideoLinks >>>