However, a group of Georgia Tech researchers have created a machine-learning model that enables the sites to “learn” dialect and other medical vernacular, thereby improving their performance for users who use such language themselves.
Called “diaTM” (short for “dialect topic modeling”), the system learns by comparing multiple medical documents written in different levels of technical language. By comparing enough of these documents, diaTM eventually learns which medical conditions, symptoms and procedures are associated with certain dialectal words or phrases, thus shrinking the “language gap” between consumers with health questions and the medical databases they turn to for answers.
“The language gap problem seems to be the most acute in the medical domain,” said Hongyuan Zha, professor in the School of Computational Science & Engineering and a paper co-author. “Providing a solution for this domain will have a high impact on maintaining and improving people’s health.”
To educate diaTM in various modes of medical language, Crain and his fellow researchers pulled publicly available documents not only from WebMD but also Yahoo! Answers, PubMed Central, the Centers for Disease Control & Prevention website, and other sources. After processing enough documents, he said, diaTM can learn that the word “gunk,” for example, is often a vernacular term for “discharge,” and it can process user searches that incorporate the word “gunk” appropriately.
In this initial study using small-scale experiments, the researchers found that diaTM can achieve a 25 percent improvement in nDCG (“normalized discounted cumulative gain”), a scientific term that refers to the relevance of information retrieval in a web search. Zha, whose research focuses on Internet search engines and their related algorithms, said a 5 percent improvement in nDCG is “very significant.”
“DiaTM figures out enough language relationships that over time it does quite well,” said Steven Crain, Ph.D. student in computer science and lead author of the paper that describes diaTM. “Another benefit is we’re not doing word-for-word equivalencies, so ‘gunk’ doesn’t necessarily have to be connected to ‘discharge,’ as long as it’s recognized that ‘gunk’ is related to infections.”
Also, diaTM is not limited to medical search; it is a machine-learning technique that would work equally well in any topic-related search. In addition to approaching websites about incorporating diaTM into their search engines, Crain said one next stop is to develop the model so that it can learn dialects by looking at patterns that do not make sense from a topical perspective. For example, using a similar algorithm he was able to automatically discover dialects including text-speak dialect (e.g. “b4” as a subsititue for “before”), but the dialects were mixed in with topically-related groups of words.
“We’re trying to get to where you can isolate just the dialects,” Crain said.
“This feature will help common users of medical websites,” Zha said. “It will help enable consumers with a relatively low level of health literacy to access the critical medical information they need.”
DiaTM is described in the paper, “Dialect Topic Modeling for Improved Consumer Medical Search,” to be presented by Crain at the American Medical Informatics Association Annual Symposium, Nov. 17 in Washington, D.C. Crain’s coauthors include Hongyuan Zha, professor in the School of Computational Science & Engineering; Shuang-Hong Yang, a Ph.D. student in Computational Science and Engineering; and Yu Jiao, research scientist at Oak Ridge National Laboratory (ORNL). The research was conducted with partial funding from ORNL, Microsoft and Hewlett-Packard.About the Georgia Tech College of Computing
Michael Terrazas | Newswise Science News
Drought hits rivers first and more strongly than agriculture
06.09.2018 | Max-Planck-Institut für Biogeochemie
Landslides triggered by human activity on the rise
23.08.2018 | European Geosciences Union
The building blocks of matter in our universe were formed in the first 10 microseconds of its existence, according to the currently accepted scientific picture. After the Big Bang about 13.7 billion years ago, matter consisted mainly of quarks and gluons, two types of elementary particles whose interactions are governed by quantum chromodynamics (QCD), the theory of strong interaction. In the early universe, these particles moved (nearly) freely in a quark-gluon plasma.
This is a joint press release of University Muenster and Heidelberg as well as the GSI Helmholtzzentrum für Schwerionenforschung in Darmstadt.
Then, in a phase transition, they combined and formed hadrons, among them the building blocks of atomic nuclei, protons and neutrons. In the current issue of...
Thin-film solar cells made of crystalline silicon are inexpensive and achieve efficiencies of a good 14 percent. However, they could do even better if their shiny surfaces reflected less light. A team led by Prof. Christiane Becker from the Helmholtz-Zentrum Berlin (HZB) has now patented a sophisticated new solution to this problem.
"It is not enough simply to bring more light into the cell," says Christiane Becker. Such surface structures can even ultimately reduce the efficiency by...
A study in the journal Bulletin of Marine Science describes a new, blood-red species of octocoral found in Panama. The species in the genus Thesea was discovered in the threatened low-light reef environment on Hannibal Bank, 60 kilometers off mainland Pacific Panama, by researchers at the Smithsonian Tropical Research Institute in Panama (STRI) and the Centro de Investigación en Ciencias del Mar y Limnología (CIMAR) at the University of Costa Rica.
Scientists established the new species, Thesea dalioi, by comparing its physical traits, such as branch thickness and the bright red colony color, with the...
Scientists have succeeded in observing the first long-distance transfer of information in a magnetic group of materials known as antiferromagnets.
An international team of researchers has mapped Nemo's genome, providing the research community with an invaluable resource to decode the response of fish to...
21.09.2018 | Event News
03.09.2018 | Event News
27.08.2018 | Event News
21.09.2018 | Physics and Astronomy
21.09.2018 | Life Sciences
21.09.2018 | Event News