Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Computer program learns language rules and composes sentences, all without outside help

01.09.2005


Cornell University and Tel Aviv University researchers have developed a method for enabling a computer program to scan text in any of a number of languages, including English and Chinese, and autonomously and without previous information infer the underlying rules of grammar. The rules can then be used to generate new and meaningful sentences. The method also works for such data as sheet music or protein sequences.



The development -- which has a patent pending -- has implications for speech recognition and for other applications in natural language engineering, as well as for genomics and proteomics. It also offers new insights into language acquisition and psycholinguistics.

"The algorithm -- the computational method -- for language learning and processing that we have developed can take a body of text, abstract from it a collection of recurring patterns or rules and then generate new material," explained Shimon Edelman, a computer scientist who is a professor of psychology at Cornell and co-author of a new paper, "Unsupervised Learning of Natural Languages," published in the Proceedings of the National Academy of Sciences (PNAS, Vol. 102, No. 33).


"This is the first time an unsupervised algorithm is shown capable of learning complex syntax, generating grammatical new sentences and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics," he said.

Unlike previous attempts at developing computer algorithms for language learning, the new method, called Automatic Distillation of Structure (ADIOS), successfully identifies complex patterns in raw texts. The algorithm discovers the patterns by repeatedly aligning sentences and looking for overlapping parts.

For example, the sentences I would like to book a first-class flight to Chicago, I want to book a first-class flight to Boston and Book a first-class flight for me, please may give rise to the pattern book a first-class flight -- if this candidate pattern passes the novel statistical significance test that is the core of the algorithm.

If the system also encounters the sentences I need to book a direct flight from New York to Tel Aviv andI would like to book an economy flight , it may infer that the phrases first-class, direct and economy are equivalent in the context of the new pattern. "Because such equivalence sets can contain other patterns -- in turn containing further patterns, and so on -- the resulting body of knowledge grows recursively, as a sort of forest of branching trees of possibilities," said Edelman.

He added, "ADIOS relies on a statistical method for pattern extraction and on structured generalization -- two processes that have been implicated in language acquisition. Our experiments show that it can acquire intricate structures from raw data, including transcripts of parents’ speech directed at 2- or 3-year-olds. This may eventually help researchers understand how children, who learn language in a similar item-by-item fashion and with very little supervision, eventually master the full complexities of their native tongue."

In addition to child-directed language, the algorithm has been tested on the full text of the Bible in several languages, on artificial context-free languages with thousands of rules and on musical notation. It also has been applied to biological data, such as nucleotide base pairs and amino acid sequences. In analyzing proteins, for example, the algorithm was able to extract from amino acid sequences patterns that were highly correlated with the functional properties of the proteins.

The new method was developed jointly with David Horn and Eytan Ruppin, professors of physics and computer science, respectively, at Tel Aviv University, and with Zach Solan, a doctoral student there and the lead author on the paper. Their collaboration with Edelman was supported in part by the U.S.-Israel Binational Science Foundation.

| EurekAlert!
Further information:
http://www.cornell.edu

More articles from Information Technology:

nachricht Robots as Tools and Partners in Rehabilitation
17.08.2018 | Albert-Ludwigs-Universität Freiburg im Breisgau

nachricht Low bandwidth? Use more colors at once
17.08.2018 | Purdue University

All articles from Information Technology >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Color effects from transparent 3D-printed nanostructures

New design tool automatically creates nanostructure 3D-print templates for user-given colors
Scientists present work at prestigious SIGGRAPH conference

Most of the objects we see are colored by pigments, but using pigments has disadvantages: such colors can fade, industrial pigments are often toxic, and...

Im Focus: Unraveling the nature of 'whistlers' from space in the lab

A new study sheds light on how ultralow frequency radio waves and plasmas interact

Scientists at the University of California, Los Angeles present new research on a curious cosmic phenomenon known as "whistlers" -- very low frequency packets...

Im Focus: New interactive machine learning tool makes car designs more aerodynamic

Scientists develop first tool to use machine learning methods to compute flow around interactively designable 3D objects. Tool will be presented at this year’s prestigious SIGGRAPH conference.

When engineers or designers want to test the aerodynamic properties of the newly designed shape of a car, airplane, or other object, they would normally model...

Im Focus: Robots as 'pump attendants': TU Graz develops robot-controlled rapid charging system for e-vehicles

Researchers from TU Graz and their industry partners have unveiled a world first: the prototype of a robot-controlled, high-speed combined charging system (CCS) for electric vehicles that enables series charging of cars in various parking positions.

Global demand for electric vehicles is forecast to rise sharply: by 2025, the number of new vehicle registrations is expected to reach 25 million per year....

Im Focus: The “TRiC” to folding actin

Proteins must be folded correctly to fulfill their molecular functions in cells. Molecular assistants called chaperones help proteins exploit their inbuilt folding potential and reach the correct three-dimensional structure. Researchers at the Max Planck Institute of Biochemistry (MPIB) have demonstrated that actin, the most abundant protein in higher developed cells, does not have the inbuilt potential to fold and instead requires special assistance to fold into its active state. The chaperone TRiC uses a previously undescribed mechanism to perform actin folding. The study was recently published in the journal Cell.

Actin is the most abundant protein in highly developed cells and has diverse functions in processes like cell stabilization, cell division and muscle...

All Focus news of the innovation-report >>>

Anzeige

Anzeige

VideoLinks
Industry & Economy
Event News

LaserForum 2018 deals with 3D production of components

17.08.2018 | Event News

Within reach of the Universe

08.08.2018 | Event News

A journey through the history of microscopy – new exhibition opens at the MDC

27.07.2018 | Event News

 
Latest News

Smallest transistor worldwide switches current with a single atom in solid electrolyte

17.08.2018 | Physics and Astronomy

Robots as Tools and Partners in Rehabilitation

17.08.2018 | Information Technology

Climate Impact Research in Hannover: Small Plants against Large Waves

17.08.2018 | Life Sciences

VideoLinks
Science & Research
Overview of more VideoLinks >>>