Techniques from natural-language processing enable computers to efficiently search video for actions
With the commodification of digital cameras, digital video has become so easy to produce that human beings can have trouble keeping up with it. Among the tools that computer scientists are developing to make the profusion of video more useful are algorithms for activity recognition — or determining what the people on camera are doing when.
At the Conference on Computer Vision and Pattern Recognition in June, Hamed Pirsiavash, a postdoc at MIT, and his former thesis advisor, Deva Ramanan of the University of California at Irvine, will present a new activity-recognition algorithm that has several advantages over its predecessors.
One is that the algorithm's execution time scales linearly with the size of the video file it's searching. That means that if one file is 10 times the size of another, the new algorithm will take 10 times as long to search it — not 1,000 times as long, as some earlier algorithms would.
Another is that the algorithm is able to make good guesses about partially completed actions, so it can handle streaming video. Partway through an action, it will issue a probability that the action is of the type that it's looking for. It may revise that probability as the video continues, but it doesn't have to wait until the action is complete to assess it.
Finally, the amount of memory the algorithm requires is fixed, regardless of how many frames of video it's already reviewed. That means that, unlike many of its predecessors, it can handle video streams of any length (or files of any size).
The grammar of action
Enabling all of these advances is the appropriation of a type of algorithm used in natural language processing, the computer science discipline that seeks techniques for interpreting sentences written in natural language.
"One of the challenging problems they try to solve is, if you have a sentence, you want to basically parse the sentence, saying what is the subject, what is the verb, what is the adverb," Pirsiavash says. "We see an analogy here, which is, if you have a complex action — like making tea or making coffee — that has some subactions, we can basically stitch together these subactions and look at each one as something like verb, adjective, and adverb."
On that analogy, the rules defining relationships between subactions are like rules of grammar. When you make tea, for instance, it doesn't matter whether you first put the teabag in the cup or put the kettle on the stove. But it's essential that you put the kettle on the stove before pouring the water into the cup. Similarly, in a given language, it could be the case that nouns can either precede or follow verbs, but that adjectives must always precede nouns.
For any given action, Pirsiavash and Ramanan's algorithm must thus learn a new "grammar." And the mechanism that it uses is the one that many natural-language-processing systems rely on: machine learning. Pirsiavash and Ramanan feed their algorithm training examples of videos depicting a particular action, and specify the number of subactions that the algorithm should look for. But they don't give it any information about what those subactions are, or what the transitions between them look like.
The rules relating subactions are the key to the algorithm's efficiency. As a video plays, the algorithm constructs a set of hypotheses about which subactions are being depicted where, and it ranks them according to probability. It can't limit itself to a single hypothesis, as each new frame could require it to revise its probabilities. But it can eliminate hypotheses that don't conform to its grammatical rules, which dramatically limits the number of possibilities it has to canvass.
The researchers tested their algorithm on eight different types of athletic endeavor — such as weightlifting and bowling — with training videos culled from YouTube. They found that, according to metrics standard in the field of computer vision, their algorithm identified new instances of the same activities more accurately than its predecessors.
Pirsiavash is particularly interested in possible medical applications of action detection. The proper execution of physical-therapy exercises, for instance, could have a grammar that's distinct from improper execution; similarly, the return of motor function in patients with neurological damage could be identified by its unique grammar. Action-detection algorithms could also help determine whether, for instance, elderly patients remembered to take their medication — and issue alerts if they didn't.
Abby Abazorius | newswise
Optical fiber transmits one terabit per second – Novel modulation approach
16.09.2016 | Technische Universität München
Researchers prototype system for reading closed books
09.09.2016 | Massachusetts Institute of Technology
The Fraunhofer Institute for Organic Electronics, Electron Beam and Plasma Technology FEP has been developing various applications for OLED microdisplays based on organic semiconductors. By integrating the capabilities of an image sensor directly into the microdisplay, eye movements can be recorded by the smart glasses and utilized for guidance and control functions, as one example. The new design will be debuted at Augmented World Expo Europe (AWE) in Berlin at Booth B25, October 18th – 19th.
“Augmented-reality” and “wearables” have become terms we encounter almost daily. Both can make daily life a little simpler and provide valuable assistance for...
With the help of artificial intelligence, chemists from the University of Basel in Switzerland have computed the characteristics of about two million crystals made up of four chemical elements. The researchers were able to identify 90 previously unknown thermodynamically stable crystals that can be regarded as new materials. They report on their findings in the scientific journal Physical Review Letters.
Elpasolite is a glassy, transparent, shiny and soft mineral with a cubic crystal structure. First discovered in El Paso County (Colorado, USA), it can also be...
For the first time, Fraunhofer IKTS shows additively manufactured hardmetal tools at WorldPM 2016 in Hamburg. Mechanical, chemical as well as a high heat resistance and extreme hardness are required from tools that are used in mechanical and automotive engineering or in plastics and building materials industry. Researchers at the Fraunhofer Institute for Ceramic Technologies and Systems IKTS in Dresden managed the production of complex hardmetal tools via 3D printing in a quality that are in no way inferior to conventionally produced high-performance tools.
Fraunhofer IKTS counts decades of proven expertise in the development of hardmetals. To date, reliable cutting, drilling, pressing and stamping tools made of...
At AKL’16, the International Laser Technology Congress held in May this year, interest in the topic of process control was greater than expected. Appropriately, the event was also used to launch the Industry Working Group for Process Control in Laser Material Processing. The group provides a forum for representatives from industry and research to initiate pre-competitive projects and discuss issues such as standards, potential cost savings and feasibility.
In the age of industry 4.0, laser technology is firmly established within manufacturing. A wide variety of laser techniques – from USP ablation and additive...
Every three years, the plastics industry gathers at K, the international trade fair for plastics and rubber in Düsseldorf. The Fraunhofer Institute for Laser Technology ILT will also be attending again and presenting many innovative technologies, such as for joining plastics and metals using ultrashort pulse lasers. From October 19 to 26, you can find the Fraunhofer ILT at the joint Fraunhofer booth SC01 in Hall 7.
K is the world’s largest trade fair for the plastics and rubber industry. As in previous years, the organizers are expecting 3,000 exhibitors and more than...
23.09.2016 | Event News
20.09.2016 | Event News
16.09.2016 | Event News
23.09.2016 | Life Sciences
23.09.2016 | Health and Medicine
23.09.2016 | Life Sciences