Compression helps a computer tell Dante from Machiavelli
New computer programme could settle literary debates.
To date, unlike us, computers have struggled to differentiate a page of Jane Austen from one by Jackie Collins. Now researchers in Italy have developed a program that can spot enough subtle differences between two authors’ works to attribute authorship1.
The program can tell a text by Machiavelli from one by Pirandello, Dante or a host of other great Italian writers. It constructed a language tree of the degree of affinity between 50 different tongues. The tree identifies all the main linguistic groups, such as Romance, Celtic, Slavic and so forth and highlights Maltese (an Afro-Asiatic language) and Basque as anomalies.
Clash of symbols
So Dario Benedetto and colleagues at the Universita ’La Sapienza’ in Rome try a different approach. They start from the premise that written language is in the end no more than a string of symbols. It might look rather random, but it is not.
Some groups of characters recur commonly (such as ’the’ in English), and particular authors favour certain constructions and turns of phrase. These can be measured, rather than being reliant on subjective impressions or anecdotal comparisons.
The team begin from the classic insight of telecommunications engineer Claude Shannon in the 1940s that the information content of a message is related to its entropy. Roughly speaking, entropy is a measure of how much redundancy a message contains. It can be defined as the smallest program that will produce the original message as the output.
For a random string of characters, this program would simply specify every character - it would be the same size as the original message. For a string of just A’s, the program could be very concise: ’repeat A’. Most real messages lie somewhere in-between: they can usually be compressed a little without losing significant information. This is the basis of data-compression computer algorithms, used to make ’zip’ files, for instance.
Benedetto and his colleagues borrow the principles of data-compression algorithms to calculate a kind of relative entropy for two different character strings: a measure of how much they differ. This distance between two texts is smaller for two works by the same author than for two works by different authors.
PHILIP BALL | © Nature News Service
Marine Skin dives deeper for better monitoring
23.04.2019 | King Abdullah University of Science & Technology (KAUST)
CubeSats prove their worth for scientific missions
17.04.2019 | American Physical Society
Researchers led by Francesca Ferlaino from the University of Innsbruck and the Austrian Academy of Sciences report in Physical Review X on the observation of supersolid behavior in dipolar quantum gases of erbium and dysprosium. In the dysprosium gas these properties are unprecedentedly long-lived. This sets the stage for future investigations into the nature of this exotic phase of matter.
Supersolidity is a paradoxical state where the matter is both crystallized and superfluid. Predicted 50 years ago, such a counter-intuitive phase, featuring...
A stellar flare 10 times more powerful than anything seen on our sun has burst from an ultracool star almost the same size as Jupiter
A localization phenomenon boosts the accuracy of solving quantum many-body problems with quantum computers which are otherwise challenging for conventional computers. This brings such digital quantum simulation within reach on quantum devices available today.
Quantum computers promise to solve certain computational problems exponentially faster than any classical machine. “A particularly promising application is the...
The technology could revolutionize how information travels through data centers and artificial intelligence networks
Engineers at the University of California, Berkeley have built a new photonic switch that can control the direction of light passing through optical fibers...
Physicists observe how electron-hole pairs drift apart at ultrafast speed, but still remain strongly bound.
Modern electronics relies on ultrafast charge motion on ever shorter length scales. Physicists from Regensburg and Gothenburg have now succeeded in resolving a...
17.04.2019 | Event News
15.04.2019 | Event News
09.04.2019 | Event News
23.04.2019 | Information Technology
23.04.2019 | Earth Sciences
23.04.2019 | Life Sciences