AI Acquires Reading Comprehension: From Position to Significance
The linguistic abilities of contemporary artificial intelligence systems are remarkable. We can now participate in genuine dialogues with systems such as ChatGPT, Gemini, and others, exhibiting a fluency nearly akin to that of a human. Nevertheless, our understanding of the fundamental mechanisms within these networks that yield such extraordinary outcomes remains limited.
A recent study published in the Journal of Statistical Mechanics: Theory and Experiment (JSTAT) elucidates a component of this enigma. It indicates that when less data is utilised for training, neural networks predominantly depend on the positional context of words within a sentence. As the system encounters sufficient data, it shifts to a strategy informed by the semantics of the words. The research indicates that this transformation happens suddenly upon surpassing a key data threshold, akin to a phase transition in physical systems. The results provide significant insights into the functioning of these models.
Similar to a toddler acquiring reading skills, a neural network begins by comprehending phrases by the spatial arrangement of words; the network deduces their interrelations based on their placements inside the sentence (identifying subjects, verbs, objects, etc.). As training progresses, the network continues to learn, resulting in a transition where word meaning becomes the principal source of information.
Inside the Transformer: The Role of Self-Attention
The new work elucidates that this occurs in a simplified model of the self-attention process, a fundamental component of transformer language models, such as those utilised daily (ChatGPT, Gemini, Claude, etc.). A transformer is a neural network architecture engineered to interpret sequential data, such as text, and serves as the foundation for numerous contemporary language models. Transformers excel at comprehending relationships within a sequence and employ the self-attention process to evaluate the significance of each word in respect to the others.
“To assess relationships between words,” explains Hugo Cui. He is a postdoctoral researcher at Harvard University and the first author of the study. “The network can use two strategies, one of which is to exploit the positions of words.” In the English language, for example, the subject typically precedes the verb, which in turn precedes the object. “Mary eats the apple” can be a simple example of this sequence.
“This is the first strategy that spontaneously emerges when the network is trained,” Cui explains. “However, in our study, we observed that if training continues and the network receives enough data, at a certain point — once a threshold is crossed — the strategy abruptly shifts: the network starts relying on meaning instead.”
“When we designed this work, we simply wanted to study which strategies, or mix of strategies, the networks would adopt. But what we found was somewhat surprising: below a certain threshold, the network relied exclusively on position, while above it, only on meaning.”
Phase Transitions in AI: Borrowing Ideas from Physics
Cui characterizes this shift as a phase transition, utilising a notion from physics. Statistical physics examines systems consisting of vast quantities of particles (such as atoms or molecules) by characterising their collective behaviour statistically. Neural networks, which underpin contemporary AI systems, consist of several “nodes” or neurons—analogous to the human brain—each interconnected and executing basic tasks. The system’s intelligence arises from the interaction of these neurones, a phenomenon that may be articulated using statistical approaches.
This is why we can characterise a sudden alteration in network behaviour as a phase transition, analogous to the transformation of water from liquid to gas under specific temperature and pressure conditions.
“Understanding from a theoretical viewpoint that the strategy shift happens in this manner is important,” Cui emphasizes. “Our networks are simplified compared to the complex models people interact with daily, but they can give us hints to begin to understand the conditions that cause a model to stabilize on one strategy or another. This theoretical knowledge could hopefully be used in the future to make the use of neural networks more efficient, and safer.”
The study conducted by Hugo Cui, Freya Behrens, Florent Krzakala, and Lenka Zdeborová, entitled “A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention,” is published in JSTAT within the Machine Learning 2025 special issue and is featured in the proceedings of the NeurIPS 2024 conference.
Original Publication
Journal: Journal of Statistical Mechanics Theory and Experiment
Method of Research: Data/statistical analysis
Article Title: A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention
Article Publication Date: 7-Jul-2025

