Making a sentence grammatically acceptable is only part of the challenge. Communicating effectively also requires that the person on the receiving end can process and understand what is being said as quickly as possible.
In a new study of 37 languages, three MIT researchers show that most languages move toward "dependency length minimization" (DLM) in practice. Users prefer more locally grouped dependent words whenever possible."People want words that are related to each other in a sentence to be close together," said Richard Futrell, a Ph.D. student in the Department of Brain and Cognitive Sciences at MIT, and a lead author of a new paper detailing the results. "There is this idea that the distance between grammatically related words in a sentence should be short, as a principle" (http://www.biosciencetechnology.com/news/2015/08/how-language-gives-your-brain-break?et_cid=4716205&et_rid=45505806&location=top)
As Futrell said, "When I'm talking to you, and you're trying to understand what I'm saying, you have to parse it, and figure out which words are related to each other. If there is a large amount of time between one word and another related word, that means you have to hold one of those words in memory, and that can be hard to do."
While the existence of DLM had previously been posited and identified in a couple of languages, this is the largest study of its kind to date. Edward Gibson, a professor of cognitive science and co-author of the paper, said, "It was pretty interesting, because people had really only looked at it in one or two languages. We though it was probably true [more widely], but that's pretty important to show. ... We're not showing perfect optimization, but [DLM] is a factor that's involved."
According to the authors, "We provide the first large-scale, quantitative, cross-linguistic evidence for a universal syntactic property of languages: that dependency lengths are shorter than chance. Our work supports long-standing ideas that speakers prefer word orders with short dependency lengths and that languages do not enforce word orders with long dependency lengths. Dependency length minimization is well motivated, because it allows for more efficient parsing and generation of natural language. Over the last 20 years, the hypothesis of a pressure to minimize dependency length has been invoked to explain many of the most striking recurring properties of languages. Our broad-coverage findings support those explanations (http://www.pnas.org/content/early/2015/07/28/1502134112.abstract).
The researchers conducted the study using four large databases of sentences that have been parsed grammatically: one from Charles University in Prague, one from Google, one from the Universal Dependencies Consortium (a new group of computational linguists) and a Chinese-language database from the Linguistic Dependencies Consortium at the University of Pennsylvania. The sentences, derived from published texts, represent everyday language use.
To determine the effect of placing related words closer to each other, the researchers compared the dependency lengths of the sentences to baselines for dependency length in each language. One baseline randomizes the distance between each "head" word in a sentence and the "dependent" words. Because some languages have relatively strict word-order rules, the researchers also used a second baseline that accounted for the effects of those word-order relationships.
In both cases, the researchers discovered, the DLM tendency exists, to varying degrees, among languages. "Italian appears to be highly optimized for short sentences; German, which has some notoriously indirect sentence constructions, is far less optimized," according to the analysis.
Futrell, Gibson, and Mahowald acknowledge that the study triggers larger questions: Does the DLM tendency help the production of language, its reception, a more strictly cognitive function or all of the above?
"It could be for the speaker, the listener or both," Gibson said. "It's very difficult to separate those."