Lost in translation

Anne-Marie Di Sciullo wants to make herself perfectly understood. Because the Internet is not.

The professor of linguistics at the l’Université du Québec in Montreal recently received a $2.5 million grant from the

Social Sciences and Humanities Research Council of Canada to develop search technology that is tied to the nature of human language but whose structures do not depend on any mother tongue.  Di Sciullo will be working with a team of 43 researchers on the project over the next five years.

ITBusiness.ca recently spoke with Di Sciullo about the nature of her project and how it might help the IT industry.

ITBusiness.ca: How exactly are you going to use this grant over the next five years?

Anne-Marie Di Sciullo: I’ve been working on this project for five years already. Now I’m working on a newer project, mostly based on interfaces, but using the results from the previous grant I received. It’s a project that combines formal grammar and linguistic theory and computer science. It also takes what we know about human cognition. The idea is to try to develop software that mimics the human activity when you get information and want to transmit it.

ITB: What exactly did you discover over five years in your initial research?

AD: What we did was to develop a (software) language for natural language processing which could be used for Internet searches. Now, in the next five years, we’re going to fine-tune this language, making it very similar to what humans do when they process natural language. It’s a new kind of technology; it’s a new generation of search engine and information processing system. It’s not based on engineering, with statistical calculations or Boolean relations.

ITB: Have we been really using computers long enough that we can define what those processes are?

AD: Well, it’s not a question of computer science. It’s a question of cognitive linguistics. Engineers, while they don’t care about cognition or even natural language, have formulas and they apply them to any object. For example, text on the Internet, a set of characters. In order to find the information with respect to a query, what they do is they analyze the keywords in the query and match documents that contain a significant number of these words. More often than not, they’re not taking into account the specific relationships of the words and the query. What you get is a result that is not accurate, but approximate.

ITB: What’s your hypothesis?

AD: My own research has led me to show that there’s a basic relationship in language that is asymmetrical. The words are structured in terms of pairs or elements, but the relationship between the elements is not symmetrical. If you have a grammar based on asymmetry, you can do much better (with search engine queries). Boolean operations are symmetrical, for example.

ITB: Where would you like to see the fruits of your research used?

AD: It could be applied to any tools that have to do with language — written language or spoken language. For example, information retrieval and extraction is one possibility, either on the Web or on intranets for corporations with lots of data to deal with.

ITB: How are you applying recent discoveries in brain science?

AD: Now we have these MRI scans, and you can have a picture of the brain where we can see the linguistic activity. In fact, there has been research in this area which suggests that even in people who have been impaired from the linguistic point of view — those who are deaf, for example — blood runs to the same area when they are trying to communicate. So we know that some activity is going on when we’re generating natural language. But the question is, what kind of activity? Well, my project suggests an answer to that, and the activity at the sub-physiological level is kind of a mirror of the asymmetry that you need to process language. We know, for example, that there were studies showing chimps that can mimic the way humans can speak, but they don’t have what they call X-bar structure.

ITB: What’s X-bar structure?

AD: It’s part of the asymmetry of natural language. There are typical recurrent patterns that you observe in the parts of a sentence that are called X-bar structure, but those kinds of patterns weren’t evidenced in chimpanzees . . .When you ask people to read a sentence, you see that there are different time reactions to different types of elements, which we know are asymmetrically related.

ITB: Why do you think the computer industry hasn’t looked at cognition more closely before?

AD: Well, I think that basically the people that started the Web — (Tim) Berners-Lee and even now with the Semantic Web — were engineers to start with. The properties of natural language were not their topic of inquiry. They did well with respect to the knowledge they had, but now with Berners-Lee doing the Semantic Web, that’s not the semantics of natural language. It’s just to put a universal computational language to standardize the way you put information in the Web. But it has nothing to do with real natural language or comprehension.

ITB: Aren’t computers always going to be several steps behind the complexity of the human brain?

AD: Maybe, but I still feel aggravated sometimes when I’m looking for something and there’s something completely different or not exactly what I want with a search engine. Of course, you can have a behavioural attitude and say, “”Oh, they’re not so bad, so we can live with them.”” But they’re not as good as they could be.

Comment: info@itbusiness.ca

Share on LinkedIn Share with Google+