Arvind Gupta has noticed a pattern: Take people who know their math, pair them with companies that know how to sell, and eventually innovation enters the marketplace.
Gupta and his colleagues examine much more complex relationships as part of a group called MITACS, or Mathematics of Information Technology and Complex Systems. This week the national research network held its two-day Quebec Interchange, where scientists and large firms discussed advances in data mining made possible through the application of advanced mathematical techniques.
“”It’s wrong-headed to try and make universities into companies, and it’s wrong-headed to make companies into universities,”” Gupta says. “”The right thing to is to have more abstract ideas and share them with industry, and industry to take those to market.””
In a recent conversation with ITBusiness.ca, Gupta tried to break down some of the complexity behind the math that could change the enterprise.
ITBusiness.ca: Can you describe in layman’s terms the kinds of advances in data mining being discussed at the conference?
Arvind Gupta: We have a wide range of companies and organizations that are interested in these kinds of problems. But on the actual technology side, data mining as a field has progressed really rapidly. I think there’s a real appreciation for working with these large data sets.
On the statistical side, some of the new techniques involve machine learning, where we try to train the computer to recognize certain patterns in your data, either by providing both positive and negative examples of the kinds of patterns you’re looking for. We call these training sets. The idea is that the machine learns via these training sets what kinds of patterns we want. We build a probablistic model, and it tells us what the likelihood is that new data will fit the pattern we’re looking for.
ITB: Where did these techniques come from? Were they from academia or learned in the field?
AG: It’s mainly academia. I guess they’ve been developed over the last 20 years. I always hate to say where an idea comes from, because there’s that whole philosophy that no idea is really new (laughs). But I think the academy at some point became very interested, for example, in human cognition and how humans recognize patterns. We’re pretty good at small data sets of seeing patterns, and the belief is we do have some sort of neural network that does something like machine learning in our brain. There’s a whole philosophy that we’re using a probablistic model. When you see an item, you’re guessing it’s an apple because you have a pathway in your brain that’s firing, “”There’s a strong probability that it’s an apple.”” Occasionally you could get fooled by a plastic apples, but after enough plastic apples you’d begin to recognize real apples from plastic apples. But the formal method of using doing machine learning, the kind of values we use there are very specific for the kind of problems that are being worked on. We don’t develop machine learning for general patterns.
ITB: So how are these algorithms making their way into today’s enterprise, like in business intelligence software, for example?
AG: This is a huge field now. Anywhere from the financial community looking for patterns in trade on the stock exchange, or setting prices for auction or other financial instruments, developing hedging strategies. In health care data —
ITB: But is that actually happening now, or is that still to come?
AG: It’s happening now. One of this thing this conference is about is the melding of academic and industrial research programs. We’re seeing a lot of that. We’re seeing that it’s a much quicker turnaround for things being developed in terms of theoretical models at the university to applications in industry. We have a project, for example, where we’re working with the telecoms on identifying patterns in customer calling and individualized long distance plans. They’re really using state-or-the-art techniques to look for these kinds of patterns. I think it’s happening more and more, and it’s happening at a quicker pace.
ITB: If some of these models have been around for 20 years, why do you think it’s taken this long for them to make their way to more practical applications?
AG: When I say they’ve been around for 20 years I meant sort of in trying to understand human cognition. I hate to say that these models just got developed two years ago. Because in the 20 years that people were thinking about “”Do humans reason?””, it’s taken a while to get some of those abstract questions down to the point where we recognize that we’re not going to build the computer that reasons like a human being, at least not in the foreseeable future.
You know, the Japanese tried this in the 1970s, with something called the Fifth Generation Project. We now recognize it’s not going to happen, but people learned a lot. I mean, going through that process and developing those human cognition models. And then maybe people looked to see if we could apply those models to restricted domains. It does take time for academics to take a larger problem, dissect it and understand the components. I think now we’re in a position to apply them to real problems.
ITB: It’s coming at a good time, considering that the onus is on companies to do a better job at targeting their approach to customers.
AG: Yeah, and there’s also social and economic issues. For example, how do you handle health-care data? The health-care system is a $70 billion industry in Canada. You could potentially save a lot of money there by gleaning patterns of drug ordering, how procedures or done. Are certain procedures being done too often on certain identifiable populations? The same with customer relationship data. There are a lot of privacy issues. One also has to worry about the social implication side. I think that’s something that a lot more companies are becoming aware of.
ITB: We’re talked before about the need to better apply mathematics to IT. How is the relationship between those two worlds evolving?
AG: Essentially, software development is no longer a high-tech industry. Just as hardware became commoditized 15 years ago, software is now becoming something where we can write our specs up and somebody else could write the code.
I think we have to do in Canada — the U.S. has started to do this, and I think Canada has started to also — we have to be the suppliers of the ideas and the algorithms and the thought processes behind the software. Because that’s something that’s going to be very high end, very value-added. Once we have the ideas, it doesn’t really matter where the implementation occurs. It’s like the car companies: they do their designs in the U.S. and they build the cars elsewhere. The designs are the “”knowledge”” part of the product.
For mathematicians, this is really a great trend, because mathematics is the ultimate resource for ideas and the brains behind the software. We’re staring to notice, finally, an upswing in enrolments in graduate and undergraduate mathematics and related areas. At these kind of events now, we’re seeing a lot of companies coming. They seem very keen to understand what the mathematicians — I hate to say “”mathematicians,”” because people think of people in math departments — but people who think more abstractly. I think there’s an appreciation for the kinds of contributions they can make.