The public will get its first chance Monday to test a search engine from start-up Powerset that eschews conventional keyword technology and instead is designed to understand the meaning of Web pages.
As such, Powerset’s search engine holds the promise of fundamentally changing people’s expectations for search engines by, in theory, offering a smarter, more efficient experience.
However, Powerset’s beta version, while delivering impressive results, has a limited scope and index, leaving unanswered questions about its ability to work its magic at the massive scale of Google’s keyword-based search engine.
“We’re changing the way information is searched by doing a much deeper analysis of the pages we index,” said Scott Prevost, Powerset’s product director.
Keyword engines treat pages as word bags, indexing their content without grasping its meaning, he said. Meanwhile, Powerset’s engine, applying technology developed in-house as well as licensed from Xerox’s PARC subsidiary, creates a semantic representation by parsing each sentence and extracting its meaning. “Meaning is what we index,” he said.
In an interview in October with IDG News Service, Marissa Mayer, Google’s vice president of Search Products & User Experience, acknowledged that the company’s search engine should — and will — overcome its keyword dependence in time.
“People should be able to ask questions and we should understand their meaning, or they should be able to talk about things at a conceptual level. We see a lot of concept-based questions — not about what words will appear on the page but more like ‘what is this about?’. A lot of people will turn to things like the semantic Web as a possible answer to that,” she said.
But she added that Google’s search engine acts smart thanks to the humongous amount of data it crunches. “With a lot of data, you ultimately see things that seem intelligent even though they’re done through brute force,” she said. As examples, she cited a query like “GM,” which the engine interprets as “General Motors” but if the query is “GM foods,” it delivers results for “genetically-modified foods.” “Because we’re processing so much data, we have a lot of context around things like acronyms. Suddenly, the search engine seems smart, like it achieved that semantic understanding, but it hasn’t really,” she said.
For now, Powerset’s index is very limited, consisting only of millions of pages from Wikipedia and Metaweb Technologies’ Freebase, a Web-based structured database of information. However, Prevost vows that the index will begin growing within a month after its launch and eventually rival in size those of Google, Yahoo and others. “Our technology fully scales,” he said.
Still, it’s impressive to see Powerset’s search engine in action and the promise it holds. Instead of returning the proverbial 10 blue links for search results, Powerset can do more, such as assembling a collection of facts related to the query, as well as summarize the found information. It can also provide direct answers to factual questions.
Because the content from Wikipedia and Freebase can be re-published, Powerset can remain relevant after a user clicks on over to a search result, by providing an outline to navigate through the page and a summary of facts. This, of course, isn’t something that Powerset could do with copyrighted content, but the company will seek partnerships with publishers to obtain permission, Prevost said. “We think it’ll be a situation where publishers will want their content to be served up in this way,” he said.
Industry analyst Greg Sterling of Sterling Market Intelligence calls Powerset’s capabilities “impressive” and particularly likes its search results interface. “What they’ve created is both a better search engine for Wikipedia and a massive ‘proof of concept’ for their algorithm and technology,” he said in an e-mail interview.
Now Powerset has to prove that its search engine can scale and deliver against an index of billions upon billions of Web pages and serving millions of concurrent end users. “There’s certainly potential there to build a better mousetrap, it would appear. But bringing what Powerset has done for Wikipedia to the entire Internet seems an enormous challenge that will take both time and lots of additional resources,” Sterling said.
Prevost acknowledges that to do this type of deep processing takes a lot of computational power, although once indexed, retrieving pages’ information doesn’t pose any special challenge.
Powerset also faces the challenges of a start-up technology company, such as generating revenue and going through growing pains. The company has already had some management upheaval, announcing in November the departure of co-founder and Chief Operating Officer Steve Newcomb and its search for a CEO, as co-founder Barney Pell gave up that post to become chief technology officer. “The CEO search is still in process, but we have a strong internal management structure and board of directors,” he said.
Prevost said the company’s investors are committed to the company and to seeing that it has the resources necessary to scale up the search engine to the level of those with indexes of 20 billion pages.
Powerset’s business model is based on advertising, although the search engine will not serve up ads from the beginning. “There’s a lot of cool stuff we can do in the ad space by matching the meaning of queries to the relevance of ads, but that’s much more longer term,” he said.
The search engine will be limited to Web search at first, although Powerset has contemplated adding specialty engines for things like images and video later, as well as targeting verticals such as health, product reviews and travel, he said.
“We’ve only shown the tip of the iceberg in language analysis,” he said.