TV star Watson is a step towards a new kind of search engine.
IBM’s supercomputer Watson is going up against top players of the US television quiz programme Jeopardy! this week, stirring up excitement in the artificial-intelligence community and prompting computer science departments across the country to gather and watch.
“It is, in my mind, a historic moment,” says Oren Etzioni, director of the Turing Center at the University of Washington, Seattle. “I watched Gary Kasparov playing Deep Blue. This absolutely ranks up there with that.”
Jeopardy! contestants are given clues in the form of answers and must try to work up the right questions. Watson, with its 16-terabyte memory, is capable of tackling normal Jeopardy! clues — including all the puns, quips and ambiguities they typically contain. It dissects the clue, compares it against a ream of facts and rules that it has gleaned from reading a raft of books (from encyclopaedias to the complete works of Shakespeare), and assigns probabilities to its answers before coming up with a response.
In the time it takes host Alex Trebek to read the clue to the human contestants, Watson comes up with an answer and decides whether it is confident enough to ring in. The match, which was filmed in advance in January, airs 14–16 February (see the programme’sWatson website).
Although it might sound like nothing more than a stunt, computer scientists say that Watson is an important advance in artificial intelligence, marking a shift that will create much better search engines and help scientists to keep up in their fields.
What Watson doesn’t do is attempt to mimic the human ability to use common sense, make leaps of logic or imagine the future, notes Patrick Winston of Massachusetts Institute of Technology in Cambridge. As a result, so it has failed to capture his interest. “I’m planning to go to bed early. I’ll watch the re-runs,” he says.
The computer system is based on IBM’s DeepQA project, which aims to answer ‘natural-language’ questions in standard English, such as ‘Which nanotechnology companies are hiring on the West Coast?’ The trick is for it to both understand that type of query and provide a meaningful answer. “Good luck getting that from Google,” says Etzioni.
That goal is at the core of many computer science-endeavours, including the long-running artificial-intelligence project Cyc, started in 1984 and now run by Cycorp in Austin, Texas. One trouble with Cyc, says Etzioni, is that its database relies on human beings typing coded facts and knowledge into its system. The alternative is to train computers to learn by reading. There are several big projects in the works on this front — including the Never-Ending Language Learning system (NELL) at Carnegie Mellon University in Pittsburgh, Pennsylvania, and Etzioni’s KnowItAll system — most of which are part-funded by the US Defense Advanced Research Projects Agency.
What IBM has done that’s different, says Etzioni, is to focus on a very specific situation (the game of Jeopardy!), spend a lot of time on how to interpret cunning clues, create a database that is the equivalent of about a million books, and find some way to get the system’s performance to shoot up — it comes up with answers in seconds. IBM hasn’t released all the details of how Watson works, so how they have done this is not clear. But Etzioni guesses that the way it collates facts from its reading is similar to how his KnowItAll system approaches the problem. It’s impressive, notes Etzioni, how DeepQA has managed to basically achieve what Cyc set out to do decades ago, but in just a few years.
“Just like with Deep Blue, it’s really bringing together the state-of-the-art in hardware and software,” says Henry Kautz, a computer scientist at the University of Rochester, New York, and president of the Association for the Advancement of Artificial Intelligence in Menlo Park, California, who is also impressed by Watson.
Etzioni says he expects natural-language software to make a big dent in search applications over the next five years, although at the moment systems such as Watson aren’t ready for ‘prime time’: he notes that Microsoft bought a natural-language processing company called Powerset in 2008 for US$100 million, “but you don’t see Microsoft using it in any visible way”. Kautz agrees that systems as broad and powerful as Watson could be available for general use “surprisingly soon… Let’s say three to four years.”
Crying for help
Etzioni argues that a search engine that can deal with natural-language queries is necessary for scientists trying to keep up with the mass of knowledge now being generated in their field, so they can ask, say, “What are the top ten genes currently being studied in cancer research?”, rather than having to trawl through the literature to find out.
Others disagree. Canadian writer Malcolm Gladwell said in a recent discussionabout the future of search technologies that current projects “are solving lots of problems that aren’t really problems… You cannot point to any area of intellectual activity or innovation or what have you that is today being compromised or hamstrung by some failure in their search technology. Can we honestly go to some scientist and say the reason we can’t cure cancer is you don’t have access to information about cancer research? No!”
Etzioni laughs at that. “To me, that’s as short-sighted as the famous statement that there’s only a world market for five computers,” he says — a statement that, ironically, is attributed to IBM founder Thomas Watson, after whom Watson is named.
“There’s massive production of knowledge, particularly in the biological community, and researchers can’t keep up with it,” says Etzioni. “Applying these tools specifically for medical researchers to keep track of what’s relevant in what they’re interested in is a huge area of my field. It’s true we don’t yet have a killer app, but you talk to anybody and they’re crying out for help.”
PS: This post is based on the article of Nicola Jones from Nature News.