What’s the future of Wolfram Alpha?

Wolfram Alpha launched the other day. The reception has been mixed, but most of the positive press seems to be from folks who’ve gotten a guided tour. You can get the same experience, if you haven’t already, by watching Steven Wolfram’s screencast. Awesome, right? All the world’s knowledge at your fingertips?

Well, not quite. Farhad Manjoo’s reaction is actually typical for folks who’ve played with Alpha for a little bit:

Once you start conjuring your own searches, it’s clear that the samples offer a misleading impression of the site’s depth. Ask how many calories that male runner would burn if he were swimming, cycling, playing tennis, cross-country skiing, or golfing—it’s clueless. Say you wanted to know how life expectancy differed by state in the United States—what’s the life expectancy of a male in California, and how does that compare to the life expectancy of a male in Kansas? “Wolfram Alpha doesn’t know what to do with your input,” the site tells me. And on and on it goes. Wolfram Alpha doesn’t know the homicide rate in South Africa or Baltimore, it doesn’t know how many copies M.I.A.‘s last album sold, it can’t tell you the per-capita GDP of the San Francisco Bay Area, and it’s got nothing about the top speed of the Bugatti Veyron. It may sound like I’m nitpicking, but I was careful to construct questions that emulated Wolfram’s own examples. As it kept coming up empty, Wolfram Alpha came to seem less like HAL 9000 and more like a chatbot.

Even more embarrassing, Alpha sometimes shows outdated data when the more up-to-date data is easily findable on Google.

What’s happening here is Wolfram Alpha is pulling information from discrete online databases, including the CIA factbook, Census reports, and the stock market. These sources are connected to Alpha by its human handlers, who teach it how the data relate to each other. (Many other websites, e.g. Every Block, do the same thing, albeit on a smaller scale.) Insofar as it’s just launched, we should expect Alpha to get smarter and smarter as the months turn into years, right? Well, not necessarily. Using human operators to organize the web is how Yahoo! search used to rule the world, until Google came along and let machines give it a try. As the web grew, it outpaced the ability of any number of human librarians to keep up with.

So how can we make software be able to understand the databases that are scattered around the internet? Well, mostly you do it by changing the way the databases are organized, to make them more easily machine parseable. What we’re talking about is the semantic web, which Tim Berners-Lee explained (he’s one of the people with a claim on inventing the internet) earlier this year at Ted. He describes the frustration of connecting data one database at a time, and he lays out some ideas of how things might work when data are connected. (Raw Data Now! It’s going to improve the world!)

Berners-Lee throws out a few examples of how this might work, but it’s mostly along the line of mashups. We can create all sorts of data mashups (that second thing is cool, click on it, seriously), but the act of creating the mashup and using it are distinct, right? The genius of Wolfram Alpha is that, within its currently limited sphere, it allows you to mash anything. Well, once the Semantic Web tips and more and more data starts to become available in machine-readable format, the potential power of tools like Alpha will grow exponentially.

Whether the tool that ends up winning the race to harness this data is Wolfram Alpha, Google, or something else remains to be seen. But at least we can now imagine what that tool will look like — something like what Wolfram Alpha looks like today, but able to handle any question that Farhad Manjoo (or any research scientist, or any kid in school) can throw at it.