Success in Natural Language Processing is Human-Level Intelligence

Sunday October 28, 2012

There's a lot of talk about Natural Language Processing, NLP, using computers to deal with lots of text. The current state of the art is like cooking with only a colander and whatever ingredients fall out of the tree in your back yard. People are entirely too excited about a collection of weak-sauce results.

Sentiment analysis is a joke on the unpopular kids (brands) desperate to know if people like them. "How many times did they say the words Nike and Love in the same sentence? Huh? How many?" AKA let's-reduce-all-human-discourse-to-one-linear-scale.
More general word frequency analysis can be fun, just like the index of an arbitrarily long book. And you hit problems with grammatical changes right away, so you start using some clever stemming approach, and that either makes things better or worse, and the machine is sure as heck not going to know which it is.
Co-occurrence? That's the best you got?
Okay Google's machine translation is pretty cool, but Chinese Room is not real understanding or analysis.

Take a look at this public service ad on the NYC subway:

Here's what it says:

MTA

.info

What's next?

Poetry is back

in Motion.

Many of you felt parting was not such sweet sorrow.

So we're bringing poetry back in a very artful way.

Hopefully, you'll feel transported.

Improving, non-stop.

When I was in Korea and studying Korean, I tried to read and understand signs that I came across. If nothing else, I should be able to read the signs in the subway, right? Even if you speak English as a first language, if you don't regularly ride the subway in New York, you may not know what this sign is really saying. If you're clever, you can sort of guess, but probably not perfectly.

MTA is the Metropolitan Transportation Authority. This isn't stated in the ad, but humans can probably guess that this public-service-announcement-looking sign on the subway is probably from the subway people.

"What's next?" is a rhetorical question, and in fact a sort of MTA advertising series as they tell us what updates and changes are happening now or in the near future.

"Poetry is back in Motion" refers to the MTA "Poetry in Motion" series, a separate initiative that puts short poems in subway ad slots. Apparently they stopped doing this for a while, and now they're going to be back. Or maybe they've just failed to sell all the ad slots. Who knows.

Next our friends the MTA allude to Juliet's parting words to Romeo from her balcony. Apparently by the end of the play both Poetry in Motion and all subway riders will be dead. But seriously, this allusion just doesn't make sense. Does no one understand the feeling Juliet was conveying? Honestly?

The "feel transported" is actually kind of a fun double-meaning pun. The "non-stop" is less fun but would probably be better if it wasn't following a bunch of other junk just like it.

I suppose you could say that the MTA has really done a noble job in making a dull message a little more fun. No doubt. The point is that really understanding even this fairly simple message is not so easy. What if your NLP doesn't have the NYC-subway-rider plug-in? Or the dual-meaning-pun plug-in? The Shakespeare plug-in? I suspect that until machine text analysis is done by an embodied learning computer with human-equivalent intelligence, we will be limited to the frankly unimpressive kinds of tools that we have so far. To make my suggestion even less helpful, I suspect that as soon as such technologies exist, they will have the same drawbacks that humans do. Perhaps computer users will all have to be managers. Will I have to give my computer the weekend off? Maybe I should have... ramble mode OFF

This post was originally hosted elsewhere.