Programmers have been trying to devise a method of making machines process natural language since long before Tim Berners-Lee talked about achieving a Web that implemented such technology – indeed, from around his birth in 1955.
Alan Turing, in 1950, published “Computing Machinery and Intelligence”, which opened by posing the question, “Can machines think?”. In that work, he introduced his “Turing Test”, which remains the most widely accepted metric for gauging artificial intelligence to date.
The Turing Test essentially uses human observers to determine if a machine can provide sufficiently human-like responses to questions so as to be indistinguishable from a human. Such ability would be essential to any successful development of what Sir Berners-Lee was talking about in 1994.
What is Natural Language Processing
Simply stated, natural language processing (NLP) is the combined application of computer science, artificial intelligence (AI) and linguistics, as it pertains to computers interacting with human language. The greatest challenge of NLP is the achievement of a complex understanding by the computer of the many nuances of human language.
As an example, consider the simple sentence, “Flying planes can be dangerous.” The average person will immediately grasp the meaning to be either that there may be danger to the pilot or to people on the ground.
An NLP algorithm, however, must consider not only whether flying is a verb or an adjective, but whether can is a verb or a noun and whether planes is used as an aircraft, a woodworking tool or a geometric object.
If the sentence is analyzed in conjunction with other sentences, then the surrounding context may help clarify some of those issues. But still, it is a challenging and frustrating field of study. Computers simply don’t think like humans.
History of NLP and AI
Initial efforts were based upon establishing a vast number of hand-written rules by which the computer would process language. The rules would simply be a portion of the program that stated “if it says ‘this’, then it means ‘that’. Those if- then rules increased exponentially in number as the program’s vocabulary was expanded.
Thus, the programs became quite large, and as a result, processing time was slow. That could probably have been dealt with by advanced processors, but the understanding of the language being analyzed simply wasn’t there.
The computer could access the definition of a word, even a phrase… but not the true meaning. Nuances in emotion, irony or sarcasm, for instance, were totally lost on the machine. There were some small gains, such as the chatterbot, Eliza, in 1966, which used simple pattern matching to determine which canned response should be issued to a statement or query.
Eliza seemed to be an astounding achievement at the time, but the program was extremely limited, and by its nature, it simply wasn’t scalable. A different approach was needed, if artificial intelligence was ever to be achieved.
In the late 1980s, machine learning algorithms were introduced, which opened new opportunities for NLP and AI. The if-then rules were now able to be automatically generated by the program itself, as the computer was exposed to new situations.
Eventually, the use of if-then statements was replaced by statistical modeling, which involved probabilistic decisions by the program. This required much fewer rules, as the program would attach a weight to the input data and would render results that were much more reliable, though still imperfect.
The Present State of AI and NLP
Computers are particularly adept at probabilistic decisions, although they still require some rules. The beauty of this model is that the program learns as it goes, essentially generating its own rules and adjusting probabilities according to the statistics it encounters in its interactions with the language.
As a result, many programs have been developed which score quite highly on the Turing Test, as indistinguishable from human responses. The progress made to date seems to indicate that artificial intelligence is indeed a possibility, and a great deal of research is dedicated to achieving such a capability.
The list of potential applications for such a capability is virtually endless, as it would allow a computer to make reliable decisions much faster than a human, in consideration of all available information, whereas humans are limited in the amount of different data-sets they can consider in a given case.
Artificial Intelligence in Search
Search engines are just one of those applications that could benefit greatly from artificial intelligence and natural language processing. Rather than simply being able to point a user to sources of data that is relevant to a query, such a program would be able to comprehend the nature of the information needed, and use data to extrapolate intricate responses.
Imagine being able to receive a reliable response from a search engine to a query such as:
I need a secular interpretation of what Dostoevsky meant by ‘… he has created him in his own image and likeness.’
The program would be able to comprehend precisely what the user seeks, find the referenced quotation in The Brothers Karamozov, analyze the context in which the statement was made, research various psychological texts to determine what the statement probably referred to, balance that weighted data against the weighted analysis of the book’s content and compile an interpretation in response.
This process would possibly take a few hundred milliseconds, in contrast to the research, analysis and composition time needed by a human – easily an hour, possibly much more. And a human might have difficulty in separating his cultural background to provide a truly secular interpretation.
At the end of the process, the computer would have learned a good deal about the philosophical motivation behind Dostoevsky’s statement, as well as about human nature and the psychological analysis of some men’s outlook on life and added it to its knowledge base.
That addition would affect the weighting of future queries, data and responses, improving both the accuracy and response time of the program.
We are well on the way to such capability – how far along we already are and how long it will be before we see it in widespread use is open to debate. But make no mistake… the Semantic Web is coming.
Well, yes, your analysis of the history of NLP (natural Language processing) is OK. The first part – if-then-else statements, is the basis of ALL computer languages. Noam Chomsky was certainly right about that.
However, computer languages can only solve problems that have finite solutions. This scope is much smaller than “all possible thoughts” and their responses, solutions, semi-solutions, remarks, etc. — so it (problems with finite solutions) can be solved by syntax only by if…then..else statements.
Eliza was a joke. Read about it on the web. The last part is correct; but it is short and only covers the waterfront. It does not give any ideas about how to proceed. I don’t think a good-enough language translator has to know all things. Syntax should ‘almost’ do the job.
If a language translator had to know everything, then it would be too big and too slow. However going from Google “keywords” to adding some syntax should be a big improvement in any search engine. Learn something about diagramming sentences — especially questions. Remember that questions always end with a ? and generally start with a interrogative pronoun (what, when, why, where, etc.) followed by a statement (noun, verb, object phrase each of which can be a noun, verb, object phrase, each of which can be….. etc., by recursion.
So… interpreting these comments, we would have to agree that computer languages can only solve finite problems. However, by integrating the ability to infer meaning from the corpus, that language should be able to perform a certain level of analysis. As the algorithm progresses on its learning path, its ability to extract the semantics of the content will continue to improve. We have already seen this ability developing.
Our AI scientist is correct about Eliza being a joke. It was little more than a toy, to be certain. But it was also a representation of a dream… one of an engine that could truly converse with its user. A baby step, but one which captured the imagination of many technologists that continue to work toward the dream of artificial intelligence.
We’re still a long way from that, but with the recent launch of the new Hummingbird search algorithm, we seem to have taken a big step closer. The algos no longer have to search for exact match terms, synonyms or even proximate phrases… they are now able to detect the general theme of an entire document, even if only at an elementary level.
The Internet is on the cusp of a dramatically important phase in its development. Web 3.0 is gradually becoming a reality and improved semantic analysis of both online content and search queries will change the way we generate content and the way users search the web.
The argument over whether or not true artificial intelligence is possible or achievable will undoubtedly rage on for some time. In the meantime, though, those that use the Internet to either locate or offer products, services or information will find the experience is evolving rapidly.