What is Intelligence: The Higher Magic of Pre-Built nets.

THE HIGHER MAGIC OF PRE-BUILT NETS

We have reached a peak in terms of developing neural networks that solve basic kinds of problems. When a neural network is trained, which is usually a computationally expensive process that hugs huge amount of resources and can run for weeks, we obtain a set of weights which are the products of the training process.


These weights that we have obtained are what aid us during inference time, that is when we want to ask questions of our network with new data. For example, if we have an image classifier that we have trained and we have arrived at the best set of weights that best generalize our dataset, during inference we would usually ask the network to classify input it has never seen before, which would be in this case a new image within classes that it knows. If the network succeeds in identifying any new piece of data in a class that it already knows then we have a network that works.

With transfer learning, we can even use these weights with classes that we did not train the network on by stitching new low-level layers that enable us to use these set of weights for a task that the network was not originally trained for.

When we have labouriously trained a neural network we usually can save the weights we have learnt and allow other people to use this network in their own projects without having to train the weights.

Apart from sharing weights with others, we can also share network architecture with others allowing them to train these networks by themselves.

But true magic comes when we share pretrained networks with others. The level of AI we are currently engrossed with is akin to the machine language level of the computer abstraction that I have discussed elsewhere. The public really doesn’t know how low level the stuff we currently call AI really is. To ignite the imagination of the public we usually build lots of plumbing in high-level software to be able to create stuff like Google assistant, Amazon Echo, etc.

Telling the average person that what gave birth to the modern AI industry was the ability to recognize images without hand coding much of the feature extraction would be boring and not spark images of the terminator. But when you watch Amazon Echo give trivial answers to trivial questions people feel that they are witnessing the rise of the machines!

What the public doesn’t know is that at the lowest levels where the real AI occurs, it is not really that exciting. To create such products a lot of hard coding of rules and other plumbing is needed to get echo to generate speech. The true machine intelligence inside is really minimal and human effort in hand coding surmounts that.

Even in the much-celebrated victories against world Go champions by programs like AlphaGo seems to disguise the fact that a lot of hardware and software engineering by a large team of engineers contributed to such victories in such a narrow domain.

I don’t think the intelligent people that design systems like AlphaGo intentionally seek to deceive the public about the capabilities of their systems. Maybe its just survival instincts to keep your industry relevant or maybe its just plain egotism but the truth is that what is going on in systems like Go is very very narrow and would take some kind of Manhattan project scale kind of engineering, if that is even possible, to make it relevant in the real world.

The researchers have tried to best “human” effort by besting narrow definitions “human intelligence”. This is how it goes:

Someone says that the game of Go is very complicated and requires so much human skill that a computer cannot defeat humans at the game of Go. Software engineers in an attempt to prove the opposite start building a computer that is able to beat humans at go.

The argument can come with this kind of proof: so far we have tried every possible current method and we are not able to defeat a human at the game of Go.

Software engineers conclude, if we are able to beat a human at Go then we have attained artificial intelligence. Then they go ahead and engineer a very specialized system, with lots of specialized algorithms and they defeat a human at Go.

The news gets out into the world because there is a lot of PR before such events, and stomachs churn as they watch the computer defeat a human mercilessly over and over with the human appearing mentally handicap.

The researchers are gratified and funding flows in, it seems that the researchers involved in this are trying so hard to promote software engineering capabilities and not working directly at the kind of intelligence that could enable humanity to solve some of the problems that have confounded it for a long time, that having a non-biological, substrate-free intelligence would enable.

No doubt we learn a lot from the kind of software engineering that makes game playing AI possible and this doesn’t just include Go, every game playing AI from Dota to Starcraft are guilty of this kind of engineered-for-appearance mostly AI.

It's like the AI engineers are magicians trying to so hard to convince the crowd that the illusions they are displaying are so real. In the case of the magic show it is to amuse but in the case of the AI shows it is to strike fear and attract funding.

I don’t blame the researchers at all for trying their best to beat humans at playing games, they are probably not armed with a good definition of intelligence as a premise from which they could start trying to solve it. Trying to define intelligence is as difficult as trying to engineer systems that perform it and I have a chapter dedicated to that discussion elsewhere.

This same display was going on in the early days of AI development, in the 60s where media attention was caught by the dramatization of AI performance in solving Toy problems. While modern researchers focused on developing game playing AI that beat humans would argue that these are not toy problems, it is hard to find out how these programs are going to help us solve real problems that humanity encounters.

We know that research products usually transfer out from the actual research work that created them, but before the algorithms that play these games so well get transferred out they would be stripped down and generalized to the point where it is hard for us to see any AI going on.

The guys in the 60s where doing what they thought was AI when they attempted to conquer humans at things like chess playing. A toy problem does not necessarily mean a simple problem. Chess might be hard but the domain is narrow and any algorithm designed exclusively to beat humans or other machines at chess would be hard to generalize into other problems. Alpha-Beta search that is used to explore chess game trees might beat a human and thus we might attribute this ability to “intelligence” but when we reapply Alpha-Beta search in another domain, we usually would not say that it has intelligence.

So the domain for which these algorithms are applied whether its Monte-Carlo tree search or whatever determines whether the public views them as AI or not. This is clearly a distraction and after the low hanging fruits of public and investor interest have been plucked researchers might have a very hard time convincing investors to invest in AI because they just defeated the best human players at a game.

Calculators are better than humans at calculating, but nobody writes articles about the powers of the calculators as the first step to AI taking over the world. Calculators are doing domain specific work of calculations so the game playing AI is in a sense equivalent to electronic calculators, they are domain specific.

The true progress AI has made is in the ability to share pretrained nets. This enables other researchers to work at a higher level of abstraction, by building higher level stuff that uses these pretrained networks as atoms. This is the true progress that doesn’t sound fancy enough for headlines.

In the early days of neural network research we were involved with just getting the network to learn weights, and we were able to do that with backpropagation and innovations in hardware that made the computationally expensive aspect of training neural networks feasible.

The next stage was building deeper and deeper networks that made computers achieve and even beat human level performance at many tasks notably image recognition. This involved lots of hyperparameter engineering and for a while was the main preoccupation of AI engineers.

After we got stable networks that have been trained on massive machines to high accuracies, the next step was to share these networks with other researchers so that without the effort of hyperparameter tuning and cost of training, any researcher could just take advantage of a pretrained network and start making inference using the pretrained network as a foundation for performing higher magic.

If we watch this process carefully we would see that it mirrors the kind of abstraction climbing we did in computer architecture till today we have high-level languages like python which we use to build systems like TensorFlow and Keras in which we can now not only design network architectures and train them but we can start using the trained networks that other people have built for our own higher-level tasks.

In systems like Wolfram Mathematica, the abstraction is neatly built, as we can simply summon the powers of some gigantic pretrained neural net with something as basic as a function call.
Above: Using the NetModel function to summon a large pre-trained net that does age estimation.

Using the pretrained net is as easy as making a single function call supplying an image as parameters

When you see a product like Google assistant or translate you are seeing the work of higher level magic that has simpler atoms like different kinds of trained neural networks as their building blocks.

But we must not be limited in our thinking to these very direct applications of pretrained nets. Just like the function abstraction, which was obtained by combining simpler jump statements in assembly language, gave us so much power to express our ideas about practical computing. The pretrained networks are like functions giving us another layer of abstractions to express ideas that would have been extremely difficult to attain if we were to be working at a lower level.

With the ability to query a pretrained network in a single function call, we have now incorporated AI algorithms into regular routine programming and now we can use our already existing tools in programming languages to build more elaborate things. But to go beyond towards Strong AI (AGI) we would have to revise ideas as fundamental as general programming languages themselves.

Comments

Popular posts from this blog

Software of the future

At the edge of a cliff - Quantum Computing

Eliminate past trauma with Kirtan Kriya