Search! The Algorithm of intelligence

I have previously talked about the network being the best representation for all kinds of problems and thus equivalent to what we can call general intelligence. My argument is that if we can find the most general representation of any problem whatsoever then we have discovered general intelligence because then it will be only a matter of manipulating the structure of this representation to solve any problem.

It must have come as a shock because people have been hooked on the idea that general intelligence is some kind of procedure and we can achieve it by inventing the most general kind of procedure ever. But now I am saying that general intelligence is more about data structures rather than procedures.

When we think of Strong AI (AGI) we usually visualize some kind of computer program that can do all kinds of things. If you are a programmer you will imagine some mountain of code that does all kinds of specific tasks, this the kind of mindset which lead the early AI researcher to write mountains of code to solve all kinds of specific problems they thought were critical to intelligence but from history, we know that they didn't succeed.

Deep learning is the modern computing paradigm which lead to the revelation that rather than manually write some specific program to solve a specific problem you could just express the problem as a network and then search for properties of the network like weights and biases that solve the problem. Deep learning represents the solution to problems of pattern recognition as a network of nodes and edges. The nodes are called neurons and the edges represent weights. Using this very general representation we are able to arrive at solutions to previously insurmountable problems like image recognition and others.

Although implicitly every computer program is a network, it is with deep learning that we are able to explicitly state a problem as a network and then manipulate the properties of that network until we reach a state which corresponds to the solution.

The Weight matrix is usually the solution to the problem of recognizing a certain image as is done in Deep neural networks. But what is the algorithm that operates on this data structure to produce results? This is the backpropagation algorithm.

It is easy to lose sight of the fact that backpropagation is a search algorithm. If you are immersed in details of training a neural network like having a loss function that you want to minimize and running backpropagation to update the weights of the network such that the loss is minimized, you don't see it clearly that what you are doing with backpropagation and your loss functions is merely searching for a set of weights that best minimize the loss.

Looking at things from a birds-eye view can give you the insight to make innovative leaps that would not be possible if you were only consumed by the details. In general when we are finding solutions to a problem we are in a sense performing a search.

The problem-solution structure is such that what is in the middle between problem and solution is some kind of searching.

We have been too focused on specific iterative search algorithms like backpropagation that we are failing to see that casting the problem of finding appropriate weights that minimize some cost function as a generalized search problem will lead us to search wider for solutions than trying to hyper-optimize the already known backpropagation algorithm.

Actually, most progress in improving deep learning has come from improving algorithms that optimize the cost function like Adam, Adagrad, Nesterov, vanilla SGD, etc. and not really the underlying backpropagation which is actually more fundamental. No matter how powerful the update rule for gradient descent, it is still limited by the fact that it is based on backpropagation.

Despite the limitations of backprop it is still a very powerful algorithm and has brought us most of the goods of modern AI but we shouldn't stop here.

In this work so far, I emphasized the power of representations. I emphasize that when you have the most general representation for all kinds of systems, which is the network, you have general intelligence indeed. I also say that it is only a matter of manipulating this representation and you have the solution to all kinds of problems at your fingertips.

Backpropagation is just one search algorithm out of the many possible algorithms for updating the weights of a network. With back propagation, we are manipulating the representation of the problem, which is the network of weights, till we find an appropriate set of weights which is the solution to the problem of reducing the loss between input data and a target, in typical classification problems.

On one side of the equation of intelligence is some fixed structure like the brain matter. The other side is the dynamic structures like the electrical communication of the neurons, the internal metabolics of the brain cell etc. The fixed structure of brain matter is a network and all the numerous dynamic activities that run on this hardware gives rise to cognition. The beautiful thing is that these dynamic activities can actually be viewed as some kind of search for parameters that model the solution to all kinds of cognitive problems.

## Comments

## Post a Comment