Towards program based models

As the deeplearning fever has calmed down like I actually predicted it will because we have not seen any general AI arise from it, its time to reflect on what we have learned from it and gain wisdom in order to avoid folly in the future and to also help correct our trajectory towards AGI so we can reach there in a reasonable time.

Deeplearning was a reasonable technology as important as computer programming itself, it enabled us for the first time get computers to do tasks by learning from data without so much explicit programming but rather than welcome this tool with a cool head we actually thought we had hit upon a path to AGI with the consequent billions of dollars spent on training large expensive models for identifying cats. Well, we also have language models like GPTx which is a very powerful tool and has seriously improved the capacity of computers to understand language, but the hype was actually excessive and much of it was pumped from marketing departments rather than from practitioners themselves.

If it were possible to hype things into existence then deep learning would have given us AGI, but that is not the way reality works. And for the next few years, there are going to be those who think they can tweak the deeplearning method till it actually goes beyond its current level because they cannot let go and move on.

Big companies will keep improving the core method because most of their services depend on standardized large scale models, but any slight improvement will come at a massive cost and only these big companies will be able to shoulder those costs.

A few researchers that lack real creativity will keep churning new papers because they must publish or die, for some years to come we will see deeplearning conferences have increasingly reduced attendance as we move to another phase of technology which in my opinion will not come from advances in software alone but hardware.

Making further improvements in deeplearning will come at a huge cost financially and conceptually. It will be like trying to improve the basic multiplication algorithm from something like the karatsuba method, theoretical results that can't be replicated will keep on being published and that is how the world works.

But is this the end of AI? will we ever have AGI? I believe that at a certain level of sophistication the internet as we know it will be some kind of AGI like the OS in the movie: HER. We won't be able to point in any direction and say this is AGI, but we will feel its presence. It won't be owned by any company or individual, it won't even be like cloud computing it will be some emergent phenomena that will arise from the general sophistication of our entire technical infrastructure. But without getting philosophical and flying in the sky, let's come down to earth and see what the present and near future holds for us in terms of the development of AGI.

Program based models

It would be totally wrong to say that we have not learnt anything from deeplearning. Deeplearning models are actually generalizations of equational models and are thus limited by the capacity of equational models. What do I mean by this? If you have a system of equations that models some kind of system, the main task is to find out useful values for the coefficients that result in a solution (equals to zero) for the system of equations. These systems of equations can be represented in a matrix and then we can solve them using the methods from linear algebra.

A deeplearning system no matter how sophisticated is at its core based on such a simple system. There are so many methods for solving systems of equations represented as matrices but the extra sophistication of deep learning comes from backpropagation and its ability to iteratively come up with solutions to the system of equations that are not perfect but fairly optimal because as these systems of equations become large as they are in typical deeplearning systems one does not hope to find some global minima but one is okay with some kind of approximation of a global minimum by focusing on reducing the error or the difference between what the model says after a set of inputs have gone through it vs the actual absolutely correct values.

If you are able to think abstractly without getting too bogged down in the details you will get the gist of what I am trying to say in the above paragraphs. Summarily I would say that deeplearning models are generalizations of equational models. When you have an equation you have some kind of function that takes a set of parameters and return some kind of value as result. When you have interdependent functions you have some kind of system of equations and then you can solve all the equations at once by maybe representing them with a matrix where you solve for the solution to the equations. Deeplearning simply uses backpropagation coupled with some kind of optimizer that continuously adjusts the weights (aka coefficients) of these equations till some point is reached in the "training" process where we can say that we are satisfied with the error. But when you are talking about a deeplearning model you are talking about some kind of compiled executable that can receive input, maybe the picture of a cat, and returns an output, the confirmation that it is seeing a cat.

The question now is that can we ever go beyond these kinds of systems, input -> fixed model -> output to something similar to how the mind operates? Note that I did not say how the brain operates because all we can really know is how aspects of our own cognition operate and that aspect is what I call the mind. Even though we can use instruments to observe what our brains are doing we must not pretend that we really know how a mind comes out of a brain.

We are caged by our own thoughts and can't go beyond our perceptions, whatever concepts we have of what are minds are is actually just speculations that sound really right with us. We are using the mind to come up with an understanding of our minds and whatever imperfection we have in the structure and function of the mind will leak out to our perception both internally and externally. We see the brain, with blood and electricity flowing around. We can poke at it and observe "activity" we can summarize these activities and try to come up with a reasonable hypothesis of what is going on, but this is actually limited and as sophisticated as our equipment go we are only able to understand what our limited minds are capable of understanding.

So far the human creative mind has come up with deeplearning and backpropagation and that has been a major leap forward for humanity technologically but it is only the tip of the iceberg and there is still a lot to do.

The main problem with deeplearning systems, for now, something we can immediately start improving upon is the fact that at the core the operations that are available are too simple to model increasing sophistication. Doing AI solely with deeplearning models, no matter what the kind of model is akin to programming a computer with machine language alone. 100% possible but unnecessarily difficult and too inefficient.

The way forward is to learn from the abstraction system we have built with classical computer programming where we do not use JMP statements but loops to do control flow. The fact that we can even build deeplearning systems on modern computing infrastructure is a testament to the power of software development itself. If software development can allow us to create the necessary abstractions to perform deeplearning why don't we leverage that abstraction capacity towards building AI systems.

What I am trying to say is that the basic operations that we have available now to build deeplearning systems are limited and we need a richer set of abstract operations that are available in everyday computer programming to make progress towards building that world where AGI come about as an emergent phenomenon that is available to individuals just like a personal God, a higher self, Cosmic consciousness of whatever we call that aspect of ourselves that seems to have all the answers that we don't have and that we tend to call upon in times of distress.

One may argue that the entire sophistication of our entire computer infrastructure depends on simple logic circuits and thus that the primitive operations of deeplearning systems are necessary and sufficient for us to work towards AGI but this is not so. What I am imploring here is that we need to work on a higher level of abstraction than we are currently working in because that is how we are able to leverage the core capability of our mind, the capacity to build increasingly higher towers of abstractions. The mind likes to roam free at the top and that is when it performs best. When you can put all the lower stuff in a black box, then you can just bundle things to create even bigger black boxes ad infinitum.

In summary, we should not be manually building any kind of model but rather we should work towards a problem description language. A language that specifies that kind of problem we want to solve and then we can have a system that just goes ahead and evolves the solution.

The closest thing to such a system I propose is what is being done in Neural architecture search. But this too is limited because we have already circumscribed the domain to neural architecture. what if we did a general program search rather than a neural architecture search? Now that is something.

Another objection to this proposal is that deeplearning frameworks like keras, etc. are already allowing us to build stuff at a higher abstraction, but this is akin to using computers to solve equations like we did with FORTRAN. After all, computers were invented to do calculation work, but as we generalized further we were able to use computers for more than calculation work so languages like FORTRAN, fell out of favour for the general non-scientific public who were interested in doing much more than translating formulas into FORTRAN programs. This is what deeplearning frameworks help us do, they enable us explicitly express networks that are the solutions to problems in image recognition, speech, language translations etc. but they don't allow us to express a problem we want to solve and get the system to automatically starting building a program that solves these problems, I don't like using the word "building" in this context I prefer the word evolving.

So do we have any systems for creating program based models now in the world apart from the Neural architecture search approximation? Yes, we do, it is a system called Genetic programming. One might say that if genetic programming was powerful enough it would have surpassed deeplearning by now and one might be correct in some sense. What I am saying here is that Genetic programming is a path that if trodden with the same energy and dexterity as we have done with deeplearning will lead to automatic program synthesis which is the holy grail in my opinion that we seek.

I think our minds are so sophisticated, and just like dark matter and the universe we are only conscious of maybe 4% of what is going on in there that it is erratic to think that we can somehow manually program some software or design some neural network that is capable of doing 100% what our minds are capable of doing. I think rather than try to explicitly program the mind we should try to evolve one by searching the incredible massive space of computer programs available using a structured search system as provided by genetic programming.

If we get this system rolling, we might even outgrow the genetic in genetic programming and focus on the abstract idea of finding structured ways to automatically search for programs that do the things we want to do.

Jackreece Ejini

Search This Blog

Towards program based models

Comments

Post a Comment

Popular posts from this blog

New Information interfaces, the true promise of chatGPT, Bing, Bard, etc.

Next Steps Towards Strong Artificial Intelligence

From data based models to program based models.