What is Intelligence: Software writing Software

Sometimes I wonder why programmers are hell-bent on writing programs that can communicate in natural language and not even putting adequate effort into writing programs that write other programs. Maybe is because of the natural tendency to protect one's source of livelihood by not attempting to automate it away or maybe because writing programs is hard enough such that contemplating of writing some program that writes programs might be even harder.
Natural Language Processing (NLP) is making great strides as we can see from products like Google assistant, Amazon Echo, Cortana, etc. But I wonder why no one is really putting in any effort in creating programs that write code as well as they generate text in natural language.

The most difficult task in natural language processing is language translation which has been difficult for machines to perform until this age of neural networks. But it is now possible in things like Google translate to load the text in one language and then get output in a target language.

The reason we should start putting more effort into writing programs that can write other programs is that we may not be able to achieve the much dreamed of Strong Artificial Intelligence by explicitly writing programs.

Our early attempts at building AI in the 60s right up to the 80s failed because we thought that we could build AI by explicitly writing programs that perform all kinds of cognitive tasks. These failed because of reasons that I explained elsewhere in my work so I will not go into it again. But in summary, we were trying to copy the actions of an intelligent agent and not find out the fundamental root structures that make that intelligence possible.

If we were interested in finding out about the seeds of intelligence and not just merely copying the appearance of intelligence in the actions of agents we would have gone far but this was not to happen as we didn’t yet have stuff like neural networks that are making certain aspects of intelligence possible but in and of itself is just a mere system of interacting nodes with weighted edges and doesn’t explicitly expose any explicit feature that depicts it as intelligent. 

With neural networks, when presented with a task that requires some kind of cognitive capacity like recognizing images, we don’t usually explicitly code out the algorithm that does the image recognition although this was what was attempted during the early days of AI with only minor success. What we do rather is code some system that searches for the right kind of program that is able to perform the image recognition tasks.

This in a sense is software writing software because, we build huge machine learning software systems like TensorFlow that include things like automatic differentiation, matrix by matrix multiplication, etc. Then we use this software to design what is called a neural network architecture, which in a sense is a kind of computer program in a higher level of abstraction beyond the base machine learning infrastructure software. Rather than write a program using the kinds of functions that we used to build the machine learning platform and infrastructures, we write a very narrow kind of program at a higher level of abstraction whose individual functions consists of what we call LAYERS connected in a certain way.

In traditional programming when we write a program, which is a bunch of connected functions (definitely ignoring the possibility of classes if it is object-oriented) we execute it with some input and the program takes a certain execution path, making decisions as it goes along till at the end is some output or desired behaviour.

In neural network programs, we usually have a fixed set of functions (layers) that we can use and we connect them in a fixed way (at least as of current design, this could change in the future). In traditional programming we can pass any value around depending on the goals of the program we are creating. We could pass along text, numbers or any other structure that the programming language supports. In neural networks, we pass in numbers and these numbers (weights, biases) are passed from function to function (layers) until the output is usually some other number.

While the typical goal of traditional software is producing some executable, the goal of a typical neural network is to get a set of weights that enables us to perform some task. So while in traditional software we create a software stack that takes inputs and produces some output. In the neural network paradigm, we are interested in the intermediate parameters (weights, biases) between the input and the output.

So in neural networks, we don’t explicitly write the program that does the image recognition or language translation task. We do write some fixed program though which sets up the architecture or what I personally like to call the search space, and then we search for the weights and biases, which are our items of interest in this case. The weights and biases that solve some AI task are actually the programs that we obtain through automatic search, the underlying software stack done in traditional programming languages just creates the space where these weights and biases can be searched for.

While in the traditional software systems, we have a program executable as the result of our programming efforts. In a neural network, our executable is actually the collection of weights and biases that are the result of searching the space of possible weights and biases.

One might be tempted to think that we have achieved automatic programming through this neural network search for weights and biases and in some sense one might be correct but in reality this is only a narrow kind of search because the space of weights and biases is very large in its own rights but it is only a very narrow program space in the computational universe of possible programs for doing all kinds of tasks that can't necessarily be easily framed within the neural network paradigm.

Although theoretically it could be possible to build a neural network that enables us to write any kinds of programs by expressing the results only in terms of weights and biases, but in reality the engineering requirements will be enormous and without raising the level of abstraction at which we can express such a network beyond the current paradigm of neural networks it will be very very hard to achieve.

Current neural networks have fixed architectures that do not change during training and while that is not bound to remain the same in future designs it is the current paradigm. With neural architecture search, we are beginning to automatically search for networks that perform tasks better than human AI engineers.

Neural architecture search with some engineering can lead to us to creating dynamic neural networks that change their structure during training (beyond stuff like dropout and other kinds of network pruning) which could result in searching an even wider space of possible programs because apart from searching for weights and biases, we can also search for neural network structures like layer configurations automatically during training and this would be a great addition.

But the thing is that with neural-network-like things, even if we actually start automatically searching for new network architectures during training we will still be searching some kind of predefined program space where we can only obtain programs of a certain type. To create systems that automatically create software we will still have to search some predefined space of programs but these programs will be more generic than the narrow kinds we have with neural networks.

Another thing with neural networks is that although at a fundamental level they all deal with weights and biases in actual implementation they are of wildly varying configurations such that one can barely see any solid relationship between a typical layer by layer network for doing classification and something like an LSTM recurrent neural network, but they are all equivalent systems because they work by manipulating weights and biases.

This begs the question could the current neural network architectures be a specialization of more general networks? This is the theme of this book. Because of the wildly varying types of networks in existence, I am proposing that there could be some generalized network structure of which every other type of deep network is merely an idealization. The goal of this book is to beg researchers to start seriously exploring this idea.

I think the success of neural networks and especially the deep types is actually hinting on some something more generalized, and that if we get at that generalization, we could merely mutate this generic structure in standard ways to instantiate any kind of network of choice, be it vanilla classification, LSTM or GANs.

Although arriving at a generalized network theory would be a great step in the direction of creating neural networks that we understand better, could it directly lead to things like programs that could write any kinds of software systems that humans currently have to write?like a program that can write a full machine learning platform like TensorFlow? I think even getting a generalized network system that is capable of being instantiated to any kind of network that does any kind of AI task will not be general enough to capture the structure of traditional software programs in their entirety.

When we think of software writing software, we imagine a scenario where software writes software in some traditional programming language like C or Python. But this is not the form we should be imagining although it easily comes to mind. We must remember that our high-level programming languages are convenience tools for our human minds to capture the kinds of things we want to solve with computer programs and not in any way essential to the computer itself.

If we have some super intelligent human who will not get bored writing machine code then this human will simply write any kind of software directly in machine language without the clutch of a higher level language to ease things for it.

So if we have software that writes software then it wouldn’t be writing that software in some higher level language. Rather it will be writing programs directly in machine code because unlike humans, it doesn’t have the kinds of limitation on memory and mental processing power.

You could bring up the topic of declarative programming languages as an example of software writing software and this is partially true and I think that in the future programming will be done in some kind of declarative style and then computer systems will use all the knowledge available to them to create the software that we have described declaratively. But for this to work a lot of low-level plumbing will have to be done.

When you query a knowledge engine like Wolfram Alpha or do a search on Google you are actually doing some kind of automatic programming because you pass in some input and get some output after your input has caused a complicated cascade software executions. Even though you are dealing with some kind of prebuilt programs, the idea I am trying to focus your mind on here is that you have some input, passed onto some software system and a large swath of activities goes on in the software system and then you have some output. If you generalize this, considering the idea of weights and biases as the output of some neural network system, if you were able to get code output of all the pieces of software that you activated as your results were produced, you would get a different one every time you used Wolfram Alpha or Google search to search for a new query.

To clarify the previous paragraph, remember that we said in neural networks you supply the input and desired output to a neural network architecture. After some programs like SGD operates on your architecture in the process called training you get some output of weights and biases.

Neural networks:

Input -> Weights and Biases -> Outputs

In traditional programming scenarios we have a program and we feed it some input and our goal is to get some output so while in neural networks the middle item (weights and biases) is of interest because we start with the input and the output, in traditional programming the output stage is of interest because we start with the input, build a program and then we want to get some output

Traditional programming:

Input -> Software program -> Outputs

If we think about the scenario of using a knowledge engine like Wolfram alpha or a search Engine like Google:

Search input -> Software programs -> Search output

If rather than focus on the search output we want to capture an image of all the software code that was executed to get us our search output, then we would be looking at a very naive kind of implementation of software writing software.

Now imagine a scenario where we obtain this software from something like Wolfram research or Google, probably in machine language, we could, in theory, distribute this software as a program that receives a particular search input and produces some output. Although this is not really a practical scenario, what I am trying to emphasize here is that we could get a program as an output rather than a particular search result as output and this is an example of software writing software.

So how might we achieve this kind of automatic programming in reality? The first thing we need to think of is architecture. What makes our modern software environment possible is because of the fixed hardware architecture we have.

We have some fixed hardware like a processor and memory, and then we have some kind of machine language which is able to coordinate the various parts of this hardware to give us a tool to program the hardware.

When we need to achieve higher level programs that do the kinds of things we want to do in the real world, we start building one clutch after the other to make it easy for the human mind to express its intentions as programs that a computer must execute.

The thing to remember from history is that we didn’t immediately get high-level languages like C from the get-go when we were designing computers. The human mind can see only so far and we, first of all, had to get some stable hardware system with machine code programming, and then we started building higher levels of abstraction like the assembly language until finally, we have gone beyond very high-level languages like python to computational languages like the Wolfram Language.

The thing to take out from this is that there is some fixed underlying hardware and we get a machine language that works by connecting one fixed part of hardware with another fixed part of the hardware and this is the beginning of the dynamism that we take to the highest level of abstraction where we are doing things that look more like thinking and less and less hardwired routines of the hardware.

With robust high-level languages, we are able to build fixed architectures like neural networks using the machine language of layers. The dynamic aspect of the fixed neural architectures are the weights and biases that we obtain from the training algorithms. These weights and biases are kind of like a software executable because using them we can simply query them with input and get some output just like we would do in a traditional software system.

If we want to create programs that write programs of any kind then we need some kind of generalized architecture that is capable of expressing any kind of program we want to express and then some kind of dynamic structure that we can modify either iteratively like in neural networks or by using some direct clever method to create specific kinds of programs that we can execute on a typical computer which solves problems that we are interested in.

If you view the kinds of programs humans use like Word processors and browsers, we will imagine that the code must be very complicated and indeed when expressing them in terms of the kinds of high-level languages that we have invented it indeed is complicated, but when we go down to the level of machine language code, we don’t have much of a variety of atoms, the atoms are the instruction set of the particular machine you are using.

Rather most of the complexity of the higher-level code is converted to a lot of repetition of instructions at the level of the machine language, much similar to how the complex physical world is built from an enormous repetition of just 92 types of atoms, which makes it simpler if viewed from a particular point of view. We humans don’t want to repeat ourselves ever too often so we do not like coding in machine language, we want to be able to express ourselves more succinctly so we invent higher level languages.

If you observe machine language code you will see that despite a huge amount of repetition we will have blobs of code that kind of like stick together and are repeated together in many places. This is because in solving our programming problems we find certain routines (blobs of code) useful for achieving our tasks so these blobs of code occur together.

In designing higher-level programming languages we abstract away these blobs and map it to something like a single instruction so that when we want to specify these blobs we just use a single instruction rather than repeating ourselves over and over.

As we go upward the abstraction hierarchy we keep finding blobs of code that go together and give them names. But for a high-level language to work we have to, first of all, create a fixed architecture which is the programming environment itself.

In compiled languages, we create compilers which are like an interface between one level of abstraction and the other. The compiler is a fixed architecture and gives us the environment to start writing programs in. When we have these programs and we execute them, the compiler turns our high-level instructions to low-level instructions and passes it down the hierarchy till we get to the lowest machine level.

If you take a typical software like a browser, as complicated as it is, at the lowest level it is just a sequence of machine code, all software is just a sequence of machine code. The complicated algorithms are eventually executed on the machine as machine code.

To build a software system that writes software we would have first of all build some kind of architecture, and you don't need to think too hard about this kind of architecture because traditional software systems and neural networks are just a network of functions passing data between one another.

Machine instructions are executed by a machine and we can usually bundle a blob of them together and map it to some high-level assembly language macro. We could model this assembly language instruction as a function. A function is anything that takes some input and returns some output. A software application is an executional chain of functions that manipulate their input to get some output.

In machine learning, these functions are the layers and the parameters are the weights. In traditional programming at a high level, the functions and parameters are arbitrary depending on what we are designing.

If we want to build software that writes software we, first of all, have to build some kind of environment, this is the fixed architecture analogous to the computer hardware or the neural network architecture.

In machine learning, we have supervised and unsupervised learning types of algorithms. In supervised learning, we supply training data with input and target pairs and it's left for the algorithm to figure out the weights and biases that enable us to predict with a confidence a target that matches correctly to some arbitrary input.

In unsupervised learning, we supply the data into the learning system and then its a matter of finding some kind of clustering of the data. In semi-supervised learning, we used some unsupervised system to drive a supervised system.

Just like we search for neural networks using neural architecture search, machines could search for blobs of related machine code using some unsupervised system and automatically map them to functions, these function code mappings could then be used to drive some kind of supervised learning system where some mapping as weights and biases can be learnt.

The input side will be a mapping between some kind of declarative programming statement that accurately describes at least at a high level what some blob of machine code does and the blob of machine code itself. It won't be a mapping between a function name and a blob code, but rather a full program describing the blob of low-level machine code.

The supervised system learns a representation of the input -> output mapping and after learning a huge database of code, maybe consisting of all the software in the world as machine code, we will have a fairly robust system, a system where all we have to do is describe in this declarative language the kind of program we are looking for and the system will produce some executable machine language code we are looking for.

You must also put in mind that different machines have different machine instructions and different compilers generate different types of code. Getting a unified machine format is not really much of a problem, with some solid software engineering we can get one.

With this kind of system or some better realization, we could design a system that automatically generates executable machine code. A bigger system with lots of software engineering could generate full feature software applications with GUIs, etc.

RNNs generate programs just like they generate natural language text but I doubt that this is the way forward.

This is merely pointing at one possibility for the realization of software systems that write software. The purpose of this chapter is to bring to attention this direction of research as it might be essential to the realization of Strong AI down the road.

Comments

Popular posts from this blog

Its not just about learning how to code

Nigeria and the Computational Future

This powerful CEO works remotely, visits the office only a few times a year