What is intelligence: What is intelligence?

What is intelligence?

I will define intelligence not by using more words but by describing the actions of an intelligent entity, system or model. This is a far better approach because definitions in a natural language just try to use different words with which we are more familiar to describe terminology which we are not familiar with. Rather than following that approach which in most cases leaves us more confused than we started I will rather describe the actions of intelligence. I am writing this book with a special focus on people actively working in the field of Strong Artificial Intelligence (towards AGI type) because it seems the biggest problem in that field is actually describing the problem that they seek to solve. People in other fields who are curious about the question of Intelligence both natural and artificial are also welcome.
Without further ado let us proceed. If an intelligence, whether it is running on a biological substrate like in a human being or running on some synthetic substrate like a computer is faced with the problem of finding the solution to a Sudoku puzzle, this is one way it will most likely go about it. The actions that it performs in order to solve the problem encodes the main algorithm that we call “intelligence” by understanding this roots of this process, we can focus more of our efforts on engineering the cause of intelligence and not its effects. 

Once we have access to the cause of intelligence, that which lies underneath the observed effects of intelligence demonstrated as intelligent action, we gain the real power to design systems that can evolve those features that we describe as intelligence.

First of all, we must take as a given that there exists at least one entity, probably the inventor of the Sudoku game who knows what the solution of the game looks like. This entity knows how to solve the game that it invented. It used intelligence! Let us say that the game creator is intelligent, but it wants to test out another entity to see if that entity is also intelligent.

There is another assumption we must make which is that the inventor of the game possesses some kind of naive procedure for solving the Sudoku game. But it gives the opponent (the entity whose intelligence is being challenged) an extra challenge that to impress it the opponent must perform way better than itself. 

So, the challenge is on and the opponent listens to the instructions of the game (the rules of the game) and then sees what a solution looks like. Then the opponent proceeds to demonstrate intelligence by solving the puzzle from just the instructions and the goal of achieving a solution that it had been shown.

If we view this situation clearly we will see that there is a problem to solve which is the starting point and there is a goal, the endpoint. In between is intelligence which tries to achieve the goal, which is solving the problem. Intelligence, in this case, is applicative. It is applied to solving a well-defined problem which in this special case is solving the Sudoku puzzle. This is a powerful definition of intelligence which can guide our thoughts on how to achieve it on a synthetic substrate beyond the human brain.

The first step would be that the opponent will proceed to try all possibilities randomly thereby inventing a naive algorithm for achieving its goals. We must also assume that the opponent which we can also call an agent will perform its moves with perfect statistical randomness. That is there will be no pattern or structure to its actions.

Practically, it is near impossible for a human agent to act in a manner that is perfectly random, there will always be some structure to the actions of human agents. It might be motivated by deeply held superstition unconsciously motivates action, some predilection for making certain choices influenced from its nature or nurture, or some other random input that had been received previously that will now influence its current actions. But for the purpose of defining an absolute base for intelligence, we will say that the agent’s actions are statistically random which is the base, the lowest level of intelligence.

With this view in mind, we can see that intelligence is equivalent to "structure"! And structure can be viewed as anything above pure statistical randomness. I say anything, which can include actions or information. If an action or some bits of information demonstrates any kind of structure beyond pure statistical randomness, even if we do not understand the goal for which such action or information exists, then it encodes intelligence.

An agent tasked with solving some Sudoku puzzle, acting with randomness will place each number from 1-9 on one of the 81 squares to obey the constraint that each row and column contains any permutation of the sequence {1,2,3,4,5,6,7,8,9}.


Above is an image of a completed Sudoku puzzle.

Below is a typical partially completed sudoku puzzle


The difficulty of solving sudoku depends on how much pre-information already exist, that is how many squares have already been filled. But a peculiar feature of the game is that a puzzle with no prefilled squares is very easy because the constraints are simpler but with some squares filled the difficulty rises. This is just the nature of this particular game. In some scenarios, a lack of pre-information might make it more difficult to solve a problem. We must assume that the challenger must have filled in just enough squares to make the hardest possible puzzle set up ever.

The reason why we choose the board setup that is the hardest is that we want to challenge the intelligence of the agent to the maximum. We can see that being able to solve the hardest possible board set up in the most efficient manner will demonstrate the highest level of intelligence possible for this specific task.

Usually, when we think of building Artificial General Intelligence, what is in our mind as we think of this task is to find out the most general algorithm or procedure, that will enable us to solve any problem whatsoever. The error that arises in our mind is that we think of a “procedure” that will enable us to solve any new problem that we are faced with.

There is another band of researchers that don’t really care much about finding out the most general procedure possible. They just want to build systems that solve a particular task that we are faced with at any point in time.

Many of us are aware of the endless battle of words that have raged on between these two camps of people. The more practically minded people will usually go forward to build systems that solve specific tasks while the more theoretical will search endlessly for the master algorithm (procedure).

Looking at these camps from a different point of view, I will posit that the real problem is that we have been viewing intelligence from the wrong angle. They have been looking for a “procedure” that enables us to solve every kind of problem. The AGI band have been on the lookout for the most general procedure while the AI band have been looking for specific procedures for solving specific problems.
Rather than looking for procedures, we should rather look for representations. The representation of a problem, that is the format in which we express the problem is more important than any specific procedure or algorithm that we employ in solving the problem.

In the space of procedures, there is no super general procedure for solving all kinds of problems. Different problems will require different kinds of procedures but different problems can be represented in the same way, which is a more powerful way of thinking about problems. 

If we found the ultimate representation for every kind of problem, then applying general algorithms that manipulate this representation until a solution is found will lead us to Strong AI
Without much ado, I will declare that the most powerful and most general way of representing any problem is in the form of a Network. A Network is simply a collection of nodes and edges


Each circle above with a number within is a node while each line connecting the circles is an edge. As you can see in this particular image there is a disconnected node and an edge that loops on a particular node.

So, the race to build synthetic intelligence has been focusing on procedures, this is responsible for the bifurcation of the intelligence research community into the pragmatists focused on more specific problems and the theorists focused on the problem of general intelligence. But at this point, we can end the arms race if we chose to with the recognition that it was not really about procedures but representations. The network is the most general representation format for problems and can be viewed as General intelligence.

This is responsible for the fervent almost religious following for Artificial Neural Networks. ANNs which are just a specialization of the more generic network systems has brought us some of the most powerful results in modern times when it comes to the problem of building systems that approximate human cognitive abilities.

Let us go back to our initial problem of an agent trying to solve the sudoku puzzle. We must assume that the agent can understand the constraints of the game which is communicated to it in the manner that it understands and that all the mechanical or mental movements that are required to move from problem to solution on the game board are already understood.

The reason for these assumptions is because when we are trying to solve some complex problem by building a model. We temporarily minimize the importance of certain variables so that we zoom in on the current ones which we are interested in to gain understanding. In some other model, the very method of moving the actuators that place some number in a square or move the data from place to place in some electronic system might be of greater importance. But for now, we must operate at the most abstract possible level, we must focus on Problem -> Application of Intelligence -> Solution. And ignore a lot of details about how intelligence is actually applied.

You don’t really need to know so much about sudoku puzzles to understand this writeup all you need to keep in mind is the abstract idea that there is an initial state for the game and an expected solution state (final state), how you proceed to this final state is really not important all that is required is that it must satisfy all the necessary constraints of the game and that the solution is recognizable as one.

So, using purely random brute force actions, which is the most unintelligent thing to do, the opponent tries all possibilities of placing numbers in squares until a solution is achieved. Depending on the initial setup of the board and of course chance, the brute force method takes little or near infinite time to solve the problem. In a board that is configured with only a few missing pieces, the solution using the naive brute force algorithm will take a little time but when the board becomes so difficult because only very few squares have a number in them, it becomes ridiculously hard to solve with just trial an error. This is where greater intelligence will help because with a more efficient algorithm encoding a particular intelligent solution, the opponent can cut the time required to solve the problem.

To enable the opponent to implement a much more efficient algorithm rather than just eyeballing the board and using what is no different from brute force trial and error. The opponent proceeds to analyze the problem on a “memory device”, which in this case is pen and paper.

One must remember that in our brute force assumptions we did not require that the opponent have any kind of memory so therefore there will be repeated moves that had failed in the past. We are assuming raw random action, just selecting moves randomly with no memory of whether a particular configuration has been selected in the past.

We have made certain assumptions here about prerequisites. The first one is that there is something called writing and a pen and paper and the capacity to encode communication in symbols of some language. These must have come from somewhere, the intelligent product of other entities.  Always remember that we must forget some details when we attempt to understand complex systems, this is an attribute of intelligence too, but we will come to that later.

So, in order to prove intelligence beyond brute force, the opponent uses the memory devices of pen and paper perform a leap of intelligence which is recording its moves so that it doesn't repeat setups that had failed it in the past. So each solution to a particular board configuration is randomly generated but once generated the agent records it in the memory device of the pen and paper. 

When the agent reaches an impasse in gameplay beyond which it cannot proceed, it checks its records, applies some intelligence to comparing the current game state to previous game states and tries to avoid repeating steps that it thinks might have failed it before. This is some kind of rudimentary application of intelligence.

In the next step of its demonstration of intelligence, the agent proceeds to analyze sudoku formally and because we must assume that this entity possesses intelligence it invents a “mechanical procedure” for crunching through Sudoku problems. So to clarify things a bit, the agent possesses intelligence, but we do not know how much so we are challenging it with tasks that require intelligence so that we observe its actions with the goal of deducing the cause of the intelligent action.

The goal of our method is to observe the agent demonstrate its intelligence in many different tasks so that we can get at the root of what that intelligence is by seeing the most common structure in all its activities. You must note that our goal is not to simulate its intelligence by building another agent that mimics its actions, our goal is to find the general structure of its intelligence and from there the most generalized structure of intelligence, by observing the root structure of its actions in specific tasks.

Another error we must not make is that as we observe the intelligent agent demonstrate it's intelligence we are not trying to deduce a general procedure that it applies in all tasks because if we use common induction to say that if the agent acted in this way in 10,000 different situations then this is what its general procedure is and thus let us go ahead to replicate it, then we get shocked when it faces the 10,001st situation and does something completely different then we have to abandon our "general" procedure.

My theory is that if we can see how the agent represents the problems it encounters then we can use any kind of procedure to manipulate that representation till we find a solution rather than copy the procedure of the agent, "Don't give me fishes, teach me how to fish". We assume that the opponent in this sudoku game is intelligent but we do not know how much or even if this is the highest level of intelligence possible. We don't want to get stuck mimicking an intelligent agent because we don't know if that agent is the maximum of possibility. We want to get at the root of intelligence so that we can apply it in any way we can. We must always keep in mind that any intelligence we observe in any entity is just an example, an instance, and not the most generalized representation of intelligence.

Well, moving from random action to using memory devices to avoid repetition to eventually perform game analysis was fast but in order to get into the meat of intelligence, we have to take several things as a given. Things like paper, pen, writing, intelligible symbols, etc. all these narration and description is an attempt to peel away the effects so we can get at the originating cause. It is this originating cause that we should attempt to understand so we can build. Although I have said that general intelligence is actually the network, it will take some exposition for this very abstract conception to make sense to us.

The mechanical procedure is a general method thereby it enables the agent/entity to not just solve one Sudoku problem but all possible sudoku problems. Even the random movements can be started first on a memory device as a mechanical procedure before it is executed by the actuators of the agent. The reason we are talking about an external memory device is that we know that there are hard limits on the typical bioresources of an agent. This is because we are restricting this discussion at this point to a biological entity. If we assumed that the agent was not limited biologically then we could assume that it has infinite and perfect memory. But we are trying to understand how we can replicate human intelligence in non-biological substrates so we must, first of all, try to understand intelligence as limited to the human condition.

The process of inventing a mechanical procedure that enables an entity to solve Sudoku (or other problems) is the core process of intelligence. This can actually be viewed as an effect of the intelligence itself because, the mechanical procedure, whether it is brute force or some well thought out procedure like convex optimization is not actually intelligence itself but an expression of it.

Brute force can be done blindly through trial and error and does not involve any intelligence if the actions do not deviate from statistical randomness. When we add memory to brute force, we have taken a step forward in enabling brute force with intelligence and thus it is no longer totally unintelligent because it has a memory of its mistakes and avoids it the next time. Assuming that the agent has some objective function that requires it to perform better over time.

If we are to take speed to solution as a major goal beyond just solving the problem, because the agent could spend eternity dancing around the board if it keeps repeating mistakes over and over due to the random nature of its action, then using the paper and pen as memory to track actions that have already been taken is the first optimization we apply to the problem of finding a solution.

By avoiding past mistakes, we have taken a step from totally chaotic unintelligence to intelligence using memory as a device. Memory adds structure to the actions executed. There are other problems that arise from the board problem like searching the board for a slot that is not filled and using some method to determine the best way to fill that slot, etc. But we will ignore this for now, not because they are not aspects of intelligence but because we want the highest and most abstract goal-oriented view possible that leads us to the deepest representation of intelligence.

To speed up the process of finding a solution to a sudoku problem the agent/entity/opponent could decide to take his external memory manipulation up a notch beyond basic pen and paper towards a system that requires less of its efforts in a physical sense.

With the pen and paper, the agent will have to write down the intermediate symbols, assuming that some intelligent subprocess of this being has already found a symbol system and a set of rules for using the symbol system to represent the state of the board and all the transformations of the board that lead from the initial state it encounters all the way to a final solution of the puzzle and supposing there are many mechanical steps to consider during the evaluation, the amount of pen and paper needed will increase.

The entity could use mental memorization to reduce the amount of paper and pen used but we are aware of the fact that this entity is probably biological and limited on many fronts when it comes to memory capacity. Another problem the entity could encounter would be the fact that when given a new problem the entity will need new sheets of paper to solve it and this will result in massive use of resources. An optimization that can be applied at this point would be to try to reduce the amount of paper consumed in the process of elaborating a procedure with which to solve the problem.

Smaller writing could be employed, thinner pen strokes, more symbols written per page, until the entity hits the limits of their apparatus or gets bored to death with all the details of using these optimizations. A more succinct symbol system will a large alphabet for representing things could be invented to pack more information per symbol. These optimizations are only exploring one dimension in the space of possible optimization of writing a symbolical procedure (program) for solving the Sudoku puzzle.
What is this dimension we are currently dealing with? We are still merely dealing with brute force which involves trying all possibilities of board placements without much structure except by remembering a previous configuration that has been tried and avoiding that in the future to save time. The only extra tool we have is a pen and a paper to write down a procedure for memory sake so that we do not repeat ourselves in the future.

As we focused on the dimension of paper, pen and symbols we also tried to apply optimizations like increasing alphabet size to reduce writing repeated symbols and to be more succinct in our expression, reducing the font size and even font thinness. We also thought about other paper/pen/symbol-based optimizations like increasing character density.

We are not limited to searching for solutions in these dimensions alone, there are other dimensions we can explore and intelligence somehow comes up with ways and methods of exploring other dimensions for solutions to the problems that we are confronted with.

About alphabets
I have been mentioning alphabets without trying to explain in detail what they are and as such only our computer science people would have understood what I meant. Let me explain more so that we all understand how optimizing the character/symbol dimension could lead to more succinct expression.

When you hear the word alphabet, you shouldn’t limit your mind to the alphabet of your native written language but you should expand your understanding to include any visual representation that is distinctive from normal background noise, that is different from randomness. If one sees some squiggles on the floor or on some wall, it could either be an intelligible symbol written by someone or something to convey some information or it could just as well be generated by natural processes.

Without going too general we want to limit ourselves to the concept of a symbol as is drawn out by some human entity or represented by some other means. A symbol usually belongs to an alphabet and an alphabet contains all the symbols that belong to it.

A simple alphabet is an English alphabet which constitutes 26 symbols. These symbols are arranged linearly in groups to represent words, delimited by spaces, that have meaning to someone who can speak English. We have other collection of alphabets like the Hebrew alphabet which contains 27 symbols, 22 original i.e representing one type character and 5 repetitions representing certain characters in the set of the 22 characters but have a different construction. While the English alphabets are used to represent words, the Hebrew alphabets also represent numbers or as computer people will say have ordinal values. There are many other characters like those that constitute the Chinese language or Russian.

There is a system of characters that is used to represent numbers called the decimal symbols and the first three of them are 1,2,3. In computer circuits we use just 0,1 to represent the dual state of Off and On which is the basis of computation executed on a digital computer.

The number of symbols an alphabet constitute of determines how succinctly the alphabet can represent information. By succinct I mean how economical or how much less repetition is required to express different concepts.

In a simple alphabet like the 1 and 0 used in computing we can have arbitrary long strings of characters representing every concept imaginable such that we could have a unique string of a particular length representing every single concept imaginable, but this is not practical and thus words of limited length need to be created to represent concepts. For instance, if a single symbol like the 1 or the 0 is denoted as a bit then with 8bits we can represent 256 individual “things” or we can say we have 8-bit words that represent 256 individual items.

We can see that with only 2 symbols in our alphabet we will have to repeat the symbols over and over again to represent individual things that we can identify in the world but with more symbols in an alphabet, we could represent more things with fewer symbols.

In the English language where we have an alphabet of 26 characters, even if we choose to limit the length of our words to 8 characters we still have 26 to the power 8 possible characters which are 208,827,064,576. We can see that by merely increasing the number of uniquely shaped symbols we can represent more things with a fixed length world.
Theoretically, we could represent everything we want to identify with a unique symbol but this would be inefficient too just like representing everything with infinite length strings of 1 and 0. We have to strike a balance between the number of symbols and the maximum length of the words that can be written in that symbol system.

These all boil down to the paper/pen/symbol dimension that we were trying to optimize earlier. Suppose we choose our symbol system and design words that represent states of the sudoku boards and actions that can be performed on those states that will lead us from initial configuration to final goal, we will still be limited by the symbol system that we choose and even though we might optimize variables like length of word, number of symbols in alphabet, written size (as concerns expression on our memory medium, which is pen and paper in this case) we will still run into some limits and thus would have to seek other dimensions to express our mechanized procedure.

The next obvious step would be to build some kind of mechanical computer with gears and cranks whose input is a sudoku problem and the output is the solution. The problem must be transformed into a proper representation so that it can be input into the mechanical systems of gears and cranks and then the output might not be some print on paper but might be encoded in some representation that eases the construction of the mechanical computer.
I said the next obvious step in the previous character because I am trying to mimic the actual evolution of computation in human society. The first attempt to mechanize the computation of solution to problems in human society first started with the attempt to do it on paper. With paper and pen, the human had an external means of representing its thoughts and manipulating it in an objective and well-defined way. Symbol systems started from a simple tally system for counting sheep to more sophisticated writing systems and eventually mathematical symbols.

With paper and pen, everything would be done on this medium by writing and re-writing the symbols that represent the abstractions we are concerned about. Eventually as some aspects of mathematics could be seen as very concise and repeatable, humans decided to build machines that would carry out some of the rigorous calculations involved as far as these calculations could be defined exactly.

This led to the construction of many mechanical systems for performing this boring repetitive computation tasks. Using an abstract symbol system, the human could create representations of the problem, representation for a machine that consumes symbols and performs fixed actions per symbol, and a system for outputting the results of the computation as symbols representing the solution to the problem.

In the paper and pen world, the entity might choose to represent in the paper the exact physical sudoku board. Then go ahead to solve it in this abstract “representation” and then return the solution as board positions on the real sudoku board. This process of abstract representation, manipulation and output is the core of computation and in the middle phase, the manipulation phase, two things are needed, memory to hold that which is being manipulated and mechanism of manipulation that transforms the symbols of the system from one state to another till a state that encodes the solution is arrived at.

In paper and pen systems the encoding can be a direct visual representation of the board or it can be more abstract in the nature of some formal systems. Mathematics is the most common system for creating formal representations but one could use any kind of systematic representation system as far as it is consistent and makes efficient use of the symbols that are required to represent the problem.

As we go from pure visual presentation to more abstract representations of the problem the economy of space and memory increases, but the difficulty of understanding such a system also increases because so much background knowledge is required to understand increasingly abstract systems.

In the mechanical system of cranks and gears (Babbage kind of thinking) we can either go for a highly specialized system that solves only sudoku or a more generalized system that solves a wider range of problems as far as they can be represented in the form that the machine can handle. The more general computation system might be more difficult to design than the highly specialized one.

We must also note that the formal representation we developed in the paper and pen phase can be applied to the design of the mechanical system. The abstract computation is done on pen and paper using standard mathematics or some other formal system like the one for solving general constraint satisfaction problems.
Constraint satisfaction problems look like the sudoku problem we have encountered earlier. They require that we solve some problem in some domain while making sure that certain constraints are satisfied. There is an algorithm for solving constraint satisfaction problems (an algorithm is a formal description of some mechanical process for moving from a problem to a solution). The formal description of the constraint satisfaction problem solver (CSPS) can be moved from being manipulated on paper and unto some mechanical crank and gear device. If we succeed in building a crank and gear system that runs our CSPS algorithm we have succeeded in building a mechanical computer.

All that is required after now is just to represent our sudoku problem in a form that is acceptable to this machine and then it will crank out a solution. Maybe we will have to supply the physical energy needed for this through the manipulation of levers or some other physical solutions for making the system move.

As we go along, I hope you are noting the constant mention of representation! This is very important as I said earlier that representation is super important because it will determine not only how efficiently we will solve the problem but how many types of problems can be solved with our chosen representation system. In the world of mathematics, the most common representation method for a problem is in the form of equations that capture the equivalencies in the problem and solution domains.

In the problem of sudoku, mathematics might not be the most efficient or even easy representation to represent the problem so we go into the field of formal constraint satisfaction for help. It is my intuition that the most general representation format for problems is in the form of networks. This is why neural networks are so powerful for solving problems that are hard to specify by hand. The learning algorithm we choose to use is not as important as the network that is used to represent the problem itself. This is because networks enable us to represent any kinds of problems and solve that problem by modifying properties of the network itself like the weights between the connection of the nodes. Neural networks require that the input to the network be numerical as well as the weights between the nodes that constitute the networks but nothing stops us from building arbitrary kinds of networks that represent any kind of problem using any kind of symbolic weights and inputs. The core idea to hold in mind is that all that is needed is computation nodes and weighted edges that represent the importance of a particular pathway in the representation of a problem as the solution is computed.

This might not be the most general description possible but it suffices for now. The goal of this work is to steer attention toward this direction. 

A graph is a specialized form of a network with special properties that are designed to achieve certain goals, but it is also a network and many problems even full computer programs can be represented as graphs and these graphs can be manipulated with certain algorithms to go from problem presentation to problem solution.

The entity who was charged with demonstrating intelligence beyond brute force has elevated its status to that of an intelligent being because, and this is very important and you should pay attention: It was capable of analyzing the problem “systematically” and thus built “tools” that enable it to produce an automatic system for arriving at a solution.
Systematically here can be viewed as capable of being mechanized or highly structured and as such can be followed blindly. This is the brunt of the work of intelligence. The entity who created the system that solves its problems might still be mentally incapable of solving the problem efficiently without the system it has created. It will still use what I call its “coping” algorithms to solve the problem if denied of its mechanical system, and these coping algorithms are no different from brute force trial and error with some memory of course.

Now if the entity wishes to become a master sudoku solver, it can analyze the game thoroughly but this time around not with the purpose of finding a mechanical system to solve it automatically but with the view of improving their own skill. But there is an upper threshold for how much “skill” the human master sudoku solver could attain. This so-called skill is based on several variables, the amount of memory the master is able to hold over the problem in their brain and the amount of memory they can access from their mental memory stores and the speed with which they can manipulate this memory (manipulation implies referencing memories and changing them).

The human coping algorithm for doing almost everything is highly memory intensive and less computationally intensive. This is because, in raw switching power, the computer beats the human brain. It is this switching speed that determines how fast things can be accessed and manipulated. The coping algorithm which the human entity uses to solve any problem it encounters is just a pattern recognition → action system. You can add to maximize reward in reinforcement learning parlance.

In the sudoku problem, this pattern recognize -> action system can be seen as: recognize board configuration pattern, fill in the square with the appropriate number making sure the constraints of the game are obeyed, storing the new pattern as a solution. Or in more detail, it could be: recognize board configuration pattern, search memory of past encounter of this pattern and the solution discovered then. If encountered before use past action sequence to solve the current board configuration. If not found search using some kind of algorithm or efficient procedure for a number that will solve the current board configuration obeying the constraints of the game.
The entity recognizes a pattern from memory and then searches a kind of table mapping patterns to actions and then selects the proper action that the pattern encountered elicits. I think it is this knowledge that gives the deep learning community so much confidence in the power of their system but there is a lot more that enables a human function beyond this basic system.

This basic system is a memory intensive system that operates on as little power as possible, enabling the entity to navigate its environment in a meaningful way, preserving its life. The coping algorithms of the entity enable it to get by in a natural environment in a very simple way, obtaining food; finding a mate and general surviving. All this is done by recognizing a configuration of events “a pattern” and choosing amongst a list of actions to perform in response to these patterns. In the most basic form, the list of actions represents simple instinctual programs pre-encoded in the genetic function of the organism. It can be as simple as recognizing opposite sex, activate procreation system. This meta-program can include subprograms that contain very specific instructions on how to move towards the mate and initiate the actions that are required towards obtaining the mate.

A program for obtaining food might be:

Recognize food item (mostly colour coded in nature to attract)
has this food item been consumed before?
If YES, was it a pleasant experience? (What was the reward obtained)
If YES, then activate muscular systems to move towards obtaining the food item.

There is also the NO pathway like:

has this food item been consumed before?
If NO, does it look attractive? (attraction is usually pre-encoded in the entity and requires very little analysis)
if it looks attractive, move towards it and try to pluck it off its tree
then eat a little bit.
does it taste good?
If yes continue
if no spit out and throw away
also store the memory in this manner
This fruit of this colour, shape and form does not taste good therefore when encountered do not eat. And also, be cautious about fruits that look like this.

The adventurous will try out other fruits that look like this while the cautions will eliminate a large swath of fruits that look like this from their diet.

This is the basic “coping” algorithm of human entities in its highest representation. This coping algorithm or our natural “naive” algorithm can be extended beyond satisfying primal urges to satisfying the need for social status or even in situations like playing a game that has been structured by another human being. This ability to create games is actually an act of higher intelligence beyond the coping algorithms. The coping algorithms usually possess intelligence too but when we think of intelligence, we should think of it as being in a spectrum. The lower end of this spectrum is trial and error brute search, while at the high end of the spectrum is inventing Constraint satisfaction problem solvers or Alpha-Beta search that is used to play chess.

We have used Sudoku as a major example but other games like chess can also be solved using an extension of the human naive general algorithm of pattern recognize → perform an action. In Sudoku we can use a search system of finding in our memory the best number to write in a square, usually, we are doing some kind of brute searching through our pattern stores in the brain before we bring out a single solution of where to place a number on the square. Usually, we may find that our number is the right number and does not need to be changed until the end of the game, or we may find out that certain other board configurations make our placement invalid and thus we need to adjust. Some steps of our manipulations take place in our heads while others take place outside, like actually placing a number in a square.

If the memory of the entity is limited, they will usually resort to doing more of the computation externally using the paper and pen as memory aids. When we move beyond using this our basic algorithm of pattern recognize → the action. We go into applying intelligence to designing highly structured system like CSPS or Alpha-Beta search, etc. These systems have the quality that we do not need to do much manipulation in our heads. Everything can be done explicitly on paper with a pen or in a mechanical geared system.

I know that I have been using the word intelligence carelessly and the more rigorous amongst us might grumble, but this more because of the limitation of language than any other thing. We could break intelligence down in very basic parts and view the collection of parts as a whole later and form our own independent view of what intelligence should consist of. At the beginning of this work, I stated that I would not be defining exactly what intelligence means. This would require that I use a bunch of English words to define an English word and I do not find this very satisfactory. Although the title of this work is: WHAT IS INTELLIGENCE, it is more of an exploration of the topic of intelligence in such a manner that the practical AI/AGI person gains a foothold on a path, beaten or new, that they could follow on their way to building synthetic intelligence.

I have found out from my exploration of the sciences that certain problems have remained insoluble for long periods of time mostly because the problems were not well defined. Or people were trying too hard to force all the facts to fit the existing model. It usually took some kind of wild individual to come into the field and define a new model which eventually explained or represented all the facts in a much better way.

This is why I am exploring intelligence in this work, looking at stuff that has been overlooked, challenging assumptions, providing insight and reviewing stuff that has been underestimated or basically clarifying certain ideas.

Intelligence can be viewed from an execution point of view as something that has to act out to solve a problem or a representational point of view as something that can represent larger classes of problems. There will always be a specific procedure for solving every imaginable problem in every imaginable domain. But chasing after these specific procedures is an endless game and is one that currently being encouraged by the most practically minded researchers in the world.

We do have to solve intelligence and we will eventually achieve that but we will do much good to ourselves if we try to solve the seed of intelligence and build fewer chatbots or talk too much about “consciousness” which drives us into a corner of words without meaning.

Even the word intelligence itself is difficult to fully understand in a direct way so therefore I will be talking of the attributes of which we are so used to ascribing to that which we know as intelligent.

Solving sudoku can be seen as an intelligent process. An entity, model, system or agent solving sudoku can be seen as possessing intelligence, and demonstrating intelligence or possessing intelligent attributes or intelligence as an attribute.

If our goal is to build synthetic intelligence then we must go beyond demonstrations of intelligence and into the core of intelligence. But as we go into the core of intelligence, we must take off our “bird flight copying to create flying machine” instincts and rather put on our “Wright brothers thinking helmets”. We are closer to building synthetic intelligence we just have to solve intelligence using intelligence in a systematic way. The first step is getting the representation right, and networks are the most powerful representation for any kind of problem, and thus networks and the operations on a network are the most basic structures of general intelligence.

When we don’t tame our “bird flight copying to create flying machines” instincts, it leads us to copying the effects of intelligence rather than getting at the core. We try to copy high level observations in programs of our own making and hope that somehow we will arrive at the core of intelligence by solving one grasped feature after another of the many changing characteristics of demonstrated intelligence.

The electronic computer teaches us a lot about how we should think about the problem of creating general intelligence. When operating at the highest level of the computer architecture abstraction like watching a movie or browsing the internet, we are apt to forget that at the bottom of it all is simple stupid unintelligent looking structures. Even when we are using our fancy Amazon Echo or Google Home, with their demonstration of simulated intelligence, we forget that the machine instructions at the bottom of these machines are simple and stupid looking and have no attribute of intelligence at all.

Most computer architectures can do very few things, apart from fetching stuff from memory and writing stuff to memory and some very simple logical stuff that can all be handled by a single logical operator, the NAND gate, there is nothing more to computing. Whether it's your Apple Siri, an enormous computation platform like Wolfram Mathematica, NASA supercomputers or those gigantic machines in research labs like CERN, all these are done with the simple computation primitives.

The power of general computing lies in our ability to build abstraction above abstraction using simple memory manipulation operations. This is all there is to computation. The quality of the programs we write depends on our creativity, in this case, we use the computer as paper and pen and the code we write on it could be anything of our conceptions for executing activities in the World.
It is exciting that we have come back to this simple realization with Artificial Neural Networks. When programming grew in abstraction we already assumed that humans could hand engineer programs that defeat humans at every conceivable task. Programming languages like LISP were designed for the specific task of giving the human programmer the ability to express programs that were intelligent. All the while we didn’t know that we were appealing to our “bird flight copying” instinct. We were trying to replicate the highest level features of intelligence rather than discovering the seed of intelligence.

Replicating the high-level features of intelligence was very rewarding in the short term because we could simulate intelligence with enough accuracy to deceive ourselves and others that we were onto something. But sooner or later this approach began to fail in our eyes. After the failure of gigantic expert systems like Cyc, and the ridiculous difficulty of hand engineering features for image recognition, we understood that our approach had come short and that going back to the foundations was the pathway to rediscovering the methods of engineering intelligence.

With simple systems like Artificial Neural Networks (ANN) where a simple network system with weights between edges and simple computational nodes performing basic addition and multiplication defeating hand engineered systems at image recognition, the path to true general intelligence has been opened up again. Whether we will achieve it this time around or veer into the wrong path like many are doing now will be determined by the future. ANNs have revealed to humanity that manipulating properties of nodes and edges is the foundation of designing general intelligence.

The path of failure that we are veering into is assuming that current ANNs and their method of manipulating nodes and edges are already all that is required for general intelligence, if only we could build hardware that enables us to have more “Neurons” as they are called we will achieve human grade general intelligence even leading to the “Singularity” a point where computers will exceed humans completely in capabilities in every imaginable field.

So we are on the wrong path with this kind of reasoning, rather than studying network-based structures fundamentally due to the success of ANNs like Deep Neural Networks, and understanding their capacity completely by trying and experimenting with different modes of operations until we get something that could be the foundation of full general intelligence. We have short-circuited our progress by stating that Current ANNs and their simple additions/Multiply operations are enough to express singularity grade general intelligence, all that is needed is more Neurons.

I won’t even talk about the ridiculous idea of Brain simulation for building artificial intelligence which is the most direct representation of “copying bird flight instinct” still resident in the minds of modern humans. Studying the brain for medical purpose or purely scientific reasons is a noble cause and even being able to simulate the brain itself is a great achievement. What I am strongly against is the idea that copying someone’s brain will give us access to their consciousness and the mind-upload crap goes along with this.


Popular posts from this blog

Its not just about learning how to code

Nigeria and the Computational Future

This powerful CEO works remotely, visits the office only a few times a year