How to become an AI researcher

Artificial Intelligence is all the rage these days. Everyone is getting on the bandwagon but there seems to be a shortage of AI researchers everywhere these days. Although many people are talking about doing AI not many people are actually doing AI research.


We may or may not be having another AI winter soon but from that large volume of hype going on around the internet with mostly unrealistic claims about the technology the likelihood of another AI winter is increasing.

The demand for Real AI researchers is not going to drop anytime soon, we have at least a good decade more of demand even if the AI winter comes to wipe out the fake AI people from business, just like it did with the bogus e-commerce sites during the dot-com boom. So despite the hype, it is really a good time to be learning about AI or just getting into software and programming in general. The demand for highly skilled technical people is not about to drop anytime soon.

With that said the rest of this post will be about helping you know what you need to know to become a real AI researcher.

If you look at the history of AI from the 1950s you will notice a trend. In the pre-deep-learning days, when we did what is referred to as GOFAI (Good Old Fashioned AI), you will notice that researchers came up with system after system that tried to directly replicate some aspect of human cognition. AI was not public utility back then, it was done in esoteric research circles and although some large organizations sponsored and made use this research it wasn't something a being without a PhD could easily get into. Although there were smart people on the side doing some solid research, it was a field mostly dominated by academia.

But deep learning changed all of that and now it is possible to get on something like Google Colab and train some model. This is a good thing because, for the first time common people could put a finger on the most advanced kind of research being done on earth but it has brought about unrealistic expectations and many people are abandoning the need to study the entire field of computing because they think that training a deep learning model has qualified them enough to call themselves AI engineers or even researchers.

The accessibility of AI these days is both a gift and a curse because as well as bringing esoteric computer science research to the everyday person, it has also given people a false sense of what they can accomplish by just training models without a solid knowledge of computer science. And this belief is being courted by some of the masters of AI who know how hard doing real research is but for maybe commercial reasons are choosing to downplay the difficulty of building robust AI systems.

I am not implying that real AI research can only be done by PhDs, this is not the case in these days when MIT has its entire content online. Any dedicated being can gain PhD grade knowledge without actually owning that title, although the discipline gained from following a PhD program is a good thing in and of itself it is not a prerequisite to doing high-quality computer science research. If you want to play with Genes or do particle physics research, then a PhD is a prerequisite because apart from just knowing you will gain practical experience in handling very delicate systems and following esoteric techniques that may not be publicly available. But this doesn't mean that you will not be able to keep up with particle physics by reading research papers and with good mathematical skills you can even bring about discoveries, to do actual research in a lab you will need to follow the discipline of gaining a PhD.

The same cannot be said of computer science where your laboratory is an actual computer. And with a good memory, solid math skills, discipline, and some tenacity you could do solid research and even achieve new discoveries.

Apart from teaching people how to train deep learning models, many online AI courses forget to emphasize the fact that the AI/ML aspect of any software project is only a small part of the entire project and that there are many other layers of real software that is needed to deploy any system that includes AI as a subsystem.

Although many people are attempting to teach AI/ML as an independent thing separate from regular computer science, I do not think that this approach will benefit in the long run if your real goal is to become an AI researcher, not just a data scientist. I think the term data scientist is less glamorous than artificial intelligence so many people opt for Artificial Intelligence when what they are dealing with is just data science.

AI is a specialization of general Machine learning which is a big field in and of itself. The topic of AI is complicated enough that it can be taught independently of computer science. You can learn to train machine learning models without a background in computer science. But to do AI research which involves making discoveries, fundamental paradigm-shifting discoveries or incremental, you must know more than training machine learning models.

Apart from doing pure AI research, without a solid background in computer science, you will not be able to engineer a robust software system that uses an AI/ML as a subsystem. If you follow those "AI" courses very well and understand the mathematical preliminaries and the rudimentary programming that is needed to understand the code what you become is a data scientist, able to take some input data and either design your model or use some pre-existing model to understand the data and perform a whole range of tasks that might be of interest to you.

I'm I implying that learning how to train machine learning models in what is classically taught as AI is not beneficial? No! This is not what I am saying. Many people see an AI course or Book and pick it up. Most of the time they have really high expectations of the kind of knowledge they will be getting from these educational materials. They usually think that at the end of their AI course they will be able to solve all kinds of fundamental problems with AI but at the end of their course they find out that what they have actually learnt is mostly training models with no idea about how stuff like an AdamOptimizer or a Relu unit actually works or why it is even necessary or even the real reason why we use Stochastic Gradient Descent and not some other optimization algorithm. Even if these things are covered in the course without some deep knowledge of certain other computer science things you might never gain real intuition and just have undigested information.

Most courses carry information that says: hey this is a Relu unit and we can use it as an activation function for a neuron. Students will swallow this information and be able to reproduce it at a request, but the details of why Relu and not Sigmoid will be very difficult to understand and it is this understanding the will enable build robust systems because you know what's going on and not just see things as a black box.

Another reason for understanding computer science and programming, in general, is that AI is just an application of the core ideas of computer science and is not a separate entity on its own, although it is large enough to be learnt on its own you won't make any progress without a solid knowledge of the fundamentals.

Many AI teachers tell their students to not really worry about the maths involved in AI and just know that this is a COST function that monitors the LOSS and our goal is to reduce the loss. This can be swallowed and reproduced by a student without any need for deep understanding and can really help you design models that receive input data and make some predictions and although this is enough for non-technical people who just want to get a basic understanding of what AI is all about it is really insufficient if your goal is not just to learn how to train a model but build an App that uses the model as part of some software pipeline.

Many people compare AI to programming like, AI has come and we do not need to write programs any more! We just call the AI and there is our result. But the truth is that AI is like Classes in object-oriented programming. The appearance of classes did not invalidate the need for functions and claiming that we have Classes now and there is no need to know about functions is a terrible idea. Actually, Methods are just functions that are attached to a Class so you will need to know about functions.

AI is like that, you cant throwaway computer science and math. You will need some understanding of these to build robust systems that utilize AI in their pipeline. There are some people that just want to understand what this AI thing is all about. These are people like CEOs, Managers and all other kinds of non-technical people. All these AI courses and books that de-emphasize the math and computer science required are meant for these people. I am not saying this in a derogatory manner, what I am emphasizing on is that if you want to do real AI research/engineering then you have to go way further than that.

You must have noticed that I have used the terms AI engineer and researcher interchangeably. This is because somehow in a fundamental sense, they are the same thing, but of course, there are subtle differences. While an AI researcher is more like a science type who is just seeking for discoveries, the Engineer is more like a bridge builder or some aerospace engineer. They are using science to build useful stuff for people.

But in the field of AI is hard to distinguish engineering for research because the end product is usually some useful system. There is no hard line between them because an advanced AI engineer is usually doing some kind of research too, while the strict researcher which is a really rare creature is just exploring the fundamentals and limits of AI without the desire to put out any real-world products that can be used in a direct sense.

When I mention researcher in this writing I am referring more to the advanced engineer who apart from designing AI systems and implementing them in some software pipeline, is also capable of doing fundamental research. Unlike the other hard sciences like physics, the AI research tools are easy to reach. Math knowledge and a computer are all that is needed to be an AI researcher.


Now that we have gotten all that away in the first part of the writing lets get to the details of what you need to become an AI researcher.

In the early days of modern AI, post-1985, you needed to have some PhD or some other highly technical qualifications to venture into AI. You couldn't get into AI by just going online and watching a video or reading some article, the web as we have it today did not even exist. AI was the outcrop of some very high-end computer science and engineering research. The problems tackled by AI were those that no other paradigm handled properly. And, although we had been through an, AI hype earlier on in the 60s and 70s AI was still some very advanced stuff that was not easily communicated to the layperson.

With the resurgence of deep learning in this current iteration of AI, we now have something whose design is clear cut enough for it to be communicated to any kind of person without introducing too much terminology. The main reason why AI people found it very difficult to communicate their work to the layperson, in the beginning, was because they were working on very fundamental issues at a very low level of abstraction. The direction AI would take was not very clear and everything was still experimental. Rather than come up with stunning infographics that would explain their work as you can find everywhere on the web these days, they communicated their models in mathematical language and pseudocode.

As AI became concrete, libraries were released that enabled researchers to share their code and rapidly code up new experiments and systems. Papers were still published because most of these researchers had a scientific background, mostly PhDs from prestigious institutes. And as soon as a PhD is minted from some institution, they are hastily grabbed by the major software companies in the world. Although a lot of researchers still do their work within academic boundaries, many are found in Google, Amazon, Microsoft, etc.

Due to the enormous expense of doing AI research, large expensive computers needing to run for weeks or months, it was necessary that large companies pick up from where acadamia had developed. Companies needed to produce results that were commercially viable so most of these researchers had to work on stuff that was profitable to the company, Google developed image recognition to great degrees because it powered their search engine and other companies developed AI software too.

Academic papers are still published and even increase as the popularity of AI is being fueled by big companies making grand annoucement through their media outreach departments. All these contributed to so much good because people outside academic circles are now exposed to some fundamentals of AI but it also fueled the hype as people who were not very knowledgeable about the technology took whatever announcements were made and used their imaginations to create all kinds of fantastic scenarios.

As the AI bandwagon raced along and many people became interested it was pertinent for people with AI knowledge to start creating courses and other educational materials to educate the public on what AI was all about. This happened because libraries that were used by researchers were growing in abstraction and thus ease of use.

You could just download Tensorflow, PyTorch, Caffe, etc. and start building some models. Most of the early AI courses simply taught people how to use these libraries to build AI models for image recognition of which the "hello, world" program was MNIST image recognition.

Popular AI libraries/Frameworks

In the early days, these libraries were very complicated and I would say required a lot of specialized AI knowledge but with the appearance of full-featured AI infrastructures like the one included in Wolfram Mathematica and Keras a very powerful library now included with Tensorflow, everybody has a chance at AI because it makes it super easy to use create models and use prebuilt models. In my opinion, the Wolfram AI system included as part of  Wolfram Mathematica is much elegant because it is built on top of the elegance and simplicity of the Wolfram Language. Python is the language of Choice for Google's own Tensorflow and I am suspecting that Google is begining to stir programmers towards Swift which allows for more efficient implementation especially for programmers who still implement AI algorithms in C++.

Python is one of the most popular programming languages in the world for doing AI related stuff and although personally I love python because it was the first language I really got to start writing programs in, I think Wolfram language is simpler and more powerful especially for data related stuff because it is functional and symbolic from the start and thus enables you to do things that Keras has only recently started to do in its functional APIs.

As a programmer, you should not be tied to one language and to one way of thinking and I think you should have Wolfram language and Python/Tensorflow/Keras under your belt because you don't want to limit yourself to just one platform as different scenarios might require different strategies. Keras is now integrated into core Tensorflow as of the recent 2.0 so you can use it directly without downloading and installing the core Keras.

The main advantage of using Wolfram language is its simplicity and elegance, especially for a beginner. The very powerful notebook environment allows you to perform computations more flexible than the python Jupyter environment, of course, the Wolfram notebook was first. Another reason to use Wolfram language is superior visualization which is far easier and more powerful than anything you will be doing with Mathplotlib, the basic visualization system used in most Jupyter/Python environments.

The highly integrated environment allows you to do work with a single flexible modality, instead of importing another library to do plotting and another to do data science work like Pandas. You never need to import anything in the Wolfram language as almost anything is available instantly without installing any library.

For a beginning AI researcher, I would advise you to start with Wolfram language even though later on you will want to experiment with other systems. The language is very forgiving and fits in so well with your mental model of thinking about things, especially AI things, that you don't have to upward engineer the system to match your mental model. This is exactly what Keras was built to do for the base Tensorflow system which was mostly used by experts in the field.

The main advance of also learning Tensorflow is because of the Google Colab platform which allows you to do real machine learning live and online on Google's Cloud platform. There are other cloud providers like AWS and Azure, etc. In AWS you can run Tensorflow but Azure is Microsoft's own stack and you might have to learn a Microsoft language like C# to interact with their cloud.

But since the focus of this post is mostly for beginners who are trying to get into AI I would say to get your hands on some practical training as soon as possible either get the Wolfram system or go to Google Colab to start.

At the beginning of this post, I emphasized that you needed some kind of general knowledge of computer science to get started. If you don't have any knowledge of programming and your goal is to self educate yourself to the hight where you will be able to do real AI research and gain the capabilities needed to build full software systems then I will recommend my book: How to learn programming. In the book, I help you navigate the sometimes confusing field of software engineering and help you to avoid many pitfalls that many people will fall into when trying to get into this field.

To learn the Wolfram language there is a free book online you can start with: An Elementary introduction to the Wolfram Language. You can run the programs in the book online without installing anything on your local computer. To learn the second fundamental language of AI which is python you can simply go to to learn the core of the Python language. But there is another skill in python you need to start thinking properly about AI and that is computational thinking. Doing computational thinking in the Wolfram language is very easy. If you want to learn computational thinking in python then you should get this book: Introduction to Computation and Programming Using Python by John V. Guttag. The reason I recommend this book is that apart from drilling you in python, it will introduce you to data science, which is what you will be doing in AI. 

When you have built some knowledge of programming then it will be time to get your hands on some real AI book. I would recommend that you start with This book requires that you have some good knowledge of mathematics but the mathematical preliminaries sections go into the details of the math that is required for deep learning which is the fundamental of modern AI.

If you have no knowledge of math at all then you have to do some warmup before reading this book. Understanding Calculus is a good thing because the core algorithm of deep learning, Back propagation uses the concept of derivatives. For a mild introduction to calculus, I recommend Khan Academy. But if you want a good college level treatment then I recommend MIT OCW.
In the Deeplearningbook, the treatment of linear algebra and Probability is sufficient.

The Wolfram Language has good documentation on how to use their platform for AI. If you are opting for python as your first language then I would recommend getting Francois Chollet's book: Deep learning with python. Of course, he is the author of Keras and a Google researcher so you know you are getting the best.

With solid foundational knowledge, discipline and creativity, If you follow the path I have laid then you are well on your way to AI mastery.


Popular posts from this blog

Its not just about learning how to code

Nigeria and the Computational Future

This powerful CEO works remotely, visits the office only a few times a year