The best language for data science

If I had not seen Wolfram language then I would have said: LISP is the best language to do data science. But thankfully Stephen Wolfram sought a general method of dealing with complicated integrals and built Wolfram language to do so. It turns out that this symbolic language called Wolfram language could  do more than math, after all, math is the lingua franca for science, which is our best method for describing the World and thus any language that enables you to do math in the most efficient way without adding too much "fat" would be the best language.

Well in this post we are focused on data science because it is the most interesting field of application for computation in recent times. In the introductory paragraph I said if I didn't know about Wolfram language I would choose LISP but why?

LISP was invented to solve AI, and it was a simple language whose goal was to process list data structures. Although it was built on the fundamentals of computing and thus could be made to express the solution to any computational problem independent of its use for AI, the intention behind its creation was to do AI and it sought to do this through manipulating list structures.

On the other side of the equation was FORTRAN (aka Formula Translator) which was designed for the sole purpose of doing scientific computing. While LISP failed to deliver the AI dream and thus was abandoned by the mainstream despite possessing very powerful ideas, and also it hugged a lot of memory, FORTRAN won the day! Most languages like C, the ancestor of C++, Java and Python are based on the kind of thinking that FORTRAN fostered. I am not ignoring other side influences behind C like its direct ancestor B, I am focusing on the fact that the paradigm of programming that FORTRAN represented was passed on to its descendants.

Although LISP failed at giving us AI, not for the lack of sophistication in the language itself but for the fact that the problem of AI was not well understood at the time and thus was doomed to failure independent of sophisticated tooling, LISP gave us a very powerful data structure, the list.

In the days of C, the array was the number one container structure and a string was created by getting a pointer to the beginning of a character array. You build list data structures by performing some magic with pointers and structs. Other things like sound and images could be handled as byte arrays. Lists were not a fundamental part of the language but could be built on demand to handle problems that needed the list data structures.

C did what it did very well and enabled us to build things like other programming languages, Java, Python, etc. and most importantly operating systems. Python and recently Java gave us lists as a datatype built into the language itself. Java used its Class system to give us some kind of generic Array that could act like lists on demand. But the list data type was more explicit in python and part of the primitives of the language. It was because of the presence of lists in python by default that made many people fall in love with the language.

The power of lists in programming is that it can enable you to create data structures very easily. The lists serve as containers for the data and code that operate on the list give you an interface. By defining routines that enable you to interact with data in a certain way you can create data structures suited to solving different kinds of algorithmic problems.

With the advent of modern deep learning, data has come to the forefront of all programming activities. While in the previous era before the internet we had a lot of code performing clever operations on little data. In the present AI age, we have a lot of data, way more than any amount of code we can write. Our goal now is to write programs that operate on datasets much larger the code itself.

This modern AI age began with the glut of data that came about as the web exploded in size. Since the Web age, there has been an increasing demand for handling all kinds of data that never existed before. Databases could store data better than file systems but to utilize these data to provide solutions to problems would require bringing them into memory and manipulating them with some programming language and thus languages with very efficient container structures like lists are in greater demand.

The web initially brought a boost in text processing requirements and languages like Perl raged for a while, but as data became less about just text Perl fell out of favour.

The AI age made the data-centric age concrete. This was not just about data storage that databases were doing very well or data transfer that tools like JSON were enabling to happen in a more efficient standard manner, it is the analysis of raw data that is the root and foundation of the modern AI/ML datacentric age.

                                                                    *   *   *

Whenever we have a large amount of raw data about anything, the thing we want from this data is insight. Insight gives us predictive power. The more data we have the more predictive power we have and thus the more intelligent we can act in response to new scenarios.

Object-oriented programming came to help us solve the problem of building large scale commercial software systems. And although if you were into algorithm development you were better of using a functional language, object-oriented programming with its promise of turning everything into an object that holds some data and can perform some actions was dominant especially when it came to building graphical user interfaces and certain classes of software.

As the data glut continued and data analysis was still mostly about pure statistical analysis, languages that had a cleaner separation between data and code became more important. Python was the natural choice because it allowed you to switch between object-oriented style and functional style of programming. When it comes to data work you need to have your data separate from your code so that code can operate neatly on data without side effects, purely functional languages excel in this.

As we shift from pure data analysis and more towards AI/ML the need for languages that can handle numerical data efficiently has increased. This is because, for most AI models to work, data must be made numerical before it is passed into the model.

This has brought the list data type into greater dominance because Tensors (Multidimensional arrays) which are containers for holding data that is passed into AI models are based on lists. A Tensor is just a list of lists of lists... Although there are efficient underlying implementations of Tensors in things like TensorFlow, Numpy and other frameworks, it is still a list of lists.

Wolfram language based on roots like APL and LISP fixes the problems inherent in those languages and adds a lot of powerful features. Wolfram language is symbolic, meaning that everything is a symbol. Without going into details of the language, a symbol is a blob of anything.

A function is a symbol, a list is a symbol a character is a symbol, etc. How useful are symbols? You could pass them around and manipulate them through standard interfaces. Programming in Wolfram language is about manipulating symbolic expressions. This is a very powerful idea, one you cannot fully grasp until you have spent some time learning the Wolfram language and unlearning some bad habits inherited from other languages.

Let's look at this idea a little bit. A list in Wolfram language is a symbol that can hold other symbols separated by the comma. Anything that can operate on a list will just operate on it and not worry about what the list contains, The list could contain images, functions, or a symbol representing your entire datacenter.

In Wolfram language, a symbol is the most literal description of a thing possible. Symbols allow things to be what they are without evaluation. You build expressions by combining symbols and these symbolic expressions can be treated as independent blobs of stuff that you can pass around and transform.

A symbolic expression has a structure and the power of the Wolfram language comes from the ability to transform these expressions using rules. This might sound all complicated but once you get into the pits of programming in the Wolfram language, the power of these operations are enormous.

I always tell newbies to the Wolfram language, that the language will lead them to programming freedom. It is customary for newbies, and I was once a newbie, to try to make the language work in the restricted fashion that they have been used to working when using other languages. And the Wolfram language, being very forgiving, will simply allow you do your inefficient stuff without complaining too much, but as you become adept, which happens very fast compared to other languages you will be able to perform feats of software magic that will outstand the most adept of programmers in traditional programming languages.

Wolfram language is the best language for doing data science because apart from being purely functional it is also symbolic. Being functional it enables you neatly separate data from operations allowing for clearer programming.

Complicated data structures like tensors need no special constructs. You could handle a tensor directly in the language itself rather than installing any special libraries. Natural conventions that you are used to in regular programming transfer well to data science and especially AI work. You don't need to learn a special interface to interact with data of any kind of complexity.

Powerful symbolic constructs enable you to transform data in powerful ways without having to worry about the details of memory management. If you have a 3TB file on your hard disk all you need is to create a file and operate on your data. Due to the symbolic nature of the language and the mountain of automation beneath, it will enable you to perform operations on that file in the best way possible, bringing in as much data to memory as can be handled by your system any point in time. You don't need to start performing tricks to deal with the memory limitations of your system.

If you want to read sensor data you are just presented with a symbolic representation of your device, functions allow you to perform operations that the device can accept by operating on the symbolic representation of the device. You can pass your device into a list of devices arranged in a structure that can be a symbolic representation of a meta device with other devices as components and you can create rules that operate on these structures in any way you desire.

When creating neural networks you can either use pre-built nets which are brought in as NetModel symbols and perform operations using standard interfaces. Or you can create your own computational graph using either raw graph primitives available in the language or using the NetGraph interface to work at a higher level of abstraction specific to neural networks.

Another major win for doing data science in Wolfram language is the highly advanced visualization system. I have experienced visualization systems in other languages and I can tell you that it is nothing compared to what I have used a lot in Wolfram language. It is not really the designers of the visualization library that did a bad job, they are obviously very smart people and must have done their best, it is the language that limits them and makes you require too many hacks to get something to work.

The biggest example is Matplotlib, and the same goes for other non-Wolfram visualization systems, which are very quirky to use for someone like me coming from the sophistication of Wolfram language.

When doing data science, you want to be able to quickly visualize a lot of stuff very fast. You don't want to start adjusting your data too much to fit the requirements of the library. If some piece of data is important then quickly having some pictorial representation of that data as fast as possible, almost as easy as just writing a single line of code, is very important as you optimize your models. 

As an example, below we obtain some embeddings using the pretrained GPT Transformer trained on BookCorpus data. The data we get is a list of list, actually 6 lists with 768 elements which we can see by calling the Dimensions function on our embedding data. To get some sense of the data in the 3rd line we call MatrixPlot on the embedding to obtain some visual representation.

Below is some piece of code, 2 lines of code exactly that performs style transfer from one image to another. In the first line of code, we call the NetModel function to retrieve a pretrained net: "AdaIN-Style Trained on MS-COCO and Painter by Numbers data. The output of this function is a beautiful visualization of a NetGraph. You can head over to the Wolfram Cloud and try this code out now and see the other more interactive features you get by hovering on the individual pieces of the graph.

In the second line of code, we simply feed data into the NetModel and perform the style transfer.

Below is the final output of the style transfer process.

Above is a visualization of ResNet 50, a deep neural network for image classification. I have not seen any visualization system for deep neural nets that is as beautiful as this. If you know any, please let me know in the comments.


Popular posts from this blog

Its not just about learning how to code

Nigeria and the Computational Future

This powerful CEO works remotely, visits the office only a few times a year