What a CEO needs to know about Machine Learning algorithms

During my first project in McKinsey in 2011, I served the CEO of a bank regarding his small business strategy. I wanted to run a linear regression on the bank’s data but my boss told me: “Don’t do it. They don’t understand statistics”. (We did not use Machine Learning but, 7 years down the road, I still believe we developed the right strategy).

Download pdf

Artificial Intelligence is the most general-purpose technology of our time. New products and processes are being developed thanks to better vision systems, speech recognition technologies or recommendation engines based on Machine Learning. In fact, most recent advances in Artificial Intelligence have been achieved in the area of Machine Learning.

Long before McKinsey, in 2004, I started my career as a mobile software developer. At that time I had to write precise instructions for every step of my code. Developing the voice recognition system of today’s phones would have been tedious and error-prone back then. It would have required literally hundreds of thousands of detailed instructions to codify every single step, including identifying phonemes from sound waves, grouping them into phonetic words, looking them up in a phonetic dictionary, codifying the meaning with predefined blocks, identifying an answer to the question through a gigantic predefined semantic decision tree …

Building Amazon Alexa back in 2004 would have been impossible. And it was impossible: Alexa was released in 2014. Machine Learning is a technological breakthrough which allows developers to write software that can learn how to solve a problem without needing to provide step-by-step instructions to it. It is a revolution.

There are three major types of Machine Learning methodologies:

  • Supervised learning: the machine learns by looking at examples which are already solved. This is how you prepared for your math exams in high school.
  • Unsupervised learning: the machine learns by identifying similarities and differences in patterns. This is how you learned that a tree is different from a cat or a house.
  • Reinforcement learning: the machine learns by trying to maximize a reward. This is how you learned to play Super Mario by trial and error.

A machine’s narrow understanding of a very specific problem does not involve broader intelligence. At least not yet. This is the biggest source of confusion about Artificial Intelligence.

Supervised learning: how you prepared for math exams in high school by reviewing already solved problems


There are many teaching methods. Different teachers use different approaches. Some prefer to teach theory, others prefer to encourage students to practice. One method I found particularly useful when I was at high school was looking at already solved math problems in order to learn how to solve similar problems.

Supervised learning is a collection of algorithms that do exactly that. They can learn how to solve a problem by looking at a lot of examples of the correct answer to that particular problem. This process is called training the algorithm. The solution to the problem can be either categorical or continuous:

  • A machine can learn to infer which future customers will default on their loans and which ones will not, by looking at characteristics of historical customers and their loans, for which we already know the outcome. This is a categorical outcome: default or no default?
  • A machine can learn to estimate the price of a new apartment based on a historical data set containing transacted prices and characteristics of apartments such as size, number of rooms or post code. This is a continuous outcome: what price exactly?

There are many kinds of supervised algorithms but most of them are different variants of decision trees, regressions, support vector machines and neural networks, which include deep-learning.

decision tree is a flowchart-like structure in which tests on different attributes are made subsequently until final decision is met. Simple decision trees are highly interpretable and are widely used by managers. But when the machine tests for thousands of attributes over hundreds of branches things can become too complex for a manager. Some examples of these more advanced tree-based algorithms are random forest, which improves the accuracy of a simple decision tree by averaging multiple decision trees, o gradient boosting trees which also use multiple decision trees in a sequential way in which each tree is trying to correct the errors of the previous tree.

Regressions are a very popular family of supervised algorithms. Due to their simplicity and numerous applications in the field of economics, they were the core part of the statistics curriculum during my MBA at Chicago Booth.

Regressions try to fit a formula with a certain shape on your data. For example, if you think there is a linear relation between your sales and independent factors such as competition prices, your advertising spend, promotions or even the weather, you can train a linear regression algorithm with historical data for which you already know your sales. The algorithm will find the exact parameters of the linear formula, which will be used to predict future sales. There are linear, quadratic, exponential, logarithmic regressions and many others. A very useful kind of regression is the logistic regression, with a simple exponential formula, which you can use to predict a binary output instead of continuous output: will my customer churn or not, based on all the data I have about her?

Support vector machines are typically, but not always, used for categorical problems, for example, identifying the gender of the person on a picture, or whether a customer is price sensitive. Imagine you could represent your customers on a bi-dimensional graph in a way that price-sensitive customers are above a certain boundary line and non-sensitive customers are below that boundary line. A support vector machine does exactly this using hundreds of dimensions and boundaries of any shape.

The last and most interesting kind of supervised learning algorithm is neural networks. My professor used them to recognize characters in car plates or in paper ballots when I studied engineering.

Neural networks try to replicate how our brain works. In simple terms, neurons are organized in layers in a way that each neuron is getting some input from other neurons in the previous layer and is giving some output to neurons in the next layer. The final aggregated output after many layers is the output of the algorithm. When you train your neural network with a lot examples, the formulas that dictate how the output of each neuron is calculated will learn the right values, which will lead to the right predictions. See below an example of face recognition.


Simple neural networks have evolved into deep learning networks, which have a significant advantage over earlier algorithms. The performance of simple networks improves as the number of examples in the training data set increases but only up to a point after which additional data does not lead to better predictions. Deep learning does not seem to have this limitation.

There are two main kinds of deep learning networks: those which can remember the context and those which cannot:

  • In those neural networks which cannot remember the context, each prediction is independent of former predictions. They are called convolutional neural networksand are typically used for image processing, for example, to diagnose a disease, such as cancer, based on a medical image, to find your logo on social media, or to detect defective products from images from your product line.
  • But some problems require the neural network to remember the context because the value of each prediction will depend on previous inputs or previous outputs. These neural networks are called recurrent neural networks and are typically used for applications that involve a time-series such as language translation, chatbots that can address customers’ inquiries, or identifying a series of transactions that constitute a fraudulent behavior.

Additionally to the algorithms as such, a number of mathematical disciplines is often used in machine learning, both supervised an unsupervised, for example time series analysis and graph theory:

  • Time series analysis allows us to understand sequences of events and predict future values or to detect anomalies. Daily sales, the stock market and human language are time series.
  • Graph theory allows us to analyse and model pairwise relations between objects like for example how we interact with our Facebook friends.

Unsupervised learning: how you tell trees apart from houses or cats

Humans are excellent unsupervised learners. We can learn to tell the difference between a cat, a tree or a house with little or no training, just by looking at many of them and inferring they are different things. But it is difficult to build a machine that works this way. Unsupervised Machine Learning is far less developed than supervised Machine Learning for the time being. When unsupervised learning becomes effective and widespread, it will involve a quantum leap for Artificial Intelligence.

When unsupervised learning becomes effective and widespread, it will involve a quantum leap for Artificial Intelligence


Unsupervised learning is typically used when you do not know how to classify or cluster your data and you need the algorithm to find patterns and classify the data for you. These algorithms are used, for example, to segment customers or to recommend a product to customers based on these segments.

There a two main kinds of unsupervised clustering algorithms: those which require you to know a priori how many clusters you need and those which do not. For example, if you know you want one hundred customer segments, algorithms such as K-means clustering or Gaussian mixture models will group your customers in one hundred clusters in a way that the clusters are as different as possible from each other and elements inside each cluster are as similar as possible to each other. Otherwise if you don’t know the number of clusters, algorithms such as hierarchical clustering will use decision trees to form a classification.

Beyond clustering, unsupervised learning can also be used for feature extraction or anomaly detection.

Reinforcement learning: how you learned to play Super Mario by trial and error


Some of my friends in the 80’s and 90’s managed to get to the final screen of video games such as Mario or Sonic the Hedgehog in a matter of days just by playing for hours and hours. This is reinforcement learning: trial and error.

Sometimes you do not have a lot of data and the only way you can learn about the environment is by interacting with it. This is when the third kind of Machine Learning algorithms comes handy. Reinforcement learning algorithms learn to perform a task simply by doing it and by trying to maximise a reward. You will need to specify the rules of the game to the unsupervised learning machine: the current state and the goal, the list of allowable actions and the constraints to those actions, as well as the reward system.

For example, IBM’s Deep Blue used reinforcement learning to learn chess and eventually defeated the world chess champion Gary Kasparov in 1997. Other applications of reinforcement learning include war-gaming (for example optimizing your trading strategy), optimizing the driving behavior of self-driving cars, optimizing pricing in real-time, and many others. Neural networks can also be used as reinforcement algorithms.

The most complex business problems I have been confronted with required reinforcement learning: for example managing exceptions in the maritime industry.

“Captain Kapoor is steering a 1,200 feet long vessel loaded with 10 thousand containers from Manzanillo (Mexico) to Shanghai at 17 knots across the Pacific ocean. Captain Kapoor starts playing with the tip of his long black moustache: the weather report is bad. Storm ahead. Despite his 20 years of service at sea, he cannot avoid a slight feeling of nervousness. He does not only need to deliver the ship, the cargo and the crew safe but also needs to make the most economical decision for the shipping company when an exception like this storm occurs. Any decision’s cost is magnified by the huge size of the vessel and can reach millions of dollars. Captain Kapoor is an old sea dog, not a business man!

Shall I speed up and call Shanghai as planned? But the faster I sail the more bunker I consume. Shall I call my next port, Hong Kong, first and call Shanghai afterwards. But soon is the Golden Week in China. Shall I just cancel Shanghai? How much will the penalties for delay be? Or can the company arrange a smaller ship from my next port to Shanghai? How many containers are due in Shanghai? Are they all full? Shall I just slow down and keep the course of the vessel?

Fortunately, Captain Kapoor’s ship is equipped with a reinforcement learning system, called Dooriya (a fictional name), which simulates hundreds of options and recommends the safe option which leads to the lowest extra cost for the company. Dooriya concludes the best option is to skip Shanghai and to call Shenzhen instead, then to call Hong Kong. The regional Head Quarters in Singapore agree with Dooriya. This time, there were not many containers to Shanghai and alternative transport can be shared with another company. Captain Kapoor orders the crew to change course south of the storm toward Shenzhen. Although he will never admit it, the Captain breathes out reassured”


Machines can be exceptionally good, sometimes orders of magnitude better than humans, doing very specific tasks, such as identifying cancer from a medical image. But the fact that a computer can detect cancer more accurately than a doctor does not mean the computer can even understand what cancer is, can learn how it develops or how to stop it or what it means for humans. A machine’s narrow understanding of a very specific problem does not involve broader intelligence. At least not yet. This is the biggest source of confusion about Artificial Intelligence.

Disclaimer: Opinions in the article do not represent the ones endorsed by the author’s employer.


Image Credits:

  • Human with a green binary background, from http://www.networkworld.com
  • Math Teacher from gettingsmart.com
  • Image recognition through deep learning from edureka.com, Ashish Bakshi
  • Super Mario from thenextweb.com
  • Cat, tree and house from decorreport.com

Link to Linked-in

Link to Medium

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s