Simplifying AI: The case for classification

Posted On by Jakob Van De Velde

To target or not to target, that is the question. It is not the only question, but it is the most prevalent one I have received as a data scientist. This question can take many forms, e.g. “Which customers are going to churn in the next 3 months?”, “Which customers will buy our new product at launch?”, “Which customers need frequent support after acquisition?”. Despite the variation, these examples all boil down to the same idea: “How likely is it that my customer will do X?”. So how do data scientists answer this kind of question?

Against popular belief, we do not possess fortune telling abilities, but we do have the next best thing, computers and statistics. With these tools, we build and train classification models. For anyone still suffering from math-related PTSD, no need to start shaking as this blog gives an (almost) numbers free explanation. To really reap the benefits of classification models you just need the answers to four questions:

  1. What do they give?
  2. What do they need?
  3. How do they work?
  4. How do they integrate with your business?

What they give: Probabilities

The classification model, once properly configured, will answer our core question with probabilities. To really understand how that works, let’s work out an example of customer churn aka customer loss. Using the power of fast computing we can create a statistical model that predicts for each customer how likely they are to churn. For example, we could know that our customer Mary only has a 5% chance to churn but John needs to be contacted asap as he has an 88% churn chance.

Jakob blog

Fig 1. Say hello to Mary and John, one a loyal customer and the other ready to leave.

This information can be of great value to any kind of customer retention efforts. Fortunately for me, I have never seen any statistical model that pulls these numbers out of thin air. I would be unemployed rather quickly if those existed. We need to allow our model to learn from the past by giving it the right data.

What they need: Data, data, data …

The most common way to gather information is by taking the historical data we have on our customers. In our churn example we look at data from the past year and divide customers into two groups, those that churned by the end of that year vs those that remained loyal customers. For all of them we look up the information we have at our disposal. This information often consists of some standard demographics (e.g. age, gender & address info) combined with specific information related to your business (e.g. annual spend, customer lifetime & number of support calls). When gathering data, we always try to involve those with extensive domain knowledge. They might identify some additional customer info that could influence their churn chance (e.g. customer installed an app with known performance issues). Consolidating all this information, we now have a set of features (i.e. customer information) and a label (churn/no churn) for every customer. This is exactly the type of data we can feed to our model.

How they work: Pattern learning

There are dozens of different algorithms, all capable of solving our classification problem with their own unique approach (e.g. Logistic regression, Naïve Bayes, Gradient boosting, Random forests …). Depending on the amount and type of available data, one approach might be preferable over the other. For now (elegantly jumping over the rabbit hole here), all you need to know is that all these algorithms have the same goal: “find the best way of combining customer features, so that they reliably predict the chance of churn.” In essence, training a model is a pattern finding exercise. For example, our model could learn the following patterns:

(a) older clients are less likely to churn (b) sudden increase in customer support calls prior to churning (c) chances to churn decrease as customer lifetime increases.

Given these patterns, let us look at Mary & John again. Mary (66) has been a loyal customer for the past ten years and has never once contacted our service centre while John (27) was acquired 9 months ago and already called customer support 6 times last month. With what you know now you can make the same (albeit less numerical) conclusion as the model we created: John = high risk, Mary = low risk.


Fig 2. Here we see John and Mary again. Don’t let John’s smile fool you, he’s furious on every support call.

How they integrate: From probability to strategy

In a production environment, our model can do this operation on a grand scale and return a probability to churn for every customer in your database. Crucially, this process can be repeated however often you like (e.g. once a month) to make sure you have up-to-date scores. Knowing which customers are at risk of churning, we can proactively start retention efforts based on our model output. Just like there are different algorithms, there are different ways of approaching this. For example, below are two very different retention strategies:

  1. A dedicated customer care team handling support calls of all customers with churn risk higher than 90%.
  2. A retention promo to the 10% of customers who are most at risk.

A good strategy is always tailored towards your business. Depending on your available resources and reasons for churning one approach might work better than another. This last step warrants extra attention. A model might be trained and performant, but its success will ultimately depend on a good integration into your interactions with the customer. Anything less than that is simply a waste of valuable resources. As promised the amount of numerical hocus pocus in this post has been kept to a bare minimum. Nevertheless, you now have a basic understanding of classification models. You know they provide us with probabilities, telling us how likely it is our customers will show a certain behaviour. To create these insights, it needs historical data on some of your customers to identify relevant patterns. In the end these generated insights are only useful when followed by a strategy fitting for your business. That knowledge is enough to start leveraging the power of data science for your business.

“Data is a key, unlocking the path to success. But it always requires a guiding hand to open the right doors.” Me