So What Is Advanced Analytics?

Estimated reading time: 18 minutes

Introduction

Advanced Analytics (AA) is split into descriptive, diagnostic, predictive and prescriptive analytics. The purpose of AA is to draw from internal and external sources to transform data into insight that leads to smarter decisions to spur on growth. Modern analytical advances have mostly taken place in predictive and prescriptive analytics as a result of advances in artificial intelligence, machine learning. Innovation in operations research, big data, data wrangling, ETL and visualisation technologies have similarly contributed to the rise of advanced analytics.

Advanced analytics can therefore also be viewed as a bundle of techniques used to discover intricate relationships, recognize complex patterns or predict current trends. A key tenant to advanced analytics is automation. Automation are driven by intelligent and non-intelligent decision-making systems. Automated analytics are the final big stride in analytics. In the graph below, we can see the different levels and type of analytics, from the most basic to the most sophisticated

An AA system starts with descriptive and diagnostic analytics in an attempt to describe not only what happen but also why something happened. The next analytical steps is predictive modelling and this in turn is followed by decision making and execution based on these predictions. The extent of automation increases with each successive step towards prescriptive analytics. This occurs as a result of the current success of AI in narrow decision making systems. Each of the four analytical disciplines uses statistics and mathematical techniques to achieve the proposed goals. AA technologies can help to automate decision-making by guaranteeing the quality of these decisions and their contributions to business.

Areas of Application:

State of Development and Use:

An example of a financial services firm’s self-diagnosed AA readiness across four main areas.

  Descriptive Diagnostic Predictive Prescriptive
Customer 85% 80% 90% 80%
Operations 85% 75% 80% 50%
People 80% 65% 70% 40%
Accounting 90% 70% 70% 20%

Customer Analytics

Data about browsing and buying patterns are everywhere. From credit card transactions and online shopping carts, to customer loyalty programs and user-generated ratings/reviews, there is a staggering amount of data that can be used to describe customers’ past buying behaviours, predict future ones, and prescribe new ways to influence future purchasing decisions.

Operations Analytics

Recent extraordinary improvements in data-collecting technologies have changed the way firms make informed and effective business decisions. Operations analytics include the modelling of future demand uncertainties, the prediction in outcomes of competing policy choices and how to choose the best course of action in the face of risk.

People Analytics

People analytics is a data-driven approach to managing people at work. For the first time in history, business leaders can make decisions about their people based on deep analysis of data rather than the traditional methods of personal relationships and ‘experienced’ opinion. People analytics include techniques used to recruit and retain great people. It is the sophisticated analysis brought to bear on people-related issues, such as recruiting, performance evaluation, leadership, hiring and promotion, job design, compensation, and collaboration.

Accounting Analytics

Accounting Analytics explores how financial statement data and non-financial metrics can be linked to financial performance. Accounting analytics include the use of data to assess what drives financial performance and to forecast future financial scenarios. While many accounting and financial organizations deliver data, accounting analytics deploys that data to deliver insight. Accounting analytics include many areas in which accounting data provides insight into other business areas including consumer behaviour predictions, corporate strategy, risk management, optimization, and more.

Illustration of Company Specific AA Dashboard

Isolated Problems and Solutions:

Disruptive data-driven models and capabilities are reshaping some industries, and could transform many more. Following is a list of characteristics that signals the potential of new advanced analytics approaches to improve the current state:

Problem Analytics Type Techniques
Inefficient matching of supply and demand. Operations Sales and Inventory Prediction - Gradient Boosting
- LightGBM
- Neural Network
- LSTM RNN
- Mechanical TS
- ARIMA
Prevalence of underutilised assets. Accounting Ratio Analysis / Fixed Asset Analysis - Return on Total
Assets
- Fixed Asset
Acquisition to Total
Assets
Unused demographic and customer sales data. Customer Customer and Segmentation / Psychographics - K-Means
Clustering
- HDBSCAN
- GMM Clustering
Large unstructured behavioural data dump. Customer Big Data - Random Forest
- Deep Learning
- Data Wrangling
Customers sign up but leave within a year. Customer Churn Analysis - LightGBM
- Prediction Metric
- Feature Analysis
- SHAP values
- Interaction
- PDP plots
- Correlation Matrix
Most employees get fired within three years. Employees Termination Analysis - See above
Excessive employee turnover. Employee Attrition Analysis - See above
Debtors stop paying after 6-months. Accounting Aged Debtors Analysis - Outlier Analysis
- Descriptive
Statistics
- Average Payment
Date
Uncertain effects lead to bad customer service. Operations Causal Analysis - Causal
Regression
- Regression
Discontinuity
Actual expenditures are more than budgeted. Accounting Budget Analysis - Incremental
- ZBB Budgeting
Product X are outperforming product Y. Operations Causal Analysis - A/B Testing
- MV Testing
Some customers are costly to the business. Customer Customer Lifetime Value Analysis - Gradient Boosting
(LightGBM)
- RFM Analysis
Resources constraints leads to supply shortage. Operations Profit Optimisation - Constraint
Programming (CP)
- MLP Programming

Analytics and Automationare

Analytics Automation:

Automated data analytics is essential for keeping track of the many sources of data modern organizations use today, ensuring data scientists don’t waste time working with bad, out-of-date, or incomplete data. With a more streamlined data analysis process, important opportunities can become apparent, introducing agility to big data analysis and, ultimately, increasing your organization’s business intelligence and competitive edge.

Robotics Process Automation

Tools such as robotic process automation (RPA) allow labour-intensive, error prone processes that formerly took days to now be accurately completed in a matter of minutes.

In financial services, automation in the form of “straight-through processing,” where transaction workflows are digitized end-to-end, can increase the scalability of transaction throughput by 80%, while reducing errors by half.

Some of the experiments the FCA (2018-2019) are conducting with advanced analytics over this year include:

  • Automated detection of unauthorised business activity on the internet through a variety of new technologies
  • Testing advanced Natural Language Processing (NLP) technologies and semantic language models in an effort to automate what would otherwise be manual supervisory tasks
  • Automated evaluation and detection of misleading advertising The automated processes will allow us to review the total population of high-risk markets, rather than only sampling a proportion.

Physical Robots Automation

Physical robots have been around for a long time in manufacturing, but more capable, more flexible, safer, and less expensive robots are now engaging in ever expanding activities and combining both mechanization, cognitive and learning capabilities—and improving over time as they are trained by their human co-workers on the shop floor, or increasingly learn by themselves. Already today, a range of automation technologies is generating real value. For example, Rio Tinto has deployed automated haul trucks and drilling machines at its mines in Pilbara, Australia, and says it is seeing 10–20% increases in utilization there.

Deeper Dive

Advanced Analytics (AA) contains descriptive, diagnostic, predictive and prescriptive analytics. Following is a more in-depth discussion of each of these areas.

Descriptive Analytics (What Happened)

Descriptive analytics is the first stage of data analysis that creates a summary of historical data to yield useful information and possibly prepare the data for further analysis. Descriptive statistics are a very important part of data analysis, they are useful to show historical insights regarding the company’s financials, production, operations, sales, and customers. This phase consists of tables and graphs so that the user can easily interpret the information. Some of the processes that are carried out at this stage are described below.

Descriptive Statistics

Basic statistics are a very valuable source of information when designing a model, since they might alert to the presence of spurious data. It is a must to check for the correctness of the most important statistical measures of every single variable.

  Mean Median Mode 25% Quartile SD Var Max Min Range
Customer Sales 2,200 1,190 300 670 100 12,100 10,000 100 9,900
Advertising Campaigns 4,900 2,290 2000 1000 50 2,500 6,000 800 5,200
Extra Customers PW 10 7 4 6 1.5 2.25 30 10 20
Employee Leave Days 20 22 21 18 2 4 40 10 30
Annual Interest Rate 1.45 1.40 1.50 1.30 0.3 0.09 1.70 1.30 0.4

These numbers are fictional and are simply for illustrative purposes

Exploratory Visualisations

For example, histograms show how the data is distributed over its entire range. In approximation problems, a uniform distribution for all the variables is, in general, desirable. If the data is very irregularly distributed, then the model will probably be of bad quality.

Diagnostic analytics (Why did it happen)

Diagnostic Analytics is the next level of analysis. It is a form of Advanced Analytics that is focused on determining the factors and events that contributed to the outcome. This phase consists of techniques such as calculating correlations and interpreting interactive visualizations.

Logistic correlations

In classification applications, it might be interesting to look for logistic dependencies between single input and single target variables. The logistic correlation is a numerical value between 0 and 1 that expresses the strength of the logistic relationship between a single input and output variables.

Logistic Correlation Matrix

  Customer Sales Advertising Campaigns Extra Customers PW Employee Leave Days Annual Interest Rate
Customer Sales 1 0.7 0.6 0.10 0.05
Advertising Campaigns 0.7 1 0.8 0.15 0.07
Extra Customers PW 0.6 0.8 1 0.02 0.1
Employee Leave Days 0.10 0.15 0.02 1 0
Annual Interest Rate 0.05 0.07 0.1 0 1

Logistic Correlation Matrix Heatmap

Scatter plot

This technique plots graphs of inputs versus targets. These charts might help to see the dependencies of the targets with the inputs.

Predictive analytics (What will happen)

Predictive analytics is the branch of Advanced Analytics that is used to make predictions about unknown future events. This is the most important phase of the analysis, its output is a predictive model capable of knowing what is going to happen in the future. It encompasses a variety of machine learning techniques such as k-nearest neighbours, decision trees, random forest, neural networks, etc, to identify the likelihood of future outcomes based on historical data. Some of them are explained below. See the machine learning section for more on this.

K – Nearest Neighbours Random Forest Neural Networks
K-nearest neighbours is a simple method used for classification and regression. It stores all available cases and classifies new cases based on a similarity measure. Random forests are a combination of decision tree predictors for classification, regression and other tasks. Helps to decrease the bias and variance experienced in traditional decision trees. Artificial Neural Networks (ANN) are computational models based on the neural structure of the brain. Performs well with large dataset. Deep learning occurs when the ANN has multiple layers, usually more than three.

Prescriptive analytics (How can we make it happen)

Prescriptive analysis is the last step of an advanced data analysis. It consists of the application of the predictive model to determine the best solution or outcome among various choices, given the known parameters. In this phase, not only is predicted what will happen in the future using our predictive model, but also is shown to the decision maker the implications of each option.

Example of Workflow:

Descriptive - Predictive - Prescriptive

For example, let’s assume that a retail banks sells several products (mortgage account, savings account, and pension account) to its customers. It keeps a record of all historical data, and this data is available for analysis and reuse. Following a merger in 2017, the bank has new customers and wants to start some marketing campaigns.

The budget for the campaigns is limited. The bank wants to contact a customer and propose only one product. The marketing department needs to decide:

  • Who should be contacted?
  • Which product should be proposed? Proposing too many products is counterproductive, so only one product per customer contact.
  • How will a customer be contacted? There are different ways, with different costs and efficiency.
  • How can they optimally use the limited budget?
  • Will such campaigns be profitable?

From the historical data, we can train a machine learning product-based classifier on customer profile (age, income, account level, …) to predict whether a customer would subscribe to a mortgage, savings, or pension account.

  • We can apply this predictive model to the new customers data to predict for each new customer what they will buy.
  • On this new data, we decide which offers are proposed. Which product is offered to which customer through which channel:
    • a. with a greedy algorithm that reproduces what a human being would do
    • b. using an optimization model using Google OR tools
  • The solutions can be displayed, compared, and analysed.
First understand current customers (Descriptive).

We can see that:

  • The greater a customer’s income, the more likely it is s/he will buy a savings account.
  • The older a customer is, the more likely it is s/he will buy a pension account.
  • There is a correlation between the number of people in a customer’s household, the number of loan accounts held by the customer, and the likelihood a customer buys a mortgage account. To see the correlation, look at the upper right and lower left corners of the mortgage chart.
Predict Future Customer Behaviour (Predictive)
  • First we will train on old data:

In our dataset we have numerous features (variables) such as:

'customer_id', 'age', 'age_youngest_child', 'debt_equity', 'gender','bad_payment', 'gold_card', 'pension_plan','household_debt_to_equity_ratio', 'income', 'members_in_household' ,'months_current_account', 'months_customer', 'call_center_contacts','loan_accounts', 'number_products', 'number_transactions', 'non_worker_percentage', 'white_collar_percentage', 'rfm_score','Mortgage', 'Pension', 'Savings', 'nb_products'

We will just select four now to implement a simple machine learning model.

cols = ['age', 'income', 'members_in_household', 'loan_accounts']
age income members_in_houshold loan_accounts
45 55,870 2 4
43 43,900 2 0
23 49,000 2 1
35 60,000 2 1
43 70,400 3 1

The next step is to create and train a simple machine learning algorithm to predict what the new clients will buy. We use a standard basic support gradient boosting algorithm to predict whether a customer might by product A, B, or C. Following is an example of a piece of python code for this purpose.

from sklearn import svm
from sklearn import ensemble
classifiers = []
for i,p in enumerate(products):
    clf = ensemble.GradientBoostingClassifier()
    clf.fit(X, ys[i])
    classifiers.append(clf) 
  • Next we will load and predict new customer data

Load new customer data, predict behaviours using trained classifier, and do some visual analysis. We have all the characteristics of the new customers such as available for historic customers.

unknown_behaviors = pd.read_csv("load_data.csv")
 
to_predict = unknown_behaviors[cols]
predicted = [classifiers[i].predict(to_predict) for i in range(len(products))]
for i,p in enumerate(products):
    to_predict[p] = predicted[i]
to_predict["id"] = unknown_behaviors["customer_id"]
 
offers = to_predict

This is an random sample of 5 of 2756 new predictions:

age income members_in_household loan_accounts savings mortgage pension id
38 47,000 4 1 0 0 0 44256
30 48,600 2 4 0 0 0 46883
41 42,150 4 0 0 0 1 32387
42 39,700 3 3 0 0 0 25504
42 44,300 6 2 0 1 0 35979
  • Visualisation of Predicted Data

The predicted data has the same semantic as the base data, with even more clear frontiers:

  • for savings, there is a clear frontier at $50K revenue $20K.
  • for pension, there is a clear frontier at 55 years old customers.

Some details about prediction: Number of new customers: 2756 Number of customers predicted to buy mortgages: 380 Number of customers predicted to buy pensions: 142 Number of customers predicted to buy savings: 713 iction:

Revenue Maximisation (Prescriptive Analytics)

The goal is to contact the customers to sell them only one product, so we cannot select all of them. This increases the complexity of the problem: we need to determine the best contact channel, but also need to select which product will be sold to a given customer. It may be hard to compute this. In order to check, we will use two techniques:

  • a greedy algorithm
  • Google CP Solver.

Set up the constraints

  • Offer only one product per customer.
  • Compute the budget and set a maximum on it.
  • Compute the number of offers to be made.
  • Ensure at least 10% of offers are made via each channel.

Example code snippet - 1/20th of the entire script

obj = 0
for c in channelsR:
    for p in productsR:
             product=products[p]
             coef = channels.get_value(index=c, col="factor") * value_per_product[product]
             obj += solver.Sum(tap[o,p,c] * int(coef)* offers.get_value(index=o, col=product) for o in offersR)

**Algorithm Comparison: **

Here are the results of the 2 algorithms:

Algorithm Revenue Number of clients Mortgage offers Pension offers Savings offers Budget Spent
Greedy 50,800 1123 299 111 713 21700
CP Google 72,600 1218 381 117 691 25000
  • As you can see, with decision optimization, we can safely do this marketing campaign to contact 1218 customers out of the 2756 customers.
  • This will lead to $72.6K revenue, significantly greater than the $50.8K revenue given by a greedy algorithm.
  • With a greedy algorithm, we will:
    • be unable to focus on the correct customers (it will select fewer of them),
    • spend less of the available budget for a smaller revenue.
    • focus on selling savings accounts that have the biggest revenue

Marketing campaign analysis

  • We need a minimum of $16K to be able to start a valid campaign and we expect it will generate $47.5K.
  • Due to the business constraints, we will be able to address 1680 customers maximum using a budget of $35.5K. Any money above that amount won’t be spent. The expected revenue is therefore $87K.
Scenario Budget Revenue Number of clients Mortgage offers Pension offers Savings offers
Standard 25,000 72,600 1,218 381 117 691
Minimum 16,000 47,500 825 374 142 309
Maximum 35,500 87,000 1,680 406 155 1119

Conclusion

Simply put, traditional analytics are not sufficient to remain competitive in a world ignited by machine intelligence. Advanced analytics offer companies and agencies the ability to obtain state of the art insights from small strategic technology investments. Traditional analytics exposes your company or institution to unnecessary uncertainties and avoidable risks. In the future we can expect the development of automated advance analytics to lead to easy to use drag and drop interfaces. Advanced analytics will keep on expanding in functionality and in ease of use. It will improve in parallel with artificial intelligence, machine learning, operations research, big data and visualisations.