So What Is Advanced Analytics?
Estimated reading time: 18 minutesIntroduction
Advanced Analytics (AA) is split into descriptive, diagnostic, predictive and prescriptive analytics. The purpose of AA is to draw from internal and external sources to transform data into insight that leads to smarter decisions to spur on growth. Modern analytical advances have mostly taken place in predictive and prescriptive analytics as a result of advances in artificial intelligence, machine learning. Innovation in operations research, big data, data wrangling, ETL and visualisation technologies have similarly contributed to the rise of advanced analytics.
Advanced analytics can therefore also be viewed as a bundle of techniques used to discover intricate relationships, recognize complex patterns or predict current trends. A key tenant to advanced analytics is automation. Automation are driven by intelligent and non-intelligent decision-making systems. Automated analytics are the final big stride in analytics. In the graph below, we can see the different levels and type of analytics, from the most basic to the most sophisticated
An AA system starts with descriptive and diagnostic analytics in an attempt to describe not only what happen but also why something happened. The next analytical steps is predictive modelling and this in turn is followed by decision making and execution based on these predictions. The extent of automation increases with each successive step towards prescriptive analytics. This occurs as a result of the current success of AI in narrow decision making systems. Each of the four analytical disciplines uses statistics and mathematical techniques to achieve the proposed goals. AA technologies can help to automate decision-making by guaranteeing the quality of these decisions and their contributions to business.
Areas of Application:
State of Development and Use:
An example of a financial services firm’s self-diagnosed AA readiness across four main areas.
Descriptive | Diagnostic | Predictive | Prescriptive | |
---|---|---|---|---|
Customer | 85% | 80% | 90% | 80% |
Operations | 85% | 75% | 80% | 50% |
People | 80% | 65% | 70% | 40% |
Accounting | 90% | 70% | 70% | 20% |
Customer Analytics
Data about browsing and buying patterns are everywhere. From credit card transactions and online shopping carts, to customer loyalty programs and user-generated ratings/reviews, there is a staggering amount of data that can be used to describe customers’ past buying behaviours, predict future ones, and prescribe new ways to influence future purchasing decisions.
Operations Analytics
Recent extraordinary improvements in data-collecting technologies have changed the way firms make informed and effective business decisions. Operations analytics include the modelling of future demand uncertainties, the prediction in outcomes of competing policy choices and how to choose the best course of action in the face of risk.
People Analytics
People analytics is a data-driven approach to managing people at work. For the first time in history, business leaders can make decisions about their people based on deep analysis of data rather than the traditional methods of personal relationships and ‘experienced’ opinion. People analytics include techniques used to recruit and retain great people. It is the sophisticated analysis brought to bear on people-related issues, such as recruiting, performance evaluation, leadership, hiring and promotion, job design, compensation, and collaboration.
Accounting Analytics
Accounting Analytics explores how financial statement data and non-financial metrics can be linked to financial performance. Accounting analytics include the use of data to assess what drives financial performance and to forecast future financial scenarios. While many accounting and financial organizations deliver data, accounting analytics deploys that data to deliver insight. Accounting analytics include many areas in which accounting data provides insight into other business areas including consumer behaviour predictions, corporate strategy, risk management, optimization, and more.
Illustration of Company Specific AA Dashboard
Isolated Problems and Solutions:
Disruptive data-driven models and capabilities are reshaping some industries, and could transform many more. Following is a list of characteristics that signals the potential of new advanced analytics approaches to improve the current state:
Problem | Analytics | Type | Techniques |
---|---|---|---|
Inefficient matching of supply and demand. | Operations | Sales and Inventory Prediction | - Gradient Boosting - LightGBM - Neural Network - LSTM RNN - Mechanical TS - ARIMA |
Prevalence of underutilised assets. | Accounting | Ratio Analysis / Fixed Asset Analysis | - Return on Total Assets - Fixed Asset Acquisition to Total Assets |
Unused demographic and customer sales data. | Customer | Customer and Segmentation / Psychographics | - K-Means Clustering - HDBSCAN - GMM Clustering |
Large unstructured behavioural data dump. | Customer | Big Data | - Random Forest - Deep Learning - Data Wrangling |
Customers sign up but leave within a year. | Customer | Churn Analysis | - LightGBM - Prediction Metric - Feature Analysis - SHAP values - Interaction - PDP plots - Correlation Matrix |
Most employees get fired within three years. | Employees | Termination Analysis | - See above |
Excessive employee turnover. | Employee | Attrition Analysis | - See above |
Debtors stop paying after 6-months. | Accounting | Aged Debtors Analysis | - Outlier Analysis - Descriptive Statistics - Average Payment Date |
Uncertain effects lead to bad customer service. | Operations | Causal Analysis | - Causal Regression - Regression Discontinuity |
Actual expenditures are more than budgeted. | Accounting | Budget Analysis | - Incremental - ZBB Budgeting |
Product X are outperforming product Y. | Operations | Causal Analysis | - A/B Testing - MV Testing |
Some customers are costly to the business. | Customer | Customer Lifetime Value Analysis | - Gradient Boosting (LightGBM) - RFM Analysis |
Resources constraints leads to supply shortage. | Operations | Profit Optimisation | - Constraint Programming (CP) - MLP Programming |
Analytics and Automationare
Analytics Automation:
Automated data analytics is essential for keeping track of the many sources of data modern organizations use today, ensuring data scientists don’t waste time working with bad, out-of-date, or incomplete data. With a more streamlined data analysis process, important opportunities can become apparent, introducing agility to big data analysis and, ultimately, increasing your organization’s business intelligence and competitive edge.
Robotics Process Automation
Tools such as robotic process automation (RPA) allow labour-intensive, error prone processes that formerly took days to now be accurately completed in a matter of minutes.
In financial services, automation in the form of “straight-through processing,” where transaction workflows are digitized end-to-end, can increase the scalability of transaction throughput by 80%, while reducing errors by half.
Some of the experiments the FCA (2018-2019) are conducting with advanced analytics over this year include:
- Automated detection of unauthorised business activity on the internet through a variety of new technologies
- Testing advanced Natural Language Processing (NLP) technologies and semantic language models in an effort to automate what would otherwise be manual supervisory tasks
- Automated evaluation and detection of misleading advertising The automated processes will allow us to review the total population of high-risk markets, rather than only sampling a proportion.
Physical Robots Automation
Physical robots have been around for a long time in manufacturing, but more capable, more flexible, safer, and less expensive robots are now engaging in ever expanding activities and combining both mechanization, cognitive and learning capabilities—and improving over time as they are trained by their human co-workers on the shop floor, or increasingly learn by themselves. Already today, a range of automation technologies is generating real value. For example, Rio Tinto has deployed automated haul trucks and drilling machines at its mines in Pilbara, Australia, and says it is seeing 10–20% increases in utilization there.
Deeper Dive
Advanced Analytics (AA) contains descriptive, diagnostic, predictive and prescriptive analytics. Following is a more in-depth discussion of each of these areas.
Descriptive Analytics (What Happened)
Descriptive analytics is the first stage of data analysis that creates a summary of historical data to yield useful information and possibly prepare the data for further analysis. Descriptive statistics are a very important part of data analysis, they are useful to show historical insights regarding the company’s financials, production, operations, sales, and customers. This phase consists of tables and graphs so that the user can easily interpret the information. Some of the processes that are carried out at this stage are described below.
Descriptive Statistics
Basic statistics are a very valuable source of information when designing a model, since they might alert to the presence of spurious data. It is a must to check for the correctness of the most important statistical measures of every single variable.
Mean | Median | Mode | 25% Quartile | SD | Var | Max | Min | Range | |
---|---|---|---|---|---|---|---|---|---|
Customer Sales | 2,200 | 1,190 | 300 | 670 | 100 | 12,100 | 10,000 | 100 | 9,900 |
Advertising Campaigns | 4,900 | 2,290 | 2000 | 1000 | 50 | 2,500 | 6,000 | 800 | 5,200 |
Extra Customers PW | 10 | 7 | 4 | 6 | 1.5 | 2.25 | 30 | 10 | 20 |
Employee Leave Days | 20 | 22 | 21 | 18 | 2 | 4 | 40 | 10 | 30 |
Annual Interest Rate | 1.45 | 1.40 | 1.50 | 1.30 | 0.3 | 0.09 | 1.70 | 1.30 | 0.4 |
These numbers are fictional and are simply for illustrative purposes
Exploratory Visualisations
For example, histograms show how the data is distributed over its entire range. In approximation problems, a uniform distribution for all the variables is, in general, desirable. If the data is very irregularly distributed, then the model will probably be of bad quality.
Diagnostic analytics (Why did it happen)
Diagnostic Analytics is the next level of analysis. It is a form of Advanced Analytics that is focused on determining the factors and events that contributed to the outcome. This phase consists of techniques such as calculating correlations and interpreting interactive visualizations.
Logistic correlations
In classification applications, it might be interesting to look for logistic dependencies between single input and single target variables. The logistic correlation is a numerical value between 0 and 1 that expresses the strength of the logistic relationship between a single input and output variables.
Logistic Correlation Matrix
Customer Sales | Advertising Campaigns | Extra Customers PW | Employee Leave Days | Annual Interest Rate | |
---|---|---|---|---|---|
Customer Sales | 1 | 0.7 | 0.6 | 0.10 | 0.05 |
Advertising Campaigns | 0.7 | 1 | 0.8 | 0.15 | 0.07 |
Extra Customers PW | 0.6 | 0.8 | 1 | 0.02 | 0.1 |
Employee Leave Days | 0.10 | 0.15 | 0.02 | 1 | 0 |
Annual Interest Rate | 0.05 | 0.07 | 0.1 | 0 | 1 |
Logistic Correlation Matrix Heatmap
Scatter plot
This technique plots graphs of inputs versus targets. These charts might help to see the dependencies of the targets with the inputs.
Predictive analytics (What will happen)
Predictive analytics is the branch of Advanced Analytics that is used to make predictions about unknown future events. This is the most important phase of the analysis, its output is a predictive model capable of knowing what is going to happen in the future. It encompasses a variety of machine learning techniques such as k-nearest neighbours, decision trees, random forest, neural networks, etc, to identify the likelihood of future outcomes based on historical data. Some of them are explained below. See the machine learning section for more on this.
K – Nearest Neighbours | Random Forest | Neural Networks |
---|---|---|
K-nearest neighbours is a simple method used for classification and regression. It stores all available cases and classifies new cases based on a similarity measure. | Random forests are a combination of decision tree predictors for classification, regression and other tasks. Helps to decrease the bias and variance experienced in traditional decision trees. | Artificial Neural Networks (ANN) are computational models based on the neural structure of the brain. Performs well with large dataset. Deep learning occurs when the ANN has multiple layers, usually more than three. |
Prescriptive analytics (How can we make it happen)
Prescriptive analysis is the last step of an advanced data analysis. It consists of the application of the predictive model to determine the best solution or outcome among various choices, given the known parameters. In this phase, not only is predicted what will happen in the future using our predictive model, but also is shown to the decision maker the implications of each option.
Example of Workflow:
Descriptive - Predictive - Prescriptive
For example, let’s assume that a retail banks sells several products (mortgage account, savings account, and pension account) to its customers. It keeps a record of all historical data, and this data is available for analysis and reuse. Following a merger in 2017, the bank has new customers and wants to start some marketing campaigns.
The budget for the campaigns is limited. The bank wants to contact a customer and propose only one product. The marketing department needs to decide:
- Who should be contacted?
- Which product should be proposed? Proposing too many products is counterproductive, so only one product per customer contact.
- How will a customer be contacted? There are different ways, with different costs and efficiency.
- How can they optimally use the limited budget?
- Will such campaigns be profitable?
From the historical data, we can train a machine learning product-based classifier on customer profile (age, income, account level, …) to predict whether a customer would subscribe to a mortgage, savings, or pension account.
- We can apply this predictive model to the new customers data to predict for each new customer what they will buy.
- On this new data, we decide which offers are proposed. Which product is offered to which customer through which channel:
- a. with a greedy algorithm that reproduces what a human being would do
- b. using an optimization model using Google OR tools
- The solutions can be displayed, compared, and analysed.
First understand current customers (Descriptive).
We can see that:
- The greater a customer’s income, the more likely it is s/he will buy a savings account.
- The older a customer is, the more likely it is s/he will buy a pension account.
- There is a correlation between the number of people in a customer’s household, the number of loan accounts held by the customer, and the likelihood a customer buys a mortgage account. To see the correlation, look at the upper right and lower left corners of the mortgage chart.
Predict Future Customer Behaviour (Predictive)
- First we will train on old data:
In our dataset we have numerous features (variables) such as:
'customer_id', 'age', 'age_youngest_child', 'debt_equity', 'gender','bad_payment', 'gold_card', 'pension_plan','household_debt_to_equity_ratio', 'income', 'members_in_household' ,'months_current_account', 'months_customer', 'call_center_contacts','loan_accounts', 'number_products', 'number_transactions', 'non_worker_percentage', 'white_collar_percentage', 'rfm_score','Mortgage', 'Pension', 'Savings', 'nb_products'
We will just select four now to implement a simple machine learning model.
cols = ['age', 'income', 'members_in_household', 'loan_accounts']
age | income | members_in_houshold | loan_accounts |
---|---|---|---|
45 | 55,870 | 2 | 4 |
43 | 43,900 | 2 | 0 |
23 | 49,000 | 2 | 1 |
35 | 60,000 | 2 | 1 |
43 | 70,400 | 3 | 1 |
The next step is to create and train a simple machine learning algorithm to predict what the new clients will buy. We use a standard basic support gradient boosting algorithm to predict whether a customer might by product A, B, or C. Following is an example of a piece of python code for this purpose.
from sklearn import svm
from sklearn import ensemble
classifiers = []
for i,p in enumerate(products):
clf = ensemble.GradientBoostingClassifier()
clf.fit(X, ys[i])
classifiers.append(clf)
- Next we will load and predict new customer data
Load new customer data, predict behaviours using trained classifier, and do some visual analysis. We have all the characteristics of the new customers such as available for historic customers.
unknown_behaviors = pd.read_csv("load_data.csv")
to_predict = unknown_behaviors[cols]
predicted = [classifiers[i].predict(to_predict) for i in range(len(products))]
for i,p in enumerate(products):
to_predict[p] = predicted[i]
to_predict["id"] = unknown_behaviors["customer_id"]
offers = to_predict
This is an random sample of 5 of 2756 new predictions:
age | income | members_in_household | loan_accounts | savings | mortgage | pension | id |
---|---|---|---|---|---|---|---|
38 | 47,000 | 4 | 1 | 0 | 0 | 0 | 44256 |
30 | 48,600 | 2 | 4 | 0 | 0 | 0 | 46883 |
41 | 42,150 | 4 | 0 | 0 | 0 | 1 | 32387 |
42 | 39,700 | 3 | 3 | 0 | 0 | 0 | 25504 |
42 | 44,300 | 6 | 2 | 0 | 1 | 0 | 35979 |
- Visualisation of Predicted Data
The predicted data has the same semantic as the base data, with even more clear frontiers:
- for savings, there is a clear frontier at $50K revenue $20K.
- for pension, there is a clear frontier at 55 years old customers.
Some details about prediction: Number of new customers: 2756 Number of customers predicted to buy mortgages: 380 Number of customers predicted to buy pensions: 142 Number of customers predicted to buy savings: 713 iction:
Revenue Maximisation (Prescriptive Analytics)
The goal is to contact the customers to sell them only one product, so we cannot select all of them. This increases the complexity of the problem: we need to determine the best contact channel, but also need to select which product will be sold to a given customer. It may be hard to compute this. In order to check, we will use two techniques:
- a greedy algorithm
- Google CP Solver.
Set up the constraints
- Offer only one product per customer.
- Compute the budget and set a maximum on it.
- Compute the number of offers to be made.
- Ensure at least 10% of offers are made via each channel.
Example code snippet - 1/20th of the entire script
obj = 0
for c in channelsR:
for p in productsR:
product=products[p]
coef = channels.get_value(index=c, col="factor") * value_per_product[product]
obj += solver.Sum(tap[o,p,c] * int(coef)* offers.get_value(index=o, col=product) for o in offersR)
**Algorithm Comparison: **
Here are the results of the 2 algorithms:
Algorithm | Revenue | Number of clients | Mortgage offers | Pension offers | Savings offers | Budget Spent |
---|---|---|---|---|---|---|
Greedy | 50,800 | 1123 | 299 | 111 | 713 | 21700 |
CP Google | 72,600 | 1218 | 381 | 117 | 691 | 25000 |
- As you can see, with decision optimization, we can safely do this marketing campaign to contact 1218 customers out of the 2756 customers.
- This will lead to $72.6K revenue, significantly greater than the $50.8K revenue given by a greedy algorithm.
- With a greedy algorithm, we will:
- be unable to focus on the correct customers (it will select fewer of them),
- spend less of the available budget for a smaller revenue.
- focus on selling savings accounts that have the biggest revenue
Marketing campaign analysis
- We need a minimum of $16K to be able to start a valid campaign and we expect it will generate $47.5K.
- Due to the business constraints, we will be able to address 1680 customers maximum using a budget of $35.5K. Any money above that amount won’t be spent. The expected revenue is therefore $87K.
Scenario | Budget | Revenue | Number of clients | Mortgage offers | Pension offers | Savings offers |
---|---|---|---|---|---|---|
Standard | 25,000 | 72,600 | 1,218 | 381 | 117 | 691 |
Minimum | 16,000 | 47,500 | 825 | 374 | 142 | 309 |
Maximum | 35,500 | 87,000 | 1,680 | 406 | 155 | 1119 |
Conclusion
Simply put, traditional analytics are not sufficient to remain competitive in a world ignited by machine intelligence. Advanced analytics offer companies and agencies the ability to obtain state of the art insights from small strategic technology investments. Traditional analytics exposes your company or institution to unnecessary uncertainties and avoidable risks. In the future we can expect the development of automated advance analytics to lead to easy to use drag and drop interfaces. Advanced analytics will keep on expanding in functionality and in ease of use. It will improve in parallel with artificial intelligence, machine learning, operations research, big data and visualisations.