Machine Learning

Estimated reading time: 4 minutes

Firms have three goals: discovering, predicting and decision making. Machine learning helps with all of these practices. There are many non-machine learning discovering (correlation map), predicting (arima) and decision making techniques (linear programming and DEA); further in the illustration below decision making is lumped into discovering as ‘strategy’ with reinforcement learning as the primary task.

Firms can benefit from a wide range of machine learning tasks. Of all the task types, as is illustrated by the above four green boxes, predictive machine learning offers the greatest benefit to financial technologies. A further, higher-level categorisation of machine learning techniques is:

  1. supervised and
  2. unsupervised learning

In the above illustration, predictive machine learning falls under supervised learning, whereas clustering and anomaly detection generally falls under the unsupervised classification section. So what is meant by supervised and unsupervised learning.

Supervised Learning:

Finds patterns (and develops predictive models) using both, input data and output data.

All Supervised Learning techniques are divided in Classification or Regression prediction tasks.

</br> Classification is used for predicting discrete responses. (Binary 1, 0; Multi-class 1, 2, 3)

Binary Prediction:

  • Future direction of commodity, currency, equity and bonds prices.
  • Predicting customer gender.
  • Predict whether customers will or will not respond to direct mail.
  • Predict the likelihood that a grant application will succeed.
  • Predict whether someone is willing to donate to a cause.
  • Predict which shoppers are most likely to repeat purchase.

Multi-class Prediction:

  • Item specific sales prediction i.e. unit of sales across stores.
  • Predict the unit of sales from multiple items in a single store.
  • Predicting the likelihood of certain crimes occurring at different points geographically and at different times.
  • What when, where and at what severity will the flu strike.
  • Predict the most pressing community issue amongst alternatives.
  • What customers will purchase what insurance policy.
  • Predict which blog post from a selection would be most popular
  • Predict destination of taxi with initial partial trajectories.


Regression is used for predicting continuous responses. (35 times, $34,000)

Regression prediction:

  • Predict how many times a customer would call customer service in the next year.
  • Corporate valuation.
  • Salary prediction and recommendation.
  • Predict sales dollars of a product at launch.
  • Predict probabilistic distribution of hourly rain using polarimetric radar measurements.
  • Predict the sale price at auction.
  • Predict census return rates.
  • Predict customer lifetime value.
  • Predict severity of claims/final cost.
  • Clicks how many clicks/interest will something receives based on its characteristics.
  • House price valuations.
  • Predict the duration of a process.
  • Predict prescription volume.

Unsupervised Learning:

Finds patterns based only on input data. This technique is useful when you’re not quite sure what to look for. Often used for exploratory analysis of raw data. Most Unsupervised Learning techniques are a form of Cluster Analysis.

Cluster Analysis: In Cluster Analysis, you group data items that have some measure of similarity based on characteristic values.

  • Cluster customers to understand behaviour.
  • Understand markets by clustering buyers and sellers according to characteristics.
  • Cluster data to discriminate between supplier service delivery.

Structured and Unstructured Data:

Machine learning models allow us to incorporate both structured and unstructured data.

What is structured data? </br> Structured data is the type of data most of us are probably used to working with. Think of data that fits neatly within fixed fields and columns in relational databases and spreadsheets. Types of structured data include numbers, currency, alphabetical, names, dates, and addresses.

Structured data is highly organized and easily understood by machine language. Those working within relational databases can input, search, and manipulate structured data relatively quickly. This is the most attractive feature of structured data.

The programming language used for managing structured data is called structured query language, also known as SQL. This language was developed by IBM in the early 1970s and is particularly useful for handling relationships in databases.

*What is unstructured data?* </br> Unstructured data is the chaotic brother of structured data, as it cannot be processed and analysed using conventional tools. Examples of unstructured data include text files, video files, audio files, mobile activity, social media activity, sensor activity, geo-location activity, satellite imagery, surveillance imagery – honestly, the list goes on and on.

Unstructured data is difficult to make sense of because it has no pre-defined structure that makes it easy to classify in a relational database. Instead, non-relational, or NoSQL databases, are best fit for managing unstructured data. An astonishing 80 percent of all data generated today is considered unstructured – and this number will continue to rise as new internet-connected devices come online.