Research Projects

Estimated reading time: 56 minutes

About:

The primary aim of the project is to make research and software more usable to to the everyday person or company. Since 2016, FirmAI has developed the world’s first open source intelligent report with 1,500 Users. It is also houses the leading catalogue for industry applications in machine learning with 120,000 GitHub Views with an average of 300 views per day. FirmAI has developed the largest code base for machine learning in asset management that already has over 3000 Downloads from SSRN. Other projects include the development of more than ten social and public media scripts to access publicly available data. Currently FirmAI is working on building the largest code base for applied mathematics in business. Part of this process includes the development of another open source project to develop the first ever Google Colab aggregation engine. In total, three different open source FirmAI packages (PandaPy, AtsPy, and PandasVault) have been downloaded more than 20,000 times. That which is FirmAI happens mainly on GitHub and on Medium.

Articles:

Title Date Description
Machine Learning Birds-eye 2017-02 Looking at types of ML and potential applications.
Advanced Analytics 2018-01 Explaining and applying advanced analytics.
Business Machine Learning 2019-07 150+ Business Data Science Application in Python.

 

Papers:

Title Date Description
Financial Services Regulation 2019-02 Regulatory implications in the age of Machine Intelligence
Earnings Surprise Prediction 2017-07 Predicting earnings surprises using non-liner models
Bankruptcy Prediction 2019-11 Predict the occurrence of litigated bankruptcies
Restaurant Closure Prediction 2018-01 Predict restaurant closures within the next one to two years
Machine Learning in Asset Management 2019-07 ML trading and portfolio optimisation models and techniques.

 

Repositories:

 

Title Description
Python Business Analytics Python solutions to solve practical business problems.
Industry Machine Learning Industry machine learning and data science notebooks
Business Machine Learning practical business machine learning (BML) and business data science (BDS) applications
Financial Machine Learning ML trading and portfolio optimisation models and techniques.
Datasets Unique business data with an API interface.
Newsletter Linkletter in the open source industry machine learning space

 


Industry Machine Learning:

A curated list of applied machine learning and data science notebooks and libraries accross different industries. The code in this repository is in Python (primarily using jupyter notebooks) unless otherwise stated.  

Accommodation & Food Agriculture Banking & Insurance
Biotechnological & Life Sciences Construction & Engineering Education & Research
Emergency & Relief Finance Manufacturing
Government and Public Works Healthcare Media & Publishing
Justice, Law and Regulations Miscellaneous Accounting
Real Estate, Rental & Leasing Utilities Wholesale & Retail

 

Table of Contents

 

Accommodation & Food

Food

Restaurant

Accommodation

 

Accounting

Machine Learning

Analytics
  • Forensic Accounting - Collection of case studies on forensic accounting using data analysis. On the lookout for more data to practise forensic accounting, please get in touch
  • General Ledger (FirmAI) - Data processing over a general ledger as exported through an accounting system.
  • Bullet Graph (FirmAI) - Bullet graph visualisation helpful for tracking sales, commission and other performance.
  • Aged Debtors (FirmAI) - Example analysis to invetigate aged debtors.
  • Automated FS XBRL - XML Language, however, possibly port analysis into Python.

Textual Analysis

Data, Parsing and APIs

Research And Articles
  • Understanding Accounting Analytics - An article that tackles the importance of accounting analytics.
  • VLFeat - VLFeat is an open and portable library of computer vision algorithms, which has Matlab toolbox.

Websites
  • Rutgers Raw - Good digital accounting research from Rutgers.

Courses

 

Agriculture

Economics

  • Prices - Agricultural price prediction.
  • Prices 2 - Agricultural price prediction.
  • Yield - Agricultural analysis looking at crop yields in Ukraine.
  • Recovery - Strategic land use for agriculture and ecosystem recovery
  • MPR - Mandatory Price Reporting data from the USDA’s Agricultural Marketing Service.

Development

 

Banking & Insurance

Consumer Finance

Management and Operation

Valuation
  • Zillow Prediction - Zillow valuation prediction as performed on Kaggle.
  • Real Estate - Predicting real estate prices from the urban environment.
  • Used Car - Used vehicle price prediction.

Fraud

Insurance and Risk

Physical

 

Biotechnological & Life Sciences

General

  • Programming - Python Programming for Biologists
  • Introduction DL - A Primer on Deep Learning in Genomics
  • Pose - Estimating animal poses using DL.
  • Privacy - Privacy preserving NNs for clinical data sharing.
  • Population Genetics - DL for population genetic inference.
  • Bioinformatics Course - Course materials for Computational Biologyand Bioinformatics
  • Applied Stats - Applied Statistics for High-Throughput Biology
  • Scripts - Python scripts for biologists.
  • Molecular NN - A mini-framework to build and train neural networks for molecular biology.
  • Systems Biology Simulations - Systems biology practical on writing simulators with F# and Z3
  • Cell Movement - LSTM to predict biological cell movement.
  • Deepchem - Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology

Sequencing

Chemoinformatics and drug discovery

  • Novel Molecules - A convolutional net that can learn features.
  • Automating Chemical Design - Generate new molecules for efficient exploration.
  • GAN drug Discovery - A method that combines generative models with reinforcement learning.
  • RL - generating compounds predicted to be active against a biological target.
  • One-shot learning - Python library that aims to make the use of machine-learning in drug discovery straightforward and convenient.

Genomics

Life-sciences

  • Plants Disease - App that detects diseases in plants using a deep learning model.
  • Leaf Identification - Identification of plants through plant leaves on the basis of their shape, color and texture.
  • Crop Analysis - An imaging library to detect and track future position of ears on maize plants
  • Seedlings - Plant Seedlings Classification from kaggle competition
  • Plant Stress - An ontology containing plant stresses; biotic and abiotic.
  • Animal Hierarchy - Package for calculating animal dominance hierarchies.
  • Animal Identification - Deep learning for animal identification.
  • Species - Big Data analysis of different species of animals
  • Animal Vocalisations - A generative network for animal vocalizations
  • Evolutionary - Evolution Strategies Tool
  • Glaciers - Educational material about glaciers.

 

Construction & Engineering

Construction

Engineering:

Material Science

 

Economics

General

Machine Learning

  • EconML - Automated Learning and Intelligence for Causation and Economics.
  • Auctions - Optimal auctions using deep learning.

Computational

 

Education & Research

Student

School

 

Emergency & Police

Preventative and Reactive

Crime

Ambulance:

  • Ambulance Analysis - An investigation of Local Government Area ambulance time variation in Victoria.
  • Site Location - Ambulance site locations.
  • Dispatching - Applying game theory and discrete event simulation to find optimal solution for ambulance dispatching
  • Ambulance Allocation - Time series analysis of ambulance dispatches in the City of San Diego.
  • Response Time - An analysis on the improvements of ambulance response time.
  • Optimal Routing - Project to find optimal routing of ambulances in Ithaca.
  • Crash Analysis - Predicting the probability of accidents on a given segment on a given time.

Disaster Management

 

Finance

Trading and Investment

Data

  • Datastream - Datastrem from Thomson Reuters accessible through Python.
  • AlphaVantage - API wrapper to simplify the process of acquiring free financial data.
  • FSA- A project to transfer SEC Edgar Filings’ financial data to custom financial statement analysis models.
  • TradeConnector - A layer to connect with market data providers.
  • Employee Count SEC Filings - Extraction to get the exact employee count values for companies from SEC filings.
  • SEC Parsing - NLP to find and extract specific information from long, unstructured documents
  • Open Edgar - OpenEDGAR (openedgar.io)
  • Rating Industries - Histories from multiple agencies converted to CSV format

Personal Papers

 

Healthcare

General

 

Justics, Law & Regulations

Tools

Policy and Regulatory

Judicial Applied

 

Manufacturing

General

Maintenance

Failure

Quality

 

Media & Publishing

Marketing

 

Miscellaneous

Art

Tourism

  • Flickr - Metadata mining tool for tourism research.
  • Fashion - A clothing retrieval and visual recommendation model for fashion images

 

Physics

General

Machine Learning

 

Government and Public Works

Social Policies

Charities

Election Analysis

Politics

  • Congressional politics - House and senate congressional partisanship.
  • Politico - A platform for profiling public figures in Brazilian politics.
  • Bots - Tools and algorithms to analyze Paraguayan Tweets in times of election
  • Gerrymander tests - Lots of metrics for quantifying gerrymandering.
  • Sentiment - Analyse newspapers with respect to their political conviction using entity sentiments of party representatives.
  • DL Politics - Prediction of Spanish Political Affinity with Deep Neural Nets: Socialist vs People’s Party
  • PAC Money - Effects of PAC money on US politics.
  • Power Networks - Constructing a watchdog for Indian corporate and political networks
  • Elite - Political elite in the US.
  • Debate Analysis - Program to analyze political debates.
  • Political Affiliation - Political affiliation prediction using twitter metadata.
  • Political Ads - Investigation into Facebook Political Ads and Targeting
  • Political Identity - Multi-axial political model.
  • YT Politics - Mapping Politics on YouTube
  • Political Ideology - Unsupervised learning of political ideology by word vector projections

 

Real Estate, Rental & Leasing

Real Estate

  • Finding Donuts - Finding real estate opportunities by predicting transforming neighbourhoods.
  • Neighbourhood - Predicting real estate prices from the urban environment.
  • Real Estate Classification - Classifying the type of property given Real Estate, satellite and Street view Images
  • Recommender - This tools aims to recommend a user the top 5 real estate properties that matches their search.
  • House Price - Predicting house prices using Linear Regression and GBR
  • House Price Portland - Predict housing prices in Portland.
  • Zillow Prediction - Zillow valuation prediction as performed on Kaggle.
  • Real Estate - Predicting real estate prices from the urban environment.

Rental & Leasing

 

Utilities

Electricity

Coal, Oil & Gas

Water & Pollution

  • Safe Water - Predict health-based drinking water violations in the United States.
  • Hydrology Data - A suite of convenience functions for exploring water data in Python.
  • Water Observatory - Monitoring water levels of lakes and reservoirs using satellite imagery.
  • Water Pipelines - Using machine learning to find water pipelines in aerial images.
  • Water Modelling - Australian Water Resource Assessment (AWRA) Community Modelling System.
  • Drought Restrictions - A Los Angeles Times analysis of water usage after the state eased drought restrictions
  • Flood Prediction - Applying LSTM on river water level data
  • Sewage Overflow - Insights into the sanitary sewage overflow (SSO). - This has been removed
  • Water Accounting - Assembles water budget data for the US from existing data source
  • Air Quality Prediction - Predict air quality(aq) in Beijing and London in the next 48 hours.

Transportation

 

Wholesale & Retail

Wholesale

  • Customer Analysis - Wholesale customer analysis.
  • Distribution - JB wholesale distribution analysis.
  • Clustering - Unsupervised learning techniques are applied on product spending data collected for customers
  • Market Basket Analysis - Instacart public dataset to report which products are often shopped together.

Retail


 

Business Machine Learning:

A curated list of applied business machine learning (BML) and business data science (BDS) examples and libraries. The code in this repository is in Python (primarily using jupyter notebooks) unless otherwise stated.

 

Table of Contents

Also see Python Business Analytics

 

Accounting

Machine Learning

Analytics
  • Forensic Accounting - Collection of case studies on forensic accounting using data analysis. On the lookout for more data to practise forensic accounting, please get in touch
  • General Ledger (FirmAI) - Data processing over a general ledger as exported through an accounting system.
  • Bullet Graph (FirmAI) - Bullet graph visualisation helpful for tracking sales, commission and other performance.
  • Aged Debtors (FirmAI) - Example analysis to invetigate aged debtors.
  • Automated FS XBRL - XML Language, however, possibly port analysis into Python.

Textual Analysis

Data, Parsing and APIs

Research And Articles
  • Understanding Accounting Analytics - An article that tackles the importance of accounting analytics.
  • VLFeat - VLFeat is an open and portable library of computer vision algorithms, which has Matlab toolbox.

Websites
  • Rutgers Raw - Good digital accounting research from Rutgers.

Courses

Lifetime Value
  • Pareto/NBD Model - Calculate the CLV using a Pareto/NBD model.
  • Gamma-Gamma Model - Using deep-learning frameworks to identify accounting anomalies.
  • Cohort Analysis - Cohort analysis to group customers into mutually exclusive cohorts measured over time.

Segmentation
  • E-commerce - E-commerce customer segmentation.
  • Groceries - Segmentation for grocery customers.
  • Online Retailer - Online retailer segmentation.
  • Bank - Bank customer segmentation.
  • Wholesale - Clustering of wholesale customers.
  • Various - Multiple types of segmentation and clustering techniques.

Behaviour
  • RNN - Investigating customer behaviour over time with sequential analysis using an RNN model.
  • Neural Net - Demand forecasting using artificial neural networks.
  • Temporal Analytics - Investigating customer temporal regularities.
  • POS Analytics - Analytics driven customer behaviour ranking for retail promotions using POS data.
  • Wholesale Customer - Wholesale customer exploratory data analysis.
  • RFM - Doing a RFM (recency, frequency, monetary) analysis.
  • Returns Behaviour - Predicting total returns and fraudulent returns.
  • Visits - Predicting which day of week a customer will visit.
  • Bank: Next Purchase - A project to predict bank customers’ most probable next purchase.
  • Bank: Customer Prediction - Predicting Target customers who will subscribe the new policy of the bank.
  • Next Purchase - Predict a customers’ next purchase also using feature engineering.
  • Customer Purchase Repeats - Using the lifetimes python library and real jewellery retailer data analyse customer repeat purchases.
  • AB Testing - Find the best KPI and do A/B testing.
  • Customer Survey (FirmAI) - Example of parsing and analysing a customer survey.
  • Happiness - Analysing customer happiness from hotel stays using reviews.
  • Miscellaneous Customer Analytics - Various tools and techniques for customer analysis.

Recommender

Churn Prediction
  • Ride Sharing - Identify customer churn rates in order to target customers for retention campaigns.
  • KKDBox I - Variational deep autoencoder to predict churn customer
  • KKDBox II - A three step customer churn prediction framework using feature engineering.
  • Personal Finance - Predict customer subscription churn for a personal finance business.
  • ANN - Churn analysis using artificial neural networks.
  • Bike - Customer bike churn analysis.
  • Cost Sensitive - Cost sensitive churn analysis drivenby economic performance.

Sentiment

 

Employee

Management

Performance

Turnover

Conversations

Physical

 

Tools

Policy and Regulatory

Judicial Applied

 

Management

Strategy
  • Topic Model Reviews - Amazon reviews for product development.
  • Patents - Forecasting strategy using patents.
  • Networks - Business categories from Yelp reviews using networks can help to identify pockets of demand.
  • Company Clustering - Hierarchical clusters and topics from companies by extracting information from their descriptions on their websites
  • Marketing Management - Programmatic marketing management.

Decision Optimisation

Casual Inference

Statistics
  • Various - Various applies statistical solutions

Quantitative
  • Applied RL - Reinforcement Learning and Decision Making tutorials explained at an intuitive level and with Jupyter Notebooks
  • Process Mining - Leveraging A-priori Knowledge in Predictive Business Process Monitoring
  • TS Forecasting - Time series forecasting for important business applications.

Data
  • Web Scraping (FirmAI) - Web scraping solutions for Facebook, Glassdoor, Instagram, Morningstar, Similarweb, Yelp, Spyfu, Linkedin, Angellist.

 

Operations

Failure and Anomalies

Load and Capacity Management

Prediction Management

 

Financial Machine Learning:

A curated list of practical financial machine learning (FinML) tools and applications. This collection is primarily in Python.

 

Trading

Deep Learning
Reinforcement Learning
  • RL Trading - A collection of 25+ Reinforcement Learning Trading Strategies - Google Colab.
  • RL - OpenGym with Deep Q-learning and Policy Gradient.
  • RL II - reinforcement learning on stock market and agent tries to learn trading.
  • RL III - Github - Deep Reinforcement Learning based Trading Agent for Bitcoin.
  • RL IV - Reinforcement Learning for finance.
  • RL V - Building an Agent to Trade with Reinforcement Learning.
  • Pair Trading RL - Using deep actor-critic model to learn best strategies in pair trading.
Other Models
Data Processing Techniques and Transformations
  • Advanced ML - Exercises too Financial Machine Learning (De Prado).
  • Advanced ML II - More implementations of Financial Machine Learning (De Prado).

 

Portfolio Management

Portfolio Selection and Optimisation
Factor and Risk Analysis:

 

Techniques

Unsupervised:
Textual:

 

Other Assets

Derivatives and Hedging:
Fixed Income
  • Vasicek - Bootstrapping and interpolation.
  • Binomial Tree - Utility functions in fixed income securities.
  • Corporate Bonds - Predicting the buying and selling volume of the corporate bonds.
Alternative Finance
Extended Research:

 

Courses

 

Data

 

Personal Papers