Research Projects
Estimated reading time: 56 minutesAbout:
The primary aim of the project is to make research and software more usable to to the everyday person or company. Since 2016, FirmAI has developed the world’s first open source intelligent report with 1,500 Users. It is also houses the leading catalogue for industry applications in machine learning with 120,000 GitHub Views with an average of 300 views per day. FirmAI has developed the largest code base for machine learning in asset management that already has over 3000 Downloads from SSRN. Other projects include the development of more than ten social and public media scripts to access publicly available data. Currently FirmAI is working on building the largest code base for applied mathematics in business. Part of this process includes the development of another open source project to develop the first ever Google Colab aggregation engine. In total, three different open source FirmAI packages (PandaPy, AtsPy, and PandasVault) have been downloaded more than 20,000 times. That which is FirmAI happens mainly on GitHub and on Medium.
Articles:
Title | Date | Description |
---|---|---|
Machine Learning Birds-eye | 2017-02 | Looking at types of ML and potential applications. |
Advanced Analytics | 2018-01 | Explaining and applying advanced analytics. |
Business Machine Learning | 2019-07 | 150+ Business Data Science Application in Python. |
Papers:
Title | Date | Description |
---|---|---|
Financial Services Regulation | 2019-02 | Regulatory implications in the age of Machine Intelligence |
Earnings Surprise Prediction | 2017-07 | Predicting earnings surprises using non-liner models |
Bankruptcy Prediction | 2019-11 | Predict the occurrence of litigated bankruptcies |
Restaurant Closure Prediction | 2018-01 | Predict restaurant closures within the next one to two years |
Machine Learning in Asset Management | 2019-07 | ML trading and portfolio optimisation models and techniques. |
Repositories:
Title | Description |
---|---|
Python Business Analytics | Python solutions to solve practical business problems. |
Industry Machine Learning | Industry machine learning and data science notebooks |
Business Machine Learning | practical business machine learning (BML) and business data science (BDS) applications |
Financial Machine Learning | ML trading and portfolio optimisation models and techniques. |
Datasets | Unique business data with an API interface. |
Newsletter | Linkletter in the open source industry machine learning space |
Industry Machine Learning:
A curated list of applied machine learning and data science notebooks and libraries accross different industries. The code in this repository is in Python (primarily using jupyter notebooks) unless otherwise stated.
Table of Contents
- Accommodation & Food
- Accounting
- Agriculture
- Banking & Insurance
- Biotechnological & Life Sciences
- Construction & Engineering
- Economics
- Education & Research
- Emergency & Relief
- Finance
- Healthcare
- Justice, Law and Regulations
- Manufacturing
- Media & Publishing
- Miscellaneous
- Physics
- Government and Public Works
- Real Estate, Rental & Leasing
- Utilities
- Wholesale & Retail
Accommodation & Food
- RobotChef - Refining recipes based on user reviews.
- Food Amenities - Predicting the demand for food amenities using neural networks
- Recipe Cuisine and Rating - Predict the rating and type of cuisine from a list of ingredients.
- Food Classification - Classification using Keras.
- Image to Recipe - Translate an image to a recipe using deep learning.
- Calorie Estimation - Estimate calories from photos of food.
- Fine Food Reviews - Sentiment analysis on Amazon Fine Food Reviews.
- Restaurant Violation - Food inspection violation forecasting.
- Restaurant Success - Predict whether a restaurant is going to fail.
- Predict Michelin - Predict the likelihood that restaurant is a Michelin restaurant.
- Restaurant Inspection - An inspection analysis to see if cleanliness is related to rating.
- Sales - Restaurant sales forecasting with LTSM.
- Visitor Forecasting - Reservation and visitation number prediction.
- Restaurant Profit - Restaurant regression analysis.
- Competition - Restaurant competitiveness analysis.
- Business Analysis - Restaurant business analysis project.
- Location Recommendation - Restaurant location recommendation tool and analysis.
- Closure, Rating and Recommendation - Three prediction tasks using Yelp data.
- Anti-recommender - Find restaurants you don’t want to attend.
- Menu Analysis - Deeper analysis of restaurants through their menus.
- Menu Recommendation - NLP to recommend restaurants with similar menus.
- Food Price - Predict food cost.
- Automated Restaurant Report - Automated machine learning company report.
- Peer-to-Peer Housing - The effect of peer to peer rentals on housing.
- Roommate Recommendation - A system for students seeking roommates.
- Room Allocation - Room allocation process.
- Dynamic Pricing - Hotel dynamic pricing calculations.
- Hotel Similarity - Compare brands that directly compete
- Hotel Reviews - Cluster hotel reviews.
- Predict Prices - Predict hotel room rates.
- Hotels vs Airbnb - Comparing the two approaches.
- Hotel Improvement - Analyse reviews to suggest hotel improvements.
- Orders - Order cancellation prediction for hotels.
- Fake Reviews - Identify whether reviews are fake/spam.
- Reverse Image Lodging - Find your preferred lodging by uploading an image.
Accounting
Machine Learning
- Chart of Account Prediction - Using labeled data to suggest the account name for every transaction.
- Accounting Anomalies - Using deep-learning frameworks to identify accounting anomalies.
- Financial Statement Anomalies - Detecting anomalies before filing, using R.
- Useful Life Prediction (FirmAI) - Predict the useful life of assets using sensor observations and feature engineering.
- AI Applied to XBRL - Standardized representation of XBRL into AI and Machine learning.
Analytics
- Forensic Accounting - Collection of case studies on forensic accounting using data analysis. On the lookout for more data to practise forensic accounting, please get in touch
- General Ledger (FirmAI) - Data processing over a general ledger as exported through an accounting system.
- Bullet Graph (FirmAI) - Bullet graph visualisation helpful for tracking sales, commission and other performance.
- Aged Debtors (FirmAI) - Example analysis to invetigate aged debtors.
- Automated FS XBRL - XML Language, however, possibly port analysis into Python.
Textual Analysis
- Financial Sentiment Analysis - Sentiment, distance and proportion analysis for trading signals.
- Extensive NLP - Comprehensive NLP techniques for accounting research.
Data, Parsing and APIs
- EDGAR - A walk-through in how to obtain EDGAR data.
- PyEDGAR - A library for downloading, caching, and accessing EDGAR filings.
- IRS - Acessing and parsing IRS filings.
- Financial Corporate - Rutgers corporate financial datasets.
- Non-financial Corporate - Rutgers non-financial corporate dataset.
- PDF Parsing - Extracting useful data from PDF documents.
- PDF Tabel to Excel - How to output an excel file from a PDF.
Research And Articles
- Understanding Accounting Analytics - An article that tackles the importance of accounting analytics.
- VLFeat - VLFeat is an open and portable library of computer vision algorithms, which has Matlab toolbox.
Websites
- Rutgers Raw - Good digital accounting research from Rutgers.
Courses
- Computer Augmented Accounting - A video series from Rutgers University looking at the use of computation to improve accounting.
- Accounting in a Digital Era - Another series by Rutgers investigating the effects the digital age will have on accounting.
Agriculture
- Prices - Agricultural price prediction.
- Prices 2 - Agricultural price prediction.
- Yield - Agricultural analysis looking at crop yields in Ukraine.
- Recovery - Strategic land use for agriculture and ecosystem recovery
- MPR - Mandatory Price Reporting data from the USDA’s Agricultural Marketing Service.
- Segmentation - Agricultural field parcel segmentation using satellite images.
- Water Table - Predicting water table depth in agricultural areas.
- Assistant - Notebooks from agricultural assistant.
- Eco-evolutionary - Eco-evolutionary dynamics.
- Diseases - Identification of crop diseases and pests using Deep Learning framework from the images.
- Irrigation and Pest Prediction - Analyse irrigation and predict pest likelihood.
Banking & Insurance
Consumer Finance
- Loan Acceptance - Classification and time-series analysis for loan acceptance.
- Predict Loan Repayment - Predict whether a loan will be repaid using automated feature engineering.
- Loan Eligibility Ranking - System to help the banks check if a customer is eligible for a given loan.
- Home Credit Default (FirmAI) - Predict home credit default.
- Mortgage Analytics - Extensive mortgage loan analytics.
- Credit Approval - A system for credit card approval.
- Loan Risk - Predictive model to help to reduce charge-offs and losses of loans.
- Amortisation Schedule (FirmAI) - Simple amortisation schedule in python for personal use.
Management and Operation
- Credit Card - Estimate the CLV of credit card customers.
- Survival Analysis - Perform a survival analysis of customers.
- Next Transaction - Deep learning model to predict the transaction amount and days to next transaction.
- Credit Card Churn - Predicting credit card customer churn.
- Bank of England Minutes - Textual analysis over bank minutes.
- CEO - Analysis of CEO compensation.
Valuation
- Zillow Prediction - Zillow valuation prediction as performed on Kaggle.
- Real Estate - Predicting real estate prices from the urban environment.
- Used Car - Used vehicle price prediction.
Fraud
- XGBoost - Fraud Detection by tuning XGBoost hyper-parameters with Simulated Annealing
- Fraud Detection Loan in R - Fraud detection in bank loans.
- AML Finance Due Diligence - Search news articles to do finance AML DD.
- Credit Card Fraud - Detecting credit card fraud.
Insurance and Risk
- Car Damage Detective - Assessing car damage with convolution neural networks for a personal auto claims.
- Medical Insurance Claims - Predicting medical insurance claims.
- Claim Denial - Predicting insurance claim denial
- Claim Fraud - Predictive models to determine which automobile claims are fraudulent.
- Claims Anomalies - Anomaly detection system for medical insurance claims data.
- Actuarial Sciences (R) - A range of actuarial tools in R.
- Bank Failure - Predicting bank failure.
- Risk Management - Finance risk engagement course resources.
- VaR GaN - Estimate Value-at-Risk for market risk management using Keras and TensorFlow.
- Compliance - Bank Grievance Compliance Management.
- Stress Testing - ECB stress testing.
- Stress Testing Techniques - A notebook with various stress testing exercises.
- Reverse Stress Test - Given a portfolio and a predefined loss size, determine which factors stress (scenarios) would lead to that loss
- BoE stress test- Stress test results and plotting.
- Recovery - Recovery of money owed.
- Quality Control - Quality control for banking using LDA
Physical
- Bank Note Fraud Detection - Bank Note Authentication Using DNN Tensorflow Classifier and RandomForest.
- ATM Surveillance - ATM Surveillance in banks use case.
Biotechnological & Life Sciences
- Programming - Python Programming for Biologists
- Introduction DL - A Primer on Deep Learning in Genomics
- Pose - Estimating animal poses using DL.
- Privacy - Privacy preserving NNs for clinical data sharing.
- Population Genetics - DL for population genetic inference.
- Bioinformatics Course - Course materials for Computational Biologyand Bioinformatics
- Applied Stats - Applied Statistics for High-Throughput Biology
- Scripts - Python scripts for biologists.
- Molecular NN - A mini-framework to build and train neural networks for molecular biology.
- Systems Biology Simulations - Systems biology practical on writing simulators with F# and Z3
- Cell Movement - LSTM to predict biological cell movement.
- Deepchem - Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology
- DNA, RNA and Protein Sequencing - Anew representation for biological sequences using DL.
- CNN Sequencing - A toolbox for learning motifs from DNA/RNA sequence data using convolutional neural networks
- NLP Sequencing - Language transfer learning model for genomics
Chemoinformatics and drug discovery
- Novel Molecules - A convolutional net that can learn features.
- Automating Chemical Design - Generate new molecules for efficient exploration.
- GAN drug Discovery - A method that combines generative models with reinforcement learning.
- RL - generating compounds predicted to be active against a biological target.
- One-shot learning - Python library that aims to make the use of machine-learning in drug discovery straightforward and convenient.
- Jupyter Genomics - Collection of computation biology and bioinformatics notebooks.
- Variant calling - Correctly identify variations from the reference genome in an individual’s DNA.
- Gene Expression Graphs - Using convolutions on an image.
- Autoencoding Expression - Extracting relevant patterns from large sets of gene expression data
- Gene Expression Inference - Predict the expression of specified target genes from a panel of about 1,000 pre-selected “landmark genes”.
- Plant Genomics - Presentation and example material for Plant and Pathogen Genomics
- Plants Disease - App that detects diseases in plants using a deep learning model.
- Leaf Identification - Identification of plants through plant leaves on the basis of their shape, color and texture.
- Crop Analysis - An imaging library to detect and track future position of ears on maize plants
- Seedlings - Plant Seedlings Classification from kaggle competition
- Plant Stress - An ontology containing plant stresses; biotic and abiotic.
- Animal Hierarchy - Package for calculating animal dominance hierarchies.
- Animal Identification - Deep learning for animal identification.
- Species - Big Data analysis of different species of animals
- Animal Vocalisations - A generative network for animal vocalizations
- Evolutionary - Evolution Strategies Tool
- Glaciers - Educational material about glaciers.
Construction & Engineering
- DL Architecture - Deep learning classifier and image generator for building architecture.
- Construction Materials - A course on construction materials.
- Bad Actor Risk Model - Risk model to improve construction related building safety
- Inspectors - Determine the assigned inspections.
- Corrupt Social Interactions - Uncover potential corrupt social interactions between an industry member and the staff at the DOB
- Risk Construction - Identify high risk construction.
- Facade Risk - A risk model to predict unsafe facades.
- Staff Levels - Predicting staff levels for front line workers.
- Injuries - Building related injuries topic modelling.
- Building Violations - Predictive analysis of building violations.
- Productivity - Productivity analysis and inspection with Tableau.
- Structural Analysis - 2D Structural Analysis in Python.
- Structural Engineering - Structural engineering modules.
- Nusa - Structural analysis using the finite element method.
- StructPy - Structural Analysis Library for Python based on the direct stiffness method
- Aileron - Structural analysis of the aileron of a Boeing 737
- Vibration - Educational vibration programs.
- Civil - Collection of civil engineering tools in FreeCAD
- GEstimator - Simple civil estimation software
- Fatpack - Functions and classes for fatigue analysis of data series.
- Pysteel - Automated design of different steel structure
- Structural Uncertainty - Quantifying structural uncertainty with deep learning.
- Pymech - A Python module for mechanical engineers
- Aerospace Engineering - Astrodynamics and Statistics
- Interactive Quantum Chemistry - Combining Psi4 and Numpy for education and development.
- Chemical and Process Engineering - Various resources.
- PyTherm - Applied Thermodynamics
- Aerogami - Aerodynamics using planes.
- Electro geophysics - Interactive applications for electromagnetics in geophysics
- Graph Signal - Graph signal processing tutorial.
- Mechanical Vibrations - Mechanical Vibrations at the Univsersity of Louisiana.
- Process Dynamics - Process Dynamics and Control
- Battery Life Cycle - Data driven prediction of batter life cycle.
- Wind Energy - Python for wind energy
- Energy Use - Standard methods for calculating normalized metered energy consumption
- Nuclear Radiation - How people are affected by radiations emitted by nuclear power plants
- Python Materials Genomics - Robust material analysis code used in a well-established project.
- Materials Mining - Scripts for simulations and analysis of materials.
- Emmet - Build databases of material properties.
- Megnet - Graph networks as a ML framework for Molecules and Crystals
- Atomate - Pre-built workflows for computational material science.
- Bylaws Compliance - Predicting property fines.
- Asphalt Binder - Construction materials, free energy and chemical composition of asphalt binder.
- Steel - Optimisation of steel.
- Awesome Materials Informatics - Curated list of known efforts in materials informatics.
Economics
- Trading Economics API - Information for 196 countries.
- Development Economics - Development microeconomics are written mostly as interactive jupyter notebooks
- Applied Econ & Fin - Applied Computational Economics and Finance
- Macroeconomics - Topics in macroeconomics with notebook examples.
- EconML - Automated Learning and Intelligence for Causation and Economics.
- Auctions - Optimal auctions using deep learning.
- Quant Econ - Quantitative economics course by NYU
- Computational - Computational methods in economics.
- Computational 2 - Small course in computational economics.
- Econometric Theory - Notebooks of A Primer on Econometric theory.
Education & Research
- Student Performance - Mining student performance using machine learning.
- Student Performance 2 - Student exam performance.
- Student Performance 3 - Student achievement in secondary education.
- Student Performance 4 - Students Performance Evaluation using Feature Engineering
- Student Intervention - Building a student intervention system.
- Student Enrolment - Student enrolment and performance analysis.
- Academic Performance - Explore the demographic and family features that have an impact a student’s academic performance.
- Grade Analysis - Student achievement analysis.
- School Choice - Data analysis for education’s school choice.
- School Budgets and Priorities - Helping the school board and mayor make strategic decisions regarding future school budgets and priorities
- School Performance - Data analysis practice using data from data.utah.gov on school performance.
- School Performance 2 - Using pandas to analyze school and student performance within a district
- School Performance 3 - Philadelphia School Performance
- School Performance 4 - NJ School Performance
- School Closure - Identify schools at risk for closure by performance and other characteristics.
- School Budgets - Tools and techniques for school budgeting.
- School Budgets - Same as a above, datacamp.
- PyCity - School analysis.
- PyCity 2 - School budget vs school results.
- Budget NLP - NLP classification for budget resources.
- Budget NLP 2 - Further classification exercise.
- Budget NLP 3 - Budget classification.
- Survey Analysis - Education survey analysis.
Emergency & Police
- Emergency Mapping - Detection of destroyed houses in California
- Emergency Room - Supporting emergency room decision making
- Emergency Readmission - Adjusted Risk of Emergency Readmission.
- Forest Fire - Forest fire detection through UAV imagery using CNNs
- Emergency Response - Emergency response analysis.
- Emergency Transportation - Transportation prompt on emergency services
- Emergency Dispatch - Reducing response times with predictive modeling, optimization, and automation
- Emergency Calls - Emergency calls analysis project.
- Calls Data Analysis - 911 data analysis.
- Emergency Response - Chemical factory RL.
- Crime Classification - Times analysis of serious assaults misclassified by LAPD.
- Article Tagging - Natural Language Processing of Chicago news article
- Crime Analysis - Association Rule Mining from Spatial Data for Crime Analysis
- Chicago Crimes - Exploring public Chicago crimes data set in Python
- Graph Analytics - The Hague Crimes.
- Crime Prediction - Crime classification, analysis & prediction in Indore city.
- Crime Prediction - Developed predictive models for crime rate.
- Crime Review - Crime review data analysis.
- Crime Trends - The Crime Trends Analysis Tool analyses crime trends and surfaces problematic crime conditions
- Crime Analytics - Analysis of crime data in Seattle and San Francisco.
- Ambulance Analysis - An investigation of Local Government Area ambulance time variation in Victoria.
- Site Location - Ambulance site locations.
- Dispatching - Applying game theory and discrete event simulation to find optimal solution for ambulance dispatching
- Ambulance Allocation - Time series analysis of ambulance dispatches in the City of San Diego.
- Response Time - An analysis on the improvements of ambulance response time.
- Optimal Routing - Project to find optimal routing of ambulances in Ithaca.
- Crash Analysis - Predicting the probability of accidents on a given segment on a given time.
- Conflict Prediction - Notebooks on conflict prediction.
- Burglary Prediction - Spatio-Temporal Modelling for burglary prediction.
- Predicting Disease Outbreak - Machine Learning implementation based on multiple classifier algorithm implementations.
- Road accident prediction - Prediction on type of victims on federal road accidents in Brazil.
- Text Mining - Disaster Management using Text mining.
- Twitter and disasters - Try to correctly predict whether tweets that are about disasters.
- Flood Risk - Impact of catastrophic flood events.
- Fire Prediction - We used 4 different algorithms to predict the likelihood of future fires.
Finance
- For more see financial-machine-learning
- For asset management see financial-machine-learning
- Deep Portfolio - Deep learning for finance Predict volume of bonds.
- AI Trading - Modern AI trading techniques.
- Corporate Bonds - Predicting the buying and selling volume of the corporate bonds.
- Simulation - Investigating simulations as part of computational finance.
- Industry Clustering - Project to cluster industries according to financial attributes.
- Financial Modeling - HFT trading and implied volatility modeling.
- Trend Following - A futures trend following portfolio investment strategy.
- Financial Statement Sentiment - Extracting sentiment from financial statements using neural networks.
- Applied Corporate Finance - Studies the empirical behaviors in stock market.
- Market Crash Prediction - Predicting market crashes using an LPPL model.
- NLP Finance Papers - Curating quantitative finance papers using machine learning.
- ARIMA-LTSM Hybrid - Hybrid model to predict future price correlation coefficients of two assets
- Basic Investments - Basic investment tools in python.
- Basic Derivatives - Basic forward contracts and hedging.
- Basic Finance - Source code notebooks basic finance applications.
- Advanced Pricing ML - Additional implementation of Advances in Financial Machine Learning (Book)
- Options and Regression - Financial engineering project for option pricing techniques.
- Quant Notebooks - Educational notebooks on quant finance, algorithmic trading and investment strategy.
- Forecasting Challenge - Financial forecasting challenge by G-Research (Hedge Fund)
- XGboost - A trading algorithm using XgBoost
- Research Paper Trading - A strategy implementation based on a paper using Alpaca Markets.
- Various - Options, Allocation, Simulation
- ML & RL NYU - Machine Learning and Reinforcement Learning in Finance.
- Datastream - Datastrem from Thomson Reuters accessible through Python.
- AlphaVantage - API wrapper to simplify the process of acquiring free financial data.
- FSA- A project to transfer SEC Edgar Filings’ financial data to custom financial statement analysis models.
- TradeConnector - A layer to connect with market data providers.
- Employee Count SEC Filings - Extraction to get the exact employee count values for companies from SEC filings.
- SEC Parsing - NLP to find and extract specific information from long, unstructured documents
- Open Edgar - OpenEDGAR (openedgar.io)
- Rating Industries - Histories from multiple agencies converted to CSV format
Personal Papers
- Financial Machine Learning Regulation
- Predicting Restaurant Facility Closures
- Predicting Corporate Bankruptcies
- Predicting Earnings Surprises
- Machine Learning in Asset Management
Healthcare
- zEpid - Epidemiology analysis package.
- Python For Epidemiologists - Tutorial to introduce epidemiology analysis in Python.
- Prescription Compliance - An analysis of prescription and medical compliance
- Respiratory Disease - Tracking respiratory diseases in Olympic athletes
- Bubonic Plague - Bubonic plague and SIR model.
Justics, Law & Regulations
Tools
- LexPredict - Software package and library.
- AI Para-legal - Lobe is the world’s first AI paralegal.
- Legal Entity Detection - NER For Legal Documents.
- Legal Case Summarisation - Implementation of different summarisation algorithms applied to legal case judgements.
- Legal Documents Google Scholar - Using Google scholar to extract cases programatically.
- Chat Bot - Chat-bot and email notifications.
- Congress API - ProPublica congress API access.
- Data Generator GDPR - Dummy data generator for GDPR compliance
Policy and Regulatory
- GDPR scores - Predicting GDPR Scores for Legal Documents.
- Driving Factors FINRA - Identify the driving factors that influence the FINRA arbitration decisions.
- Securities Bias Correction - Bias-Corrected Estimation of Price Impact in Securities Litigation.
- Public Firm to Legal Decision - Embed public firms based on their reaction to legal decisions.
- Night Life Regulation - Australian nightlife and its regulation and policing
- Comments - Public comments on government regulations.
- Clustering - Clustering Canadian regulations.
- Environment - Regulation of Energy and the Environment
- Risk - Systematic risk of various financial regulations.
- FINRA Compliance - Topic modelling on compliance.
Judicial Applied
- Supreme Court Prediction - Predicting the ideological direction of Supreme Court decisions: ensemble vs. unified case-based model.
- Supreme Court Topic Modeling - Multiple steps necessary to implement topic modeling on supreme court decisions.
- Judge Opinion - Using text mining and machine learning to analyze judges’ opinions for a particular concern.
- ML Law Matching - A machine learning law match maker.
- Bert Multi-label Classification - Fine Grained Sentiment Analysis from AI.
- Some Computational AI Course - Video series Law MIT.
- Financial Machine Learning Regulation (Paper)
Manufacturing
- Green Manufacturing - Mercedes-Benz Greener Manufacturing competition on Kaggle.
- Semiconductor Manufacturing - Semicondutor manufacturing process line data analysis.
- Smart Manufacturing - Shared work of a modelling Methodology.
- Bosch Manufacturing - Bosch manufacturing project, Kaggle.
- Predictive Maintenance 1 - Predict remaining useful life of aircraft engines
- Predictive Maintenance 2 - Time-To-Failure (TTF) or Remaining Useful Life (RUL)
- Manufacturing Maintenance - Simulation of maintenance in manufacturing systems.
- Predictive Analytics - Method for Predicting failures in Equipment using Sensor data.
- Detecting Defects - Anomaly detection for defective semiconductors
- Defect Detection - Smart defect detection for pill manufacturing.
- Manufacturing Failures - Reducing manufacturing failures.
- Manufacturing Anomalies - Intelligent anomaly detection for manufacturing line.
- Quality Control - Bosh failure of quality control.
- Manufacturing Quality - Intelligent Manufacturing Quality Forecast
- Auto Manufacturing - Regression Case Study Project on Manufacturing Auction Sale Data.
Media & Publishing
- Video Popularity - HIP model for predicting the popularity of videos.
- YouTube transcriber - Automatically transcribe YouTube videos.
- Marketing Analytics - Marketing analytics case studies.
- Algorithmic Marketing - Models from Introduction to Algorithmic Marketing book
- Marketing Scripts - Marketing data science applications.
- Social Mining - Mining the social web.
Miscellaneous
- Painting Forensics - Analysing paintings to find out their year of creation.
- Flickr - Metadata mining tool for tourism research.
- Fashion - A clothing retrieval and visual recommendation model for fashion images
Physics
- Gamma-hadron Reconstruction - Tools used in Gamma-ray ground based astronomy.
- Curriculum - Newtonian notebooks.
- Interaction Networks - Interaction Networks for Learning about Objects, Relations and Physics.
- Particle Physics - Training, generation, and analysis code for learning Particle Physics
- Computational Physics - A computational physics repository.
- Medical Physics - Useful python for medical physics.
- Medical Physics 2 - A common, core Python package for Medical Physics
- Flow Physics - Flow Physics and Aeroacoustics Toolbox with Python
- Physics ML and Stats - Machine learning and statistics for physicists
- High Energy - Machine Learning for High Energy Physics.
- High Energy GAN - Generative Adversarial Networks for High Energy Physics.
- Neural Networks - Physics meets neural networks
Government and Public Works
Social Policies
- Triage - General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems.
- World Bank Poverty I - A comparative assessment of machine learning classification algorithms applied to poverty prediction.
- World Bank Poverty II - Repository for the World Bank Pover-t Test Competition Solution Overseas Company Land Ownership .
- Overseas Company Land Ownership - Identifying foreign ownership in the UK.
- CFPB - Consumer Finances Protection Bureau complaints analysis.
- Cannabis Legalisation Effect - Effects of cannabis legalization on crime.
- Public Credit Card - Identification of potential fraud for council credit cards. Data
- Recidivism Prediction - Transparency and audibility to recidivism risk assessment
- Household Poverty - Predict poverty in households in Costa Rica.
- NLP Public Policy - An example of an NLP use-case in public policy.
- World Food Production - Comparing Top food and feed Producers around the globe.
- Tax Inequality - Data project around taxation and inequality in Basel Stadt.
- Sheriff Compliance - Compliance to ICE requests.
- Apps Detection - Suspicious app detection for kids.
- Social Assistance - Trending information on social assistance
- Computational Social Science - Social data science summer school course.
- Liquor and Crime - Effect of liquor licenses issued on the crime rate.
- Animal Placement Kennels - Optimising animal placement in shelters.
- Staffing Wall - Independent exploration project on U.S. Mexican Border wall
- Worker Fatalities - Worker Fatalities and Catastrophes Map from OSHA data
- Census Data API - Pull variables from the 5-year American Community Survey.
- Philantropic Giving - Work done by numerous DataKind volunteers on harnessing Form 990 data
- Charity Recommender - NYC Charity Collaborative Recommender System on an Implicit DataSet.
- Donor Identification - A machine learning project in which we need to find donors for charity.
- US Charities - Charity exploration and machine learning.
- Charity Effectiveness - Scraping online data about charities to understand effectiveness
Election Analysis
- Election Analysis - Election Analysis and Prediction Models
- American Election Causal - Using ANES data with causal inference models.
- Campaign Finance and Election Results - Investigating the relation between campaign finance and subsequent election results.
- Voting System - Proportional representation voting methods.
- President Vote - Vote by income level analysis..
- Congressional politics - House and senate congressional partisanship.
- Politico - A platform for profiling public figures in Brazilian politics.
- Bots - Tools and algorithms to analyze Paraguayan Tweets in times of election
- Gerrymander tests - Lots of metrics for quantifying gerrymandering.
- Sentiment - Analyse newspapers with respect to their political conviction using entity sentiments of party representatives.
- DL Politics - Prediction of Spanish Political Affinity with Deep Neural Nets: Socialist vs People’s Party
- PAC Money - Effects of PAC money on US politics.
- Power Networks - Constructing a watchdog for Indian corporate and political networks
- Elite - Political elite in the US.
- Debate Analysis - Program to analyze political debates.
- Political Affiliation - Political affiliation prediction using twitter metadata.
- Political Ads - Investigation into Facebook Political Ads and Targeting
- Political Identity - Multi-axial political model.
- YT Politics - Mapping Politics on YouTube
- Political Ideology - Unsupervised learning of political ideology by word vector projections
Real Estate, Rental & Leasing
- Finding Donuts - Finding real estate opportunities by predicting transforming neighbourhoods.
- Neighbourhood - Predicting real estate prices from the urban environment.
- Real Estate Classification - Classifying the type of property given Real Estate, satellite and Street view Images
- Recommender - This tools aims to recommend a user the top 5 real estate properties that matches their search.
- House Price - Predicting house prices using Linear Regression and GBR
- House Price Portland - Predict housing prices in Portland.
- Zillow Prediction - Zillow valuation prediction as performed on Kaggle.
- Real Estate - Predicting real estate prices from the urban environment.
- Analysing Rentals - Analyzing and visualizing rental listings data.
- Interest Prediction - Predict people interest in renting specific NYC apartments.
- Housing Uni vs Non-Uni - The effect on university lodging after the GFC.
- Predict Household Poverty - Predict the poverty of households in Costa Rica using automated feature engineering.
- Airbnb public analytics competition: - Now strategic management.
Utilities
- Electricity Price - Electricity price comparison Singapore.
- Electricity-Coal Correlation - Determining the correlation between state electricity rates and coal generation over the past decade.
- Electricity Capacity - A Los Angeles Times analysis of California’s costly power glut.
- Electricity Systems - Optimal Wind+Hydrogen+Other+Battery+Solar (WHOBS) electricity systems for European countries.
- Load Disaggregation - Smart meter load disaggregation with Hidden Markov Models
- Price Forecasting - Forecasting Day-Ahead electricity prices in the German bidding zone with deep neural networks.
- Carbon Index - Calculation of electricity CO₂ intensity at national, state, and NERC regions from 2001-present.
- Demand Forecasting - Electricity demand forecasting for Austin.
- Electricity Consumption - Estimating Electricity Consumption from Household Surveys
- Household power consumption - Individual household power consumption LSTM.
- Electricity French Distribution - An analysis of electricity data provided by the French Distribution Network (RTE)
- Renewable Power Plants - Time series of cumulated installed capacity.
- Wind Farm Flow - A repository of wind plant flow models connected to FUSED-Wind.
- Power Plant - The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011).
- Coal Phase Out - Generation adequacy issues with Germany’s coal phaseout.
- Coal Prediction - Predicting coal production.
- Oil & Gas - Oil & Natural Gas price prediction using ARIMA & Neural Networks
- Gas Formula - Calculating potential economic effect of price indexation formula.
- Demand Prediction - Natural gas demand prediction.
- Consumption Forecasting - Natural gas consumption forecasting.
- Gas Trade - World Model for Natural Gas Trade.
- Safe Water - Predict health-based drinking water violations in the United States.
- Hydrology Data - A suite of convenience functions for exploring water data in Python.
- Water Observatory - Monitoring water levels of lakes and reservoirs using satellite imagery.
- Water Pipelines - Using machine learning to find water pipelines in aerial images.
- Water Modelling - Australian Water Resource Assessment (AWRA) Community Modelling System.
- Drought Restrictions - A Los Angeles Times analysis of water usage after the state eased drought restrictions
- Flood Prediction - Applying LSTM on river water level data
- Sewage Overflow - Insights into the sanitary sewage overflow (SSO). - This has been removed
- Water Accounting - Assembles water budget data for the US from existing data source
- Air Quality Prediction - Predict air quality(aq) in Beijing and London in the next 48 hours.
- Transdim - Creating accurate and efficient solutions for the spatio-temporal traffic data imputation and prediction tasks.
- Transport Recommendation - Context-Aware Multi-Modal Transportation Recommendation
- Transport Data - Data and notebooks for Toronto transport.
- Transport Demand - Predicting demand for public transportation in Nairobi.
- Demand Estimation - Implementation of dynamic origin-destination demand estimation.
- Congestion Analysis - Transportation systems analysis
- TS Analysis - Time series analysis on transportation data.
- Network Graph Subway - Vulnerability analysis for transportation networks. - Have been taken down
- Transportation Inefficiencies - Quantifying the inefficiencies of Transportation Networks
- Train Optimisation - Train schedule optimisation
- Traffic Prediction - multi attention recurrent neural networks for time-series (city traffic)
- Predict Crashes - Crash prediction modelling application that leverages multiple data sources
- AI Supply chain - Supply chain optimisation system.
- Transfer Learning Flight Delay - Using variation encoders in Keras to predict flight delay.
- Replenishment - Retail replenishment code for supply chain management.
Wholesale & Retail
- Customer Analysis - Wholesale customer analysis.
- Distribution - JB wholesale distribution analysis.
- Clustering - Unsupervised learning techniques are applied on product spending data collected for customers
- Market Basket Analysis - Instacart public dataset to report which products are often shopped together.
- Retail Analysis - Studying Online Retail Dataset and getting insights from it.
- Online Insights - Analyzing the Online Transactions in UK
- Retail Use-case - Notebooks & Data for CyberShop Retail Use Case
- Dwell Time - Customer dwell time and other analysis.
- Retail Cohort - Cohort analysis.
Business Machine Learning:
A curated list of applied business machine learning (BML) and business data science (BDS) examples and libraries. The code in this repository is in Python (primarily using jupyter notebooks) unless otherwise stated.
Table of Contents
Also see Python Business Analytics
Accounting
Machine Learning
- Chart of Account Prediction - Using labeled data to suggest the account name for every transaction.
- Accounting Anomalies - Using deep-learning frameworks to identify accounting anomalies.
- Financial Statement Anomalies - Detecting anomalies before filing, using R.
- Useful Life Prediction (FirmAI) - Predict the useful life of assets using sensor observations and feature engineering.
- AI Applied to XBRL - Standardized representation of XBRL into AI and Machine learning.
Analytics
- Forensic Accounting - Collection of case studies on forensic accounting using data analysis. On the lookout for more data to practise forensic accounting, please get in touch
- General Ledger (FirmAI) - Data processing over a general ledger as exported through an accounting system.
- Bullet Graph (FirmAI) - Bullet graph visualisation helpful for tracking sales, commission and other performance.
- Aged Debtors (FirmAI) - Example analysis to invetigate aged debtors.
- Automated FS XBRL - XML Language, however, possibly port analysis into Python.
Textual Analysis
- Financial Sentiment Analysis - Sentiment, distance and proportion analysis for trading signals.
- Extensive NLP - Comprehensive NLP techniques for accounting research.
Data, Parsing and APIs
- EDGAR - A walk-through in how to obtain EDGAR data.
- IRS - Acessing and parsing IRS filings.
- Financial Corporate - Rutgers corporate financial datasets.
- Non-financial Corporate - Rutgers non-financial corporate dataset.
- PDF Parsing - Extracting useful data from PDF documents.
- PDF Tabel to Excel - How to output an excel file from a PDF.
Research And Articles
- Understanding Accounting Analytics - An article that tackles the importance of accounting analytics.
- VLFeat - VLFeat is an open and portable library of computer vision algorithms, which has Matlab toolbox.
Websites
- Rutgers Raw - Good digital accounting research from Rutgers.
Courses
- Computer Augmented Accounting - A video series from Rutgers University looking at the use of computation to improve accounting.
- Accounting in a Digital Era - Another series by Rutgers investigating the effects the digital age will have on accounting.
Customer
Lifetime Value
- Pareto/NBD Model - Calculate the CLV using a Pareto/NBD model.
- Gamma-Gamma Model - Using deep-learning frameworks to identify accounting anomalies.
- Cohort Analysis - Cohort analysis to group customers into mutually exclusive cohorts measured over time.
Segmentation
- E-commerce - E-commerce customer segmentation.
- Groceries - Segmentation for grocery customers.
- Online Retailer - Online retailer segmentation.
- Bank - Bank customer segmentation.
- Wholesale - Clustering of wholesale customers.
- Various - Multiple types of segmentation and clustering techniques.
Behaviour
- RNN - Investigating customer behaviour over time with sequential analysis using an RNN model.
- Neural Net - Demand forecasting using artificial neural networks.
- Temporal Analytics - Investigating customer temporal regularities.
- POS Analytics - Analytics driven customer behaviour ranking for retail promotions using POS data.
- Wholesale Customer - Wholesale customer exploratory data analysis.
- RFM - Doing a RFM (recency, frequency, monetary) analysis.
- Returns Behaviour - Predicting total returns and fraudulent returns.
- Visits - Predicting which day of week a customer will visit.
- Bank: Next Purchase - A project to predict bank customers’ most probable next purchase.
- Bank: Customer Prediction - Predicting Target customers who will subscribe the new policy of the bank.
- Next Purchase - Predict a customers’ next purchase also using feature engineering.
- Customer Purchase Repeats - Using the lifetimes python library and real jewellery retailer data analyse customer repeat purchases.
- AB Testing - Find the best KPI and do A/B testing.
- Customer Survey (FirmAI) - Example of parsing and analysing a customer survey.
- Happiness - Analysing customer happiness from hotel stays using reviews.
- Miscellaneous Customer Analytics - Various tools and techniques for customer analysis.
Recommender
- Recommendation - Recommend the songs that a customer on a music app would prefer listening to.
- General Recommender - Identifying which products to recommend to which customers.
- Collaborative Filtering - Customer recommendation using collaborative filtering.
- Up-selling (FirmAI) - Analysis to identify up-selling opportunities.
Churn Prediction
- Ride Sharing - Identify customer churn rates in order to target customers for retention campaigns.
- KKDBox I - Variational deep autoencoder to predict churn customer
- KKDBox II - A three step customer churn prediction framework using feature engineering.
- Personal Finance - Predict customer subscription churn for a personal finance business.
- ANN - Churn analysis using artificial neural networks.
- Bike - Customer bike churn analysis.
- Cost Sensitive - Cost sensitive churn analysis drivenby economic performance.
Sentiment
- Topic Modelling - Topic modelling on a corpus of customer surveys from the VR industry.
- Customer Satisfaction - Predict customer satisfaction using Kaggle data.
Employee
Management
- Personality Prediction - Predict Big 5 Personality from text.
- Salary Prediction Resume - Textual analyses over resume to predict appropriate salary [Project Disappeared, still a cool idea]
- Employee Review Analysis - Review analytics for top 50 retail companies on Indeed.
- Diversity Analysis - A simple analysis of gender and race disparity in the tech industry.
- Occupation Prediction - Predict the likelihood that an occupation is analytical.
Performance
- Training Hours Performance - The impact of training ours on employee performance.
- Promotion Prediction - Analysing promotion patterns.
- Employee Attendance prediction - Various tools to predict employee attendance.
Turnover
- Early Leaving Employees - Identifying why the best and most experienced employees leaving prematurely.
- Employee Turnover - Identifying factors associated with employee turnover.
Conversations
- Slack Communication Analysis - Producing meaningful visualisations from slack conversations.
- Employee Relationships from Conversations - Identifying employee relationships from emails for improved HR analytics.
- Categorise Employee Requests - Classifying employee requests via TFDIF Vectorizer and RandomForestClassifier.
Physical
- Employee Face Recognition - A face recognition implementation.
- Attendance Management System - An attendance management system using face recognition.
Legal
Tools
- LexPredict - Software package and library.
- AI Para-legal - Lobe is the world’s first AI paralegal.
- Legal Entity Detection - NER For Legal Documents.
- Legal Case Summarisation - Implementation of different summarisation algorithms applied to legal case judgements.
- Legal Documents Google Scholar - Using Google scholar to extract cases programatically.
- Chat Bot - Chat-bot and email notifications.
Policy and Regulatory
- GDPR scores - Predicting GDPR Scores for Legal Documents.
- Driving Factors FINRA - Identify the driving factors that influence the FINRA arbitration decisions.
- Securities Bias Correction - Bias-Corrected Estimation of Price Impact in Securities Litigation.
- Public Firm to Legal Decision - Embed public firms based on their reaction to legal decisions.
Judicial Applied
- Supreme Court Prediction - Predicting the ideological direction of Supreme Court decisions: ensemble vs. unified case-based model.
- Supreme Court Topic Modeling - Multiple steps necessary to implement topic modeling on supreme court decisions.
- Judge Opinion - Using text mining and machine learning to analyze judges’ opinions for a particular concern.
- ML Law Matching - A machine learning law match maker.
- Bert Multi-label Classification - Fine Grained Sentiment Analysis from AI.
- Some Computational AI Course - Video series Law MIT.
Management
Strategy
- Topic Model Reviews - Amazon reviews for product development.
- Patents - Forecasting strategy using patents.
- Networks - Business categories from Yelp reviews using networks can help to identify pockets of demand.
- Company Clustering - Hierarchical clusters and topics from companies by extracting information from their descriptions on their websites
- Marketing Management - Programmatic marketing management.
Decision Optimisation
- Constraint Learning - Machine learning that takes into account constraints.
- Fairlearn - I think it is called cost-sensitive machine learning.
- Multi-label Classification - Cost-Sensitive Multi-Label Classification
- Multi-class Classification - Cost-sensitive multi-class classification (Weighted-All-Pairs, Filter-Tree & others)
- CostCla - Costcla is a Python module for cost-sensitive machine learning (classification) built on top of Scikit-Learn
- DEA Software - pyDEA is a software package developed in Python for conducting data envelopment analysis (DEA).
- Covering Set (FirmAI) - Constraint programming analysis.
- Insurance (FirmAI) - CP Insurance analysis.
- Machine Learning + CP (FirmAI) - Machine Learning + Optimisation.
- Post Office (FirmAI) - Post Office optimisation.
- Soda - CP (FirmAI) - Constraint Programming + ML.
- Soda - Knapsack (FirmAI) - Knapsack algorithm + ML.
- Soda - MLP (FirmAI) - MLP analysis + ML.
Casual Inference
- Marketing AB Testing - A/B Testing Experiment.
- Legal Studies - Instrumental and discontinuity causal approach.
- A-B Test Result (FirmAI) - Initial A-B Results.
- Causal Regression (FirmAI) - Regression technique for causal estimate.
- Frequentist vs Bayesian A-B Test (FirmAI) - Comparison between frequentist and bayesian A-B testing.
- A-B Test Power Analysis (FirmAI) - Sample size estimation to match testing power.
- Variance Reduction A-B test (FirmAI) - Techniques to reduce variance in A-B tests.
Statistics
- Various - Various applies statistical solutions
Quantitative
- Applied RL - Reinforcement Learning and Decision Making tutorials explained at an intuitive level and with Jupyter Notebooks
- Process Mining - Leveraging A-priori Knowledge in Predictive Business Process Monitoring
- TS Forecasting - Time series forecasting for important business applications.
Data
- Web Scraping (FirmAI) - Web scraping solutions for Facebook, Glassdoor, Instagram, Morningstar, Similarweb, Yelp, Spyfu, Linkedin, Angellist.
Operations
Failure and Anomalies
- Anomalies - Anomaly detection resources.
- Intrusion Detection - Detecting network intrusions.
- APS Failure, Data - Investigating APS failures in Scania trucks.
- Hardware Failure - Using different machine learning techniques in detecting anomalies.
- Anomaly KIs,Paper - Anomaly detection algorithm for seasonal KPIs.
Load and Capacity Management
- House Load Energy - Linear, SVR and Random Forest models to predict house’s appliances energy Load.
- Uber Load Management - Uber predictive load management.
- Capacity Management - Investigating IT stability issues are caused by capacity constraints.
- Bike Sharing - XGBRegressor, RandomForestRegressor, GradientBoostingRegressor combined with feature selection.
- Airline Fleet Segmentation - Analysis of Delta airlines.
- Airbnb - Airbnb Booking Analysis.
Prediction Management
- Dispute Prediction - Financial service complaint management.
- Fight Delay Prediction - Transfer learning for flight-delay prediction via variational autoencoders in Keras.
- Electric Fault Prediction - Predict tripping at grid stations by applying simple machine learning algorithms.
- Popularity Prediction in R - Marked Hawkes Point Process .
Financial Machine Learning:
A curated list of practical financial machine learning (FinML) tools and applications. This collection is primarily in Python.
Trading
Deep Learning
- Deep Learning - Technical experimentations to beat the stock market using deep learning.
- Deep Learning II - Tensorflow Regression.
- Deep Learning III - Algorithmic trading with deep learning experiments.
- Deep Learning IV - Bulbea: Deep Learning based Python Library.
- LTSM GRU - Stock Market Forecasting using LSTM\GRU.
- LTSM Recurrent - OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network.
- ARIMA-LTSM Hybrid - Hybrid model to predict future price correlation coefficients of two assets.
- Neural Network - Neural networks to predict stock prices.
- AI Trading - AI to predict stock market movements.
Reinforcement Learning
- RL Trading - A collection of 25+ Reinforcement Learning Trading Strategies - Google Colab.
- RL - OpenGym with Deep Q-learning and Policy Gradient.
- RL II - reinforcement learning on stock market and agent tries to learn trading.
- RL III - Github - Deep Reinforcement Learning based Trading Agent for Bitcoin.
- RL IV - Reinforcement Learning for finance.
- RL V - Building an Agent to Trade with Reinforcement Learning.
- Pair Trading RL - Using deep actor-critic model to learn best strategies in pair trading.
Other Models
- Mixture Models I - Mixture models to predict market bottoms.
- Mixture Models II - Mixture models and stock trading.
- Scikit-learn Stock Prediction - Using python and scikit-learn to make stock predictions.
- Fundamental LT Forecasts - Research in investment finance for long term forecasts.
- Short-Term Movement Cues - Identify social/historical cues for short term stock movement.
- Trend Following - A futures trend following portfolio investment strategy.
Data Processing Techniques and Transformations
- Advanced ML - Exercises too Financial Machine Learning (De Prado).
- Advanced ML II - More implementations of Financial Machine Learning (De Prado).
Portfolio Management
Portfolio Selection and Optimisation
- Distribution Characteristic Optimisation - Extends classical portfolio optimisation to take the skewness and kurtosis of the distribution of market invariants into account.
- Reinforcement Learning - Reinforcement Learning for Portfolio Management.
- Efficient Frontier - Modern Portfolio Theory.
- Policy Gradient Portfolio - A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem.
- Deep Portfolio Theory - Autoencoder framework for portfolio selection.
- 401K Portfolio Optimisation - Portfolio analyses and optimisation for 401K.
- Online Portfolio Selection - **Comparing OLPS algorithms on a diversified set of ETFs.
- OLMAR Algorithm - Relative importance of each component of the OLMAR algorithm.
- Modern Portfolio Theory - Universal portfolios; modern portfolio theory.
Factor and Risk Analysis:
- Various Risk Measures - Risk measures and factors for alternative and responsible investments.
- Pyfolio - Portfolio and risk analytics in Python.
- Risk Basic - Active portfolio risk management .
- CAPM - Expected returns using CAPM.
- Factor Analysis - Factor analysis for mutual funds.
- VaR GaN - Estimate Value-at-Risk for market risk management using Keras and TensorFlow.
- VaR - Value-at-risk calculations.
- Python for Finance - Various financial notebooks.
- Performance Analysis - Performance analysis of predictive (alpha) stock factors.
- Quant Finance - General quant repository.
- Risk and Return - Riskiness of portfolios and assets.
- Convex Optimisation - Convex Optimization for Finance.
- Factor Analysis - Factor strategy notebooks.
- Statistical Finance - Various financial experiments.
Techniques
Unsupervised:
- PCA Pairs Trading - PCA, Factor Returns, and trading strategies.
- Fund Clusters - Data exploration of fund clusters.
- VRA Stock Embedding - Variational Reccurrent Autoencoder for Embedding stocks to vectors based on the price history.
- Industry Clustering - Clustering of industries.
- Pairs Trading - Finding pairs with cluster analysis.
- Industry Clustering - Project to cluster industries according to financial attributes.
Textual:
- NLP - This project assembles a lot of NLP operations needed for finance domain.
- Earning call transcripts - Correlation between mutual fund investment decision and earning call transcripts.
- Buzzwords - Return performance and mutual fund selection.
- Fund classification - Fund classification using text mining and NLP.
- NLP Event - Applying Deep Learning and NLP in Quantitative Trading.
- Financial Sentiment Analysis - Sentiment, distance and proportion analysis for trading signals.
- Financial Statement Sentiment - Extracting sentiment from financial statements using neural networks.
- Extensive NLP - Comprehensive NLP techniques for accounting research.
- Accounting Anomalies - Using deep-learning frameworks to identify accounting anomalies.
Other Assets
Derivatives and Hedging:
- Options - Introduction to options.
- Derivative Markets - The economics of futures, futures, options, and swaps.
- Black Scholes - Options pricing.
- Computational Derivatives - Projects focusing on investigating simulations and computational techniques applied in finance.
- Reinforcement Learning - Hedging portfolios with reinforcement learning.
- Delta Hedging - Advanced derivatives.
- Options Risk Measures - Efficient financial risk estimation via computer experiment design (regression + variance-reduced sampling).
- Derivatives Python - Derivative analytics with Python.
- Volatility and Variance Derivatives - Volatility derivatives analytics.
- Options - Black Scholes and Copula.
- Option Strategies - Valuation of Vanilla and Exotic option strategies (Butterfly, Risk Reversal etc.) with widget animations.
- Derman - Binomial tree for American call.
- Hull White - Callable Bond, Hull White.
Fixed Income
- Vasicek - Bootstrapping and interpolation.
- Binomial Tree - Utility functions in fixed income securities.
- Corporate Bonds - Predicting the buying and selling volume of the corporate bonds.
Alternative Finance
- Kiva Crowdfunding - Exploratory data analysis.
- Venture Capital - Insight into a new founder to make data-driven investment decisions.
- Venture Capital NN - Cox-PH neural network predictions for VC/innovations finance research.
- Private Equity - Valuation models.
- VC OLS - VC regression.
- Watch Valuation - Analysis of luxury watch data to classify whether a certain model is likely to be over- or undervalued.
- Art Valuation - Art evaluation analytics.
- Blockchain - Repository for distributed autonomous investment banking.
Extended Research:
- HFT - High frequency trading.
- Deep Portfolio - Deep learning for finance Predict volume of bonds.
- Mathematical Finance - Notebooks for math and financial tutorials.
- NLP Finance Papers - Curating quantitative finance papers using machine learning.
- Simulation - Investigating simulations as part of computational finance.
- Market Crash Prediction - Predicting market crashes using an LPPL model.
- Commodity - Commodity influence over Brazilian stocks.
- Finance Graph Theory - Modelling Contentedness of Firms in Financial Markets with Heterogeneous Agents.
- Real Estate Property Fraud - Unsupervised fraud detection model that can identify likely candidates of fraud.
- Behavioural Economics - Behavioural Economics and Finance Python Notebooks.
- Bayesian Finance - Notebook PyMC3 implementation.
- Bayesian Finance I - Stochastic Process Calibration using Bayesian Inference & Probabilistic Programs.
- Currency PCA - Forex spots PCA.
- Backtests - Trading data and algorithms.
- High Frequency - A Python toolkit for high-frequency trade research.
- Financial Economics - Financial Economics Models.
- Critical Transitions - Detecting critical transitions in financial networks with topological data analysis.
- Economic Foundations - Basic economic models.
- Corporate Finance - Basic corporate finance.
- Applied Corporate Finance - Studies the empirical behaviours in stock market.
- M&A - Mergers and Acquisitions.
- Life-cycle - Company life cycle.
- Computational Finance - Applied Computational Economics and Finance.
- Liquidity and Momentum - Various factors and portfolio constructions.
Courses
- Mathematical Finance - NYU Math-GA 2048: Scientific Computing in Finance.
- Algo Trading - Intro to algo trading.
- Python for Finance - CEU python for finance course material.
- Handson Python for Finance - Hands-on Python for Finance published by Packt.
- Machine Learning for Trading - Notebooks, resources and references accompanying the book Machine Learning for Algorithmic Trading.
- ML Specialisation - Machine Learning in Finance.
- Risk Management - Finance risk engagement course resources.
- Basic Investments - Basic investment tools in python.
- Basic Derivatives - Basic forward contracts and hedging.
- Basic Finance - Source code notebooks basic finance applications.
Data
- Employee Count SEC Filings
- SEC Parsing
- Open Edgar
- EDGAR
- IRS
- Rating Industries
- Web Scraping (FirmAI)
- Financial Corporate
- Non-financial Corporate
- http://finance.yahoo.com/
- https://fred.stlouisfed.org/
- https://stooq.com
- https://github.com/timestocome/StockMarketData
Personal Papers
- Financial Machine Learning Regulation
- Predicting Restaurant Facility Closures
- Predicting Corporate Bankruptcies
- Predicting Earnings Surprises
- Machine Learning in Asset Management