open source machine learning tools


Machine Learning technology is advancing fast to enhance the way we analyze and understand our data. Some of the most powerful tools for machine learning are free and easily accessible through the open source community.

Here at makepath, we advocate for the open source community and actively work to integrate machine learning into open source GIS applications through our contributions to open source projects.

We’ve created a list of tools to highlight the latest and greatest in open source machine learning software.

Open Source Machine Learning Tools

Tensorflow

TensorFlow logo

TensorFlow is an end-to-end platform that provides a flexible ecosystem of tools for state-of-the-art machine learning.

Languages: Python, C++, Haskell, Java, Go, Rust, JavaScript

Type: Model development, Model optimization, Deep Learning, Scaling, Reinforcement Learning

Creator(s): Google

Date Started: 2015

Cortex

cortex logo

Cortex is a cloud infrastructure for scalable machine learning.

Languages: Python

Type: Model optimization, Scaling

Creator(s): Grafana Labs

Date Started: 2016

auto-sklearn

Auto-Sklearn logo

auto-sklearn takes over algorithm selection and hyperparameter tuning form the user, leveraging Bayesian optimization, meta-learning, and ensemble construction.

Languages: Python

Type: Model optimization, Classical Machine Learning

Creator(s): Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter

Date Started: 2021

PyTorch

PyTorch logo

PyTorch is a machine learning framework for applications such as computer vision and natural language processing.

Languages: Python, C++

Type: Model development, Model optimization, Deep Learning, Reinforcement Learning

Creator(s): Facebook AI Research

Date Started: 2016

Apache Mahout

Apache Mahout Logo

Apache Mahout produces free implementations of distributed and scalable machine learning algorithms.

Languages: Java, Scala

Type: Model optimization, Scaling

Creator(s): Apache Software Foundation

Date Started: 2009

Shogun

Shogun logo

Shogun offers methods for efficient and unified machine learning.

Languages: Python, Octave, Java/Scala, Ruby, C#, R, Perl, JavaScript

Type: Model optimization, Classical Machine Learning

Creator(s): NumFOCUS

Date Started: 1999

Compose

compose logo

Compose is for automated prediction engineering. With Compose you can structure predictions and generate labels for supervised learning.

Languages: Python

Type: Data preparation

Creator(s): Alteryx

Date Started: 2019

r-spatial

R logo

R-Spatial is an ecosystem of code and packages developed using R for working with and adding value to spatial data. Packages include, but are not limited to, sf, stars, mapview, gstat, spdep, raster and terra.

Languages: R

Type: Data preparation, Classical Machine Learning

Creator(s): R-Spatial community

Date Started: 2003

Weka

WEKA logo

Weka includes tools for data preparation, classification, regression, clustering, and other machine learning algorithms used for data mining.

Languages: Java

Type: Data preparation, Model optimization, Classical Machine Learning

Creator(s): University of Waikato, NZ

Date Started: 1993

Keras

Keras logo

Keras offers consistent and simple APIs to reduce cognitive load.

Languages: Python

Type: Model development, Model optimization, Deep Learning, API

Creator(s): Google

Date Started: 2015

caffe

Caffe Logo

Caffe provides a deep learning framework designed with expression, speed, and modularity.

Languages: C++, Python

Type: Model optimization, Deep Learning

Creator(s): Berkeley AI Research

Date Started: 2013

H2O

H2O AI logo

H2O is a platform for distributed and scalable machine learning. It works well with big data tech like Hadoop and Spark.

Languages: R, Python, Scala, Java, JSON

Type: Model optimization, Scaling, Classical Machine Learning, Deep Learning

Creator(s): H2O

Date Started: 2015

GoLearn

golearn logo

GoLearn is a machine learning library for Go.

Languages: Go

Type: Model development

Creator(s): Stephen James Whitworth

Date Started: 2014

Gradio

gradio logo

Gradio is for creating web-based UIs that enabled users to interact with models in real time. Makes model demos easy.

Languages: Python

Type: UI, API

Creator(s): Gradio

Date Started: 2019

Featuretools

Featuretools Logo

Featuretools is a framework for automated feature engineering that excels at changing temporal and relational datasets into feature matrices.

Languages: Python

Type: Data preparation

Creator(s): Alteryx

Date Started: 2017

Scikit-learn

scikit-learn logo

scikit-learn is a Python module for machine learning built on top of SciPy.

Languages: Python

Type: Model optimization, Classical Machine Learning

Creator(s): David Cournapeau

Date Started: 2007

aesara

aesara logo

aesara lets users define, optimize, and evaluate mathematical expressions with multi-dimensional arrays.

Languages: Python

Type: Math

Creator(s): Aesara Contributors

Date Started: 2020

gobrain

gobrain logo

GoBrain provides Neural Networks written in Go.

Languages: Go

Type: Model optimization, Deep Learning

Creator(s): Jonas Trevisan

Date Started: 2014

fastai

fast.ai logo

fastai is a library for deep learning results, and lets researchers mix in low level components to build new approaches.

Languages: Python

Type: Model development, Model optimization, Deep Learning

Creator(s): Jeremy Howard and Dr. Rachel Thomas

Date Started: 2020

polyaxon

polyaxon Logo

polyaxon enables building, training, and monitoring of large scale deep learning apps.

Languages: Python

Type: Scaling, Model optimization, Deep Learning

Creator(s): Mourad Mourafiq

Date Started: 2004

Oryx 2

oryx 2 logo

Oryx 2 specializes in real-time large scale machine learning. Not only a framework for building applications, it includes packaged end-to-end apps for collaborative filtering, classification, regression, and clustering.

Languages: Java

Type: Model optimization, Scaling, Classical Machine Learning

Creator(s): Sean Owen

Date Started: 2015

PyTorch Lightning

PyTorch Lightning logo

PyTorch Lightning is a research framework for scaling models.

Languages: Python

Type: Model development, Model optimization, Deep Learning, Reinforcement Learning

Creator(s): William Falcon

Date Started: 2019

MLlib

Spark MLlib Logo

MLlib is a scalable machine learning library. It fits into Spark’s APIs and interoperates with NumPy in Python.

Languages: Java, Python

Type: Model optimization, Scaling, Classical Machine Learning

Creator(s): Apache Software Foundation

Date Started: 2015

Microsoft Cognitive Toolkit

Microsoft Cognitive Toolkit Logo

Microsoft Cognitive Toolkit is a toolkit for commercial-grade distributed deep learning. Users can combine popular model types, and implement algorithms across multiple servers and GPUs.

Languages: Python, C++, BrainScript

Type: Model development, Model optimization, Deep Learning, Scaling

Creator(s): Microsoft

Date Started: 2016

Theano

theano logo

Theano lets you define, optimize, and evaluate mathematical expressions with multi-dimensional arrays.

Languages: Python

Type: Model optimization, Classical Machine Learning

Creator(s): University of Montreal

Date Started: 2007

Torch

torch logo

Torch is a scientific computing framework that supports machine learning algorithms on GPUs.

Languages: Lua, LuaJIT, C, CUDA, C++

Type: Model development, Neural Networks, Predictions, Training

Creator(s): Ronan Collobert, Samy Bengio, and Johnny Mariéthoz

Date Started: 2002

Accord.net

Accord.net Logo

Accord.NET is a machine learning framework combined with audio and image processing libraries.

Languages: C#

Type: Data preparation, Model optimization, Classical Machine Learning

Creator(s): Cesar De Souza

Date Started: 2007

MLflow

mlflow logo

MLflow streamlines machine learning dev by tracking experiments, packaging code, and sharing/deploying models.

Languages: Python, R, Java/Scala

Type: Model optimization, Model development, Scaling, Deep Learning, Classical Machine Learning

Creator(s): Databricks

Date Started: 2018

BigDL

BigDL Logo

BigDL is a distributed deep learning library for Apache Spark. Users can write deep learning apps as Spark programs.

Languages: Scala, Python, Java

Type: Model optimization, Scaling

Creator(s): Jason (Jinquan) Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Li (Cherry) Zhang, Yan Wan, Zhichao Li, Jiao Wang, Shengsheng Huang, Zhongyuan Wu, Yang Wang, Yuhao Yang, Bowen She, Dongjie Shi, Qi Lu, Kai Huang, and Guoqiong Song

Date Started: 2017

OpenCV

OpenCV logo

OpenCV is a computer vision and machine learning library that provides common infrastructure to accelerate machine perception.

Languages: C++, Python, Java, MATLAB

Type: Data preparation, Feature engineering

Creator(s): Intel

Date Started: 2000

Dask

dask logo

Dask is a flexible parallel computing library for analytics.

Languages: Python

Type: Scaling

Creator(s): Matt Rocklin

Date Started: 2015

Ray

Ray logo

Ray provides a universal API for building distributed applications.

Languages: Python, C++

Type: Model development, Model optimization, Reinforcement Learning, Scaling

Creator(s): anyscale

Date Started: 2017

TVM

tvm logo

TVM bridges productivity-focused deep learning with performance and efficiency-focused hardware backends.

Languages: Python, C++

Type: Model development, Scaling, Deep Learning

Creator(s): OctoML

Date Started: 2017

Transformers

Transformers logo

Transformers provides machine learning for Pytorch, TensorFlow, and JAX. There are thousands of pretrained models for tasks like text, vision, and audio tasks.

Languages: Python

Type: Model optimization, Deep Learning

Creator(s): Hugging Face

Date Started: 2018

DVC

DVC logo

Data Version Control (DVC) is for data science and machine learning projects.

Languages: Python

Type: Model optimization, Data preparation

Creator(s): Iterative

Date Started: 2017

CML

CML logo

Continuous Machine Learning (CML) is a tool for implementing continuous integration and delivery. It can automate development workflows like: machine provisioning, model training, and comparing ML experiments.

Languages: Python

Type: Model optimization, Data preparation

Creator(s): Iterative

Date Started: 2020

feast

feast logo

feast is a feature store for machine learning applications, and a path to analytic data production for model training.

Languages: Java

Type: Feature engineering

Creator(s): Tecton

Date Started: 2019

How to Contribute to Machine Learning Projects

The steps for contributing to machine learning projects are the same as contributing to other open source projects.

Check out our guide to Getting Started with Open Source for step-by-step instructions on how to get involved and make your first open source contribution.

Contact Us

Our goal is to create a comprehensive list of all the best open source machine learning tools.

Have any suggestions for new libraries?

Please email us at contact@makepath.com.