Best Production Machine Learning Tools & Tech Stack
Machine Learning Pipelines
Machine Learning (ML) pipelines help to automate the ML life cycle, streamline the workflow, and unlock faster iteration of models from development to deployment. Pipelines also allow data, models and experiments to be more easily tracked, and monitored. This is especially important when there are a lot of models to be maintained in production, and when collaborating in a team with other developers.
Some advantages of pipelines include:
- Fully automated process
- Easier to manage and maintain the number of models
- Better scaling
- Unlocks faster iteration of ML life cycle
- Automated testing and performance monitoring
- Version-controlled
- Standardization
A typical ML workflow consists of: data processing, feature engineering, model training, model validation and model deployment.

But there are so many resources and tools out there, which one should I use for production ML?
In this article, I have compiled a list of some of the best open source tools and libraries I have come across. (This list is not exhaustive, and I plan to keep on updating it in the future). These tools have the power to give data scientists, ML engineers, and developers a boost in orchestrating, deploying, and monitoring their ML workflow for production.
Tools for Production Machine Learning:
1. ETL and Data Pipelines:
Airflow — A platform to programmatically build, schedule, and monitor workflows. You can use Airflow to build workflows as DAGs of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies.
Argo Workflows — A container-native workflow engine for orchestrating parallel jobs on Kubernetes.
Luigi — Helps to build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, etc.
Dagster — A data orchestrator for ML, analytics, and ETL.
2. Feature Engineering:
auto-sklearn — An automated machine learning toolkit, it automatically searches for the right learning algorithm for a new ML dataset and optimizes its hyperparameters.
AutoML-GS — A zero code interface for getting an optimized model and data transformation pipeline in multiple popular ML/DL frameworks, with minimal Python dependencies (pandas + scikit-learn + your framework of choice).
FEAST (Feature Store) — A tool to to help bridge between data and machine learning models; allowing teams to register, ingest, serve, and monitor features in production.
3. Orchestrator for Model Training
Kubeflow — A cloud native platform dedicated to make deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
Airflow — Asides from being useful for data and ETL pipelines, airflow can also be used as an orchestration tool to train ML models.
4. Data and Model Versioning
DVC (Data Version Control) — Version control system for Data Science and ML projects. Using Git workflow to organize your data, models, and experiments.
MLflow — A platform to streamline ML development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc).
Pachyderm — A platform for data versioning, data pipelines, and data lineage.
5. Monitoring and Serving
Seldon Core — Seldon core converts your ML models (Tensorflow, Pytorch, H2o, etc.) or language wrappers (Python, Java, etc.) into production REST/GRPC microservices. It also deploys and monitors your ML models in Kubernetes.
Tensorflow Serving — A flexible, high-performance serving system for ML models, designed for production environments. Provides out of the box integration with TensorFlow models, but can be easily extended to serve other types of models.
Some of these tools overlap in functionalities; make sure to choose the ones that’s best fitted for your tasks and for your team.
Join us on Project Alesia where I talk more things about Machine Learning, MLOps, Data Science, and much more.