PyCaret: Low-code Machine Learning Library that Accelerates Model Building Pipeline

Yicheng Bao
4 min readMar 25, 2021

--

Overview

In this article, we are going to share the experience of using PyCaret, a low-code library for building Machine Learning pipelines. The purpose of this article is to introduce the usage of PyCaret and how we can use it to improve model building pipelines. The article is divided into three parts: introduction, a demo of applying PyCaret, strengths, and limitations.

Part 1 : Introduction

Currently, machine learning is widely used in production, but it is hard for non-tech people to do machine learning tasks without professional training. Software engineers and data scientists still need to write tons of codes to train or deploy machine learning models. PyCaret can solve these problems with its low-code feature.

What is PyCaret?

PyCaret is an open-source low-code machine learning library in Python that targets to accelerate the process of constructing ML experiments and building machine learning pipelines. With this library, users can train, evaluate and deploy machine learning models efficiently. Compared to other machine learning libraries, PyCaret is simple and easy to use, which requires only a few lines of code to perform complex machine learning tasks. All the operations performed in PyCaret are automatically stored in a custom Pipeline that is fully orchestrated for deployment. Here are some parts of the ML workflow included in PyCaret :

Problems addressed by PyCaret :

PyCaret has the ability to train models in different machine learning problems such as supervised learning & unsupervised learning. Specifically, it can solve classification and regression problems in supervised learning and clustering-related tasks in unsupervised learning. Here are some examples for each specific problem :

Tasks like spam detection and stock price prediction are really common in the industry, and PyCaret covers all of these common ML problems.

Part 2 : A demo of applying PyCaret in exploring movie rating dataset

In this section, we will apply PyCaret on a movie dataset to do a prediction task. We use the movielens dataset which is the same as the one we use for milestone 1 and explore if we can predict movie ratings based on the user & movie features.

Figure 1. The dataset

Data preprocessing in one line of code

With the setup() function, you can deal with data preprocessing tasks like filling in missing data, one-hot encoding, data transformation, and do feature engineering, clustering, training & test splitting by adding parameters in one step.

Model comparison and selection

PyCaret has pre-installed many models, which enables us to quickly compare them over a selection of metrics.

Model training and tuning

There is no need to flip back and forth on trying different combinations of parameters.Instead, PyCaret takes care of it all.

Model evaluation and interpretation

Evaluate the model over all metrics with one call.

Plotting different metrics with predefined style.

Part 3 : Strength and limitations

Strength

Low-code: The most significant strength of PyCaret is its low-code feature. In comparison to other machine learning frameworks such as scikit-learn, it is trivial to find out that we write less code to the same tasks using PyCaret. Doing so enables users to do any machine learning tasks efficiently, which is a huge advantage.

Integration: PyCaret also integrates many ML frameworks such as spaCy, Tableau and etc, so it can perform complex problems in multiple fields including Natural Language Processing and Data Visualization.

Easy to use: It is easy for anyone to use PyCaret due to its good-looking UI and detailedly written documents.

Limitations

Maintainability: The library only has its first version, so it’s prone to bugs.

Explainability: Due to its low-code feature, multiple functions are combined together so it is hard to explain every part inside a function.

Limited ML fields: Modern fields in ML-like Reinforcement Learning are not included in PyCaret, so some machine learning engineers still can not use it widely.

References:

https://pycaret.org/

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Yicheng Bao
Yicheng Bao

Responses (3)

Write a response