Workflow in ML#

Refer to Pytorch workflow

Topic

Contents

1. Getting data ready

Data can be almost anything but to get started we’re going to create a simple straight line

2. Building a model

Here we’ll create a model to learn patterns in the data, we’ll also choose a loss function, optimizer and build a training loop.

3. Fitting the model to data (training)

We’ve got data and a model, now let’s let the model (try to) find patterns in the (training) data.

4. Making predictions and evaluating a model (inference)

Our model’s found patterns in the data, let’s compare its findings to the actual (testing) data.

5. Saving and loading a model

You may want to use your model elsewhere, or come back to it later, here we’ll cover that.

1. Data (preparing and loading)#

Data in ML can be anything, but must be turned into numbers (normally represented in tensors)

Turn data to number#

Turning data to numbers is called Numerical encording

Split data into training and test sets#

Split

Purpose

Amount of total data

How often is it used?

Training set

The model learns from this data (like the course materials you study during the semester).

~60-80%

Always

Validation set

The model gets tuned on this data (like the practice exam you take before the final exam).

~10-20%

Often but not always

Testing set

The model gets evaluated on this data to test what it has learned (like the final exam you take at the end of the semester).

~10-20%

Always

Note

Should keep in mind the data explorer’s motto… “visualize, visualize, visualize!”

Think of this whenever you’re working with data and turning it into numbers, if you can visualize something, it can do wonders for understanding.

Machines love numbers and we humans like numbers too but we also like to look at things.

2. Build model#

PyTorch model building essentials#

PyTorch has four (give or take) essential modules you can use to create almost any kind of neural network you can imagine. They are

PyTorch module

What does it do?

torch.nn

Contains all of the building blocks for computational graphs (essentially a series of computations executed in a particular way).

torch.nn.Parameter

Stores tensors that can be used with nn.Module. If requires_grad=True gradients (used for updating model parameters via gradient descent) are calculated automatically, this is often referred to as “autograd”.

torch.nn.Module

The base class for all neural network modules, all the building blocks for neural networks are subclasses. If you’re building a neural network in PyTorch, your models should subclass nn.Module. Requires a forward() method be implemented.

torch.optim

Contains various optimization algorithms (these tell the model parameters stored in nn.Parameter how to best change to improve gradient descent and in turn reduce the loss).

def forward()

All nn.Module subclasses require a forward() method, this defines the computation that will take place on the data passed to the particular nn.Module (e.g. the linear regression formula above).

import torch
from torch import nn

# Create a Linear Regression model class
class LinearRegressionModel(nn.Module): #  nn.Module is almost everything in PyTorch (as neural network lego blocks)
    def __init__(self):
        super().__init__() 
        self.weights = nn.Parameter(torch.randn(1, # start with random weights (this will get adjusted as the model learns)
                                    dtype=torch.float), # <- PyTorch loves float32 by default
                                    requires_grad=True) # <- can we update this value with gradient descent?)

        self.bias = nn.Parameter(torch.randn(1, # start with random bias (this will get adjusted as the model learns)
                                    dtype=torch.float), # <- PyTorch loves float32 by default
                                    requires_grad=True) # <- can we update this value with gradient descent?))

    # Forward defines the computation in the model
    def forward(self, x: torch.Tensor) -> torch.Tensor: # <- "x" is the input data (e.g. training/testing features)
        return self.weights * x + self.bias # <- this is the linear regression formula (y = m*x + b)

Making predictions using torch.inference_mode()#

torch.inference_mode() is used when using a model for inference (making predictions).

torch.inference_mode() turns off a bunch of things (like gradient tracking, which is necessary for training but not for inference) to make forward-passes (data going through the forward() method) faster.

# Make predictions with model
with torch.inference_mode(): 
    y_preds = model_0(X_test)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 3
      1 # Make predictions with model
      2 with torch.inference_mode(): 
----> 3     y_preds = model_0(X_test)

NameError: name 'model_0' is not defined

3. Train model#

Creating a loss function and optimizer in PyTorch#

In ML loss function is also called cost function, objective function,… that is needed to minimize.

Function

What does it do?

Where does it live in PyTorch?

Common values

Loss function

Measures how wrong your models predictions (e.g. y_preds) are compared to the truth labels (e.g. y_test). Lower the better.

PyTorch has plenty of built-in loss functions in torch.nn.

Mean absolute error (MAE) for regression problems (torch.nn.L1Loss()). Binary cross entropy for binary classification problems (torch.nn.BCELoss()).

Optimizer

Tells your model how to update its internal parameters to best lower the loss.

You can find various optimization function implementations in torch.optim.

Stochastic gradient descent (torch.optim.SGD()). Adam optimizer (torch.optim.Adam()).

Common optimizers:

  • SGD (stochastic gradient descent) optimizer

  • Adam optimizer

Creating an optimization loop in PyTorch#

We will create a training loop (and testing loop).

The training loop involves the model going through the training data and learning the relationships between the features and labels.

The testing loop involves going through the testing data and evaluating how good the patterns are that the model learned on the training data (the model never see’s the testing data during training).

Each of these is called a “loop” because we want our model to look (loop through) at each sample in each dataset.

PyTorch training loop#

For the training loop, we’ll build the following steps:

Number

Step name

What does it do?

Code example

1

Forward pass

The model goes through all of the training data once, performing its forward() function calculations.

model(x_train)

2

Calculate the loss

The model’s outputs (predictions) are compared to the ground truth and evaluated to see how wrong they are.

loss = loss_fn(y_pred, y_train)

3

Zero gradients

The optimizers gradients are set to zero (they are accumulated by default) so they can be recalculated for the specific training step.

optimizer.zero_grad()

4

Perform backpropagation on the loss

Computes the gradient of the loss with respect for every model parameter to be updated (each parameter with requires_grad=True). This is known as backpropagation, hence “backwards”.

loss.backward()

5

Update the optimizer (gradient descent)

Update the parameters with requires_grad=True with respect to the loss gradients in order to improve them.

optimizer.step()

pytorch training loop annotated

Note

The above is just one example of how the steps could be ordered or described. With experience you’ll find making PyTorch training loops can be quite flexible.

And on the ordering of things, the above is a good default order but you may see slightly different orders. Some rules of thumb:

  • Calculate the loss (loss = ...) before performing backpropagation on it (loss.backward()).

  • Zero gradients (optimizer.zero_grad()) before stepping them (optimizer.step()).

  • Step the optimizer (optimizer.step()) after performing backpropagation on the loss (loss.backward()).

PyTorch testing loop#

As for the testing loop (evaluating our model), the typical steps include:

Number

Step name

What does it do?

Code example

1

Forward pass

The model goes through all of the training data once, performing its forward() function calculations.

model(x_test)

2

Calculate the loss

The model’s outputs (predictions) are compared to the ground truth and evaluated to see how wrong they are.

loss = loss_fn(y_pred, y_test)

3

Calulate evaluation metrics (optional)

Alongisde the loss value you may want to calculate other evaluation metrics such as accuracy on the test set.

Custom functions

Notice the testing loop doesn’t contain performing backpropagation (loss.backward()) or stepping the optimizer (optimizer.step()), this is because no parameters in the model are being changed during testing, they’ve already been calculated. For testing, we’re only interested in the output of the forward pass through the model.

pytorch annotated testing loop

Note

  • Training loop and testing loop are normally performed together

  • In ML, epoch means step, like in MD.

Let’s put all of the above together and train our model for 100 epochs (forward passes through the data) and we’ll evaluate it every 10 epochs.

torch.manual_seed(42)  # seed to make sure reproducing the same random number on different runs or machines 

epochs = 100    # Set the number of epochs (how many times the model will pass over the training data)

# Create empty loss lists to track values
train_loss_values = []
test_loss_values = []
epoch_count = []

for epoch in range(epochs):
    ### Training
    model_0.train()           # Put model in training mode (this is the default state of a model)

    y_pred = model_0(X_train)        # 1. Forward pass on train data using the forward() method inside 
    loss = loss_fn(y_pred, y_train)  # 2. Calculate the loss (how different are our models predictions to the ground truth)
    optimizer.zero_grad()            # 3. Zero grad of the optimizer
    loss.backward()                  # 4. Loss backwards
    optimizer.step()                 # 5. Progress the optimizer

    ### Testing
    model_0.eval()  # Put the model in evaluation mode

    with torch.inference_mode():
        test_pred = model_0(X_test)                              # 1. Forward pass on test data
        test_loss = loss_fn(test_pred, y_test.type(torch.float)) # 2. Calculate loss on test data (note: predictions come in torch.float datatype, so comparisons need to be done with tensors of the same type

        # Print out what's happening
        if epoch % 10 == 0:
            epoch_count.append(epoch)
            train_loss_values.append(loss.detach().numpy())
            test_loss_values.append(test_loss.detach().numpy())
            print(f"Epoch: {epoch} | MAE Train Loss: {loss} | MAE Test Loss: {test_loss} ")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 12
      8 epoch_count = []
     10 for epoch in range(epochs):
     11     ### Training
---> 12     model_0.train()           # Put model in training mode (this is the default state of a model)
     14     y_pred = model_0(X_train)        # 1. Forward pass on train data using the forward() method inside 
     15     loss = loss_fn(y_pred, y_train)  # 2. Calculate the loss (how different are our models predictions to the ground truth)

NameError: name 'model_0' is not defined

4. Make a prediction with trained model (inference)#

We used it during training/testing loop.

There are three things to remember when making predictions (also called performing inference) with a PyTorch model:

  1. Set the model in evaluation mode (model.eval()).

  2. Make the predictions using the inference mode context manager (with torch.inference_mode(): ...).

  3. All predictions should be made with objects on the same device (e.g. data and model on GPU only or data and model on CPU only).

model_0.eval()                 # 1. Set the model in evaluation mode
with torch.inference_mode():   # 2. Setup the inference mode context manager
    model_0.to(device)         # 3. setup device-agnostic, to make all on the same device
    X_test = X_test.to(device)
    y_preds = model_0(X_test)

5. Saving and loading a model#

For saving and loading models in PyTorch, there are three main methods you should be aware of (PyTorch saving and loading models guide):

PyTorch method

What does it do?

torch.save

Saves a serialzed object to disk using Python’s pickle utility. Models, tensors and various other Python objects like dictionaries can be saved using torch.save.

torch.load

Uses pickle’s unpickling features to deserialize and load pickled Python object files (like models, tensors or dictionaries) into memory. You can also set which device to load the object to (CPU, GPU etc).

torch.nn.Module.load_state_dict

Loads a model’s parameter dictionary (model.state_dict()) using a saved state_dict() object.

Note

As stated in Python’s pickle documentation, the pickle module is not secure. That means you should only ever unpickle (load) data you trust. That goes for loading PyTorch models as well. Only ever use saved PyTorch models from sources you trust.

Saving a PyTorch model’s state_dict()#

The recommended way for saving and loading a model for inference (making predictions) is by saving and loading a model’s state_dict().

call torch.save(obj, f) where obj is the target model’s state_dict() and f is the filename of where to save the model.

torch.save(obj=model_0.state_dict(), f=filename)  # Save the model state_dict(), only saves the models learned parameters

Loading a saved PyTorch model’s state_dict()#

Since we’ve now got a saved model state_dict() at models/01_pytorch_workflow_model_0.pth we can now load it in using torch.nn.Module.load_state_dict(torch.load(f)) where f is the filepath of our saved model state_dict().

Why call torch.load() inside torch.nn.Module.load_state_dict()?

Because we only saved the model’s state_dict() which is a dictionary of learned parameters and not the entire model, we first have to load the state_dict() with torch.load() and then pass that state_dict() to a new instance of our model (which is a subclass of nn.Module).

Why not save the entire model?

Saving the entire model rather than just the state_dict() is more intuitive, however, to quote the PyTorch documentation (italics mine):

Tip

The disadvantage of this approach (saving the whole model) is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved…

Because of this, your code can break in various ways when used in other projects or after refactors.

So instead, we’re using the flexible method of saving and loading just the state_dict(), which again is basically a dictionary of model parameters.

Let’s test it out by created another instance of LinearRegressionModel(), which is a subclass of torch.nn.Module and will hence have the in-built method load_state_dit().

# Instantiate a new instance of our model (this will be instantiated with random weights)
loaded_model_0 = LinearRegressionModel()

# Load the state_dict of our saved model (this will update the new instance of our model with trained weights)
loaded_model_0.load_state_dict(torch.load(f=filename))

Improving a model (Hyperparameters tuning)#

When the model gives bad predictions, there are a few ways to try for making it better. See example here

Model improvement technique*

What does it do?

Add more layers

Each layer potentially increases the learning capabilities of the model with each layer being able to learn some kind of new pattern in the data, more layers is often referred to as making your neural network deeper.

Add more hidden units

Similar to the above, more hidden units per layer means a potential increase in learning capabilities of the model, more hidden units is often referred to as making your neural network wider.

Fitting for longer (more epochs)

Your model might learn more if it had more opportunities to look at the data.

Changing the activation functions

Some data just can’t be fit with only straight lines (like what we’ve seen), using non-linear activation functions can help with this (hint, hint).

Change the learning rate

Less model specific, but still related, the learning rate of the optimizer decides how much a model should change its parameters each step, too much and the model overcorrects, too little and it doesn’t learn enough.

Change the loss function

Again, less model specific but still important, different problems require different loss functions. For example, a binary cross entropy loss function won’t work with a multi-class classification problem.

Use transfer learning

Take a pretrained model from a problem domain similar to yours and adjust it to your own problem. We cover transfer learning in notebook 06.

Note

  • Because you can adjust all of these by hand, they’re referred to as hyperparameters.

  • And this is also where machine learning’s half art half science comes in, there’s no real way to know here what the best combination of values is for your project, best to follow the data scientist’s motto of “experiment, experiment, experiment”.