Code reproducibility and Python virtual environments.

2 minute read

Creating and managing Python virtual environments

In this article, we will learn to use Python's virtual environment to isolate our code while running it on the system and also adding reproducibility features. A virtual environment is a Python tool for dependency management and project isolation. It allows us to run our code isolated from the system packages. We can create a virtual environment for each project with project specific packages and dependencies, allowing us to isolate our project from the system level packages and dependencies. Lets assume that our project directory tree is
code/ 
--- hello.py

and the content of our hello.py is

import os
import numpy as np

print(os.listdir())
As we can see I am importing numpy and printing all the directory in the root of this hello.py file. Now, let's create a Python virtual environment of name myVirtualEnv to run our hello.py script.
python3 -m venv myVirtualEnv/

After this, the project directory tree will look like

code/ 
--- hello.py
--- myVirtualEnv/
The next step is to activate the virtual environment and install the numpy package since our hello.py needs to import numpy.
# activate the virtual environment
source myVirutalEnv/bin/activate

Now you should be in your virtual environment. Let’s install the numpy.

# install numpy 
pip install numpy

After this, we can run our hello.py script and see the output.

# run the hello.py script
puthon3 hello.py

Next to deactivate the virtual environment just run this command.

deactivate

Reproducibility

When we share our code, we need to make sure that the person running our code has all the correct packages and dependencies setup. For this we can use the Python requirements file. The requirements file contains the name and version of all the packages needed to run the code. To generate the requirements file use the following command.
# do this with the virtual environment activated.
pip freeze > requirements.txt
Since we only installed the numpy package for our toy example the contents of the file will be
# the numpy version number will be different
numpy==numpy_version_number
Now let's imagine your friend downloaded your code from Github and her project directory will lookslike this:
code/ 
--- hello.py
--- requirements.txt
Your friend will now create a virtual environment like we did before. After that she can install the required packages using the requirements.txt file like this.
# with the virtual environment activated
pip install -r requirements.txt
After this she can run the hello.py script without doing anything else. Isn't that neat :grin:

Automating the whole process

Until now we have learned to create the virtual environment and install the required packagesn manually. How about automating this whole process with just one script file? We can create a script which creates the virtual environment, install the neccessary packages, runs the code, and finally remove or delete the virtual environment. For our example case, the script file setup.sh will look like this
#!bin/sh

# create the virtual environment named myvenv
python3 -m venv myvenv/

# activate the virtual environment
source myvenv/bin/activate

# install the necessary packages
pip install -r requirements.txt

# run the python script
python3 hello.py

# deactivate the virtual environment
deactivate

# delete the virtual environment
rm -r myvenv

To run the script do this

bash setup.sh

Here, I have assumed that you have Python 3 installed and your operating system is some variant of Linux. Similar can be acheived on MAC OS or Windows.

Anaconda is a popular package manager for Python and other programming languages. In the next post, we will learn how we can achieve the process described above using Anaconda virtual environments. Cheers :grin: