Code reproducibility and Python virtual environments
Creating and managing Python virtual environments
In this article, we will learn to use Python's virtual environment to isolate our code while running it on the
system and also adding reproducibility features.
A virtual environment is a Python tool for dependency management and project isolation. It allows us to run our code isolated
from the system packages. We can create a virtual environment for each project with project specific packages and dependencies,
allowing us to isolate our project from the system level packages and dependencies.
Lets assume that our project directory tree is
code/
--- hello.py
and the content of our hello.py
is
import os
import numpy as np
print(os.listdir())
As we can see I am importing numpy and printing all the directory in the root of this
hello.py
file. Now, let's create a Python virtual environment
of name myVirtualEnv
to run our hello.py
script.
python3 -m venv myVirtualEnv/
After this, the project directory tree will look like
code/
--- hello.py
--- myVirtualEnv/
The next step is to activate the virtual environment and install the numpy package since our
hello.py
needs to import numpy.
# activate the virtual environment
source myVirutalEnv/bin/activate
Now you should be in your virtual environment. Let’s install the numpy.
# install numpy
pip install numpy
After this, we can run our hello.py
script and see the output.
# run the hello.py script
puthon3 hello.py
Next to deactivate the virtual environment just run this command.
deactivate
Reproducibility
When we share our code, we need to make sure that the person running our code has all the correct packages and dependencies setup. For this we can use
the Python requirements file. The requirements file contains the name and version of all the packages needed to run the code. To generate the requirements
file use the following command.
# do this with the virtual environment activated.
pip freeze > requirements.txt
Since we only installed the numpy package for our toy example the contents of the file will be
# the numpy version number will be different
numpy==numpy_version_number
Now let's imagine your friend downloaded your code from Github and her project directory will lookslike this:
code/
--- hello.py
--- requirements.txt
Your friend will now create a virtual environment like we did before. After that she can install the required packages using the
requirements.txt
file like this.
# with the virtual environment activated
pip install -r requirements.txt
After this she can run the
hello.py
script without doing anything else. Isn't that neat
Automating the whole process
Until now we have learned to create the virtual environment and install the required packagesn manually. How about automating this whole process with just one script file?
We can create a script which creates the virtual environment, install the neccessary packages, runs the code, and finally remove or delete the virtual environment. For our example case, the script file
setup.sh
will look like this
#!bin/sh
# create the virtual environment named myvenv
python3 -m venv myvenv/
# activate the virtual environment
source myvenv/bin/activate
# install the necessary packages
pip install -r requirements.txt
# run the python script
python3 hello.py
# deactivate the virtual environment
deactivate
# delete the virtual environment
rm -r myvenv
To run the script do this
bash setup.sh
Here, I have assumed that you have Python 3 installed and your operating system is some variant of Linux. Similar can be acheived on MAC OS or Windows.
Anaconda is a popular package manager for Python and other programming languages. In the next post, we will learn how we can achieve the process described above using Anaconda virtual environments. Cheers