2

Modern Python Development Environments

A deep understanding of the programming language of choice is the most important part of being a programming expert. Still, it is really hard to develop good software efficiently without knowing the best tools and practices that are common within the given language community. Python has no single feature that cannot be found in some other language. So, when comparing the syntax, expressiveness, or performance, there will always be a solution that is better in one or more fields. But the area in which Python really stands out from the crowd is the whole ecosystem built around the language. The Python community has spent many years polishing standard practices and libraries that help to create high-quality software in a shorter time.

Writing new software is always an expensive and time-consuming process. However, being able to reuse existing code instead of reinventing the wheel greatly reduces development times and costs. For some companies, it is the only reason why their projects are economically feasible. That's why the most important part of the ecosystem is a huge collection of reusable packages that solve a multitude of problems. A tremendous number of these packages are available as open-source through the Python Package Index (PyPI).

Because of the importance of Python's open-source community, Python developers put a lot of effort into creating tools and standards to work with Python packages that have been created by others—starting from virtual isolated environments, improved interactive shells, and debuggers, to utilities that help you to discover, search, and analyze the huge collection of packages that are available on PyPI.

In this chapter, we will cover the following topics:

  • Overview of the Python packaging ecosystem
  • Isolating the runtime environment
  • Using Python's venv
  • System-level environment isolation
  • Popular productivity tools

Before we get into some specific elements of the Python ecosystem, let's begin by considering the technical requirements.

Technical requirements

You can install the free system virtualization tools that are mentioned in this chapter from the following sites:

The following are the Python packages that are mentioned in this chapter that you can download from PyPI:

  • poetry
  • flask
  • wait-for-it
  • watchdog
  • ipython
  • ipdb

Information on how to install packages is included in the Installing Python packages using pip section.

The code files for this chapter can be found at https://github.com/PacktPublishing/Expert-Python-Programming-Fourth-Edition/tree/main/Chapter%202.

Python's packaging ecosystem

The core of Python's packaging ecosystem is the Python Packaging Index. PyPI is a vast public repository of (mostly) free-to-use Python projects that at the time of writing hosts almost three and a half million distributions of more than 250,000 packages. That's not the biggest number among all package repositories (npm surpassed a million packages in 2019) but it still places Python among the leaders of packaging ecosystems.

Such a large ecosystem of packages doesn't come without a price. Modern applications are often built using multiple packages from PyPI that often have their own dependencies. Those dependencies can also have their own dependencies. In large applications, such dependency chains can go on and on. Add the fact that some packages may require specific versions of other packages and you may quickly run into dependency hell—a situation where it is almost impossible to resolve conflicting version requirements manually.

That's why it is crucial to know the tools that can help you work with packages available on PyPI.

Installing Python packages using pip

Nowadays, a lot of operating systems come with Python as a standard component. Most Linux distributions and UNIX-like systems (such as FreeBSD, NetBSD, OpenBSD, and macOS) come with Python either installed by default or available through system package repositories. Many of them even use it as part of some core components—Python powers the installers of Ubuntu (Ubiquity), Red Hat Linux (Anaconda), and Fedora (Anaconda again). Unfortunately, the Python version preinstalled with operating systems is often older than the latest Python release.

Due to Python's popularity as an operating system component, a lot of packages from PyPI are also available as native packages managed by the system's package management tools, such as apt-get (Debian, Ubuntu), rpm (Red Hat Linux), or emerge (Gentoo). It should be remembered, however, that the list of available libraries is often very limited, and they are mostly outdated compared to PyPI. Sometimes they may be evenly distributed with platform-specific patches to make sure that they will properly support other system components.

Due to these facts, when building your own applications, you should always rely on package distributions available on PyPI. The Python Packaging Authority (PyPA)—a group of maintainers of standard Python packaging tools—recommends pip for installing packages. This command-line tool allows you to install packages directly from PyPI. Although it is an independent project, starting from versions 2.7.9 and 3.4 of CPython, every Python release comes with an ensurepip module. This simple utility module ensures pip installation in your environment, regardless of whether release maintainers decided to bundle pip. The pip installation can be bootstrapped using the ensurepip module as in the following example:

$ python3 -m ensurepip
Looking in links: /var/folders/t6/n6lw_s3j4nsd8qhsl1jhgd4w0000gn/T/tmpouvorgu0
Requirement already satisfied: setuptools in ./.venv/lib/python3.9/site-packages (49.2.1)
Processing /private/var/folders/t6/n6lw_s3j4nsd8qhsl1jhgd4w0000gn/T/tmpouvorgu0/pip-20.2.3-py2.py3-none-any.whl
Installing collected packages: pip
Successfully installed pip-20.2.3 

When you have pip available, installing a new package is as simple as this:

$ pip install <package-name>

So, if you want to install a package named django, you simply run:

$ pip install django

Among other features, pip allows specific versions of packages to be installed (using pip install <package-name>==<version>) or upgraded to the latest version available (using pip install -–upgrade <package-name>).

pip is not just a package installer. Besides the install command, it offers additional commands that allow you to inspect packages, search through PyPI, or build your own package distributions. The list of all available commands can be obtained by pip --help as in the following command:

$ pip --help

And it will produce the following output:

Usage:
  pip <command> [options]
Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  config                      Manage local and global configuration.
  search                      Search PyPI for packages.
  cache                       Inspect and manage pip's wheel cache.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  debug                       Show information useful for debugging.
  help                        Show help for commands.
(...)

The most up-to-date information on how to install pip for older Python versions is available on the project's documentation page at https://pip.pypa.io/en/stable/installing/.

Isolating the runtime environment

When you use pip to install a new package from PyPI, it will be installed into one of the available site-packages directories. The exact location of site-packages directories is specific to the operating system. You can inspect paths where Python will be searching for modules and packages by using the site module as a command as follows:

$ python3 -m site

The following is an example output of running python3 -m site on macOS:

sys.path = [
    '/Users/swistakm',
    '/Library/Frameworks/Python.framework/Versions/3.9/lib/python39.zip',
    '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9',
    '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload',
    '/Users/swistakm/Library/Python/3.9/lib/python/site-packages',
    '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages',
]
USER_BASE: '/Users/swistakm/Library/Python/3.9' (exists)
USER_SITE: '/Users/swistakm/Library/Python/3.9/lib/python/site-packages' (exists)
ENABLE_USER_SITE: True

The sys.path variable in the preceding output is a list of module search locations. These are locations that Python will attempt to load modules from. The first entry is always the current working directory (in this case, /users/swistakm) and the last is the global site-packages directory, often referred to as the dist-packages directory.

The USER_SITE in the preceding output describes the location of the user site-packages directory, which is always specific to the user that is currently invoking the Python interpreter. Packages installed in a local site-packages directory will take precedence over packages installed in the global site-packages directory.

An alternative way to obtain the site-packages is by invoking sys.getsitepackages(). The following is an example of using that function in an interactive shell:

>>> import site
>>> site.getsitepackages()
['/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages']

You can also obtain user site-packages directories by invoking the sys.getusersitepackages() function like so:

>>> import site
>>> site.getusersitepackages()
/Users/swistakm/Library/Python/3.9/lib/python/site-packages

When running pip install, packages will be installed in either the user or the global site-packages directory depending on several conditions evaluated in the following order:

  1. user site-packages if the --user switch is specified
  2. global site-packages if the global site-packages directory is writable to the user invoking pip
  3. user site-packages otherwise

The preceding conditions simply mean that without the --user switch, pip will always attempt to install packages to a global site-packages directory and only fall back to user site-packages if that is not possible. On most operating systems where Python is available by default (many Linux distributions, macOS), the global site-packages directory of the system's Python distribution is protected from writes from non-privileged users. This means that in order to install a package in the global site-packages directory using a system's Python distributions, you will have to use a command that grants you superuser privileges, like sudo. On UNIX-like and Linux systems, such superuser invocation of pip will be as follows:

$ sudo -H pip install <package-name>

Superuser privileges for installing system-wide Python packages are not required on Windows since it does not provide the Python interpreter by default. Also, for some other operating systems (like macOS) if you install Python from the installer available on the python.org website, it will be installed in such a way that the global site-packages directory will be writable to normal users.

Although installing packages directly from PyPI into the global site-packages directory is possible and in certain environments will be happening by default, it is usually not recommended and should be avoided. Bear in mind that pip will only install a single version of a package in the site-packages directory. If an older version is already available, the new installation will overwrite it. This may be problematic, especially if you are planning to build different applications with Python. Recommending not to install anything in the global site-packages directory may sound confusing because this is the semi-default behavior of pip, but there are some serious reasons for that.

As we mentioned earlier, Python is often an important part of many packages that are available through operating system package repositories and may power a lot of important services. System distribution maintainers put in a lot of effort to select the correct versions of packages to match various package dependencies.

Very often, Python packages that are available from a system's package repositories (like apt, yum, or rpm) contain custom patches or are purposely kept outdated to ensure compatibility with some other system components. Forcing an update of such a package, using pip, to a version that breaks some backward compatibility might cause bugs in some crucial system service.

Last but not least, if you're working on multiple projects in parallel, you'll notice that maintaining a single list of package versions that works for all of your projects is practically impossible. Packages evolve fast and not every change is backward compatible. You will eventually run into a situation where one of your new projects desperately needs the latest version of some library, but some other project cannot use it because there is some backward-incompatible change. If you install a package into global site-packages you will be able to use only one version of that package.

Fortunately, there is an easy solution to this problem: environment isolation. There are various tools that allow the isolation of the Python runtime environment at different levels of system abstraction. The main idea is to isolate project dependencies from packages that are required by different projects and/or system services. The benefits of this approach are as follows:

  • It solves the project X depends on package 1.x but project Y needs package 4.x dilemma. The developer can work on multiple projects with different dependencies that may even collide without the risk of affecting each other.
  • The project is no longer constrained by versions of packages that are provided in the developer's system distribution repositories (like apt, yum, rpm, and so on).
  • There is no risk of breaking other system services that depend on certain package versions, because new package versions are only available inside such an environment.
  • A list of packages that are project dependencies can be easily locked. Locking usually captures exact versions of all packages within all dependency chains so it is very easy to reproduce such an environment on another computer.

If you're working on multiple projects in parallel, you'll quickly find that it is impossible to maintain their dependencies without some kind of isolation.

Let's discuss the difference between application-level isolation and system-level isolation in the next section.

Application-level isolation versus system-level isolation

The easiest and most lightweight approach to isolation is to use application-level isolation through virtual environments. Python has a built-in venv module that greatly simplifies the usage and creation of such virtual environments.

Virtual environments focus on isolating the Python interpreter and the packages available within it. Such environments are very easy to set up but aren't portable, mostly because they rely on absolute system paths. This means that they cannot be easily copied between computers and operating systems without breaking things. They cannot even be moved between directories on the same filesystem. Still, they are robust enough to ensure proper isolation during the development of small projects and packages. Thanks to built-in support within Python distributions, they can also be easily replicated by your peers.

Virtual environments are usually sufficient for writing focused libraries that are independent of the operating system or projects of low complexity that don't have too many external dependencies. Also, if you write software that is to be run only on your own computer, virtual environments should be enough to provide sufficient isolation and reproducibility.

Unfortunately, in some cases, this may not be enough to ensure enough consistency and reproducibility. Despite the fact that software written in Python is usually considered very portable, not every package will behave the same on every operating system. This is especially true for packages that rely on third-party shared libraries (DLL on Windows, .so on Linux, .dylib on macOS) or make heavy use of compiled Python extensions written in either C or C++, but can also happen for pure Python libraries that use APIs that are specific to a given operating system.

In such cases, system-level isolation is a good addition to the workflow. This kind of approach usually tries to replicate and isolate complete operating systems with all of their libraries and crucial system components, either with classical operating system virtualization tools (for example, VMware, Parallels, and VirtualBox) or container systems (for example, Docker and Rocket). Some of the available solutions that provide this kind of isolation are detailed later in the System-level environment isolation section.

System-level isolation should be your preferred option for the development environment if you're writing software on a different computer than the one you'll be executing it on. If you are running your software on remote servers, you should definitely consider system-level isolation from the very beginning as it may save you from portability issues in the future. And you should do that regardless of whether your application relies on compiled code (shared libraries, compiled extensions) or not. Using system-level isolation is also worth considering if your application makes heavy use of external services like databases, caches, search engines, and so on. That's because many system-level isolation solutions allow you to easily isolate those dependencies too.

Since both approaches to environment isolation have their place in modern Python development, we will discuss them both in detail. Let's start with the simpler one—virtual environments using Python's venv module.

Application-level environment isolation

Python has built-in support for creating virtual environments. It comes in the form of a venv module that can be invoked directly from your system shell. To create a new virtual environment, simply use the following command:

$ python3.9 -m venv <env-name>

Here, env-name should be replaced with the desired name for the new environment (it can also be an absolute path). Note how we used the python3.9 command instead of plain python3. That's because depending on the environment, python3 may be linked to different interpreter versions and it is always better to be very explicit about the Python version when creating new virtual environments. The python3.9 -m venv commands will create a new env-name directory in the current working directory path. Inside, it will contain a few sub-directories:

  • bin/: This is where the new Python executable and scripts/executables provided by other packages are stored.

    Note for Windows users

    The venv module under Windows uses a different naming convention for its internal structure of directories. You need to use Scripts/, Libs/, and Include/, instead of bin/, lib/, and include/, to match the development conventions commonly used on that operating system. The commands that are used for activating/deactivating the environment are also different; you need to use ENV-NAME/Scripts/activate.bat and ENV-NAME/Scripts/deactivate.bat instead of using source on activate and deactivate scripts.

  • lib/ and include/: These directories contain the supporting library files for the new Python interpreter inside the virtual environment. New packages will be installed in ENV-NAME/lib/pythonX.Y/site-packages/.

    Many developers keep their virtual environments together with the source code and pick a generic path name like .venv or venv. Many Python Integrated Development Environments (IDEs) are able to recognize that convention and automatically load the libraries for syntax completion. Generic names also allow you to automatically exclude virtual environment directories from code versioning, which is generally a good idea. Git users can, for instance, add this path name to their global .gitgnore file, which lists path patterns that should be ignored when versioning the source code.

Once the new environment has been created, it needs to be activated in the current shell session. If you're using Bash as a shell, you can activate the virtual environment using the source command:

$ source env-name/bin/activate

There's also a shorter version that should work under any POSIX-compatible system regardless of the shell:

$ . env-name/bin/activate

This changes the state of the current shell session by affecting its environment variables. In order to make the user aware that they have activated the virtual environment, it will change the shell prompt by appending the (ENV-NAME) string at its beginning. To illustrate this, here is an example session that creates a new environment and activates it:

$ python3 -m venv example
$ source example/bin/activate
(example) $ which python
/home/swistakm/example/bin/python 
(example) $ deactivate
$ which python
/usr/local/bin/python

The important thing to note about venv is that it does not provide any additional abilities to track what packages should be installed in it. Virtual environments are also not portable and should not be moved to another system/machine or even a different filesystem path. This means that a new virtual environment needs to be created every time you want to install your application on a new host.

Because of this, there is a best practice that's used by pip users to store the definition of all project dependencies in a single place. The easiest way to do this is by creating a requirements.txt file (this is the naming convention), with contents as shown in the following code:

# lines followed by hash (#) are treated as a comment.
 
# pinned version specifiers are best for reproducibility 
eventlet==0.17.4 
graceful==0.1.1 
 
# for projects that are well tested with different 
# dependency versions the version ranges are acceptable
falcon>=0.3.0,<0.5.0 
 
# packages without versions should be avoided unless 
# latest release is always required/desired 
pytz

With such a file, all dependencies can be easily installed in a single step. The pip install command understands the format of such requirements files. You can specify the path to a requirements file using the -r flag as in the following example:

$ pip install -r requirements.txt

Remember that requirements files specify only packages to be installed and not packages that are currently in your environment. If you install something manually in your environment, it won't be reflected in your requirements file automatically. So, great care needs to be taken to keep your requirements file up to date, especially for large and complex projects.

There is the pip freeze command, which prints all packages in the current environment together with their versions, but it should be used carefully. This list will also include dependencies of your dependencies, so for large projects, it will quickly become very large. You will have to carefully inspect whether the list contains anything installed accidentally or by mistake.

For projects that require better reproductivity of virtual environments and strict control of installed dependencies, you may need a more sophisticated tool. We will discuss such a tool—Poetry—in the following section.

Poetry as a dependency management system

Poetry is quite a novel approach to dependency and virtual environment management in Python. It is an open-source project that aims to provide a more predictable and convenient environment for working with the Python packaging ecosystem.

As Poetry is a package on PyPI, you can install it using pip:

$ pip install --user poetry

Be aware that Poetry takes care of creating Python virtual environments so it should not be installed inside of a virtual environment itself. You can install it in either user site-packages or global site-packages although user site-packages is the recommended option (see the Isolating the runtime environment section).

As already highlighted in the Installing Python packages using pip section, the above command will install the poetry package in your site-packages directory. Depending on your system configuration it will be either the global site-packages directory or the user site-packages directory. To avoid this ambiguity, the Poetry project creators recommend using an alternative bootstrapping method.

On macOS, Linux, and other POSIX-compatible systems Poetry can be installed using the curl utility:

$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python

On Windows it can be installed using PowerShell:

> (Invoke-WebRequest -Uri https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py -UseBasicParsing).Content | python -

Once installed, Poetry can be used to:

  • Create new Python projects together with virtual environments
  • Initialize existing projects with a virtual environment
  • Manage project dependencies
  • Package libraries

To create a completely new project with Poetry, you can use the poetry new command as in the following example:

$ poetry new my-project

The above command will create a new my-project directory with some initial files in it. The structure of that directory will be roughly as follows:

my-project/
├── README.rst
├── my_project
│   └── __init__.py
├── pyproject.toml
└── tests
    ├── __init__.py
    └── test_my_project.py

As you can see, it creates some files that can be used as stubs for further development. If you have a preexisting project, you can initialize Poetry within it using the poetry init command inside of your project directory. The difference is that it won't create any new project files except the pyproject.toml configuration file.

The core of Poetry is the pyproject.toml file, which stores the project configuration. For the my-project example it may have the following content:

[tool.poetry]
name = "my-project"
version = "0.1.0"
description = ""
authors = ["Michał Jaworski <swistakm@gmail.com>"]
[tool.poetry.dependencies]
python = "^3.9"
[tool.poetry.dev-dependencies]
pytest = "^5.2"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

As you can see, the pyproject.toml file is divided into four sections. Those are:

  • [tool.poetry]: This is a set of basic project metadata like name, version description, and author. This information is necessary if you would like to publish your project as a package on PyPI.
  • [tool.poetry.dependencies]: This is a list of project dependencies. On fresh projects, it lists only the Python version but can also include all package versions that normally would be described in the requirements.txt file.
  • [tool.poetry.dev-dependencies]: This is a list of dependencies that require local development, like testing frameworks or productivity tools. It is common practice to have a separate list of such dependencies as they are usually not required in production environments.
  • [build-system]: Describes Poetry as a build system used to manage the project.

The pyproject.toml file is part of the official Python standard described in the PEP 518 document. You can read more information about its structure at https://www.python.org/dev/peps/pep-0518/.

If you create a new project or initialize an existing one using Poetry, it will be able to create a new virtual environment in a shared location whenever you need it. You can activate it using Poetry instead of "sourcing" the activate scripts. That's more convenient than using the plain venv module because you don't need to remember where the actual virtual environment is stored. The only thing you need to do is to move your shell to any place in your project source tree and use the poetry shell command as in the following example:

$ cd my-project
$ poetry shell

From that moment on, the current shell will have Poetry's virtual environment activated. You can verify it with either the which python or python -m site command.

Another thing that Poetry changes is how you manage dependencies. As we already mentioned, requirements.txt files are a very basic way of managing dependencies. They describe what packages to install but do not automatically track what has been installed in the environment through the development. If you install something with pip but forget to reflect that change in the requirements.txt file, other programmers may have a problem recreating your environment.

With Poetry, that problem is gone. There's only one way of adding dependencies to your project and it is with the poetry add <package-name> command. It will:

  • Resolve whole dependency trees if other packages share dependencies
  • Install all packages from the dependency tree in the virtual environment associated with your project
  • Reflect the change in the pyproject.toml file

The following transcript presents the process of installation of the Flask framework within the my-project environment:

$ poetry add flask

This will produce an output like the following:

Using version ^1.1.2 for Flask
Updating dependencies
Resolving dependencies... (38.9s)
Writing lock file
Package operations: 15 installs, 0 updates, 0 removals
  • Installing markupsafe (1.1.1)
  • Installing pyparsing (2.4.7)
  • Installing six (1.15.0)
  • Installing attrs (20.3.0)
  • Installing click (7.1.2)
  • Installing itsdangerous (1.1.0)
  • Installing jinja2 (2.11.2)
  • Installing more-itertools (8.6.0)
  • Installing packaging (20.4)
  • Installing pluggy (0.13.1)
  • Installing py (1.9.0)
  • Installing wcwidth (0.2.5)
  • Installing werkzeug (1.0.1)
  • Installing flask (1.1.2)
  • Installing pytest (5.4.3)

And the following is the resulting pyproject.toml file with highlighted changes to the project dependencies:

[tool.poetry]
name = "my-project"
version = "0.1.0"
description = ""
authors = ["Michał Jaworski <swistakm@gmail.com>"]
[tool.poetry.dependencies]
python = "^3.9"
Flask = "^1.1.2"
[tool.poetry.dev-dependencies]
pytest = "^5.2"
[build-system]
requires = ["poetry-core>=1.0.0"]

The preceding transcript shows that Poetry has installed 15 packages when we asked for only one dependency. That's because Flask has its own dependencies and those dependencies have their own dependencies. Such dependencies of dependencies are called transitive dependencies. Libraries often have lax version specifiers like six >=1.0.0 to denote that they are able to accept a wide range of versions. Poetry implements a dependency resolution algorithm to find out which set of versions can satisfy all dependency transitive dependency constraints.

The problem with transitive dependencies is their ability to change over time. Remember that libraries can have lax version specifiers for their dependencies. It is thus possible that two environments created on different dates will have different final versions of packages installed. The inability to reproduce exact versions of all transitive dependencies can be a big problem for large projects and manually tracking them in requirements.txt files is usually a big challenge.

Poetry solves the problem of transitive dependencies by using so-called dependency lock files. Whenever you are sure that your environment has a working and tested set of package versions, you can issue the following command:

$ poetry lock

This will create a really verbose poetry.lock file that is a complete snapshot of the dependency resolution process. That file will be then used to determine versions of transitive dependencies instead of the ordinary dependency process.

Whenever new packages are added with the poetry add command, Poetry will evaluate the dependency tree and update the poetry.lock file. The lock file approach is so far the best and most reliable way of handling transitive dependencies in your project.

You can find more information about advanced usage of Poetry in the official documentation under https://python-poetry.org.

System-level environment isolation

The key enabler to the rapid iteration of software implementation is the reuse of existing software components. Don't repeat yourself—this is a common mantra of many programmers. Using other packages and modules to include them in the codebase is only a part of that mindset. What can also be considered as reused components are binary libraries, databases, system services, third-party APIs, and so on. Even whole operating systems should be considered as a component that is being reused.

The backend services of web-based applications are a great example of how complex such applications can be. The simplest software stack usually consists of a few layers. Consider some imaginary application that allows you to store some information of its users and exposes it to the internet over the HTTP protocol. It could have at least the three following layers (starting from the lowest):

  • A database or other kind of storage engine
  • The application code implemented in Python
  • An HTTP server working in reverse proxy mode, such as Apache or NGINX

Although very simple applications can be single-layered, it rarely happens for complex applications or applications that are designed to handle very large traffic. In fact, big applications are sometimes so complex that they cannot be represented as a stack of layers but rather as a patchwork or mesh of interconnected services. Both small and big applications can use many different databases, be divided into multiple independent processes, and use many other system services for caching, queuing, logging, service discovery, and so on. Sadly, there are no limits to this complexity.

What is really important is that not all software stack elements can be isolated on the level of Python runtime environments. No matter whether it is an HTTP server, such as NGINX, or an RDBMS, such as PostgreSQL, or a shared library, those elements are usually not part of the Python distribution or Python package ecosystem and can't be encapsulated within Python's virtual environments. That's why they are considered external dependencies of your software.

What is very important is that external dependencies are usually available in different versions and flavors on different operating systems. For instance, if two developers are using completely different Linux distributions, let's say Debian and Gentoo, it is really unlikely that at any given time they will have access to the same version of software like NGINX through their system's package repositories. Moreover, they can be compiled using different compile-time flags (for instance, enabling specific settings), or be provided with custom extensions or distribution-specific patches.

So, making sure that everyone in a development team uses the same versions of every component is very hard without the proper tools. It is theoretically possible that all developers in a team working on a single project will be able to get the same versions of services on their development boxes. But all this effort is futile if they do not use the same operating system as they do in their production environment. Forcing a programmer to work on something else rather than their beloved system of choice is also not always possible.

The production environment, or production for short, is the actual environment where your application is installed and running to serve its very purpose. For instance, the production environment for a desktop application would be the actual desktop computer on which your users install their applications. The production environment of a backend server for a web application available through the internet is usually a remote server (sometimes virtual) operating in some sort of datacenter.

The problem lies in the fact that portability is still a big challenge. Not all services will work exactly the same in the production environments as they do on the developer's machines. And this is unlikely to change. Even Python can behave differently on different systems, despite how much work is put into making it cross-platform. Usually, for Python, this is well-documented and happens only in places that interact directly with the operating system. Still, relying on the programmer's ability to remember a long list of compatibility quirks is quite an error-prone strategy.

A popular solution to this problem is isolating whole systems as the application environment. This is usually achieved by leveraging different types of system virtualization tools. Virtualization, of course, may have an impact on performance; but with modern CPUs that have hardware support for virtualization, the performance loss is greatly reduced. On the other hand, the list of possible gains is very long:

  • The development environment can exactly match the system version, services, and shared libraries used in production, which helps to solve compatibility issues.
  • Definitions for system configuration tools, such as Puppet, Chef, or Ansible (if used), can be reused to configure both the production and development environments.
  • The newly hired team members can easily hop into the project if the creation of such environments is automated.
  • The developers can work directly with low-level system features that may not be available on operating systems they use for work. For example, Filesystem in Userspace (FUSE) is a feature of Linux operating systems that you could not work with on Windows without virtualization.

In the next section, we'll take a look at two different approaches to achieving the system-level isolation of development environments.

Containerization versus virtualization

There are two main ways that system-level isolation techniques can be used for development purposes:

  • Machine virtualization, which emulates the whole computer system
  • Operating system-level virtualization, known also as containerization, which isolates complete user spaces within a single operating system

Machine virtualization techniques concentrate on emulating whole computer systems within other computer systems. Think of it as providing virtual hardware that can be run as a piece of software on your own computer. As this is full hardware emulation, it gives you the possibility to run any operating system within your host environments. This is the technology that drives the infrastructure of Virtual Private Server (VPS) and cloud computing providers, as it allows you to run multiple independent and isolated operating systems within a single host computer.

This is also a convenient method of running many operating systems for development purposes, as starting a new operating system does not require rebooting your computer. You can also easily dispose of virtual machines when not needed. That's something that cannot be done easily with typical multi-boot system installation.

Operating system-level virtualization, on the other hand, does not rely on emulating the hardware. It encapsulates a user-space environment (shared libraries, resource constraints, filesystem volumes, code, and so on) in the form of containers that cannot operate outside the strictly defined container environment. All containers are running on the same operating system kernel but cannot interfere with each other unless you explicitly allow them to.

Operating system-level virtualization does not require emulation of the hardware. Still, it can set specific constraints on the use of system resources like storage space, CPU time, RAM, or network. These constraints are managed only by the system kernel, so the performance overhead is usually smaller than in machine virtualization. That's why operating system-level virtualization is often called lightweight virtualization.

Usually, a container contains only application code and its system-level dependencies, mostly shared libraries or runtime binaries like the Python interpreter, but can be as large as you want. Images for Linux containers are often based on whole system distributions like Debian, Ubuntu, or Fedora. From the perspective of processes running inside a container, it looks like a completely isolated system environment.

When it comes to system-level isolation for development purposes, both methods provide a similarly sufficient level of isolation and reproducibility. Nevertheless, due to its more lightweight nature, operating system-level virtualization seems to be more favored by developers as it allows cheaper, faster, and more streamlined usage of such environments together with convenient packaging and portability. This is especially useful for programmers that work on multiple projects in parallel or need to share their environments with other programmers.

There are two leading tools for providing system-level isolation of development environments:

  • Docker for operating system-level virtualization
  • Vagrant for machine virtualization

Docker and Vagrant seem to overlap in features. The main difference between them is the reason why they were built. Vagrant was built primarily as a tool for development. It allows you to bootstrap the whole virtual machine with a single command but is rarely used to simply pack such an environment as a complete artifact that could be easily delivered to a production environment and executed as is. Docker, on the other hand, is built exactly for that purpose—preparing complete containers that can be sent and deployed to production as a complete package. If implemented well, this can greatly improve the process of product deployment.

Due to some implementation nuances, the environments that are based on containers may sometimes behave differently than environments based on virtual machines. They also do not package the operating system kernel, so for code that is highly operating system-specific, they may not always behave the same on every host. Also, if you decide to use containers for development, but don't decide to use them on target production environments, you'll lose some of the consistency guarantees that were the main reason for environment isolation.

But, if you already use containers in your target production environments, then you should always replicate production conditions in the development stage using the same technique. Fortunately, Docker, which is currently the most popular container solution, provides an amazing docker-compose tool that makes the management of local containerized environments extremely easy.

Containers are a great alternative to full machine virtualization. It is a lightweight method of virtualization, where the kernel and operating system allow multiple isolated user-space instances to be run. If your operating system supports containers natively, this method of virtualization will require less overhead than full machine virtualization.

Virtual environments using Docker

Software containers got their popularity mostly thanks to Docker, which is one of the available implementations for the Linux operating system.

Docker allows you to describe an image of the container in the form of a simple text document called a Dockerfile. Images from such definitions can be built and stored in image repositories. Image repositories allow multiple programmers to reuse existing images without the need to build them all by themselves. Docker also supports incremental changes, so if new things are added to the container then it does not need to be recreated from scratch.

Docker is an operating system virtualization method for Linux operating systems, so it isn't natively supported by kernels of Windows and macOS. Still, this doesn't mean that you can't use Docker on Windows or macOS. On those operating systems, Docker becomes kind of a hybrid between machine virtualization and operating system-level virtualization. Docker installation on those two systems will create an intermediary virtual machine with the Linux operating system that will act as a host for your containers. The Docker daemon and command-line utilities will take care of proxying any traffic and images between your own operating system and containers running on that virtual machine seamlessly.

You can find Docker installation instructions on https://www.docker.com/get-started.

The existence of an intermediary virtual machine means that Docker on Windows or macOS isn't as lightweight as it is on Linux. Still, the performance overhead shouldn't be noticeably higher than the performance overhead of other development environments based strictly on machine virtualization.

Writing your first Dockerfile

Every Docker-based environment starts with a Dockerfile. A Dockerfile is a description of how to create a Docker image. You can think about the Docker images in a similar way to how you would think about images of virtual machines. It is a single file (composed of many layers) that encapsulates all system libraries, files, source code, and other dependencies that are required to execute your application.

Every layer of a Docker image is described in the Dockerfile by a single instruction in the following format:

INSTRUCTION arguments

Docker supports plenty of instructions, but the most basic ones that you need to know in order to get started are as follows:

  • FROM <image-name>: This describes the base image that your image will be based on. They are often based on common Linux system distributions and usually come with additional libraries and software installed. The default Docker images repository is called Docker Hub. It can be accessed for free and browsed at https://hub.docker.com/.
  • COPY <src>... <dst>: This copies files from the local build context (usually project files) and adds them to the container's filesystem.
  • ADD <src>... <dst>: This works similarly to COPY but automatically unpacks archives and allows <src> to be URLs.
  • RUN <command>: This runs a specified command on top of previous layers. After execution, it commits changes that this command made to the filesystem as a new image layer.
  • ENTRYPOINT ["<executable>", "<param>", ...]: This configures the default command to be run as your container starts. If no entry point is specified anywhere in the image layers, then Docker defaults to /bin/sh -c, which is the default shell of a given image (usually Bash but can also be another shell).
  • CMD ["<param>", ...]: This specifies the default parameters for image entry points. Knowing that the default entry point for Docker is /bin/sh -c, this instruction can also take the form of CMD ["<executable>", "<param>", ...]. It is recommended to define the target executable directly in the ENTRYPOINT instruction and use CMD only for default arguments.
  • WORKDIR <dir>: This sets the current working directory for any of the following RUNCMDENTRYPOINTCOPY, and ADD instructions.

To properly illustrate the typical structure of a Dockerfile, we will try to dockerize a simple Python application. Let's imagine we want to create an HTTP echo web server that replies back with details of the HTTP request it received. We will use Flask, which is a very popular Python web microframework.

Flask isn't a part of the Python standard library. You can install it in your environment using pip as follows:

$ pip install flask

You can find more information about the Flask framework at https://flask.palletsprojects.com/.

The code of our application, which would be saved in a Python script, echo.py, could be as follows:

from flask import Flask, request
app = Flask(__name__)
@app.route('/')
def echo():
    return (
        f"METHOD: {request.method}\n"
        f"HEADERS:\n{request.headers}"
        f"BODY:\n{request.data.decode()}"
    )
if __name__ == '__main__':
    app.run(host="0.0.0.0")

Our script starts with the import of the Flask class and the request object. The instance of the Flask class represents our web application. The request object is a special global object that always represents the context of the currently processed HTTP request.

echo() is a so-called view function, which is responsible for handling incoming requests. @app.route('/') registers the echo() view function under the / path. This means that only requests that match the / path will be dispatched to this view function. Inside of our view, we read incoming request details (method, headers, and body) and return them in text form. Flask will include that text output in the request response body.

Our script ends with the call to the app.run() method. It starts the local development server of our application. This development server is not intended for production environment use but is good enough for development purposes and greatly simplifies our example.

If you have the Flask package installed, you can run your application using the following command:

$ python3 echo.py

The above command will start the Flask development server on port 5000. You can either visit the http://localhost:5000 address in your browser or use the command-line utility.

The following is an example of invoking a GET request using curl:

$ curl localhost:5000
METHOD: GET
HEADERS:
Host: localhost:5000
User-Agent: curl/7.64.1
Accept: */*
BODY:

As we confirmed that our application returned the HTTP details of the request it received, we're almost ready to dockerize it. The structure of our project files could be as follows:

.
├── Dockerfile
├── echo.py
└── requirements.txt

The requirements.txt file will contain only one entry, flask==1.1.2, to make sure our image will always use the same version of Flask. Before we jump to the Dockerfile, let's decide how we want our image to work. What we want to achieve is the following:

  • Hide some complexity from the user—especially the fact that we use Python and Flask
  • Package the Python 3.9 executable with all its dependencies
  • Package all project dependencies defined in the requirements.txt file

Knowing the above requirements, we are ready to write our first Dockerfile. It will take the following form:

FROM python:3.9-slim
WORKDIR /app/
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY echo.py .
ENTRYPOINT ["python", "echo.py"]

FROM python:3.9-slim defines the base image for our custom container image. Python has a collection of official images on Docker Hub and python:3.9-slim is one of them. 3.9-slim is the tag of the image including Python 3.9 with only a minimal set of system packages needed to run Python. It is usually a sensible starting point for Python-based application images.

In the next section, we will learn how to build a Docker image from the above Dockerfile and how to run our container.

Running containers

Before your container can be started, you'll first need to build an image defined in the Dockerfile. You can build the image using the following command:

$ docker build -t <name> <path>

The -t <name> argument allows us to name the image with a readable identifier. It is totally optional, but without it, you won't be able to easily reference a newly created image. The <path> argument specifies the path to the directory where your Dockerfile is located. Let's assume that we were already running the command from the root of the project presented in the previous section. We also want to tag our image with the name echo. The docker build command invocation will be the following:

$ docker build -t echo .

Its output may be as follows:

Sending build context to Docker daemon   16.8MB
Step 1/6 : FROM python:3.9-slim
3.9-slim: Pulling from library/python
bb79b6b2107f: Pull complete
35e30c3f3e2b: Pull complete
b13c2c0e2577: Pull complete
263be93302fa: Pull complete
30e7021a7001: Pull complete
Digest: sha256:c13fda093489a1b699ee84240df4f5d0880112b9e09ac21c5d6875003d1aa927
Status: Downloaded newer image for python:3.9-slim
 ---> a90139e6bc2f
Step 2/6 : WORKDIR /app/
 ---> Running in fd85d9ac44a6
Removing intermediate container fd85d9ac44a6
 ---> b781318cdec7
Step 3/6 : COPY requirements.txt .
 ---> 6d56980fedf6
Step 4/6 : RUN pip install -r requirements.txt
 ---> Running in 5cd9b86ac454
(...)
Successfully installed Jinja2-2.11.2 MarkupSafe-1.1.1 Werkzeug-1.0.1 click-7.1.2 flask-1.1.2 itsdangerous-1.1.0
Removing intermediate container 5cd9b86ac454
 ---> 0fbf85e8f6da
Step 5/6 : COPY echo.py .
 ---> a546d22e8c98
Step 6/6 : ENTRYPOINT ["python", "echo.py"]
 ---> Running in 0b4e57680ac4
Removing intermediate container 0b4e57680ac4
 ---> 0549d15959ef
Successfully built 0549d15959ef
Successfully tagged echo:latest

Once created, you can inspect the list of available images using the docker images command:

$ docker images
REPOSITORY      TAG       IMAGE ID         CREATED              SIZE
echo            latest    0549d15959ef     About a minute ago   126MB
python          3.9-slim  a90139e6bc2f     10 days ago          115MB

The shocking size of container images

Our image has a size of 126 MB because it actually captures the whole Linux system distribution needed for running our Python application. It may sound like a lot, but it isn't really anything to worry about. For the sake of brevity, we have used a base image that is simple to use. There are other images that have been crafted specially to minimize this size, but these are usually dedicated to more experienced Docker users. Also, thanks to the layered structure of Docker images, if you're using many containers, the base layers can be cached and reused, so an eventual space overhead is rarely an issue. In the preceding example, the total size of storage used for both images will be only 126 MB because the echo:latest image only adds 11 MB on top of the python:3.9-slim image.

Once your image is built and tagged, you can run a container using the docker run command. Our container is an example of a web service, so we will have to additionally tell Docker that we want to publish the container's ports by binding them locally:

docker run -it --rm --publish 5000:5000 echo

Here is an explanation of the specific arguments of the preceding command:

  • -it: These are actually two concatenated options: -i and -t. The -i (for interactive) keeps STDIN open, even if the container process is detached, and -t (for tty) allocates pseudo-TTY for the container. TTY stands for teletypewriter and on Linux and UNIX-like operating systems represents the terminal connected to a program's standard input and output. In short, thanks to these two options, we will be able to see live logs from our application and ensure that the keyboard interrupt will cause the process to exit. It will simply behave the same way as we would start Python, straight from the command line.
  • --rm: Tells Docker to automatically remove the container when it exits. Without this option, the container will be kept so you can reattach to it in order to diagnose its state. By default, Docker does not remove containers just to make debugging easier. They can quickly pile up on your disk so good practice is to use --rm by default unless you really need to keep the exited container for later review.
  • --publish 5000:5000: Tells Docker to publish the container's port 5000 by binding port 5000 on the host's interface. You can use this option to also remap application ports. If you would like, for instance, to expose the echo application on port 8080 locally, you could use the --publish 8080:5000 argument.

Building and running your own images using the docker command is quite simple and straightforward but can become cumbersome after a while. It requires using quite long command invocations and remembering a lot of custom identifiers. It can be quite inconvenient for more complex environments. In the next section, we will see how a Docker workflow can be simplified with the Docker Compose utility.

Setting up complex environments

While the basic usage of Docker is pretty straightforward for basic setups, it can be a bit overwhelming once you start to use it in multiple projects. It is really easy to forget about specific command-line options, or which ports should be published on which images.

But things start to get really complicated when you have one service that needs to communicate with others. A single Docker container should only contain one running process.

This means that you really shouldn't put any additional process supervision tools, such as Supervisor and Circus, into the container image, and instead set up multiple containers that communicate with each other. Each service may use a completely different image, provide different configuration options, and expose ports that may or may not overlap. If you want to run multiple different processes, each process should be a separate container.

Large production deployments of containers use dedicated container orchestration systems like Kubernetes, Nomad, or Docker Swarm to keep track of all containers and their execution details like images, ports, volumes, ports, configuration, and so on. You could use one of those tools locally, but that would be overkill for development purposes.

The best container development tool that you can use on your computer that works well for both simple and complex scenarios is Docker Compose. Docker Compose is usually distributed with Docker, but in some Linux distributions (for example, Ubuntu), it may not be available by default. In such a case, it must be installed as a separate package from the system package repository. Docker Compose provides a powerful command-line utility named docker-compose and allows you to describe multi-container applications using the YAML syntax.

Compose expects the specially named docker-compose.yml file to be in your project root directory. An example of such a file for our previous project could be as follows:

version: '3.8'
services:
  echo-server:
    # this tell Docker Compose to build image from
    # local (.) directory
    build: .
    # this is equivalent to "-p" option of
    # the "docker run" command
    ports:
    - "5000:5000"
    # this is equivalent to "-t" option of
    # the "docker run" command
    tty: true

If you create such a docker-compose.yml file in your project, then your whole application environment can be started and stopped with two simple commands:

  • docker-compose up: This starts all containers defined in the docker-compose.yml file and actively prints their standard output
  • docker-compose down: This stops all containers started by docker-compose in the current project directory

Docker Compose will automatically build your image if it hasn't been built yet. That's a great way of encoding the development environment in the configuration file. If you work with other programmers, you can provide one docker-compose.yml file for your project. This way, setting up a fully working local development environment will be a matter of one docker-compose up command. The docker-compose.yml file should definitely be versioned together with the rest of your code if you use the code versioning tools.

Moreover, if your application requires additional external services, you can easily add them to your Docker Compose environment instead of installing them on your host system. Consider the following example that adds one instance of a PostgreSQL database and Redis memory storage using official Docker Hub images:

version: '3.8'
services:
  echo-server:
    build: .
    ports:
    - "5000:5000"
    tty: true
  database:
    image: postgres
  cache:
    image: redis

Docker Hub is the official repository of Docker images. Many open-source developers host their official project images there. You can find more info about Docker Hub at https://hub.docker.com.

It is as simple as that. To ensure better reproducibility, you should always specify version tags of external images (like postgres:13.1 and redis:6.0.9). That way you will ensure everyone using your docker-compose.yml file will be using exactly the same versions of external services. Thanks to Docker Compose you can use multiple versions of the same service simultaneously without any interference. That's because different Docker Compose environments are by default isolated on the network level.

Useful Docker and Docker Compose recipes for Python

Docker and containers in general are such a vast topic that it is impossible to cover them in one short section of this book. Thanks to Docker Compose, it is really easy to start working with Docker without knowing a lot about how it works internally. If you're new to Docker, you'll have to eventually slow down a bit, take the Docker documentation, and read it thoroughly.

The official Docker documentation can be found at https://docs.docker.com/.

The following are some quick tips and recipes that allow you to defer that moment and solve most of the common problems that you may have to deal with sooner or later.

Reducing the size of containers

A common concern of new Docker users is the size of their container images. It's true that containers provide a lot of space overhead compared to plain Python packages, but it is usually nothing if we compare this to the size of images for virtual machines. However, it is still very common to host many services on a single virtual machine, but with a container-based approach, you should definitely have a separate image for every service. This means that with a lot of services, the overhead may become noticeable.

If you want to limit the size of your images, you can use two complementary techniques:

  • Use a base image that is designed specifically for that purpose: Alpine Linux is an example of a compact Linux distribution that is specifically tailored to provide very small and lightweight Docker images. The base image is around 5 MB in size and provides an elegant package manager that allows you to keep your images compact.
  • Take into consideration the characteristics of the Docker overlay filesystem: Docker images consist of layers where each layer encapsulates the difference in the root filesystem between itself and the previous layer. Once the layer is committed, the size of the image cannot be reduced. This means that if you need a system package as a build dependency, and it may be later discarded from the image, then instead of using multiple RUN instructions, it may be better to do everything in a single RUN instruction with chained shell commands to avoid excessive layer commits.

These two techniques can be illustrated by the following Dockerfile:

FROM alpine:3.13
WORKDIR /app/
RUN apk add --no-cache python3
COPY requirements.txt .
RUN apk add --no-cache py3-pip && \
    pip3 install -r requirements.txt && \
    apk del py3-pip
COPY echo.py .
CMD ["python", "echo.py"]

The above example uses the alpine:3.12 base image to illustrate the technique of cleaning up needless dependencies before committing the layer. Unfortunately, the apk manager in the Alpine distribution doesn't give proper control of which version of Python will be installed. That's why recommended Alpine base images for Python projects come from the official Python repository. For Python 3.9 that would be the python:3.9-alpine base image.

The --no-cache flag of apk (Alpine's package manager) has two effects. First, it will cause apk to ignore the existing cache of package lists so it will install the latest package version that is available officially in the package repository. Second, it won't update the existing package lists cache, so the layer created with this instruction will be slightly smaller than using the --update-cache flag that is otherwise necessary to install the package in its latest version. The difference is not that big (probably around 2 MB) but those small chunks of cache can add up in bigger images that have many layers of apk add invocations. Package managers of different Linux distributions usually offer a similar way of disabling their package list caches.

The second RUN instruction is an example of taking into account the way Docker image layers work. On Alpine, the Python package doesn't come with pip installed so we need to install it on our own. Generally, after all the required Python packages have been installed, pip is no longer required and can be removed. We could use the ensurepip module to bootstrap pip but then we wouldn't have an obvious way of removing it. Instead, we use a long-chained instruction that relies on apk to install the py3-pip package and remove it after installing the other Python packages. This trick on Alpine 3.13 may even save us up to 16 MB.

If you run the Docker images command, you will see that there is a substantial size difference between images based on Alpine and python:slim base images:

$ docker images
REPOSITORY    TAG        IMAGE ID         CREATED              SIZE
echo-alpine   latest     e7e3a2bc7b71     About a minute ago   53.7MB
echo          latest     6b036d212e8f     40 minutes ago       126MB

The resulting image is now more than two times smaller than the one based on the python:3.9-slim image. That's mostly due to a streamlined Alpine distribution that is around 5 MB in total. Without our trick of deleting pip and using the --no-cache flag, the image size would probably be around 72 MB (package lists caches are around 2 MB, py3-pip around 16 MB). In total it allowed us to save almost 25% of the size. Such a size reduction will not be that meaningful for larger applications with more dependencies where 18 MB doesn't make that much of a difference. Still, this technique can be used for other build-time dependencies. Some packages, for instance, require additional compilers like gcc (GNU Compiler Collection) and extra header files at the time of installation. In such a situation, you could use the same pattern to avoid having the full GNU Compiler Collection in the final image. And that actually can have quite a big impact on the image size.

Addressing services inside of a Docker Compose environment

Complex applications often consist of multiple services that communicate with each other. Compose allows us to define such applications with ease. The following is an example docker-compose.yml file that defines the application as a composition of two services:

version: '3.8'
services:
  echo-server:
    build: .
    ports:
    - "5000:5000"
    tty: true
  database:
    image: postgres
    restart: always

The preceding configuration defines two services:

  • echo-server: This is our echo application service container with the image built from the local Dockerfile
  • database: This is a PostgreSQL database container from an official postgres Docker image

We assume that the echo-server service wants to communicate with the database service over the network. In order to set up such communications, we need to know the service IP address or hostname so that it can be used as an application configuration. Thankfully, Docker Compose is a tool that was designed exactly for such scenarios, so it will make it a lot easier for us.

Whenever you start your environment with the docker-compose up command, Docker Compose will create a dedicated Docker network by default and will register all services in that network using their names as their hostnames. This means that the echo-server service can use the database:5432 address to communicate with the database (5432 is the default PostgreSQL port), and any other service in that Docker Compose environment will be able to access the HTTP endpoint of the echo-server service under the http://echo-server:80 address.

Even though the service hostnames in Docker Compose are easily predictable, it isn't good practice to hardcode any addresses in your application code. The best approach would be to provide them as environment variables that can be read by your application on startup. The following example shows how arbitrary environment variables can be defined for each service in a docker-compose.yml file:

version: '3.8'
services:
  echo-server:
    build: .
    ports:
    - "5000:5000"
    tty: true
    environment:
      - DATABASE_HOSTNAME=database
      - DATABASE_PORT=5432
      - DATABASE_PASSWORD=password
  database:
    image: postgres
    restart: always
    environment:
      POSTGRES_PASSWORD: password

The highlighted lines provide environment variables that tell our echo server what the hostname and port of the database are. Environment variables are the most recommended way of providing configuration parameters for containers.

Docker containers are ephemeral. This means that once the container is removed (usually on exit), its internal filesystem changes are lost. For databases, this means that if you don't want to lose data in the database running in the container, you should mount a volume inside a container under the directory where the data is supposed to be stored. Maintainers of Docker images for databases usually document how to mount such volumes, so always refer to the documentation of the Docker image you are using if you want to keep database data safe. An example of using Docker volumes for slightly different purposes is shown in the Adding live reload for absolutely any code section.

Communicating between Docker Compose environments

If you build a system composed of multiple independent services and/or applications, you will very likely want to keep their code in multiple independent code repositories (projects). The docker-compose.yml files for every Docker Compose application are usually kept in the same code repository as the application code. The default network that was created by Compose for a single application is isolated from the networks of other applications. So, what can you do if you suddenly want your multiple independent applications to communicate with each other?

Fortunately, this is another thing that is extremely easy with Compose. The syntax of the docker-compose.yml file allows you to define a named external Docker network as the default network for all services defined in that configuration.

The following is an example configuration that defines an external network named my-interservice-network:

version: '3.8'
networks:
  default:
    external:
      name: my-interservice-network
services:
  webserver:
    build: .
    ports:
    - "80:80"
    tty: true
    environment:
      - DATABASE_HOSTNAME=database
      - DATABASE_PORT=5432
      - DATABASE_PASSWORD=password
  database:
    image: postgres
    restart: always
    environment:
      POSTGRES_PASSWORD: password

Such external networks are not managed by Docker Compose, so you'll have to create it manually with the docker network create command, as follows:

$ docker network create my-interservice-network

Once you have done this, you can use this external network in other docker-compose.yml files for all applications that should have their services registered in the same network. The following is an example configuration for other applications that will be able to communicate with both database and webserver services over my-interservice-network, even though they are not defined in the same docker-compose.yml file:

version: '3.8'
networks:
  default:
    external:
      name: my-interservice-network
services:
  other-service:
    build: .
    ports:
    - "80:80"
    tty: true
    environment:
      - DATABASE_HOSTNAME=database
      - DATABASE_PORT=5432
      - ECHO_SERVER_ADDRESS=http://echo-server:80

The above approach allows you to start two independent Docker Compose environments in separate shells. All services will be able to communicate with each other through a shared Docker network.

Delaying code startup until service ports are open

If you run docker-compose up, all services will be started at the same time. You can control to some extent the service startup using the depends_on key in the service definition as in the following example:

version: '3.8'
services:
  echo-server:
    build: .
    ports:
    - "5000:5000"
    tty: true
    depends_on:
      - database
  database:
    image: postgres
    environment:
      POSTGRES_PASSWORD: password

The preceding setup will make sure that our echo server will be started after the database service. Unfortunately, it is not always enough to ensure proper startup ordering of services within the development environment.

Consider a situation where echo-server would have to read something from the database immediately after starting. Docker Compose will make sure that services will be started in order but will not make sure that PostgreSQL will be ready to actually accept connections from the echo server. That's because PostgreSQL initialization can take a couple of seconds.

The solution for this is pretty simple. There are numerous scripting utilities that allow you to test if a specific network port is open before proceeding with the execution of a command. One such utility is named wait-for-it and is actually written in Python so you can easily install it with pip.

You can invoke wait-for-it using the following syntax:

$ wait-for-it --service <service-address> -- command [...]

The -- command [...] usage pattern is a common pattern for utilities that wrap different command execution where [...] represents any set of arguments for command. The wait-for-it process will try to create a TCP connection and when it succeeds, it will execute command [...]. For instance, if we would like to wait for localhost connection on port 2000 before starting the python echo.py command we would simply execute:

$ wait-for-it --service localhost:2000 -- python echo.py

The following is an example of a modified docker-compose.yml file that elegantly overrides the default Docker image command and decorates it with the call to the wait-for-it utility to ensure our echo server starts only when it would be able to connect to the database:

version: '3.8'
services:
  echo-server:
    build: .
    ports:
    - "5000:5000"
    tty: true
    depends_on:
      - database
    command:
      wait-for-it --service database:5432 --
      python echo.py
  database:
    image: postgres
    environment:
      POSTGRES_PASSWORD: password

wait-for-it by default times out after 15 seconds. After that timeout, it will start the process after the -- mark regardless of whether it succeeded in connecting or not. You can disable timeout using the --timeout 0 argument. Without the timeout, wait-for-it will wait indefinitely.

Adding live reload for absolutely any code

When developing a new application, we usually work with code iteratively. We implement changes and see results. We either verify the code manually or run the tests. There is a constant feedback loop.

With Docker, we need to enclose our code in the container image to make it work. But running docker build or docker-compose build every time you make a change in your host system would be highly counterproductive.

That's why the best way to provide code to the container while working with Docker in the development stage is through Docker volumes. The idea is to bind your local filesystem directory to the container's internal filesystem path. That way any changes made to the files in the host's filesystem will be automatically reflected inside of the container. With Docker Compose, it is extremely easy as it allows you to define volumes in the service configuration. The following is a modified version of our docker-compose.yml file for the echo service that mounts the project's root directory under the /app/ path:

version: '3.8'
services:
  echo-server:
    build: .
    ports:
    - "5000:5000"
    tty: true
    volumes:
      - .:/app/

Changes happening on mounted Docker volumes are actively propagated on both sides. Many Python frameworks or servers support active hot reloading whenever they notice that your code has changed. This dramatically improves the development experience because you can see how the behavior of your application changes as you write it and without the need for manual restarts.

Probably not every piece of code you write will be built using a framework that supports active reloading. Fortunately, there is a great Python package named watchdog that allows you to reload any application watching code changes. This package provides a handful watchmedo utility that similarly to wait-for-it can wrap any process execution.

The watchmedo utility from the watchdog package requires some additional dependencies in order to execute. To install that package with extra dependencies use the following pip install syntax:

pip install watchdog[watchmedo]

The following is the basic usage format for reloading specified processes whenever there is a change to any Python file in the current working directory:

$ watchmedo auto-restart --patterns "*.py" --recursive -- command [...]

The --patterns "*.py" options indicate which files the watchmedo process should monitor for changes. The --recursive flag makes it traverse the current working directory recursively so it will be able to pick up changes made even if they are nested deep down in the directory tree. The -- command [...] usage pattern is the same as the wait-for-it command mentioned in Delaying code startup until service ports are open. It simply means that everything after the -- mark will be treated as a single command with (optional) arguments. watchmedo starts that command and restarts it whenever it discovers a change in the monitored files.

If you install the watchdog package in your Docker image, you will be able to elegantly include it in your docker-compose.yml as in the following example:

version: '3.8'
services:
  echo-server:
    build: .
    ports:
    - "5000:5000"
    tty: true
    depends_on:
      - database
    command:
      watchmedo auto-restart --patterns "*.py" --recursive --
      python echo.py
    volumes:
      - .:/app/

The above Docker Compose setup will restart the process inside of a container every time there is a change to your Python code. In our example, this will be any file with the .py extension that lives under the /app/ path. Thanks to mounting the source directory as a Docker volume, the watchmedo utility will be able to pick up any change made on the host filesystem and restart as soon as you save changes in your editor.

Development environments with Docker and Docker Compose are extremely useful and convenient but have their limitations. The obvious one is that they only allow you to run your code under the Linux operating system. Even though Docker is available for macOS and Windows, it still relies on a Linux virtual machine as an intermediary layer, so your Docker containers will still be running under Linux. If you need to develop your application as if it were running exactly on a specific system that is different from Linux, you need a completely different approach to environment isolation. In the next section, we will learn about one such tool.

Virtual development environments using Vagrant

Although Docker together with Docker Compose provides a very good foundation for creating reproducible and isolated development environments, there are cases where a real virtual machine will simply be a better (or only) choice. An example of such a situation may be a need to do some system programming for an operating system different than Linux.

Vagrant currently seems to be one of the most popular tools for developers to manage virtual machines for the purpose of local development. It provides a simple and convenient way to describe development environments with all system dependencies in a way that is directly tied to the source code of your project. It is available for Windows, macOS, and a few popular Linux distributions (refer to https://www.vagrantup.com).

It does not have any additional dependencies. Vagrant creates new development environments in the form of virtual machines or containers. The exact implementation depends on a choice of virtualization providers. VirtualBox is the default provider, and it is bundled with the Vagrant installer, but additional providers are available as well. The most notable choices are VMware, Docker, Linux Containers (LXC), and Hyper-V.

The most important configuration is provided to Vagrant in a single file named a Vagrantfile. It should be independent for every project. The following are the most important things it provides:

  • Choice of virtualization provider
  • A box, which is used as a virtual machine image
  • Choice of provisioning method
  • Shared storage between the virtual machine and the virtual machine's host
  • Ports that need to be forwarded between the virtual machine and its host

The syntax language for a Vagrantfile is Ruby. The example configuration file provides a good template to start the project and has excellent documentation, so knowledge of this language is not required. The template configuration can be created using a single command:

$ vagrant init

This will create a new file named Vagrantfile in the current working directory. The best place to store this file is usually the root of the related project sources. This file is already a valid configuration that will create a new virtual machine using the VirtualBox provider and box image based on an Ubuntu Linux distribution. The default Vagrantfile content that's created with the vagrant init command contains a lot of comments that will guide you through the complete configuration process.

The following is a minimal example of a Vagrantfile for the Python 3.9 development environment based on the Ubuntu operating system, with some sensible defaults that, among others, enable port 80 forwarding in case you want to do some web development with Python:

Vagrant.configure("2") do |config|
  # Every Vagrant development environment requires a box.
  # You can search for boxes at https://vagrantcloud.com/search.
  # Here we use Bionic version Ubuntu system for x64 architecture.
  config.vm.box = "ubuntu/bionic64"
  # Create a forwarded port mapping which allows access to a specific
  # port within the machine from a port on the host machine and only
  # allow access via 127.0.0.1 to disable public access
  config.vm.network "forwarded_port", guest: 80, host: 8080, host_ip: "127.0.0.1"
  config.vm.provider "virtualbox" do |vb|
    vb.gui = false
    # Customize the amount of memory on the VM:
    vb.memory = "1024"
  end
  # Enable provisioning with a shell script.
  config.vm.provision "shell", inline: <<-SHELL
    apt-get update
    apt-get install python3.9 -y
  SHELL
end

In the preceding example, we have set the additional provisioning of system packages with a simple shell script inside of the config.vm.provision section. The default virtual machine image provided by the ubuntu/bionic64 "box" does not include the Python 3.9 version, so we need to install it using the apt-get package manager.

When you feel that the Vagrantfile is ready, you can run your virtual machine using the following command:

$ vagrant up

The initial startup can take a few minutes, because the actual box image must be downloaded from the web. There are also some initialization processes that may take a while every time the existing virtual machine is brought up, and the amount of time depends on the choice of provider, image, and your system's performance. Usually, once the image has been downloaded, this takes only a couple of seconds. When the Vagrant environment is up and running, you can connect to it through SSH using the following shell shorthand:

$ vagrant ssh

This can be done anywhere in the project source tree below the location of the Vagrantfile. For the developers' convenience, Vagrant will traverse all directories above the user's current working directory in the filesystem tree, looking for the configuration file and matching it with the related virtual machine instance. Then, it establishes the secure shell connection, so the development environment can be interacted with just like an ordinary remote machine. The only difference is that the whole project source tree (root defined as the location of the Vagrantfile) is available on the virtual machine's filesystem under /vagrant/. This directory is automatically synchronized with your host filesystem, so you can normally use an IDE or code editor of your choice on the host and simply treat the SSH session to your Vagrant virtual machine just like a normal local shell session.

Popular productivity tools

Almost every open-source Python package that has been released on PyPI is a kind of productivity booster—it provides ready-to-use solutions to some problem. That way we don't have to reinvent the wheel all the time. Some could also say that Python itself is all about productivity. Almost everything in this language and the community surrounding it seems to be designed to make software development as productive as possible.

This creates a positive feedback loop. Since writing code with Python is fun and easy, a lot of programmers use their free time to create tools that make it even easier and more fun. And this fact will be used here as a basis for a very subjective and non-scientific definition of a productivity tool—a piece of software that makes development easier and more fun.

By nature, productivity tools focus mainly on certain elements of the development process, such as testing, debugging, and managing packages, and are not core parts of the products that they help to build. In some cases, they may not even be referred to anywhere in the project's codebase, despite being used on a daily basis.

We've already discussed tools revolving around package management and the isolation of virtual environments. These are undoubtedly productivity tools as their aim is to simplify and ease the tedious processes of setting up your local working environment. Later in the book, we will discuss more productivity tools that help to solve specific problems, such as profiling and testing. This section is dedicated to other tools that are really worth mentioning but have no specific chapter in this book where they could be introduced.

Custom Python shells

Python programmers spend a lot of time in interactive interpreter sessions. These sessions are very good for testing small code snippets, accessing documentation, or even debugging code at runtime. The default interactive Python session is very simple and does not provide many features, such as tab completion or code introspection helpers. Fortunately, the default Python shell can be easily extended and customized.

If you use an interactive shell very often, you can easily modify the behavior of its prompt. Python at startup reads the PYTHONSTARTUP environment variable, looking for the path of the custom initializations script. Some operating system distributions where Python is a common system component (for example, Linux or macOS) may already be preconfigured to provide a default startup script. It is commonly found in the user's home directory under the .pythonstartup name.

These scripts often use the readline module (based on the GNU readline library) together with rlcompleter in order to provide interactive tab completion and command history. Both modules are part of the Python standard library.

The readline module is not available on Windows. Windows users often use the pyreadline package available on PyPI as a substitution for the missing module.

If you don't have a default Python startup script, you can easily build your own. A basic script for command history and tab completion can be as simple as the following:

# python startup file 
import atexit 
import os 
 
try:
   import readline
except ImportError:
   print("Completion unavailable: readline module not available")
else:
    import rlcompleter
    # tab completion 
    readline.parse_and_bind('tab: complete') 
 
    # Path to history file in user's home directory.
    # Can use your own path. 
    history_file = os.path.expanduser('~/.python_shell_history') 
    try: 
        readline.read_history_file(history_file) 
    except IOError:
        pass 
 
    atexit.register(readline.write_history_file, history_file)
    del os, history_file, readline, rlcompleter

Create this file in your home directory and call it .pythonstartup. Then, add a PYTHONSTARTUP variable in your environment using the path of your file.

If you are running Linux or macOS, you can create the Python startup script in your home folder. Then, link it with a PYTHONSTARTUP environment variable that's been set in the system shell startup script. For example, the Bash and Korn shells use the .profile file, where you can insert a line, as follows:

export PYTHONSTARTUP=~/.pythonstartup

If you are running Windows, it is easy to set a new environment variable as an administrator in the system preferences and save the script in a common place instead of using a specific user location.

Writing on the PYTHONSTARTUP script may be a good exercise but creating a good custom shell all alone is a challenge that few can find time for. Fortunately, there are a few custom Python shell implementations that immensely improve the experience of interactive sessions in Python. In the next section, we will take a closer look at one that is particularly popular—IPython.

Using IPython

IPython provides an extended Python command shell. It is available as a package on PyPI so you can easily install it with either pip or poetry. Among all the features it provides, some interesting ones are as follows:

  • Dynamic object introspection
  • System shell access from the prompt
  • Multiline code editing
  • Syntax highlighting
  • Copy-paste helpers
  • Direct profiling support
  • Debugging facilities

Now, IPython is a part of a larger project called Jupyter, which provides interactive notebooks with live code that can be written in many different languages. Jupyter notebooks are really popular within the data science community where Python really shines. So it is good to know their shell sibling.

The IPython shell is invoked through the ipython command. After starting IPython you will immediately notice that the standard Python prompt is replaced by a colorful number of execution cells:

$ ipython
Python 3.9.0 (v3.9.0:9cf6752276, Oct  5 2020, 11:29:23)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:

There are two really handy properties of an IPython shell:

  • It allows you to easily work with multiline code including one that has been pasted from the clipboard
  • Provides shortcuts for inspecting docstrings, module documentation, and code of imported modules

These two features alone make IPython great for learning purposes. First, if you find any useful snippets of code (including ones in this book), you can easily paste them from system's clipboard and modify them as if the Python interpreter were a code editor. The following is a screenshot of a terminal with an interactive IPython session that the source code of the echo application was pasted into:

Obraz zawierający tekst

Opis wygenerowany automatycznie

Figure 2.1: Pasting code into IPython

When it comes to code introspection, IPython provides a really quick way of looking into the documentation and source code of imported modules, functions, and classes. Simply type a name you want to inspect and follow it with ? to see the docstring. The following terminal transcript presents an example exploration session of the urlunparse() function from the urllib.parse module:

In [1]: urllib.parse.urlunparse?
Signature: urllib.parse.urlunparse(components)
Docstring:
Put a parsed URL back together again.  This may result in a
slightly different, but equivalent URL, if the URL that was parsed
originally had redundant delimiters, e.g. a ? with an empty query
(the draft states that these are equivalent).
File:      /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py
Type:      function

Use ?? after the function name instead and you'll see the whole source code:

In [2]: urllib.parse.urlunparse??
Signature: urllib.parse.urlunparse(components)
Source:
def urlunparse(components):
    """Put a parsed URL back together again.  This may result in a
    slightly different, but equivalent URL, if the URL that was parsed
    originally had redundant delimiters, e.g. a ? with an empty query
    (the draft states that these are equivalent)."""
    scheme, netloc, url, params, query, fragment, _coerce_result = (
                                                  _coerce_args(*components))
    if params:
        url = "%s;%s" % (url, params)
    return _coerce_result(urlunsplit((scheme, netloc, url, query, fragment)))
File:      /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py
Type:      function

IPython is not the only enhanced Python shell at your disposal. You may want to look at the btpython and ptpython projects, which have similar capabilities but a slightly different user experience.

Interactive sessions are great for experimentation and module exploration but sometimes can also be useful in final applications. In the next section, you will learn how to embed them inside of your own code.

Incorporating shells in your own scripts and programs

Sometimes, there is a need to incorporate read-eval-print loop (REPL), similar to Python's interactive session, inside of your own software. This allows easier experimentation with your code and inspection of its internal state. Sometimes it is simply easier to embed an interactive terminal instead of designing a custom Command-Line Interface (CLI) for your application (especially if it has to be used on rare occasions). Interactive interpreters are often embedded in web application frameworks to allow developers to interact with data stored within applications using Python REPL instead of database-specific terminal utilities.

The simplest module that allows emulating Python's interactive interpreter already comes with the standard library and is named code.

The script that starts interactive sessions consists of one import and a single function call:

import code
code.interact()

You can easily do some minor tuning, such as modify a prompt value or add banner and exit messages, but anything fancier will require a lot more work. If you want to have more features, such as code highlighting, completion, or direct access to the system shell, it is always better to use something that was already built by someone. Fortunately, the IPython shell mentioned in the previous section can be embedded in your own program as easily as the code module.

The following are examples of how to invoke all of the previously mentioned shells inside of your code:

# Example for IPython
import IPython
IPython.embed()
# Example for bpython
import bpython
bpython.embed()
# Example for ptpython
from ptpython.repl import embed
embed(globals(), locals())

The first two arguments to the embed() function are dictionaries of objects that will be available as global and local namespaces during the interactive session. This can be used to prepopulate the interactive session with modules, variables, functions, or classes that are likely to be used during that session.

Interactive sessions are great for providing a low-level interface of an application directly to the user. Sometimes they can be used to manually inspect the internal state of an application by providing access to either local or global variables. Still, if you want to interactively trace how your application executes the code, you will probably need to use a debugger. Fortunately, Python comes with a built-in debugger that offers such a possibility in the form of an interactive session.

Interactive debuggers

Code debugging is an integral element of the software development process. Many programmers can spend most of their life using only extensive logging and print() functions as their primary debugging tools, but most professional developers prefer to rely on some kind of debugger.

Python already ships with a built-in interactive debugger called pdb. It can be invoked from the command line on the existing script, so Python will enter post-mortem debugging if the program exits abnormally:

$ python3 -m pdb -c continue script.py

Another way to achieve similar behavior is running the interpreter with the -i flag:

$ python3 -i script.py

The preceding code will open an interactive session at the moment where Python would normally exit. From there, you can start a post-mortem debugging session by importing the pdb module and using the pdb.pm() function as in the following example:

>>> import pdb
>>> pdb.pm()

Post-mortem debugging, while useful, does not cover every scenario. It is useful only when the application exits with some exception if the bug occurs. In many cases, faulty code behaves abnormally but does not exit unexpectedly. In such cases, custom breakpoints can be set on a specific line of code using the breakpoint() function. The following is an example of setting a breakpoint inside of a simple function:

import math
def circumference(r: float):
    breakpoint()
    return 2 * math.pi * r

The breakpoint() function was not available prior to Python 3.7 so you may see some older Python developers using the following idiom:

import pdb; pdb.set_trace()

This will cause the Python interpreter to start the debugger session on this line during runtime.

The pdb module is very useful for tracing issues, and at first glance, it may look very similar to the well-known GNU Debugger (GDB). Because Python is a dynamic language, the pdb session is very similar to an ordinary interpreter session. This means that the developer is not limited to tracing code execution but can call any code and even perform module imports.

Sadly, because of its roots (gdb), your first experience with pdb can be a bit overwhelming due to the existence of cryptic short-letter debugger commands such as hbsnj, and r. When in doubt, the help pdb command, which can be typed during the debugger session, will provide extensive usage information. You can also use the h shortcut.

The debugger session in pdb is very simple and does not provide additional features such as tab completion or code highlighting. Fortunately, same as with enhanced Python shells, there are a couple of enhanced debugging shells available on PyPI. There is even one based on IPython. Its name is ipdb.

If you want to use ipdb instead of plain pdb, you can either use a modified debugging idiom (import ipdb; ipdb.set_trace()) or set the PYTHONBREAKPOINT environment variable to the ipdb.set_trace value.

Last but not least, many IDEs offer visual debuggers and some developers find them extremely useful. These debuggers allow you to set breakpoints in multiple places of your application without the need for modifying the code with manual breakpoint() calls. They also often allow adding variable watches that stop program execution when the selected variable has a specific value.

Other productivity tools

We've concentrated so far on the productivity tools that are specific to Python. But the real truth is that programming in different languages is not that different. It doesn't matter what languages programmers use, they often face the same problems and tedious tasks like massaging the data in various formats, downloading network artifacts, searching through filesystems, and navigating projects.

Probably the most flexible productivity tool of all time will be Bash together with common standard utilities found in every POSIX and UNIX-like operating system. Knowing them all thoroughly is probably impossible for an ordinary human. But knowing a few well is something that will make you really productive.

Simply put, sometimes there's no need to write a sophisticated Python script for a one-off job if you can quickly wire and pipe together a few invocations of the curl, grep, sed, and sort commands. Sometimes, there is already a specialized tool for a specific and non-trivial job (counting lines of code, for instance) that would take a lot of time to implement from scratch.

The following table gives a short list of such useful utilities that I find invaluable when working with any code. Think of it as a mini awesome list of programming productivity tools:

Utility

Description

jq

https://stedolan.github.io/jq/

Utility for manipulating data in the form of JSON documents. Extremely useful for manipulating the output of web APIs directly in the shell. Data is read from standard input and results are printed on standard output. Manipulation is described through a custom domain-specific language that is very easy to learn.

yq

https://pypi.org/project/yq/

Sibling of jq that uses the same syntax for manipulating YAML documents.

curl

https://curl.se

Old-fashioned classic for transferring data through URLs. Most often used for interfacing with HTTP but actually supports over 20 protocols.

HTTPie

https://httpie.io

Python-based utility for interfacing with HTTP servers. Many developers find it more convenient to use than curl.

autojump

https://github.com/wting/autojump

Shell utility that allows users to quickly navigate to most recently visited directories. Indispensable for programmers working on dozens of projects in parallel. Simply type j and a few characters of the desired directory name and you will probably land in the right place. Plays nicely with Poetry workflows.

cloc

https://github.com/AlDanial/cloc

One of the best and most complete utilities for counting lines of code. Sometimes you need to see how big a project is and how many programming or markup languages it uses. cloc will give you the right answer quickly.

ack-grep

https://beyondgrep.com

grep on steroids. Allows you to quickly search through large codebases looking for a specific string. Allows filtering by programming language and often is simply faster and better than opening a project in an IDE.

GNU parallel

https://www.gnu.org/software/parallel/

Enhanced replacement of xargs. Really invaluable if you want to do many things in parallel inside of in a shell or Bash script, especially if you want to do it reliably and efficiently.

Summary

This chapter was all about development environments for Python programmers. We've discussed the importance of environment isolation for Python projects. You've learned two different levels of environment isolation (application-level and system-level), and multiple tools that allow you to create them in a consistent and repeatable manner. We've also discussed some essential topics for managing Python dependencies in your projects. This chapter ended with a review of a few tools that improve the ways in which you can experiment with Python or debug your programs and work effectively.

Once you have all of these tools in your tool belt, you are well-prepared for the next few chapters, where we will discuss multiple features of modern Python syntax. You're probably already hungry for Python code so we will start with a quick overview of the new things that were included in Python over the last few releases.

If you're quite up to date with what's happening in Python, you can probably skip the next chapter. Still, take a quick look at the headings—it is possible that you have missed something, as Python evolves really fast.

    Reset