9 Bridging Python with C and C++

Python is great but it isn't suited for everything. Sometimes you may find that particular problems can be solved more easily with a different language. Those different languages may be better due to greater expressiveness for certain technical domains (like control engineering, image processing, or system programming) or offer natural performance gains over Python. And Python (together with default CPython implementation) has a few characteristics that aren't necessarily good for performance:

Threading usability is greatly reduced for CPU-bound tasks due to the existence of Global Interpreter Lock (GIL) in CPython and is dependent on the Python implementation of choice
Python is not a compiled language (in the way C and Go are) so it lacks many compile-time optimizations
Python does not provide static typing and the possible optimizations that come with it

But the fact that some languages are better for specific tasks doesn't mean that you have to completely forgo Python when faced with such problems. With proper techniques, it is possible to write applications that take advantage of many technologies.

One such technique is architecting applications as independent components that communicate with each other through well-defined communication channels. This often comes in the form of service-oriented or microservice architectures. This is extremely common in distributed systems where every component (service) of system architecture can run independently on a different host. Systems written in multiple languages are often nicknamed polyglot systems.

The disadvantage of using polyglot service-oriented or microservice architectures is that you will usually have to recreate a lot of application scaffolding for every such language. This includes things like application configuration, logging, monitoring, and communication layers, as well as different frameworks, libraries, build tools, common conventions, and design patterns. Introducing those tools and conventions will cost time and future maintenance effort, which can often exceed the gains of adding another language to your architecture.

Fortunately, there's another way to overcome this problem. Often what we really need from a different language can be packaged as an isolated library that does one thing and does it well. What we need to do is to find a bridge between Python and other languages that will allow us to use their libraries in Python applications. This can be done either through custom CPython extensions or so-called Foreign Function Interfaces (FFIs).

In both cases, the C and C++ programming languages act as a gateway to libraries and code written in different languages. The CPython interpreter is itself written in C and provides the Python/C API (defined in the Python.h header file) that allows you to create shared C libraries that can be loaded by the interpreter. C (and C++, due to its native interoperability with the C language) can be used to create such extensions. The FFIs on the other hand can be used to interact with any compatible compiled shared library regardless of the language it is written in. These libraries will still rely on C calling conventions and basic types.

This chapter will discuss the main reasons for writing your own extensions in other languages and introduce you to the popular tools that help to create them. We will learn about the following topics in this chapter:

C and C++ as the core of Python extensibility
Compiling and loading Python C extensions
Writing extensions
Downsides of using extensions
Interfacing with compiled dynamic libraries without extensions

In order to bridge Python with different languages, we will need a handful of extra tools and libraries so let's take a look at the technical requirements for this chapter.

Technical requirements

In order to compile the Python extensions mentioned in this chapter, you will need C and C++ compilers. The following are suitable compilers that you can download for free on selected operating systems:

Visual Studio 2019 (Windows): https://visualstudio.microsoft.com
GCC (Linux and most POSIX systems): https://gcc.gnu.org
Clang (Linux and most POSIX systems): https://clang.llvm.org

On Linux, GCC and Clang compilers are usually available through package management systems specific to the given system distribution. On macOS, the compiler is part of the Xcode IDE (available through the App Store).

The following are Python packages that are mentioned in this chapter that you can download from PyPI:

Cython
Cffi

Information on how to install packages is included in Chapter 2, Modern Python Development Environments.

The code files for this chapter can be found at https://github.com/PacktPublishing/Expert-Python-Programming-Fourth-Edition/tree/main/Chapter%209.

C and C++ as the core of Python extensibility

The reference implementation of Python—the CPython interpreter—is written in C. Because of that, Python interoperability with other languages revolves around C and C++, which has native interoperability with C. There is even a full superset of the Python language called Cython, which uses a source-to-source compiler for creating C extensions for CPython using extended Python syntax.

In fact, you can use dynamic/shared libraries written in any language if the language supports compilation in the form of dynamic/shared libraries. So, interlanguage integration possibilities go way beyond C and C++. This is because shared libraries are intrinsically generic. They can be used in any language that supports their loading. So, even if you write such a library in a completely different language (let's say Delphi or Prolog), you can use it in Python. Still, it is hard to call such a library a Python extension if it does not use the Python/C API.

Unfortunately, writing your own extensions only in C or C++ using the bare Python/C API is quite demanding. Not only because it requires a good understanding of one of the two languages that are relatively hard to master, but also because it requires an exceptional amount of boilerplate. You will have to write a lot of repetitive code that is used only to provide an interface that will glue your core C or C++ code with the Python interpreter and its datatypes.

Anyway, it is good to know how pure C extensions are built because of the following reasons:

You will better understand how Python works in general
One day, you may need to debug or maintain a native C/C++ extension
It helps in understanding how higher-level tools for building extensions work

That's why in this chapter we will first learn how to build a simple Python C extension from scratch. We will later reimplement it with different techniques that do not require the usage of the low-level Python/C API.

But before we dive into the details of writing extensions, let's see how to Compile and load one.

Compiling and loading Python C extensions

The Python interpreter is able to load extensions from dynamic/shared libraries such as Python modules if they provide an applicable interface using the Python/C API. The definition of all functions, types, and macros constituting the Python/C API is included in a Python.h C header file that is distributed with Python sources. In many distributions of Linux, this header file is contained in a separate package (for example, python-dev in Debian/Ubuntu) but under Windows, it is distributed by default with the interpreter. On POSIX and POSIX-compatible systems (for example, Linux and macOS), it can be found in the include/ directory of your Python installation. On Windows, it can be found in the Include/ directory of your Python installation.

The Python/C API traditionally changes with every release of Python. In most cases, these are only additions of new features to the API so are generally source-compatible. Anyway, in most cases, they are not binary-compatible due to changes in the Application Binary Interface (ABI). This means that extensions must be compiled separately for every major version of Python. Also, different operating systems have incompatible ABIs, so this makes it practically impossible to create a single binary distribution for every possible environment. This is the reason why most Python extensions are distributed in source form.

Since Python 3.2, a subset of the Python/C API has been defined to have a stable ABI. Thanks to this, it is possible to build extensions using this limited API (with a stable ABI), so extensions can be compiled only once for a given operating system and it will work with any version of Python higher than or equal to 3.2 without the need for recompilation. Anyway, this limits the number of API features and does not solve the problems of older Python versions. It also does not allow you to create a single binary distribution that would work on multiple operating systems. This is a trade-off and the price of the stable ABI sometimes may be a bit too high for a very low gain.

It is important to know that the Python/C API is a feature that is limited only to CPython implementations. Some efforts were made to bring extension support to alternative implementations such as PyPI, Jython, or IronPython, but it seems that there is no stable and complete solution for them at the moment. The only alternative Python implementation that should deal easily with extensions is Stackless Python because it is in fact only a modified version of CPython.

C extensions for Python need to be compiled into shared/dynamic libraries before they can be imported. There is no native way to import C/C++ code in Python directly from sources. Fortunately, the setuptools package provides helpers to define compiled extensions as modules, so compilation and distribution can be handled using the setup.py script as if they were ordinary Python packages.

We will learn more details about creating Python packages in Chapter 11, Packaging and Distributing Python Code.

The following is an example of the setup.py script from the official documentation that handles the preparation of a simple package distribution that has an extension written in C:

from setuptools import setup, Extension 
 
module1 = Extension( 
    'demo', 
    sources=['demo.c'] 
) 
 
 
setup( 
    name='PackageName', 
    version='1.0', 
    description='This is a demo package', 
    ext_modules=[module1] 
)

We will learn more about distributing Python packages and the setup.py script in Chapter 11, Packaging and Distributing Python Code.

Once prepared this way, the following additional step is required in your distribution flow:

python3 setup.py build

This step will compile all your extensions defined as the ext_modules argument according to all additional compiler settings provided with the Extension() constructor. The compiler that will be used is the one that is a default for your environment. This compilation step is not required if the package is going to be distributed as a source distribution. In that case, you need to be sure that the target environment has all the compilation prerequisites such as the compiler, header files, and additional libraries that are going to be linked to your binary (if your extension needs any). More details of packaging the Python extensions will be explained later in the Downsides of using extensions section.

In the next section, we will discuss why you may need to use extensions.

The need to use extensions

It's not easy to say when it is a reasonable decision to write extensions in C/C++. The general rule of thumb could be "never unless you have no other choice". But this is a very subjective statement that leaves a lot of place for the interpretation of what is not doable in Python. In fact, it is hard to find a thing that cannot be done using pure Python code.

Still, there are some problems where extensions may be especially useful by adding the following benefits:

Bypassing GIL in the CPython threading model
Improving performance in critical code sections
Integrating source code written in different languages
Integrating third-party dynamic libraries
Creating efficient custom datatypes

Of course, for every such problem, there is usually a viable native Python solution. For example, the core CPython interpreter constraints, such as GIL, can easily be overcome with a different approach to concurrency, such as coroutines or multiprocessing, instead of a threading model (we discussed these options in Chapter 6, Concurrency). To work around third-party dynamic libraries and custom datatypes, third-party libraries can be integrated with the ctypes module, and every datatype can be implemented in Python.

Still, the native Python approach may not always be optimal. The Python-only integration of an external library may be clumsy and hard to maintain. The implementation of custom datatypes may be suboptimal without access to low-level memory management. So, the final decision of what path to take must always be taken very carefully and take many factors into consideration. A good approach is to start with a pure Python implementation first and consider extensions only when the native approach proves to be not good enough.

The next section will explain how extensions can be used to improve the performance in critical code sections.

Improving performance in critical code sections

Let's be honest, Python is not chosen by developers because of its performance. It does not execute fast but allows you to develop fast. Still, no matter how performant we are as programmers, thanks to this language, we may sometimes find a problem that may not be solved efficiently using pure Python.

In most cases, solving performance problems is really mostly about choosing proper algorithms and data structures and not about limiting the constant factor of language overhead. Usually, it is not a good approach to rely on extensions in order to shave off some CPU cycles if the code is already written poorly or does not use efficient algorithms.

It is often possible that performance can be improved to an acceptable level without the need to increase the complexity of your project by adding yet another language to your technology stack. And if it is possible to use only one programming language, it should be done that way in the first place.

Anyway, it is also very likely that even with a state-of-the-art algorithmic approach and the best-suited data structures, you will not be able to fit some arbitrary technological constraints using Python alone.

The example field that puts some well-defined limits on the application's performance is the Real-Time Bidding (RTB) business. In short, the whole of RTB is about buying and selling advertisement inventory (places for online ads) in a way that is similar to how real auctions or stock exchanges work. The whole trading usually takes place through some ad exchange service that sends the information about available inventory to demand-side platforms (DSPs) interested in buying areas for their advertisements. And this is the place where things get exciting. Most of the ad exchanges use the OpenRTB protocol (which is based on HTTP) for communication with potential bidders. The DSP is the site responsible for serving responses to its OpenRTB HTTP requests. And ad exchanges always put very strict time constraints on how long the whole process can take. It can be as little as 50 ms—from the first TCP packet received to the last byte written by the DSP server. To spice things up, it is not uncommon for DSP platforms to process tens of thousands of requests per second. Being able to shave off a few milliseconds from the response times often determines service profitability. This means that porting even trivial code to C may be reasonable in that situation but only if it's a part of some performance bottleneck and cannot be improved any further algorithmically. As Guido once said:

If you feel the need for speed, (...) – you can't beat a loop written in C.

A completely different use-case for custom extensions is integrating code written in different languages, which is explained in the next section.

Integrating existing code written in different languages

Although computer science is young when compared to other fields of technical studies, we are already standing on the shoulders of giants. Many great programmers have written a lot of useful libraries for solving common problems using many programming languages. It would be a great loss to forget about all that heritage every time a new programming language pops out, but it is also impossible to reliably port any piece of software that was ever written to every new language.

The C and C++ languages seem to be the most important languages that provide a lot of libraries and implementations that you would like to integrate into your application code without the need to port them completely to Python. Fortunately, CPython is already written in C, so the most natural way to integrate such code is precisely through custom extensions.

The next section explains a very similar use-case: integrating third-party dynamic libraries.

Integrating third-party dynamic libraries

Integrating code written using different technologies does not end with C/C++. A lot of libraries, especially third-party software with closed sources, are distributed as compiled binaries. In C, it is really easy to load such shared/dynamic libraries and call their functions. This means that you can use any C library as long as you wrap it as a Python extension using the Python/C API.

This, of course, is not the only solution and there are tools such as ctypes and CFFI that allow you to interact with dynamic libraries directly using pure Python code without the need for writing extensions in C. Very often, the Python/C API may still be a better choice because it provides better separation between the integration layer (written in C) and the rest of your application.

Last but not least, extensions can be used to enhance Python with novel and performant data structures.

Creating efficient custom datatypes

Python provides a very versatile selection of built-in datatypes. Some of them really use state-of-the-art internal implementations (at least in CPython) that are specifically tailored for usage in the Python language. The number of basic types and collections available out of the box may look impressive for newcomers, but it is clear that it does not cover all of a programmer's needs.

You can, of course, create many custom data structures in Python, either by subclassing built-in types or by building them from scratch as completely new classes. Unfortunately, sometimes the performance of such a data structure may be suboptimal. The whole power of complex collections such as dict or set comes from their underlying C implementation. Why not do the same and implement some of your custom data structures in C too?

Since we already know the possible reasons to create custom Python extensions, let's see how to actually build one.

Writing extensions

As already said, writing extensions is not a simple task but, in return for your hard work, it can give you a lot of advantages. The easiest approach to creating extensions is to use tools such as Cython. Cython allows you to write C extensions using language that greatly resembles Python without all the intricacies of the Python/C API. It will increase your productivity and make code easier to develop, read, and maintain.

Anyway, if you are new to this topic, it is good to start your adventure with extensions by writing one using nothing more than bare C language and the Python/C API. This will improve your understanding of how extensions work and will also help you to appreciate the advantages of alternative solutions. For the sake of simplicity, we will take a simple algorithmic problem as an example and try to implement it using the two following different approaches:

Writing a pure C extension
Using Cython

Our problem will be finding the nth number of the Fibonacci sequence. This is a sequence of numbers where each element is the sum of two preceding ones. The sequence starts with 0 and 1. The first 10 numbers of the sequence are as follows:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34

As you see, the sequence is easy to explain and also easy to implement. It is very unlikely that you would need to create a compiled extension solely for solving this problem. But it is very simple so it will serve as a very good example of wiring any C function to the Python/C API. Our goals are only clarity and simplicity, so we won't try to provide the most efficient solution.

Before we create our first extension, let's define a reference implementation that will allow us to compare different solutions. Our reference implementation of the Fibonacci function implemented in pure Python looks as follows:

"""Python module that provides fibonacci sequence function""" 
 
def fibonacci(n): 
    """Return nth Fibonacci sequence number computed recursively."""
    if n == 0:
        return 0
    if n == 1: 
        return 1 
    else: 
        return fibonacci(n - 1) + fibonacci(n - 2)

Note that this is one of the most simple implementations of the fibonnaci() function. A lot of improvements could be applied to it. We don't optimize our implementation (using a memoization pattern, for instance) because this is not the purpose of our example. In the same manner, we won't optimize our code later when discussing implementations in C or Cython, even though the compiled code gives us many more possibilities to do so.

Memoization is a popular technique of saving past results of function calls for later reference to optimize application performance. We explain it in detail in Chapter 13, Code Optimization.

Let's look into pure C extensions in the next section.

Pure C extensions

If you have decided that you need to write C extensions for Python, I assume that you already know the C language at a level that will allow you to fully understand the examples that are presented. This book is about Python, and as such nothing other than the Python/C API details will be explained here. This API, despite being crafted with great care, is definitely not a good introduction to C, so if you don't know C at all, you should avoid attempting to write Python extensions in C until you gain experience in this language. Leave it to others and stick with Cython or Pyrex, which are a lot safer from a beginner's perspective.

As announced earlier, we will try to port the fibonacci() function to C and expose it to the Python code as an extension. Let's start with a base implementation that would be analogous to the previous Python example. The bare function without any Python/C API usage could be roughly as follows:

long long fibonacci(unsigned int n) { 
    if (n == 0) {
        return 0;
    } else if (n == 1) { 
        return 1; 
    } else { 
        return fibonacci(n - 2) + fibonacci(n - 1); 
    } 
}

And here is the example of a complete, fully functional extension that exposes this single function in a compiled module:

#define PY_SSIZE_T_CLEAN
#include <Python.h> 
 
long long fibonacci(unsigned int n) { 
    if (n == 0) {
        return 0;
    } else if (n == 1) { 
        return 1; 
    } else { 
        return fibonacci(n - 2) + fibonacci(n - 1); 
    } 
}  
 
static PyObject* fibonacci_py(PyObject* self, PyObject* args) { 
    PyObject *result = NULL; 
    long n; 
 
    if (PyArg_ParseTuple(args, "l", &n)) { 
        result = Py_BuildValue("L", fibonacci((unsigned int)n)); 
    } 
 
    return result; 
}
 
static char fibonacci_docs[] = 
    "fibonacci(n): Return nth Fibonacci sequence number " 
    "computed recursively\n"; 
 
 
static PyMethodDef fibonacci_module_methods[] = { 
    {"fibonacci", (PyCFunction)fibonacci_py, 
     METH_VARARGS, fibonacci_docs}, 
    {NULL, NULL, 0, NULL} 
}; 
 
 
static struct PyModuleDef fibonacci_module_definition = { 
    PyModuleDef_HEAD_INIT, 
    "fibonacci", 
    "Extension module that provides fibonacci sequence function", 
    -1, 
    fibonacci_module_methods 
}; 
 
 
PyMODINIT_FUNC PyInit_fibonacci(void) { 
    Py_Initialize(); 
 
    return PyModule_Create(&fibonacci_module_definition); 
}

I know what you think. The preceding example might be a bit overwhelming at first glance. We had to add four times more code just to make the fibonacci() C function accessible from Python. We will discuss every bit of that code step by step later, so don't worry. But before we do that, let's see how it can be packaged and executed in Python.

The following minimal setuptools configuration for our module needs to use the setuptools.Extension class in order to instruct the interpreter how our extension is compiled:

from setuptools import setup, Extension 
 
setup( 
    name='fibonacci', 
    ext_modules=[ 
        Extension('fibonacci', ['fibonacci.c']), 
    ] 
)

The build process for extensions can be initialized with the setup.py build command, but it will also be automatically performed upon package installation. The following transcript presents the result of the installation in editable mode (using pip with the -e flag):

$ python3 -m pip install -e .
Obtaining file:///Users/.../Expert-Python-Programming-Fourth-Edition/Chapter%209/02%20-%20Pure%20C%20extensions
Installing collected packages: fibonacci
  Running setup.py develop for fibonacci
Successfully installed fibonacci

Using the editable mode of pip allows us to take a peek at files created during the build step. The following is an example of files that could be created in your working directory during the installation:

$ ls -1ap
./
../
build/
fibonacci.c
fibonacci.cpython-39-darwin.so
fibonacci.egg-info/
setup.py

The fibonacci.c and setup.py files are our source files. fibonacci.egg-info/ is a special directory that stores package metadata, and we should not be concerned about it at the moment. What is really important is the fibonacci.cpython-39-darwin.so file. This is our binary shared library that is compatible with the CPython interpreter. That's the library that the Python interpreter will load when we attempt to import our fibonacci module. Let's try to import it and review it in an interactive session:

$ python3
Python 3.9.1 (v3.9.1:1e5d33e9b9, Dec  7 2020, 12:10:52)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import fibonacci
>>> help(fibonacci)
Help on module fibonacci:
NAME
    fibonacci - Extension module that provides fibonacci sequence function
FUNCTIONS
    fibonacci(...)
        fibonacci(n): Return nth Fibonacci sequence number computed recursively
FILE
    /(...)/fibonacci.cpython-39-darwin.so
>>> [fibonacci.fibonacci(n) for n in range(10)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

Now let's take a closer look at the anatomy of our extension.

A closer look at the Python/C API

Since we know how to properly package, compile, and install a custom C extension and we are sure that it works as expected, now is the right time to discuss our code in detail.

The extensions module starts with the following single C preprocessor directive, which includes the Python.h header file:

#include <Python.h>

This pulls the whole Python/C API and is everything you need to include to be able to write your extensions. In more realistic cases, your code will require a lot more preprocessor directives to benefit from the C standard library functions or to integrate other source files. Our example was simple, so no more directives were required.

Next, we have the core of our module as follows:

long long fibonacci(unsigned int n) {
    if (n == 0) {
        return 0;
    } else if (n == 1) {
        return 1;
    } else {
        return fibonacci(n - 2) + fibonacci(n - 1);
    }
}

The preceding fibonacci() function is the only part of our code that does something useful. It is a pure C implementation that Python by default can't understand. The rest of our example will create the interface layer that will expose it through the Python/C API.

The first step of exposing this code to Python is the creation of the C function that is compatible with the CPython interpreter. In Python, everything is an object. This means that C functions called in Python also need to return real Python objects. Python/C APIs provide a PyObject type and every callable must return the pointer to it. The signature of our function is as follows:

static PyObject* fibonacci_py(PyObject* self, PyObject* args)

Note that the preceding signature does not specify the exact list of arguments, but PyObject* args will hold the pointer to the structure that contains the tuple of the provided values.

The actual validation of the argument list must be performed inside the function body and this is exactly what fibonacci_py() does. It parses the args argument list assuming it is the single unsigned int type and uses that value as an argument to the fibonacci() function to retrieve the Fibonacci sequence element as shown in the following code:

static PyObject* fibonacci_py(PyObject* self, PyObject* args) { 
    PyObject *result = NULL; 
    long n; 
 
    if (PyArg_ParseTuple(args, "l", &n)) { 
        result = Py_BuildValue("L", fibonacci((unsigned int)n)); 
    } 
 
    return result; 
}

The preceding example function has a serious bug, which the eyes of an experienced developer should spot very easily. Try to find it as an exercise in working with C extensions. For now, we'll leave it as it is for the sake of brevity. We will try to fix it later when discussing the details of dealing with errors and exceptions in the Exception handling section.

The "l" (lowercase L) string in the PyArg_ParseTuple(args, "l", &n) call means that we expect args to contain only a single long value. In the case of failure, it will return NULL and store information about the exception in the per-thread interpreter state.

The actual signature of the parsing function is int PyArg_ParseTuple(PyObject *args, const char *format, ...) and what goes after the format string is a variable-length list of arguments that represents parsed value output (as pointers). This is analogous to how the scanf() function from the C standard library works. If our assumption fails and the user provides an incompatible arguments list, then PyArg_ParseTuple() will raise the proper exception. This is a very convenient way to encode function signatures once you get used to it but has a huge downside when compared to plain Python code. Such Python call signatures implicitly defined by the PyArg_ParseTuple() calls cannot be easily inspected inside the Python interpreter. You need to remember this fact when using the code provided as extensions.

As already said, Python expects objects to be returned from callables. This means that we cannot return a raw long value obtained from the fibonacci() function as a result of fibonacci_py(). Such an attempt would not even compile and there is no automatic casting of basic C types to Python objects.

The Py_BuildValue(*format, ...) function must be used instead. It is the counterpart of PyArg_ParseTuple() and accepts a similar set of format strings. The main difference is that the list of arguments is not a function output but an input, so actual values must be provided instead of pointers.

After fibonacci_py() is defined, most of the heavy work is done. The last step is to perform module initialization and add metadata to our function that will make usage a bit simpler for the users. This is the boilerplate part of our extension code. For simple examples, such as this one, it can take up more space than the actual functions that we want to expose. In most cases, it simply consists of some static structures and one initialization function that will be executed by the interpreter on module import.

At first, we create a static string that will be the content of the Python docstring for the fibonacci_py() function as follows:

static char fibonacci_docs[] = 
    "fibonacci(n): Return nth Fibonacci sequence number " 
    "computed recursively\n";

Note that this could be inlined somewhere later in fibonacci_module_methods, but it is a good practice to have docstrings separated and stored in close proximity to the actual function definition that they refer to.

The next part of our definition is the array of the PyMethodDef structures that define methods (functions) that will be available in our module. The PyMethodDef structure contains exactly four fields:

char* ml_name: This is the name of the method.
PyCFunction ml_meth: This is the pointer to the C implementation of the function.
int ml_flags: This includes the flags indicating either the calling convention or binding convention. The latter is applicable only for the definition of class methods.
char* ml_doc: This is the pointer to the content of the method/function docstring.

Such an array must always end with a sentinel value of {NULL, NULL, 0, NULL}. This sentinel value simply indicates the end of the structure. In our simple case, we created the static PyMethodDef fibonacci_module_methods[] array that contains only two elements (including sentinel value):

static PyMethodDef fibonacci_module_methods[] = { 
    {"fibonacci", (PyCFunction)fibonacci_py, 
     METH_VARARGS, fibonacci_docs}, 
    {NULL, NULL, 0, NULL} 
};

And this is how the first entry maps to the PyMethodDef structure:

ml_name = "fibonacci": Here, the fibinacci_py() C function will be exposed as a Python function under the fibonacci name.
ml_meth = (PyCFunction)fibonacci_py: Here, the casting to PyCFunction is simply required by the Python/C API and is dictated by the call convention defined later in ml_flags.
ml_flags = METH_VARARGS: Here, the METH_VARARGS flag indicates that the calling convention of our function accepts a variable list of arguments and no keyword arguments.
ml_doc = fibonacci_docs: Here, the Python function will be documented with the content of the fibonacci_docs string.

When an array of function definitions is complete, we can create another structure that contains the definition of the whole module. It is described using the PyModuleDef type and contains multiple fields. Some of them are useful only for more complex scenarios, where fine-grained control over the module initialization process is required. Here, we are interested only in the first five of them:

PyModuleDef_Base m_base: This should always be initialized with PyModuleDef_HEAD_INIT.
char* m_name: This is the name of the newly created module. In our case, it is fibonacci.
char* m_doc: This is the pointer to the docstring content for the module. We usually have only a single module defined in one C source file, so it is OK to inline our documentation string in the whole structure.
Py_ssize_t m_size: This is the size of the memory allocated to keep the module state. This is used only when support for multiple subinterpreters or multiphase initialization is required. In most cases, you don't need that and it gets the value -1.
PyMethodDef* m_methods: This is a pointer to the array containing module-level functions described by the PyMethodDef values. It could be NULL if the module does not expose any functions. In our case, it is fibonacci_module_methods.

The other fields are explained in detail in the official Python documentation (refer to https://docs.python.org/3/c-api/module.html) but are not needed in our example extension. They should be set to NULL if not required and they will be initialized with that value implicitly when not specified. This is why our module description contained in the fibonacci_module_definition variable can take the following simple form:

static struct PyModuleDef fibonacci_module_definition = { 
    PyModuleDef_HEAD_INIT, 
    "fibonacci", 
    "Extension module that provides fibonacci sequence function", 
    -1, 
    fibonacci_module_methods 
};

The last piece of code that crowns our work is the module initialization function. This must follow a very specific naming convention, so the Python interpreter can easily find it when the dynamic/shared library is loaded. It should be named PyInit_<name>, where <name> is the name of your module. So it is exactly the same string that was used as the m_base field in the PyModuleDef definition and as the first argument of the setuptools.Extension() call. If you don't require a complex initialization process for the module, it takes a very simple form, exactly like in our example:

PyMODINIT_FUNC PyInit_fibonacci(void) { 
    return PyModule_Create(&fibonacci_module_definition); 
}

The PyMODINIT_FUNC macro is a preprocessor macro that will declare the return type of this initialization function as PyObject* and add any special linkage declarations if required by the platform.

One very important difference between Python and C functions is the calling and binding conventions. This is quite a verbose topic, so let's discuss that in a separate section.

Calling and binding conventions

Python is an object-oriented language with flexible calling conventions using both positional and keyword arguments. Consider the following print() function call:

print("hello", "world", sep=" ", end="!\n")

The first two expressions provided to the call (the "hello" and "world" expressions) are positional and will be matched with the positional argument of the print() function. Order is important and if we modify it, the function call will give a different result. On the other hand, the following " " and "!\n" expressions will be matched with keyword arguments. Their order is irrelevant as long as the names don't change.

C is a procedural language with only positional arguments. When writing Python extensions, there is a need to support Python's argument flexibility and object-oriented data model. That is done mostly through the explicit declaration of supported calling and binding conventions.

As explained in the A closer look at the Python/C API section, the ml_flags bit field of the PyMethodDef structure contains flags for calling and binding conventions. Calling convention flags are as follows:

METH_VARARGS: This is a typical convention for the Python function or method that accepts only arguments as its parameters. The type provided as the ml_meth field for such a function should be PyCFunction. The function will be provided with two arguments of the PyObject* type. The first is either the self object (for methods) or the module object (for module functions). A typical signature for the C function with that calling convention is PyObject* function(PyObject* self, PyObject* args).
METH_KEYWORDS: This is the convention for the Python function that accepts keyword arguments when called. Its associated C type is PyCFunctionWithKeywords. The C function must accept three arguments of the PyObject* type — self, args, and a dictionary of keyword arguments. If combined with METH_VARARGS, the first two arguments have the same meaning as for the previous calling convention, otherwise, args will be NULL. The typical C function signature is PyObject* function(PyObject* self, PyObject* args, PyObject* keywds).
METH_NOARGS: This is the convention for Python functions that do not accept any other argument. The C function should be of the PyCFunction type, so the signature is the same as that of the METH_VARARGS convention (with self and args arguments). The only difference is that args will always be NULL, so there is no need to call PyArg_ParseTuple(). This cannot be combined with any other calling convention flag.
METH_O: This is the shorthand for functions and methods accepting single object arguments. The type of the C function is again PyCFunction, so it accepts two PyObject* arguments: self and args. Its difference from METH_VARARGS is that there is no need to call PyArg_ParseTuple() because PyObject* provided as args will already represent the single argument provided in the Python call to that function. This also cannot be combined with any other calling convention flag.

A function that accepts keywords is described either with METH_KEYWORDS or bitwise combinations of calling convention flags in the form of METH_VARARGS | METH_KEYWORDS. If so, it should parse its arguments with PyArg_ParseTupleAndKeywords() instead of PyArg_ParseTuple() or PyArg_UnpackTuple().

Here is an example module with a single function that returns None and accepts two named arguments that are printed on standard output:

#define PY_SSIZE_T_CLEAN
#include <Python.h> 
 
static PyObject* print_args(PyObject *self, PyObject *args, 
 PyObject *keywds) 
{ 
    char *first; 
    char *second; 
 
    static char *kwlist[] = {"first", "second", NULL}; 
 
    if (!PyArg_ParseTupleAndKeywords(args, keywds, "ss", kwlist, 
                                     &first, &second)) 
        return NULL; 
 
    printf("%s %s\n", first, second); 
 
    Py_INCREF(Py_None); 
    return Py_None; 
} 
 
 
static PyMethodDef module_methods[] = { 
    {"print_args", (PyCFunction)print_args, 
     METH_VARARGS | METH_KEYWORDS, 
     "print provided arguments"}, 
    {NULL, NULL, 0, NULL} 
}; 
 
 
static struct PyModuleDef module_definition = { 
    PyModuleDef_HEAD_INIT, 
    "kwargs", 
    "Keyword argument processing example", 
    -1, 
    module_methods 
}; 
 
 
PyMODINIT_FUNC PyInit_kwargs(void) { 
    return PyModule_Create(&module_definition); 
}

Argument parsing in the Python/C API is very elastic and is extensively described in the official documentation at https://docs.python.org/3/c-api/arg.html.

The format argument in PyArg_ParseTuple() and PyArg_ParseTupleAndKeywords() allows fine-grained control over the argument number and types. Every advanced calling convention known from Python can be coded in C with this API, including the following:

Functions with default values for arguments
Functions with arguments specified as keyword-only
Functions with arguments specified as positional-only
Functions with a variable number of arguments
Functions without arguments

The additional binding convention flags METH_CLASS, METH_STATIC, and METH_COEXIST are reserved for methods and cannot be used to describe module functions. The first two are quite self-explanatory. They are C counterparts of the @classmethod and @staticmethod decorators and change the meaning of the self argument passed to the C function.

METH_COEXIST allows loading a method in place of the existing definition. It is useful very rarely. This is mostly in the case when you would like to provide an implementation of the C method that would be generated automatically from the other features of the type that was defined. The Python documentation gives the example of the __contains__() wrapper method that would be generated if the type has the sq_contains slot defined. Unfortunately, defining your own classes and types using the Python/C API is beyond the scope of this introductory chapter.

Let's take a look at exception handling in the next section.

Exception handling

C, unlike Python or even C++, does not have syntax for raising and catching exceptions. All error handling is usually handled with function return values and optional global state for storing details that can explain the cause of the last failure.

Exception handling in the Python/C API is built around that simple principle. There is a global per-thread indicator of the last error that occurred. It is set to describe the cause of a problem. There is also a standardized way to inform the caller of a function if this state was changed during the call, for example:

If the function is supposed to return a pointer, it returns NULL
If the function is supposed to return a value of type int, it returns -1

The only exceptions from the preceding rules in the Python/C API are the PyArg_*() functions that return 1 to indicate success and 0 to indicate failure.

To see how this works in practice, let's recall our fibonacci_py() function from the example in the previous sections:

static PyObject* fibonacci_py(PyObject* self, PyObject* args) { 
    PyObject *result = NULL; 
    long n; 
 
    if (PyArg_ParseTuple(args, "l", &n)) {
        result = Py_BuildValue("L", fibonacci((unsigned int) n)); 
    } 
 
    return result; 
}

Error handling starts at the very beginning of our function with the initialization of the result variable. This variable is supposed to store the return value of our function. It is initialized with NULL, which, as we already know, is an indicator of error. And this is how you will usually code your extensions—assuming that error is the default state of your code.

Later we have the PyArg_ParseTuple() call that will set error information in the case of an exception and return 0. This is part of the if statement, so in the case of an exception, we don't do anything more and the function will return NULL. Whoever calls our function will be notified about the error.

Py_BuildValue() can also raise an exception. It is supposed to return PyObject* (pointer), so in the case of failure, it gives NULL. We can simply store this as our result variable and pass it on as a return value.

But our job does not end with caring for exceptions raised by Python/C API calls. It is very probable that you will need to inform the extension user about what kind of error or failure occurred. The Python/C API has multiple functions that help you to raise an exception but the most common one is PyErr_SetString(). It sets an error indicator with the given exception type and with the additional string provided as the explanation of the error cause. The full signature of this function is as follows:

void PyErr_SetString(PyObject* type, const char* message)

You could have already noticed a problematic issue in the fibonacci_py() function from the section A closer look at the Python/C API. If not, now is the right time to uncover it and fix it. Fortunately, we finally have the proper tools to do that.

The problem lies in the insecure casting of the long type to unsigned int in the following lines:

if (PyArg_ParseTuple(args, "l", &n)) { 
    result = Py_BuildValue("L", fibonacci((unsigned int) n)); 
}

Thanks to the PyArg_ParseTuple() call, the first and only argument will be interpreted as a long type (the "l" specifier) and stored in the local n variable. Then it is cast to unsigned int so the issue will occur if the user calls the fibonacci() function from Python with a negative value. For instance, -1 as a signed 32-bit integer will be interpreted as 4294967295 when casting to an unsigned 32-bit integer. Such a value will cause a very deep recursion and will result in a stack overflow and segmentation fault. Note that the same may happen if the user gives an arbitrarily large positive argument. We cannot fix this without a complete redesign of the C fibonacci() function, but we can at least try to ensure that the function input argument meets some preconditions. Here, we check whether the value of the n argument is greater than or equal to 0 and we raise a ValueError exception if that's not true, as follows:

static PyObject* fibonacci_py(PyObject* self, PyObject* args) { 
    PyObject *result = NULL; 
    long n; 
    long long fib; 
 
    if (PyArg_ParseTuple(args, "l", &n)) { 
        if (n<0) { 
            PyErr_SetString(PyExc_ValueError, 
                            "n must not be less than 0"); 
        } else { 
            result = Py_BuildValue("L", fibonacci((unsigned int) n)); 
        } 
    } 
 
    return result; 
}

The last note about exception handling is that the global error state does not clear by itself. Some of the errors can be handled gracefully in your C functions (the same as using the try ... except clause in Python) and you need to be able to clear the error indicator if it is no longer valid. The function for that is PyErr_Clear().

One of the great advantages of C extensions is the ability to bypass the GIL, which can be detrimental to threaded concurrency in Python applications. In the next section, we will discuss the possibility of releasing GIL in C extensions.

Releasing GIL

We have already mentioned that extensions can be a way to bypass Python's GIL. It is a famous limitation of the CPython implementation that only one thread at a time can execute the Python code. Multiprocessing is the suggested approach to circumvent this problem (see Chapter 6, Concurrency) but it may not be the best solution for some highly parallelizable algorithms, due to the resource overhead of running additional processes.

Because extensions are mostly used in cases where a bigger part of the work is performed in pure C without any calls to the Python/C API, it is possible (or even advisable) to release GIL in some application sections while doing non-Python data processing. Thanks to this, you can still benefit from having multiple CPU cores and multithreaded application designs. The only thing you need to do is to wrap blocks of code that are known to not use any of the Python/C API calls or Python structures with specific macros provided by the Python/C API. These two following preprocessor macros are provided to simplify the whole procedure of releasing and reacquiring the GIL:

Py_BEGIN_ALLOW_THREADS: This declares the hidden local variable where the current thread state is saved and it releases GIL.
Py_END_ALLOW_THREADS: This reacquires GIL and restores the thread state from the local variable declared with the previous macro.

When we look carefully at our fibonacci extension example, we can clearly see that the fibonacci() function does not execute any Python code and does not touch any of the Python structures. This means that the fibonacci_py() function that simply wraps the fibonacci(n) execution could be updated to release GIL around that call as follows:

static PyObject* fibonacci_py(PyObject* self, PyObject* args) { 
    PyObject *result = NULL; 
    long n; 
    long long fib; 
 
    if (PyArg_ParseTuple(args, "l", &n)) { 
        if (n<0) { 
            PyErr_SetString(PyExc_ValueError, 
                            "n must not be less than 0"); 
        } else { 
            Py_BEGIN_ALLOW_THREADS; 
            fib = fibonacci(n); 
            Py_END_ALLOW_THREADS; 
 
            result = Py_BuildValue("L", fib); 
        }
     } 
 
    return result; 
}

Another important topic regarding the Python/C API is memory management and garbage collection. The most common garbage collection mechanism among dynamic programming languages is tracing garbage collection, which works by tracing whether objects can be reached from a program's root reference. If objects become unreachable, they can be released from program memory to reclaim memory space.

Python has a minimal tracing garbage collector for finding reference cycles but in fact uses reference counting as a main memory management mechanism. That's not a problem in plain Python code but adds some substantial work when writing C extensions. Let's dive deeper into this topic in the next section.

Tracing garbage collection is such a common garbage collection strategy that it is often treated as a synonym to garbage collection. That's why some people argue that Python isn't garbage collected (because it uses reference counting as the main memory management technique) and others argue that it is (because it uses tracing for finding reference cycles and reference counting can be understood as an alternative garbage collection strategy).

Reference counting

Finally, we come to the important topic of memory management in Python. Python has its own garbage collector, but it is designed only to solve the issue of cyclic references in the reference counting algorithm. Reference counting is the primary method of managing the deallocation of objects that are no longer needed.

The Python/C API documentation introduces ownership of references to explain how it deals with the deallocation of objects. Objects in Python are never owned by extension code and thus cannot be created or released by extensions themselves. The actual creation of objects is managed by Python's memory manager. That's why we say that objects in Python are owned by the memory manager.

The memory manager is the internal component of the CPython interpreter that is the only one responsible for allocating and deallocating memory for objects that are stored in a private heap. What can be owned instead is a reference to the object.

Every object in Python that is represented by a reference (PyObject* pointer) has an associated reference count. When it goes to zero, it means that no one holds any valid references to that object and the deallocator associated with its type can be invoked. The Python/C API provides a few macros for increasing and decreasing reference counts:

Py_INCREF() and Py_DECREF(): The first one increases the reference count and the second one decreases it. These macros accept object pointers that must not be NULL.
Py_XINCREF() and Py_XDECREF(): The first one increases the reference count and the second one decreases it. These macros accept NULL values so you should use them whenever you are not sure if you are dealing with NULL pointers.

But before we discuss their details, we need to understand the following terms related to reference ownership:

Passing of ownership: Whenever we say that the function passes the ownership over a reference, it means that it has already increased the reference count and it is the responsibility of the caller to decrease the count when the reference to the object is no longer needed. Most of the functions that return the newly created objects, such as Py_BuildValue, are doing that. If that object is going to be returned from our function to another caller, then the ownership is passed again. We do not decrease the reference count in that case because it is no longer our responsibility. This is why the fibonacci_py() function does not call Py_DECREF() on the result variable.
Borrowed references: The borrowing of references happens when the function receives a reference to some Python object as an argument. The reference count for such a reference should never be decreased in that function unless it was explicitly increased in its scope. In our fibonacci_py() function, the self and args arguments are such borrowed references and thus we do not call PyDECREF() on them. Some of the Python/C API functions may also return borrowed references. The notable examples are PyTuple_GetItem() and PyList_GetItem(). It is often said that such references are unprotected. There is no need to dispose of their ownership unless they will be returned as a function's return value. In most cases, extra care should be taken if we use such borrowed references as arguments of other Python/C API calls. It may be necessary in some circumstances to additionally protect such references with a separate Py_INCREF() call before using it as an argument to other functions and then calling Py_DECREF() when it is no longer needed. We'll see an example of such a situation at the end of the section.
Stolen references: It is also possible for the Python/C API function to steal the reference instead of borrowing it when provided as a call argument. This is the case of exactly two functions—PyTuple_SetItem() and PyList_SetItem(). They fully take over the responsibility of the reference passed to them. They do not increase the reference count by themselves but will call Py_DECREF() when the reference is no longer needed.

Keeping an eye on the reference counts is one of the hardest things when writing complex extensions. Some of the non-obvious issues may not be noticed until the code is run in a multithreaded setup.

The other common problem is caused by the very nature of Python's object model and the fact that some functions return borrowed references. When the reference count goes to zero, the deallocation function is executed. For user-defined classes, it is possible to define a __del__() method that will be called at that moment.

This can be any Python code and it is possible that it will affect other objects and their reference counts. The official Python documentation gives the following example of code that may be affected by this problem:

void bug(PyObject *list) { 
    PyObject *item = PyList_GetItem(list, 0); 
 
    PyList_SetItem(list, 1, PyLong_FromLong(0L)); 
    PyObject_Print(item, stdout, 0); /* BUG! */ 
}

It looks completely harmless, but the problem is in fact that we cannot know what elements the list object contains. When PyList_SetItem() sets a new value on the list[1] index, the ownership of the object that was previously stored at that index is disposed of. If it was the only existing reference, the reference count will become 0 and the object may be deallocated. It is possible that it was some user-defined class with a custom implementation of the __del__() method. A serious issue will occur if, in the result of such a __del__() execution, item[0] is removed from the list.

Note that PyList_GetItem() returns a borrowed reference! It does not call Py_INCREF() before returning a reference. So in that code, it is possible that PyObject_Print() will be called with a reference to an object that no longer exists. This will cause a segmentation fault and crash the Python interpreter.

The proper approach is to protect borrowed references for the whole time that we need them because there is a possibility that any call in between may cause the deallocation of that object. This can happen even if they are seemingly unrelated, as shown in the following code:

void no_bug(PyObject *list) { 
    PyObject *item = PyList_GetItem(list, 0); 
 
    Py_INCREF(item); 
    PyList_SetItem(list, 1, PyLong_FromLong(0L)); 
    PyObject_Print(item, stdout, 0); 
    Py_DECREF(item); 
}

As you can see, writing Python extensions in C using the Python/C API can be a challenge. Especially if you are not experienced with C. It requires a lot of knowledge about CPython internals and precise memory management. But fortunately, there's an easier path to custom extensions. It is Cython, which is a special dialect of Python. We will discuss it in the next section.

Writing extensions with Cython

Cython is both an optimizing static compiler and the name of a programming language that is a superset of Python. It can be used to speed up Python applications by compiling them to machine code but can also be used as a "wrapping language" for code written in C or C++.

As a compiler, it performs the source-to-source compilation of native Python code and Cython dialect to Python C extensions using the Python/C API. It allows you to combine the power of Python and C without the need to manually deal with the Python/C API.

As a superset of Python, it offers the ability to use static typing, static linking of C libraries (as opposed to dynamic linking of shared libraries), the ability to interact with C header files, and direct control over CPython's GIL.

Let's first discuss Cython as a source-to-source compiler.

Cython as a source-to-source compiler

For extensions created using Cython, the major advantage you will get is using the superset language that it provides. Anyway, it is possible to create extensions from plain Python code using source-to-source compilation. This is the simplest approach to Cython because it requires almost no changes to the code and can give some significant performance improvements with very little effort.

To begin with, in order to build Cython extensions you will need the Cython package. It can be installed from PyPI using pip:

$ python3 -m pip install Cython

Cython provides a simple cythonize utility function that allows you to easily integrate the compilation process with the setuptools package. Let's assume that we would like to compile a pure Python implementation of our fibonacci() function to a C extension. If it is located in the fibonacci.py module, the minimal setup.py script could be as follows:

from setuptools import setup 
from Cython.Build import cythonize 
 
setup( 
    name='fibonacci', 
    ext_modules=cythonize(['fibonacci.py']) 
)

You can install such a module with pip the same way as you would do with a plain C extension:

$ python3 -m pip install -e .
Installing collected packages: fibonacci
  Running setup.py develop for fibonacci
Successfully installed fibonacci

The above command installs the package in editable mode so we can take a look at all files generated in the process. If you execute it in your own shell, you can see it creates some additional build artifacts:

$ ls -1ap
./
../
build/
fibonacci.c
fibonacci.cpython-39-darwin.so
fibonacci.egg-info/
fibonacci.py
setup.py

fibonacci.c in the preceding output is autogenerated C extension code. Cython translates the plain Python code into raw C code. During installation, this C code will be used to build the extension module library. In our case, it is the fibonacci.cpython-39-darwin.so file.

You can take a look at the fibonacci.c file to see how much work Cython does behind the curtain. It is actually pretty long. For our simple fibonacci.py module, it can even be over 4000 lines long.

Cython, when used as a source compilation tool for the Python language, has another benefit. Source-to-source compilation to an extension can be a fully optional part of the source distribution installation process. If the environment where the package needs to be installed does not have Cython or any other building prerequisites, it can be installed as a normal pure Python package. The user should not notice any functional difference in the behavior of code distributed that way. A common approach for distributing extensions built with Cython is to include both Python/Cython sources and C code that would be generated from these source files.

This way, the package can be installed in the following three different ways, depending on the existence of building prerequisites:

If the installation environment has Cython available, the extension C code is generated from the Python/Cython sources that are provided.
If Cython is not available but there are available building prerequisites (C compiler, Python/C API headers), the extension is built from distributed pregenerated C files.
If neither of the preceding is available but the extension is created from pure Python sources, the modules are installed like ordinary Python code, and the compilation step is skipped.

Note that the Cython documentation says that including generated C files as well as Cython sources is the recommended way of distributing Cython extensions. The same documentation says that Cython compilation should be disabled by default because the user may not have the required version of Cython in their environment, and this may result in unexpected compilation issues.

You can read more about official guidelines on distribution Cython code at https://cython.readthedocs.io/src/userguide/source_files_and_compilation.html.

Anyway, with the advent of environment isolation, this seems to be a less worrying problem today. Also, Cython is a valid Python package that is available on PyPI, so it can easily be defined as your project requirement in a specific version. Including such a prerequisite is, of course, a decision with serious implications and should be considered very carefully. The safer solution is to leverage the power of the extras_require feature in the setuptools package and allow the user to decide whether they want to use Cython with a specific environment variable, for example:

import os 
 
from setuptools import setup, Extension
try: 
    # cython source to source compilation
    # available only when Cython is available
    # and specific environment variable says 
    # explicitly that Cython should be used 
    # to generate C sources 
    USE_CYTHON = bool(os.environ.get("USE_CYTHON")) 
    import Cython 
    
 
except ImportError: 
    USE_CYTHON = False 
 
ext = '.pyx' if USE_CYTHON else '.c' 
 
extensions = [Extension("fibonacci", ["fibonacci"+ext])] 
 
if USE_CYTHON: 
    from Cython.Build import cythonize 
    extensions = cythonize(extensions) 
 
setup( 
    name='fibonacci', 
    ext_modules=extensions, 
    extras_require={ 
        # Cython will be set in that specific version 
        # as a requirement if package will be installed 
        # with '[with-cython]' extra feature 
        'with-cython': ['cython==0.29.22'] 
    } 
)

The pip installation tool supports the installation of packages with the extras option by adding the [extra-name] suffix to the package name. For the preceding example, the optional Cython requirement and compilation during the installation from local sources can be enabled using the following command:

$ USE_CYTHON=1 pip install .[with-cython]

The USE_CYTHON environment variable guarantees that pip will use Cython to compile .pyx sources to C and [with-cython] guarantees that the Cython compiler will be actually downloaded before installation.

Although you can use Cython to compile plain Python code, you will get the most benefit from using the Cython dialect. It has a few additional features that are not available in plain Python. We will take a closer look at Cython as a separate language in the next section.

Cython as a language

Cython is not only a compiler but also a superset of the Python language. Superset means that any valid Python code is allowed but it can be further enhanced with additional features, such as support for calling C functions or declaring C types on variables and class attributes. So, any code written in Python is also written in Cython but the reverse is not always true. This explains why ordinary Python modules can be so easily compiled to C using the Cython compiler.

But we won't stop at that simple fact. Instead of just saying that our reference fibonacci() function is also Cython code, we will try to improve it a bit. This won't be any real optimization because we still want to implement our Fibonacci sequence recursively. But we will do some minor updates that will allow it to benefit more from being written in Cython.

Cython sources use a different file extension. It is .pyx instead of .py. The content of the fibonacci.pyx file might look like this:

"""Cython module that provides fibonacci sequence function."""
def fibonacci(unsigned int n):
    """Return nth Fibonacci sequence number computed recursively."""
    if n == 0:
        return 0
    if n == 1:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)

As you can see, the only thing that has really changed is the signature of the fibonacci() function. Thanks to optional static typing in Cython, we can declare the n argument as unsigned int and this should slightly improve the way our function works. Additionally, it does a lot more than we did previously when writing extensions by hand. If the argument of the Cython function is declared with a static type, then the extension will automatically handle conversion and overflow errors by raising proper exceptions. The following is an example of an interactive session showing how our fibonacci() function written in Cython deals with conversion and overflow errors:

>>> from fibonacci import fibonacci
>>> fibonacci(5)
5
>>> fibonacci(0)
0
>>> fibonacci(-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "fibonacci.pyx", line 4, in fibonacci.fibonacci
    def fibonacci(unsigned int n):
OverflowError: can't convert negative value to unsigned int
>>> fibonacci(10 ** 10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "fibonacci.pyx", line 4, in fibonacci.fibonacci
    def fibonacci(unsigned int n):
OverflowError: value too large to convert to unsigned int

We already know that Cython compiles only source to source and the generated code uses the same Python/C API that we would use when writing C code for extensions by hand. Note that fibonacci() is a recursive function, so it calls itself very often. This will mean that although we declared a static type for the input argument, during the recursive call it will treat itself like any other Python function. So n-1 and n-2 will be packed back into the Python object and then passed to the hidden wrapper layer of the internal fibonacci() implementation that will again bring it back to the unsigned int type. This will happen again and again until we reach the final depth of recursion. This is not necessarily a problem but involves a lot more argument processing than is really required.

We can cut off the overhead of Python function calls and argument processing by delegating more of the work to the pure C function that does not know anything about Python structures. We did this previously when creating C extensions with pure C and we can do that in Cython too. We can use the cdef keyword to declare C-style functions that accept and return only C types as follows:

cdef long long fibonacci_cc(unsigned int n):
    if n == 0:
        return 0
    if n == 1:
        return 1
    else:
        return fibonacci_cc(n - 1) + fibonacci_cc(n - 2)
def fibonacci(unsigned int n):
    """ Return nth Fibonacci sequence number computed recursively
    """
    return fibonacci_cc(n)

The fibonacci_cc() function will not be available to import in the final compiled fibonacci module. The fibonacci() function forms a façade to the low-level fibonacci_cc() implementation.

We can go even further. With a plain C example, we finally showed how to release GIL during the call of our pure C function, so that the extension was a bit nicer for multithreaded applications. In previous examples, we have used Py_BEGIN_ALLOW_THREADS and Py_BEGIN_ALLOW_THREADS preprocessor macros from Python/C API headers to mark a section of code as free from Python calls. The Cython syntax is a lot shorter and easier to remember. GIL can be released around the section of code using a simple with nogil statement like the following:

def fibonacci(unsigned int n): 
    """ Return nth Fibonacci sequence number computed recursively 
    """ 
    with nogil: 
        return fibonacci_cc(n)

You can also mark the whole C-style function as safe to call without GIL as follows:

cdef long long fibonacci_cc(unsigned int n) nogil: 
    if n < 2: 
        return n 
    else: 
        return fibonacci_cc(n - 1) + fibonacci_cc(n - 2)

It is important to know that such functions cannot have Python objects as arguments or return types. Whenever a function marked as nogil needs to perform any Python/C API call, it must acquire GIL using the with gil statement.

We already know two ways of creating Python extensions: using plain C code with the Python/C API and using Cython. The first one gives you the most power and flexibility at the cost of quite complex and verbose code and the second one makes writing extensions easier but does a lot of magic behind your back. We also learned some potential advantages of extensions so it's time to take a closer look at some potential downsides.

Downsides of using extensions

To be honest, I started my adventure with Python only because I was tired of all the difficulty of writing software in C and C++. In fact, it is very common that programmers start to learn Python when they realize that other languages do not deliver what their users need.

Programming in Python, when compared to C, C++, or Java, is a breeze. Everything seems to be simple and well designed. You might think that there are no places where you can trip over and there are no other programming languages required anymore.

And of course, nothing could be more wrong. Yes, Python is an amazing language with a lot of cool features, and it is used in many fields. But it doesn't mean that it is perfect and doesn't have any downsides. It is easy to understand and write, but this easiness comes with a price. It is not as slow as many people think but will never be as fast as C. It is highly portable, but its interpreter is not available on as many architectures as compilers of some other languages are. We could go on with that list for a while.

One of the solutions to fix that problem is to write extensions. That gives us some ability to bring some of the advantages of good old C back to Python. And in most cases, it works well. The question is—are we really using Python because we want to extend it with C? The answer is no. This is only an inconvenient necessity in situations where we don't have any better options.

Extensions always come with a cost and one of the biggest downsides of using extensions is increased complexity.

Additional complexity

It is not a secret that developing applications in many different languages is not an easy task. Python and C are completely different technologies, and it is very hard to find anything that they have in common. It is also true that there is no application that is free of bugs. If extensions become common in your code base, debugging can become painful. Not only because the debugging of C code requires a completely different workflow and tools, but also because you will need to switch context between two different languages very often.

We are all human and we all have limited cognitive capabilities. There are, of course, people who can handle multiple layers of abstraction at the same time efficiently, but they seem to be a very rare specimen. No matter how skilled you are, there is always an additional price to pay for maintaining such hybrid solutions. This will either involve extra effort and time required to switch between C and Python, or additional stress that will make you eventually less efficient.

According to the TIOBE index, C is still one of the most popular programming languages. Despite this fact, it is very common for Python programmers to know very little or almost nothing about it. Personally, I think that C should be the lingua franca in the programming world, but my opinion is very unlikely to change anything in this matter.

Python also is so seductive and easy to learn, meaning that a lot of programmers forget about all their previous experiences and completely switch to the new technology. And programming is not like riding a bike. This particular skill erodes very fast if not used and polished sufficiently. Even programmers with a strong C background are risking gradually losing their previous C proficiency if they decide to dive into Python for too long.

All of the above leads to one simple conclusion—it is harder to find people who will be able to understand and extend your code. For open-source packages, this means fewer voluntary contributors. In closed source, this means that not all of your teammates will be able to develop and maintain extensions without breaking things. And debugging broken things is definitely harder in extensions than in plain Python code.

Harder debugging

When it comes to failures, the extensions may break very badly. One could think that static typing gives you a lot of advantages over Python and allows you to catch a lot of issues during the compilation step that would be hard to notice in Python. And that can happen even without a rigorous testing routine and full test coverage. But that's only one side of the coin.

On the other side, we have all the memory management that must be performed manually. And faulty memory management is the main reason for most programming errors in C. In the best-case scenario, such mistakes will result only in some memory leaks that will gradually eat all of your environment resources. The best case does not mean easy to handle. Memory leaks are really tricky to find without using proper external tools such as Valgrind. In most cases, the memory management issues in your extension code will result in a segmentation fault that is unrecoverable in Python and will cause the interpreter to crash without raising an exception that would explain the cause. This means that you will eventually need to arm up with additional tools that most Python programmers usually don't need to use. This adds complexity to your development environment and workflow.

The downsides of using extensions mean that they are not always the best tool to bridge Python with other languages. If the only thing you need to do is to interact with already built shared libraries, sometimes the best option is to use a completely different approach. The next section discusses ways of interacting with dynamic libraries without using extensions.

Interfacing with dynamic libraries without extensions

Thanks to ctypes (a module in the standard library) or cffi (an external package available on PyPI), you can integrate every compiled dynamic/shared library in Python, no matter what language it was written in. And you can do that in pure Python without any compilation step. Those two packages are known as foreign function libraries. They are interesting alternatives to writing your own extensions in C.

Although using foreign function libraries does not require writing C code, it does not mean you don't need to know anything about C to use them effectively. Both ctypes and cffi require from you a reasonable understanding of C and how dynamic libraries work in general. On the other hand, they remove the burden of dealing with Python reference counting and greatly reduce the risk of making painful mistakes. Also, interfacing with C code through ctypes or cffi is more portable than writing and compiling the C extension modules.

Let's first take a look at ctypes, which is a part of the Python standard library.

The ctypes module

The ctypes module is the most popular module to call functions from dynamic or shared libraries without the need to write custom C extensions. The reason for that is obvious. It is part of the standard library, so it is always available and does not require any external dependencies.

The first step to use code from a shared library is to load it. Let's see how to do that with ctypes.

Loading libraries

There are exactly four types of dynamic library loaders available in ctypes and two conventions to use them. The classes that represent dynamic and shared libraries are ctypes.CDLL, ctypes.PyDLL, ctypes.OleDLL, and ctypes.WinDLL. The differences between them are as follows:

ctypes.CDLL: This class represents loaded shared libraries. The functions in these libraries use the standard calling convention and are assumed to return the int type. GIL is released during the call.
ctypes.PyDLL: This class works like ctypes.CDLL, but GIL is not released during the call. After execution, the Python error flag is checked, and an exception is raised if the flag was set during the execution. It is only useful when the loaded library is directly calling functions from the Python/C API or uses callback functions that may be Python code.
ctypes.OleDLL: This class is only available on Windows. Functions in these libraries use Windows' stdcall calling convention and return Windows-specific HRESULT code about call success or failure. Python will automatically raise an OSError exception after a result code indicating a failure.
ctypes.WinDLL: This class is only available on Windows. Functions in these libraries use Windows' stdcall calling convention and return values of type int by default. Python does not automatically inspect whether these values indicate failure or not.

To load the library, you can either instantiate one of the preceding classes with proper arguments or call the LoadLibrary() function from the submodule associated with a specific class:

ctypes.cdll.LoadLibrary() for ctypes.CDLL
ctypes.pydll.LoadLibrary() for ctypes.PyDLL
ctypes.windll.LoadLibrary() for ctypes.WinDLL
ctypes.oledll.LoadLibrary() for ctypes.OleDLL

The main challenge when loading shared libraries is how to find them in a portable way. Different systems use different suffixes for shared libraries (.dll on Windows, .dylib on macOS, .so on Linux) and search for them in different places. The main offender in this area is Windows, which does not have a predefined naming scheme for libraries. Because of that, we won't discuss details of loading libraries with ctypes on this system and will concentrate mainly on Linux and macOS, which deal with this problem in a consistent and similar way.

If you are interested in the Windows platform, refer to the official ctypes documentation, which has plenty of information about supporting that system. It can be found at https://docs.python.org/3/library/ctypes.html.

Both library loading conventions (the LoadLibrary() functions and specific library-type classes) require you to use the full library name. This means all the predefined library prefixes and suffixes need to be included. For example, to load the C standard library on Linux, you need to write the following:

>>> import ctypes
>>> ctypes.cdll.LoadLibrary('libc.so.6')
<CDLL 'libc.so.6', handle 7f0603e5f000 at 7f0603d4cbd0>

Here, for macOS, this would be the following:

>>> import ctypes
>>> ctypes.cdll.LoadLibrary('libc.dylib')

Fortunately, the ctypes.util submodule provides a find_library() function that allows you to load a library using its name without any prefixes or suffixes and will work on any system that has a predefined scheme for naming shared libraries:

>>> import ctypes
>>> from ctypes.util import find_library
>>> ctypes.cdll.LoadLibrary(find_library('c'))
<CDLL 'libc.so.6', handle 7f2e82f12000 at 0x7f2e8288e220>
>>> ctypes.cdll.LoadLibrary(find_library('bz2'))
<CDLL 'libbz2.so.1.0', handle 55fb3c2d1660 at 0x7f2e827e8af0>

So, if you are writing a ctypes package that is supposed to work both under macOS and Linux, always use ctypes.util.find_library().

When your shared library is loaded, it is time to use its functions. Calling C functions using ctypes is explained in the next section.

Calling C functions using ctypes

When the dynamic/shared library is successfully loaded to the Python object, the common pattern is to store it as a module-level variable with the same name as the name of the loaded library. The functions can be accessed as object attributes, so calling them is like calling a Python function from any other imported module, for example:

>>> import ctypes
>>> from ctypes.util import find_library
>>> libc = ctypes.cdll.LoadLibrary(find_library('c'))
>>> libc.printf(b"Hello world!\n")
Hello world!
13

Unfortunately, all the built-in Python types except integers, strings, and bytes are incompatible with C datatypes and thus must be wrapped in the corresponding classes provided by the ctypes module. Here is the full list of compatible datatypes that come from the ctypes documentation:

`ctypes` type	C type	Python type
`c_bool`	`_Bool`	`bool`
`c_char`	`char`	1-character `bytes`
`c_wchar`	`wchar_t`	1-character `string`
`c_byte`	`char`	`int`
`c_ubyte`	`unsigned char`	`int`
`c_short`	`short`	`int`
`c_ushort`	`unsigned short`	`int`
`c_int`	`int`	`int`
`c_uint`	`unsigned int`	`int`
`c_long`	`long`	`int`
`c_ulong`	`unsigned long`	`int`
`c_longlong`	`__int64` or `long long`	`int`
`c_ulonglong`	`unsigned __int64` or `long long`	`int`
`c_size_t`	`size_t`	`int`
`c_ssize_t`	`ssize_t` or `Py_ssize_t`	`int`
`c_float`	`float`	`float`
`c_double`	`double`	`float`
`c_longdouble`	`long double`	`float`
`c_char_p`	`char*` (`NULL`-terminated)	`bytes` or `None`
`c_wchar_p`	`wchar_t*` (`NULL`-terminated)	`string` or `None`
`c_void_p`	`void*`	`int` or `None`

As you can see, the preceding table does not contain dedicated types that would reflect any of the Python collections as C arrays. The recommended way to create types for C arrays is to simply use the multiplication operator with the desired basic ctypes type as follows:

>>> import ctypes
>>> IntArray5 = ctypes.c_int * 5
>>> c_int_array = IntArray5(1, 2, 3, 4, 5)
>>> FloatArray2 = ctypes.c_float * 2
>>> c_float_array = FloatArray2(0, 3.14)
>>> c_float_array[1]
3.140000104904175

The above syntax works for every basic ctypes type.

Let's look at how Python functions are passed as C callbacks in the next section.

Passing Python functions as C callbacks

It is a very popular design pattern to delegate part of the work of function implementation to custom callbacks provided by the user. The most-known function from the C standard library that accepts such callbacks is a qsort() function that provides a generic implementation of the quicksort algorithm. It is rather unlikely that you would like to use this algorithm instead of the default TimSort implemented in the CPython interpreter, which is more suited for sorting Python collections. Anyway, qsort() seems to be a canonical example of an efficient sorting algorithm and a C API that uses the callback mechanism that is found in many programming books. This is why we will try to use it as an example of passing the Python function as a C callback.

The ordinary Python function type will not be compatible with the callback function type required by the qsort() specification. Here is the signature of qsort() from the BSD man page that also contains the type of accepted callback type (the compar argument):

void qsort(void *base, size_t nel, size_t width, 
           int (*compar)(const void*, const void *));

So in order to execute qsort() from libc, you need to pass the following:

base: This is the array that needs to be sorted as a void* pointer.
nel: This is the number of elements as size_t.
width: This is the size of the single element in the array as size_t.
compar: This is the pointer to the function that is supposed to return int and accepts two void* pointers. It points to the function that compares the size of two elements that are being sorted.

We already know from the Calling C functions using ctypes section how to construct the C array from other ctypes types using the multiplication operator. nel should be size_t and that maps to Python int, so it does not require any additional wrapping and can be passed as len(iterable). The width value can be obtained using the ctypes.sizeof() function once we know the type of our base array. The last thing we need to know is how to create the pointer to the Python function compatible with the compar argument.

The ctypes module contains a CFUNCTYPE() factory function that allows you to wrap Python functions and represent them as C callable function pointers. The first argument is the C return type that the wrapped function should return.

It is followed by the variable list of C types that the function accepts as the arguments. The function type compatible with the compar argument of qsort() will be as follows:

CMPFUNC = ctypes.CFUNCTYPE( 
    # return type 
    ctypes.c_int, 
    # first argument type 
    ctypes.POINTER(ctypes.c_int), 
    # second argument type 
    ctypes.POINTER(ctypes.c_int), 
)

CFUNCTYPE() uses the cdecl calling convention, so it is compatible only with the CDLL and PyDLL shared libraries. The dynamic libraries on Windows that are loaded with WinDLL or OleDLL use the stdcall calling convention. This means that the other factory must be used to wrap Python functions as C callable function pointers. In ctypes, it is WINFUNCTYPE().

To wrap everything up, let's assume that we want to sort a randomly shuffled list of integer numbers with a qsort() function from the standard C library. Here is the example script that shows how to do that using everything that we have learned about ctypes so far:

from random import shuffle 
 
import ctypes 
from ctypes.util import find_library 
 
libc = ctypes.cdll.LoadLibrary(find_library('c')) 
 
CMPFUNC = ctypes.CFUNCTYPE( 
    # return type 
    ctypes.c_int, 
    # first argument type 
    ctypes.POINTER(ctypes.c_int), 
    # second argument type 
    ctypes.POINTER(ctypes.c_int), 
) 
 
 
def ctypes_int_compare(a, b): 
    # arguments are pointers so we access using [0] index 
    print(" %s cmp %s" % (a[0], b[0])) 
 
    # according to qsort specification this should return: 
    # * less than zero if a < b 
    # * zero if a == b 
    # * more than zero if a > b 
    return a[0] - b[0] 
 
 
def main(): 
    numbers = list(range(5)) 
    shuffle(numbers) 
    print("shuffled: ", numbers) 
 
    # create new type representing array with length 
    # same as the length of numbers list 
    NumbersArray = ctypes.c_int * len(numbers) 
    # create new C array using a new type 
    c_array = NumbersArray(*numbers) 
 
    libc.qsort( 
        # pointer to the sorted array 
        c_array, 
        # length of the array 
        len(c_array), 
        # size of single array element 
        ctypes.sizeof(ctypes.c_int), 
        # callback (pointer to the C comparison function) 
        CMPFUNC(ctypes_int_compare) 
    ) 
    print("sorted:   ", list(c_array)) 
 
 
if __name__ == "__main__": 
    main()

The comparison function provided as a callback has an additional print statement, so we can see how it is being executed during the sorting process as follows:

$ python3 ctypes_qsort.py 
shuffled:  [4, 3, 0, 1, 2]
 4 cmp 3
 4 cmp 0
 3 cmp 0
 4 cmp 1
 3 cmp 1
 0 cmp 1
 4 cmp 2
 3 cmp 2
 1 cmp 2
sorted:    [0, 1, 2, 3, 4]

Of course, using qsort in Python doesn't make a lot of sense because Python has its own specialized sorting algorithm. Anyway, passing Python functions as C callbacks is a very useful technique for integrating many third-party libraries.

The ctypes module is very popular among Python programmers because it is part of the standard library. Its downside is a lot of low-level type handling and a bit of boilerplate required to interact with loaded libraries. That's why some developers prefer using a third-party package, CFFI, that streamlines the usage of foreign function calls. We will take a look at it in the next section.

CFFI

CFFI is a foreign function interface for Python that is an interesting alternative to ctypes. It is not a part of the standard library, but it is easily available from PyPI as the cffi package. It is different from ctypes because it puts more emphasis on reusing plain C declarations instead of providing extensive Python APIs in a single module. It is way more complex and also has a feature that allows you to automatically compile some parts of your integration layer into extensions using the C compiler. This means it can be used as a hybrid solution that fills the gap between plain C extensions and ctypes.

Because it is a very large project, it is impossible to briefly introduce it in a few paragraphs. On the other hand, it would be a shame to not say something more about it. We have already discussed one example of integrating the qsort() function from the standard library using ctypes. So, the best way to show the main differences between these two solutions would be to reimplement the same example with cffi.

I hope that the following one block of code is worth more than a few paragraphs of text:

from random import shuffle 
 
from cffi import FFI 
 
ffi = FFI() 
 
ffi.cdef(""" 
void qsort(void *base, size_t nel, size_t width, 
           int (*compar)(const void *, const void *)); 
""") 
C = ffi.dlopen(None) 
 
 
@ffi.callback("int(void*, void*)") 
def cffi_int_compare(a, b): 
    # Callback signature requires exact matching of types. 
    # This involves less magic than in ctypes 
    # but also makes you more specific and requires 
    # explicit casting 
    int_a = ffi.cast('int*', a)[0] 
    int_b = ffi.cast('int*', b)[0] 
    print(" %s cmp %s" % (int_a, int_b)) 
 
    # according to qsort specification this should return: 
    # * less than zero if a < b 
    # * zero if a == b 
    # * more than zero if a > b 
    return int_a - int_b 
 
 
def main(): 
    numbers = list(range(5)) 
    shuffle(numbers) 
    print("shuffled: ", numbers) 
 
    c_array = ffi.new("int[]", numbers) 
 
    C.qsort( 
        # pointer to the sorted array 
        c_array, 
        # length of the array 
        len(c_array), 
        # size of single array element 
        ffi.sizeof('int'), 
        # callback (pointer to the C comparison function) 
        cffi_int_compare, 
    ) 
    print("sorted:   ", list(c_array)) 
 
if __name__ == "__main__": 
    main()

The output will be similar to the one presented earlier when discussing the example of C callbacks in ctypes. Using CFFI to integrate qsort in Python doesn't make any more sense than using ctypes for the same purpose. Anyway, the preceding example should show the main differences between ctypes and cffi regarding handling datatypes and function callbacks.

Summary

This chapter explained one of the most complex topics in the book. We discussed the reasons and tools for building Python extensions as a way of bridging Python with other languages. We started by writing pure C extensions that depend only on the Python/C API and then reimplemented it with Cython to show how easy it can be if you only choose the proper tool.

There are still some reasons for doing things the hard way and using nothing more than the pure C compiler and the Python.h headers. Anyway, the best recommendation is to use tools such as Cython because this will make your code base more readable and maintainable. It will also save you from most of the issues caused by incautious reference counting and memory mismanagement.

Our discussion of extensions ended with the presentation of ctypes and CFFI as alternative ways to solve the problems of integrating shared libraries. Because they do not require writing custom extensions to call functions from compiled binaries, they should be your tools of choice for integrating closed-source dynamic/shared libraries—especially if you don't need to use custom C code.

In the last few chapters, we have discussed multiple complex topics. From advanced design patterns, through concurrency and event-driven programming to bridging Python with different languages. Now we will be moving on to the topic of maintaining Python applications: from testing and quality assurance to packaging, monitoring, and optimizing applications of any size.

One of the most important challenges of software maintenance is how to assure that the code we wrote is correct. As our software inevitably becomes more complex, it is harder to ensure that it is working properly without an organized testing regime. And as it will grow bigger, it will be impossible to effectively test it without any kind of automation. That's why in the next chapter, we will take a look at various Python tools and techniques that allow you to automate testing and quality processes.

Previous Chapter

Elements of Metaprogramming

Next Chapter

Testing and Quality Automation

Table of Contents for Expert Python Programming - Fourth Edition

9

Bridging Python with C and C++