8

Elements of Metaprogramming

Metaprogramming is a collection of programming techniques that focus on the ability of programs to introspect themselves, understand their own code, and modify themselves. Such an approach to programming gives programmers a lot of power and flexibility. Without metaprogramming techniques, we probably wouldn't have modern programming frameworks, or at least those frameworks would be way less expressive.

The term metaprogramming is often shrouded in an aura of mystery. Many programmers associate it almost exclusively with programs that can inspect and manipulate their own code at the source level. Programs manipulating their own source code are definitely one of the most striking and complex examples of applied metaprogramming, but metaprogramming takes many forms and doesn't always have to be complex nor hard. Python is especially rich in features and modules that make certain metaprogramming techniques simple and natural.

In this chapter, we will explain what metaprogramming really is and present a few practical approaches to metaprogramming in Python. We will start with simple metaprogramming techniques like function and class decorators but will also cover advanced techniques to override the class instance creation process and the use of metaclasses. We will finish with examples of the most powerful but also dangerous approach to metaprogramming, which is code generation patterns.

In this chapter, we will cover the following topics:

  • What is metaprogramming?
  • Using decorators to modify function behavior before use
  • Intercepting the class instance creation process
  • Metaclasses
  • Code generation

Before we get into some metaprogramming techniques available for Python developers, let's begin by considering the technical requirements.

Technical requirements

The following are Python packages that are mentioned in this chapter that you can download from PyPI:

  • inflection
  • macropy3
  • falcon
  • hy

Information on how to install packages is included in Chapter 2, Modern Python Development Environments.

The code files for this chapter can be found at https://github.com/PacktPublishing/Expert-Python-Programming-Fourth-Edition/tree/main/Chapter%208.

What is metaprogramming?

Maybe we could find a good academic definition of metaprogramming, but this is a book that is more about good software craftsmanship than about computer science theory. This is why we will use our own informal definition of metaprogramming:

"Metaprogramming is a technique of writing computer programs that can treat themselves as data, so they can introspect, generate, and/or modify themselves while running."

Using this definition, we can distinguish between two major branches of metaprogramming in Python:

  • Introspection-oriented metaprogramming: Focused on natural introspection capabilities of the language and dynamic definitions of functions and types
  • Code-oriented metaprogramming: Metaprogramming focused on treating code as mutable data structures

Introspection-oriented metaprogramming concentrates on the language's ability to introspect its basic elements, such as functions, classes, or types, and to create or modify them on the go. Python really provides a lot of tools in this area. This feature of the Python language is often used by Integrated Development Environments (IDEs) to provide real-time code analysis and name suggestions. The easiest possible metaprogramming tools in Python that utilize language introspection features are decorators that allow for adding extra functionality to existing functions, methods, or classes. Next are special methods of classes that allow you to interfere with the class instance creation process. The most powerful are metaclasses, which allow programmers to even completely redesign Python's implementation of object-oriented programming.

Code-oriented metaprogramming allows programmers to work directly with code, either in its raw (plain text) format or in the more programmatically accessible abstract syntax tree (AST) form. This second approach is, of course, more complicated and difficult to work with but allows for really extraordinary things, such as extending Python's language syntax or even creating your own domain-specific language (DSL).

In the next section, we'll discuss what decorators are in the context of metaprogramming.

Using decorators to modify function behavior before use

Decorators are one of the most common inspection-oriented metaprogramming techniques in Python. Because functions in Python are first-class objects, they can be inspected and modified at runtime. Decorators are special functions capable of inspecting, modifying, or wrapping other functions.

The decorator syntax was explained in Chapter 4, Python in Comparison with Other Languages, and is in fact a syntactic sugar that is supposed to make it easier to work with functions that extend existing code objects with additional behavior.

You can write code that uses the simple decorator syntax as follows:

@some_decorator
def decorated_function(): 
    pass 

You can also write it in the following (more verbose) way:

def decorated_function(): 
    pass 
decorated_function = some_decorator(decorated_function)

This verbose form of function decoration clearly shows what the decorator does. It takes a function object and modifies it at runtime. A decorator usually returns a new function object that replaces the pre-existing decorated function name.

We've already seen how function decorators are indispensable in implementing many design patterns, in Chapter 5, Interfaces, Patterns, and Modularity. Function decorators are often used to intercept and preprocess original function arguments, modify the return values, or enhance the function call context with additional functional aspects like logging, profiling, or evaluating a caller's authorization/authentication claims.

Let's, for instance, consider the following example usage of the @lru_cache decorator from the functools module:

from functools import lru_cache
@lru_cache(size=100)
def expensive(*args, **kwargs):
    ...

The @lru_cache decorator creates a Last Recently Used (LRU) cache of return values for a given function. It intercepts incoming function arguments and compares them with a list of recently used argument sets. If there is a match, it returns the cached value instead of calling the decorated function. If there is no match, the original function will be called first and the return value will be stored in the cache for later use. In our example, the cache will hold no more than 100 values.

What is really interesting is that the use of @lru_cache is already a metaprogramming technique. It takes an existing code object (here, the expensive() function) and modifies its behavior. It also intercepts arguments and inspects their value and type to decide whether these can be cached or not.

This is good news. We've seen already, in Chapter 4, Python in Comparison with Other Languages, that decorators are relatively easy to write and use in Python. In most cases, decorators make code shorter, easier to read, and also cheaper to maintain. This means that they serve as a perfect introductory technique to metaprogramming. Other metaprogramming tools that are available in Python may be more difficult to understand and master.

The natural step forward from function decorators is class decorators. We take a look at them in the next section.

One step deeper: class decorators

One of the lesser-known syntax features of Python is class decorators. Their syntax and implementation are exactly the same as function decorators. The only difference is that they are expected to return a class instead of the function object.

We've already used some class decorators in previous chapters. These were the @dataclass decorator from the dataclasses module explained in Chapter 4, Python in Comparison with Other Languages, and @runtime_checkable from the typing module, explained in Chapter 5, Interfaces, Patterns, and Modularity. Both decorators rely on Python's introspection capabilities to enhance existing classes with extra behavior:

  • The @dataclass decorator inspects class attribute annotations to create a default implementation of the __init__() method and comparison protocol that saves developers from writing repeatable boilerplate code. It also allows you to create custom "frozen" classes with immutable and hashable instances that can be used as dictionary keys.
  • The @runtime_checkable decorator marks Protocol subclasses as "runtime checkable." It means that the argument and return value annotation of the Protocol subclass can be used to determine at runtime if another class implements an interface defined by the protocol class.

The best way to understand how class decorators work is to learn by doing. The @dataclass and @runtime_checkable decorators have rather complex inner workings, so instead of looking at their actual code, we will try to build our own simple example.

One of the great features of dataclasses is the ability to provide a default implementation of the __repr__() method. That method returns a string representation of the object that can be displayed in an interactive session, logs, or in standard output.

For custom classes, this __repr__() method will by default include only the class name and memory address, but for dataclasses it automatically includes a representation of each individual field of the dataclass. We will try to build a dataclass decorator that will provide a similar capability for any class.

We'll start by writing a function that can return a human-readable representation of any class instance if given a list of attributes to represent:

from typing import Any, Iterable
UNSET = object()
def repr_instance(instance: object, attrs: Iterable[str]) -> str:
    attr_values: dict[str, Any] = {
        attr: getattr(instance, attr, UNSET)
        for attr in attrs
    }
    sub_repr = ", ".join(
        f"{attr}={repr(val) if val is not UNSET else 'UNSET'}"
        for attr, val in attr_values.items()
    )
    return f"<{instance.__class__.__qualname__}: {sub_repr}>"

Our repr_instance() function starts by traversing instance attributes using the getattr() function over all attribute names provided in the attrs argument. Some instance attributes may not be set at the time we are creating the representation. The getattr() function will return None if the attribute is not set—however, None is also a valid attribute value so we need to have a way to distinguish unset attributes from None values. That's why we use the UNSET sentinel values.

UNSET = object() is a common pattern for creating a unique sentinel value. The bare object type instance returns True on the is operator check only when compared with itself.

Once attributes and their values are known, our function uses f-strings to create an actual representation of the class instance that will include a representation of each individual attribute defined in the attrs argument.

We will soon look at how to automatically include such representations in custom classes, but first, let's try to see how it deals with existing objects. Here, for instance, is an example of using the instance_repr() function in an interactive session to get a representation of an imaginary number:

>>> repr_instance(1+10j, ["real", "imag"])
'<complex: real=1.0, imag=10.0>'

It's good so far but we need to pass the object instance explicitly and know all the possible attribute names before we want to print them. That's not very convenient because we will have to update the arguments of repr_instance() every time the structure of the class changes. We will write a class decorator that will be able to take the repr_instance() function and inject it into a decorated class. We will also use class attribute annotations stored under a class's __annotations__ attribute to determine what attributes we want to include in the representation. Following is the code of our decorator:

def autorepr(cls):
    attrs = set.union(
        *(
            set(c.__annotations__.keys())
            for c in cls.mro()
            if hasattr(c, "__annotations__")
        )
    )
    def __repr__(self):
        return repr_instance(self, sorted(attrs))
    cls.__repr__ = __repr__
    return cls

In those few lines, we use a lot of things that we learned about in Chapter 4, Python in Comparison with Other Languages. We start by obtaining a list of annotated attributes from the cls.__annotations__ dictionary from each class in the class Method Resolution Order (MRO). We have to traverse the whole MRO because annotations are not inherited from base classes.

Later, we use a closure to define an inner __repr__() function that has access to the attrs variable from the outer scope. When that's done, we override the existing cls.__repr__() method with a new implementation. We can do that because function objects are already non-data descriptors. It means that in the class context they become methods and simply receive an instance object as a first argument.

Now we can test our decorator on some custom instance. Let's save our code in the autorepr.py file and define some trivial class with attribute annotations that will be decorated with our @autorepr decorator:

from typing import Any
@autorepr
class MyClass:
    attr_a: Any
    attr_b: Any
    attr_c: Any
    def __init__(self, a, b):
        self.attr_a = a
        self.attr_b = b

If you are vigilant, you've probably noticed that we have missed the attr_c attribute initialization. This is intentional. It will allow us to see how @autorepr deals with unset attributes. Let's start Python, import our class, and see the automatically generated representations:

>>> from autorepr import MyClass
>>> MyClass("Ultimate answer", 42)
<MyClass: attr_a='Ultimate answer', attr_b=42, attr_c=UNSET>
>>> MyClass([1, 2, 3], ["a", "b", "c"])
<MyClass: attr_a=[1, 2, 3], attr_b=['a', 'b', 'c'], attr_c=UNSET>
>>> instance = MyClass(None, None)
>>> instance.attr_c = None
>>> instance
<MyClass: attr_a=None, attr_b=None, attr_c=None>

The above example from an interactive Python session shows how the @autorepr decorator can use class attribute annotations to discover the fields that need to be included in instance representation. It is also able to distinguish unset attributes from those that have an explicit None value. A decorator is reusable so you can easily apply it to any class that has type annotations for attributes instead of creating new __repr__() methods manually.

Moreover, it does not require constant maintenance. If you extend the class with an additional attribute annotation, it will be automatically included in instance representation.

Modifying existing classes in place (also known as monkey patching) is a common technique used in class decorators. The other way to enhance existing classes with decorators is through utilizing closures to create new subclasses on the fly. If we had to rewrite our example as a subclassing pattern, we could write it as follows:

def autorepr(cls):
    attrs = cls.__annotations__.keys()
    class Klass(cls):
        def __repr__(self):
            return repr_instance(self, attrs)
    return Klass

The major drawback of using closures in class decorators this way is that this technique affects class hierarchy. Among others, this will override the class's __name__, __qualname__, and __doc__ attributes. In our case, that would also mean that part of the intended functionality would be lost. The following are would-be example representations of MyClass decorated with such a decorator:

<autorepr.<locals>.Klass: attr_a='Ultimate answer', attr_b=42, attr_c=UNSET>
<autorepr.<locals>.Klass: attr_a=[1, 2, 3], attr_b=['a', 'b', 'c'], attr_c=UNSET>

This cannot be easily fixed. The functools module provides the @wraps utility decorator, which can be used in ordinary function decorators to preserve the metadata of an annotated function. Unfortunately, it can't be used with class decorators. This makes the use of subclassing in class decorators limited. They can, for instance, break the results of automated documentation generation tools.

Still, despite this single caveat, class decorators are a simple and lightweight alternative to the popular mixin class pattern. A mixin in Python is a class that is not meant to be instantiated but is instead used to provide some reusable API or functionality to other existing classes. Mixin classes are almost always added using multiple inheritance. Their usage usually takes the following form:

class SomeConcreteClass(MixinClass, SomeBaseClass): 
    pass 

Mixin classes form a useful design pattern that is utilized in many libraries and frameworks. To name one, Django is an example framework that uses them extensively. While useful and popular, mixin classes can cause some trouble if not designed well, because, in most cases, they require the developer to rely on multiple inheritance. As we stated earlier, Python handles multiple inheritance relatively well, thanks to its clear MRO implementation. Anyway, try to avoid subclassing multiple classes if you can, as multiple inheritance makes code complex and hard to work with. This is why class decorators may be a good replacement for mixin classes.

In general, decorators concentrate on modifying the behavior of functions and classes before they are actually used. Function decorators replace existing functions with their wrapped alternatives and class decorators, usually modifying the class definition. But there are some metaprogramming techniques that concentrate more on modifying code behavior when it is actually in use. One of those techniques relies on intercepting the class instance creation process through the overriding of the __new__() method. We will discuss this in the next section.

Intercepting the class instance creation process

There are two special methods concerned with the class instance creation and initialization process. These are __init__() and __new__().

The __init__() method is the closest to the concept of the constructor found in many object-oriented programming languages. It receives a fresh class instance together with initialization arguments and is responsible for initializing the class instance state.

The special method __new__() is a static method that is actually responsible for creating class instances. This __new__(cls, [,...]) method is called prior to the __init__() initialization method. Typically, the implementation of the overridden __new__() method invokes its superclass version using super().__new__() with suitable arguments and modifies the instance before returning it.

The __new__() method is a special-cased static method so there is no need to declare it as a static method using the staticmethod decorator.

The following is an example class with the overridden __new__() method implementation in order to count the number of class instances:

class InstanceCountingClass:
    created = 0
    number: int
    def __new__(cls, *args, **kwargs):
        instance = super().__new__(cls)
        instance.number = cls.created
        cls.created += 1
        return instance
    def __repr__(self):
        return (
            f"<{self.__class__.__name__}: "
            f"{self.number} of {self.created}>"
        )

Here is the log of the example interactive session that shows how our InstanceCountingClass implementation works:

>>> instances = [InstanceCountingClass() for _ in range(5)]
>>> for i in instances:
...     print(i)
...
<InstanceCountingClass: 0 of 5>
<InstanceCountingClass: 1 of 5>
<InstanceCountingClass: 2 of 5>
<InstanceCountingClass: 3 of 5>
<InstanceCountingClass: 4 of 5>
>>> InstanceCountingClass.created
5

The __new__() method should usually return the instance of the featured class, but it is also possible for it to return other class instances. If this does happen (a different class instance is returned), then the call to the __init__() method is skipped. This fact is useful when there is a need to modify the creation/initialization behavior of immutable class instances like some of Python's built-in types.

Following is an example of the subclassed int type that does not include a zero value:

class NonZero(int): 
    def __new__(cls, value): 
        return super().__new__(cls, value) if value != 0 else None 
 
    def __init__(self, skipped_value): 
        # implementation of __init__ could be skipped in this case 
        # but it is left to present how it may be not called 
        print("__init__() called") 
        super().__init__()

The above example includes several print statements to present how Python skips the __init__() method call in certain situations. Let's review these in the following interactive session:

>>> type(NonZero(-12))
__init__() called
<class '__main__.NonZero'>
>>> type(NonZero(0))
<class 'NoneType'>
>>> NonZero(-3.123)
__init__() called
-3

So, when should we use __new__()? The answer is simple: only when __init__() is not enough. One such case was already mentioned, that is, subclassing immutable built-in Python types such as int, str, float, frozenset, and so on. This is because there was no way to modify such an immutable object instance in the __init__() method once it was created.

Some programmers would argue that __new__() may be useful for performing important object initialization that may be missed if the user forgets to use the super().__init__() call in the overridden initialization method. While it sounds reasonable, this has a major drawback. With such an approach, it becomes harder for the programmer to explicitly skip previous initialization steps if this is the already desired behavior. It also breaks an unspoken rule of all initializations performed in __init__().

Because __new__() is not constrained to return the same class instance, it can be easily abused. Irresponsible usage of this method might do a lot of harm to code readability, so it should always be used carefully and backed with extensive documentation. Generally, it is better to search for other solutions that may be available for the given problem, instead of affecting object creation in a way that will break a basic programmer's expectations. Even the overridden initialization of immutable types can be replaced with more predictable and well-established design patterns like factory methods.

Factory methods in Python are usually defined with the use of the classmethod decorator, which can intercept arguments before the class constructor is invoked. That usually allows you to pack more than one initialization semantic into a single class. Following is an example of a list type subclass that has two factory methods for creating list instances that are doubled or tripled in size:

from collections import UserList
class XList(UserList):
    @classmethod
    def double(cls, iterable):
        return cls(iterable) * 2
    @classmethod
    def triple(cls, iterable):
        return cls(iterable) * 3

There is at least one aspect of Python programming where extensive usage of the __new__() method is well justified. These are metaclasses, which are described in the next section.

Metaclasses

A metaclass is a Python feature that is considered by many as one of the most difficult things to understand in the language and is thus avoided by a great number of developers. In reality, it is not as complicated as it sounds once you understand a few basic concepts. As a reward, knowing how to use metaclasses grants you the ability to do things that are not possible without them.

A metaclass is a type (class) that defines other types (classes). The most important thing to know in order to understand how they work is that classes (so, types that define object structure and behavior) are objects too. So, if they are objects, then they have an associated class. The basic type of every class definition is simply the built-in type class (see Figure 8.1).

Figure 8.1: How classes are typed

In Python, it is possible to substitute the metaclass for a class object with your own type. Usually, the new metaclass is still the subclass of the type metaclass (refer to Figure 8.2) because not doing so would make the resulting classes highly incompatible with other classes in terms of inheritance:

Figure 8.2: Usual implementation of custom metaclasses

Let's take a look at the general syntaxes for metaclasses in the next section.

The general syntax

The call to the built-in type() class can be used as a dynamic equivalent of the class statement. The following is an example of a class definition with the type() call:

def method(self): 
    return 1 
 
MyClass = type('MyClass', (), {'method': method})

The first argument is the class name, the second is a list of base classes (here, an empty tuple), and the third is a dictionary of class attributes (usually methods). This is equivalent to the explicit definition of the class with the class keyword:

class MyClass: 
    def method(self): 
        return 1

Every class that's created with the class statement implicitly uses type as its metaclass. This default behavior can be changed by providing the metaclass keyword argument to the class statement, as follows:

class ClassWithAMetaclass(metaclass=type): 
    pass 

The value that's provided as a metaclass argument is usually another class object, but it can be any other callable that accepts the same arguments as the type class and is expected to return another class object.

The detailed call signature of a metaclass is type(name, bases, namespace) and the meaning of the arguments are as follows:

  • name: This is the name of the class that will be stored in the __name__ attribute
  • bases: This is the list of parent classes that will become the __bases__ attribute and will be used to construct the MRO of a newly created class
  • namespace: This is a namespace (mapping) with definitions for the class body that will become the __dict__ attribute

One way of thinking about metaclasses is the __new__() method but at a higher level of class definition.

Despite the fact that functions that explicitly call type() can be used in place of metaclasses, the usual approach is to use a different class that inherits from type for this purpose. The common template for a metaclass is as follows:

class Metaclass(type): 
    def __new__(mcs, name, bases, namespace): 
        return super().__new__(mcs, name, bases, namespace) 
 
    @classmethod 
    def __prepare__(mcs, name, bases, **kwargs): 
        return super().__prepare__(name, bases, **kwargs) 
 
    def __init__(cls, name, bases, namespace, **kwargs): 
        super().__init__(name, bases, namespace) 
 
    def __call__(cls, *args, **kwargs): 
        return super().__call__(*args, **kwargs) 

The name, bases, and namespace arguments have the same meaning as in the type() call we explained earlier, but each of these four methods is invoked at a different stage of the class lifecycle:

  • __new__(mcs, name, bases, namespace): This is responsible for the actual creation of the class object in the same way as it does for ordinary classes. The first positional argument is a metaclass object. In the preceding example, it would simply be Metaclass. Note that mcs is the popular naming convention for this argument.
  • __prepare__(mcs, name, bases, **kwargs): This creates an empty namespace object. By default, it returns an empty dict instance, but it can be overridden to return any other dict subclass instance. Note that it does not accept namespace as an argument because, before calling it, the namespace does not exist yet. Example usage of that method will be explained later, in the Metaclass usage section.
  • __init__(cls, name, bases, namespace, **kwargs): This is not common in metaclass implementations but has the same meaning as in ordinary classes. It can perform additional class object initialization once it is created with __new__(). The first positional argument is now named cls by convention to mark that this is already a created class object (metaclass instance) and not a metaclass object. When __init__() is called, the class has been already constructed and so the __init__() method can do less than the __new__() method. Implementing such a method is very similar to using class decorators, but the main difference is that __init__() will be called for every subclass, while class decorators are not called for subclasses.
  • __call__(cls, *args, **kwargs): This is called when an instance of a metaclass is called. The instance of a metaclass is a class object (refer to Figure 8.1); it is invoked when you create new instances of a class. This can be used to override the default way of how class instances are created and initialized.

Each of the preceding methods can accept additional extra keyword arguments, all of which are represented by **kwargs. These arguments can be passed to the metaclass object using extra keyword arguments in the class definition in the form of the following code:

class Klass(metaclass=Metaclass, extra="value"): 
    pass 

This amount of information can be overwhelming at first without proper examples, so let's trace the creation of metaclasses, classes, and instances with some print() calls:

class RevealingMeta(type): 
    def __new__(mcs, name, bases, namespace, **kwargs): 
        print(mcs, "METACLASS __new__ called") 
        return super().__new__(mcs, name, bases, namespace) 
 
    @classmethod 
    def __prepare__(mcs, name, bases, **kwargs): 
        print(mcs, " METACLASS __prepare__ called") 
        return super().__prepare__(name, bases, **kwargs) 
 
    def __init__(cls, name, bases, namespace, **kwargs): 
        print(cls, " METACLASS __init__ called") 
        super().__init__(name, bases, namespace) 
 
    def __call__(cls, *args, **kwargs): 
        print(cls, " METACLASS __call__ called") 
        return super().__call__(*args, **kwargs) 

Using RevealingMeta as a metaclass to create a new class definition will give the following output in the Python interactive session:

>>> class RevealingClass(metaclass=RevealingMeta):
...     def __new__(cls):
...         print(cls, "__new__ called")
...         return super().__new__(cls)
...     def __init__(self):
...         print(self, "__init__ called")
...         super().__init__()
... 
<class '__main__.RevealingMeta'> METACLASS __prepare__ called
<class '__main__.RevealingMeta'> METACLASS __new__ called
<class '__main__.RevealingClass'> METACLASS __init__ called 

As you can see, during the class definition, only metaclass methods are called. The first one is the __prepare__() method, which prepares a new class namespace. It is immediately followed by the __new__() method, which is responsible for the actual class creation and receives the namespace created by the __prepare__() method. Last is the __init__() method, which receives the class object created by the __new__() method (here, the RevealingClass definition).

Metaclass methods cooperate with class methods during class instance creation. We can trace the order of method calls by creating a new RevealingClass instance in the Python interactive session:

>>> instance = RevealingClass()
<class '__main__.RevealingClass'> METACLASS __call__ called
<class '__main__.RevealingClass'> CLASS __new__ called
<__main__.RevealingClass object at 0x10f594748> CLASS __init__ called

The first method called was the __call__() method of a metaclass. At this point, it has access to the class object (here, the RevealingClass definition) but no class instance has been created yet. It is called just before class instance creation, which should happen in the __new__() method of the class definition. The last step of the class instance creation process is the call to the class __init__() method responsible for instance initialization.

We know roughly how metaclasses work in theory so let's now take a look at example usage of metaclasses.

Metaclass usage

Metaclasses are a great tool for doing unusual things. They give a lot of flexibility and power in modifying typical class behavior. So, it is hard to tell what common examples of their usage are. It would be easier to say that most usages of metaclasses are pretty uncommon.

For instance, let's take a look at the __prepare__() method of every object type. It is responsible for preparing the namespace of class attributes. The default type for a class namespace is a plain dictionary. For years, the canonical example of the __prepare__() method was providing a collections.OrderedDict instance as a class namespace.

Preserving the order of attributes in the class namespace allowed for things like repeatable object representation and serialization. But since Python 3.7 dictionaries are guaranteed to preserve key insertion order, that use case is gone. But it doesn't mean that we can't play with namespaces.

Let's imagine the following problem: we have a large Python code base that was developed over dozens of years and the majority of the code was written way before anyone in the team cared about coding standards. We may have, for instance, classes mixing camelCase and snake_case as the method naming convention. If we cared about consistency, we would be forced to spend a tremendous amount of effort to refactor the whole code base into either of the naming conventions. Or we could just use some clever metaclass that could be added on top of existing classes that would allow for calling methods in both ways. We could write new code using the new calling convention (preferably snake_case) while leaving the old code untouched and waiting for a gradual update.

That's an example of a situation where the __prepare__() method could be used! Let's start by writing a dict subclass that automatically interpolates camelCase names into snake_case keys:

from typing import Any
import inflection
class CaseInterpolationDict(dict):
    def __setitem__(self, key: str, value: Any):
        super().__setitem__(key, value)
        super().__setitem__(inflection.underscore(key), value)

To save some work, we use the inflection module, which is not a part of the standard library. It is able to convert strings between various "string cases." You can download it from PyPI using pip:

$ pip install inflection

Our CaseInterpolationDict class works almost like an ordinary dict type but whenever it stores a new value, it saves it under two keys: the original one and one converted to snake_case. Note that we used the dict type as a parent class instead of the recommended collections.UserDict. This is because we will use this class in the metaclass __prepare__() method and Python requires namespaces to be dict instances.

Now it's time to write an actual metaclass that will override the class namespace type. It will be surprisingly short:

class CaseInterpolatedMeta(type):
    @classmethod
    def __prepare__(mcs, name, bases):
        return CaseInterpolationDict()

Since we are set up, we can now use the CaseInterpolatedMeta metaclass to create a dummy class with a few methods that uses the camelCase naming convention:

class User(metaclass=CaseInterpolatedMeta):
    def __init__(self, firstName: str, lastName: str):
        self.firstName = firstName
        self.lastName = lastName
    def getDisplayName(self):
        return f"{self.firstName} {self.lastName}"
    def greetUser(self):
        return f"Hello {self.getDisplayName()}!"

Let's save all that code in the case_user.py file and start an interactive session to see how the User class behaves:

>>> from case_user import User

The first important thing to notice is the contents of the User.__dict__ attribute:

>>> User.__dict__
mappingproxy({
    '__module__': 'case_class',
    '__init__': <function case_class.User.__init__(self, firstName: str, lastName: str)>,
    'getDisplayName': <function case_class.User.getDisplayName(self)>,
    'get_display_name': <function case_class.User.getDisplayName(self)>,
    'greetUser': <function case_class.User.greetUser(self)>,
    'greet_user': <function case_class.User.greetUser(self)>,
    '__dict__': <attribute '__dict__' of 'User' objects>,
    '__weakref__': <attribute '__weakref__' of 'User' objects>,
    '__doc__': None
})

The first thing that catches the eye is the fact that methods got duplicated. That was exactly what we wanted to achieve. The second important thing is the fact that User.__dict__ is of the mappingproxy type. That's because Python always copies the contents of the namespace object to a new dict when creating the final class object. The mapping proxy also allows proxy access to superclasses within the class MRO.

So, let's see if our solution works by invoking all of its methods:

>>> user = User("John", "Doe")
>>> user.getDisplayName()
'John Doe'
>>> user.get_display_name()
'John Doe'
>>> user.greetUser()
'Hello John Doe!'
>>> user.greet_user()
'Hello John Doe!'

It works! We could call all the snake_case methods even though we haven't defined them. For an unaware developer, that could look almost like magic!

However, this is a kind of magic that should be used very carefully. Remember that what you have just seen is a toy example. The real purpose of it was to show what is possible with metaclasses and just a few lines of code. In fact, doing something similar in a large and complex code base could be really dangerous. Metaclasses interact with the very core of the Python data model and can lead to various pitfalls. Some of them are discussed in the next section.

Metaclass pitfalls

Metaclasses, once mastered, are a powerful feature, but always complicate the code. Metaclasses also do not compose well and you'll quickly run into problems if you try to mix multiple metaclasses through inheritance.

Like some other advanced Python features, metaclasses are very elastic and can be easily abused. While the call signature of the class is rather strict, Python does not enforce the type of the return parameter. It can be anything as long as it accepts incoming arguments on calls and has the required attributes whenever it is needed.

One such object that can be anything-anywhere is the instance of the Mock class that's provided in the unittest.mock module. Mock is not a metaclass and also does not inherit from the type class. It also does not return the class object on instantiating. Still, it can be included as a metaclass keyword argument in the class definition, and this will not raise any syntax errors.

Using Mock as a metaclass is, of course, complete nonsense, but let's consider the following example:

>>> from unittest.mock import Mock
>>> class Nonsense(metaclass=Mock):  # pointless, but illustrative
...     pass
... 
>>> Nonsense
<Mock spec='str' id='4327214664'>

It's not hard to predict that any attempt to instantiate our Nonsense pseudo-class will fail. What is really interesting is the following exception and traceback you'll get trying to do so:

>>> Nonsense()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/mock.py", line 917, in __call__
    return _mock_self._mock_call(*args, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/unittest/mock.py", line 976, in _mock_call
    result = next(effect)
StopIteration  

Does the StopIteration exception give you any clue that there may be a problem with our class definition on the metaclass level? Obviously not. This example illustrates how hard it may be to debug metaclass code if you don't know where to look for errors.

But there are situations where things cannot be easily done without metaclasses. For instance, it is hard to imagine Django's ORM implementation built without extensive use of metaclasses. It could be possible, but it is rather unlikely that the resulting solution would be similarly easy to use. Frameworks are the place where metaclasses really shine. They usually have a lot of complex internal code that is not easy to understand and follow but eventually allow other programmers to write more condensed and readable code that operates on a higher level of abstraction.

For simple things, like changing the read/write attributes or adding new ones, metaclasses can be avoided in favor of simpler solutions, such as properties, descriptors, or class decorators. There is also a special method named __init__subclass__(), which can be used as an alternative to metaclasses in many situations. Let's take a closer look at it in the next section.

Using the __init__subclass__() method as an alternative to metaclasses

The @autorepr decorator presented in the One step deeper: class decorators section was fairly simple and useful. Unfortunately, it has one problem that we haven't discussed yet: it doesn't play well with subclassing.

It will work well with simple one-off classes that do not have any descendants but once you start subclassing the originally decorated class, you will notice that it doesn't work as one might expect. Consider the following class inheritance:

from typing import Any
from autorepr import autorepr
@autorepr
class MyClass:
    attr_a: Any
    attr_b: Any
    attr_c: Any
    def __init__(self, a, b):
        self.attr_a = a
        self.attr_b = b
class MyChildClass(MyClass):
    attr_d: Any
    def __init__(self, a, b):
        super().__init__(a, b)

If you try to obtain a representation of MyChildClass instances in an interactive interpreter session, you will see the following output:

<MyChildClass: attr_a='Ultimate answer', attr_b=42, attr_c=UNSET>
<MyChildClass: attr_a=[1, 2, 3], attr_b=['a', 'b', 'c'], attr_c=UNSET>

That's understandable. The @autorepr decorator was used only on the base class so didn't have access to the subclass annotations. MyChildClass inherited the unmodified __repr__() method.

The way to fix that is to add the @autorepr decorator to the subclass as well:

@autorepr
class MyChildClass(MyClass):
    attr_d: Any
    def __init__(self, a, b):
        super().__init__(a, b)

But how can we make the class decorator auto-apply on subclasses? We could clearly replicate the same behavior with the use of metaclasses but we already know that this can really complicate things. That would also make usage way harder as you can't really mix the inheritance of classes using different metaclasses.

Fortunately, there's a method for that. Python classes provide the __init_subclass__() hook method that will be invoked for every subclass. It is a convenient alternative for otherwise problematic metaclasses. This hook just lets the base class know that it has subclasses. It is often used to facilitate various event-driven and signaling patterns (see Chapter 7, Event-Driven Programming) but can also be employed to create "inheritable" class decorators.

Consider the following modification to our @autorepr decorator:

def autorepr(cls):
    attrs = set.union(
        *(
            set(c.__annotations__.keys())
            for c in cls.mro()
            if hasattr(c, "__annotations__")
        )
    )
    def __repr__(self):
        return repr_instance(self, sorted(attrs))
    cls.__repr__ = __repr__
    def __init_subclass__(cls):
        autorepr(cls)
    cls.__init_subclass__ = classmethod(__init_subclass__)
    return cls

What is new is the __init_subclass__() method, which will be invoked with the new class object every time the decorated class is subclassed. In that method, we simply re-apply the @autorepr decorator. It will have access to all new annotations and will also be able to hook itself in for further subclasses. That way you don't have to manually add the decorator for every new subclass and can be sure that all __repr__() methods will always have access to the latest annotations.

So far, we have discussed the built-in features of Python that facilitate the metaprogramming techniques. We've seen that Python is quite generous in this area thanks to natural introspection capabilities, metaclasses, and the flexible object model. But there's a branch of metaprogramming available to practically any language, and regardless of its features. It is code generation. We will discuss that in the next section.

Code generation

As we already mentioned, dynamic code generation is the most difficult approach to metaprogramming. There are tools in Python that allow you to generate and execute code or even make some modifications to already compiled code objects.

Various projects such as Hy (mentioned later) show that even whole languages can be reimplemented in Python using code generation techniques. This proves that the possibilities are practically limitless. Knowing how vast this topic is and how badly it is riddled with various pitfalls, I won't even try to give detailed suggestions on how to create code this way, or to provide useful code samples.

Anyway, knowing what is possible may be useful for you if you plan to study this field in more depth by yourself. So, treat this section only as a short summary of possible starting points for further learning.

Let's take a look at how to use the exec(), eval(), and compile() functions.

exec, eval, and compile

Python provides the following three built-in functions to manually execute, evaluate, and compile arbitrary Python code:

  • exec(object, globals, locals): This allows you to dynamically execute Python code. The object attribute should be a string or code object (see the compile() function) representing a single statement or a sequence of multiple statements. The globals and locals arguments provide global and local namespaces for the executed code and are optional.

    If they are not provided, then the code is executed in the current scope. If provided, globals must be a dictionary, while locals may be any mapping object. The exec() function always returns None.

  • eval(expression, globals, locals): This is used to evaluate the given expression by returning its value. It is similar to exec(), but it expects the expression argument to be a single Python expression and not a sequence of statements. It returns the value of the evaluated expression.
  • compile(source, filename, mode): This compiles the source into the code object or AST object. The source code is provided as a string value in the source argument. The filename should be the name of the file from which the code was read. If it has no file associated (for example, because it was created dynamically), then "<string>" is the value that is commonly used. The mode argument should be either "exec" (a sequence of statements), "eval" (a single expression), or "single" (a single interactive statement, such as in a Python interactive session).

The exec() and eval() functions are the easiest to start with when trying to dynamically generate code because they can operate on strings. If you already know how to program in Python, then you may already know how to correctly generate working source code programmatically.

The most useful in the context of metaprogramming is obviously exec() because it allows you to execute any sequence of Python statements. The word any should be alarming for you. Even eval(), which allows only the evaluation of expressions in the hands of a skillful programmer (when fed with the user input), can lead to serious security holes.

Note that crashing the Python interpreter is the scenario you should be least afraid of. Introducing vulnerability to remote execution exploits due to irresponsible use of exec() and eval() could cost you your image as a professional developer, or even your job. This means that neither exec() nor eval() should ever be used with untrusted input. And every input coming from end users should always be considered unsafe.

Even if used with trusted input, there is a list of little details about exec() and eval() that is too long to be included here but might affect how your application works in ways you would not expect. Armin Ronacher has a good article that lists the most important of them, titled Be careful with exec and eval in Python (refer to http://lucumr.pocoo.org/2011/2/1/exec-in-python/).

Despite all these frightening warnings, there are natural situations where the usage of exec() and eval() is really justified. Still, in the case of even the tiniest doubt, you should not use them and try to find a different solution.

The signature of the eval() function might make you think that if you provide empty globals and locals namespaces and wrap them with proper try ... except statements, then it will be reasonably safe. There could be nothing more wrong. Ned Batchelder has written a very good article in which he shows how to cause an interpreter segmentation fault in the eval() call, even with erased access to all Python built-ins (see http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html). This should be enough proof that both exec() and eval() should never be used with untrusted input.

We'll take a look at the abstract syntax tree in the next section.

The abstract syntax tree

The Python syntax is converted into AST format before it is compiled into byte code. This is a tree representation of the abstract syntactic structure of the source code. Processing of Python grammar is available thanks to the built-in ast module. Raw ASTs of Python code can be created using the compile() function with the ast.PyCF_ONLY_AST flag, or by using the ast.parse() helper. Direct translation in reverse is not that simple and there is no function provided in the standard library that can do so. Some projects, such as PyPy, do such things though.

The ast module provides some helper functions that allow you to work with the AST, for example:

>>> import ast
>>> tree = ast.parse('def hello_world(): print("hello world!")')
>>> tree
<_ast.Module object at 0x00000000038E9588>
>>> print(ast.dump(tree, indent=4))
Module(
    body=[
        FunctionDef(
            name='hello_world',
            args=arguments(
                posonlyargs=[],
                args=[],
                kwonlyargs=[],
                kw_defaults=[],
                defaults=[]),
            body=[
                Expr(
                    value=Call(
                        func=Name(id='print', ctx=Load()),
                        args=[
                            Constant(value='hello world!')],
                        keywords=[]))],
            decorator_list=[])],
    type_ignores=[])

It is important to know that the AST can be modified before being passed to compile(). This gives you many new possibilities. For instance, new syntax nodes can be used for additional instrumentation, such as test coverage measurement. It is also possible to modify the existing code tree in order to add new semantics to the existing syntax. Such a technique is used by the MacroPy project (https://github.com/lihaoyi/macropy) to add syntactic macros to Python using the already existing syntax (refer to Figure 8.3):

Figure 8.3: How MacroPy adds syntactic macros to Python modules on import

Unfortunately, MacroPy isn't compatible with the latest Python versions and is only tested to run on Python 3.4. Anyway, it is a very interesting project that shows what can be achieved with AST manipulation.

ASTs can also be created in a purely artificial manner, and there is no need to parse any source code at all. This gives Python programmers the ability to create Python bytecode for custom DSLs, or even to completely implement other programming languages on top of Python VMs.

Import hooks

Taking advantage of MacroPy's ability to modify original ASTs is as easy as using the import macropy.activate statement, because it is able to somehow override the Python import behavior. It is not magic and Python provides a way to intercept imports for every developer using the following two kinds of import hooks:

  • Meta hooks: These are called before any other import processing has occurred. Using meta hooks, you can override the way in which sys.path is processed for even frozen and built-in modules. To add a new meta hook, a new meta path finder object must be added to the sys.meta_path list.
  • Import path hooks: These are called as part of sys.path processing. They are used if the path item associated with the given hook is encountered. The import path hooks are added by extending the sys.path_hooks list with a new path entry finder object.

The details of implementing both path finders and meta path finders are extensively documented in the official Python documentation (see https://docs.python.org/3/reference/import.html). The official documentation should be your primary resource if you want to interact with imports on that level. This is so because the import machinery in Python is rather complex and any attempt to summarize it in a few paragraphs would inevitably fail. Here, we just want to make you aware that such things are possible.

We'll take a look at projects that use code generation patterns in the following sections.

Notable examples of code generation in Python

It is hard to find a really usable implementation of the library that relies on code generation patterns that is not only an experiment or simple proof of concept. The reasons for that situation are fairly obvious:

  • Deserved fear of the exec() and eval() functions because, if used irresponsibly, they can cause real disasters
  • Successful code generation is very difficult to develop and maintain because it requires a deep understanding of the language and exceptional programming skills in general

Despite these difficulties, there are some projects that successfully take this approach either to improve performance or achieve things that would be impossible by other means.

Falcon's compiled router

Falcon (http://falconframework.org/) is a minimalist Python WSGI web framework for building fast and lightweight web APIs. It strongly encourages the REST architectural style that is currently very popular around the web. It is a good alternative to other rather heavy frameworks, such as Django or Pyramid. It is also a strong competitor for other micro-frameworks that aim for simplicity, such as Flask, Bottle, or web2py.

One of the best Falcon features is its very simple routing mechanism. It is not as complex as the routing provided by Django urlconf and does not provide as many features, but in most cases is just enough for any API that follows the REST architectural design. What is most interesting about Falcon's routing is the internal construction of that router. Falcon's router is implemented using the code generated from the list of routes, and the code changes every time a new route is registered. This is the effort that's needed to make routing fast.

Consider this very short API example, taken from Falcon's web documentation:

# sample.py 
import falcon 
import json 
  
class QuoteResource: 
    def on_get(self, req, resp): 
        """Handles GET requests""" 
        quote = { 
            'quote': 'I\'ve always been more interested in ' 
                     'the future than in the past.', 
            'author': 'Grace Hopper' 
        } 
 
        resp.body = json.dumps(quote) 
  
api = falcon.API() 
api.add_route('/quote', QuoteResource())

In short, the call to the api.add_route() method dynamically updates the whole generated code tree for Falcon's request router. It also compiles it using the compile() function and generates the new route-finding function using eval(). Let's take a closer look at the following __code__ attribute of the api._router._find() function:

>>> api._router._find.__code__
<code object find at 0x00000000033C29C0, file "<string>", line 1>
>>> api.add_route('/none', None)
>>> api._router._find.__code__
<code object find at 0x00000000033C2810, file "<string>", line 1>

This transcript shows that the code of this function was generated from the string and not from the real source code file (the "<string>" file). It also shows that the actual code object changes with every call to the api.add_route() method (the object's address in memory changes).

Hy

Hy (http://docs.hylang.org/) is the dialect of Lisp that is written entirely in Python. Many similar projects that implement other programming languages in Python usually try only to tokenize the plain form of code that's provided either as a file-like object or string and interpret it as a series of explicit Python calls. Unlike others, Hy can be considered as a language that runs fully in the Python runtime environment, just like Python does. Code written in Hy can use the existing built-in modules and external packages and vice versa. Code written with Hy can be imported back into Python.

To embed Lisp in Python, Hy translates Lisp code directly into Python AST. Import interoperability is achieved using the import hook that is registered once the Hy module is imported into Python. Every module with the .hy extension is treated as a Hy module and can be imported like the ordinary Python module. The following is a "hello world" program written in this Lisp dialect:

;; hyllo.hy 
(defn hello [] (print "hello world!")) 

It can be imported and executed with the following Python code:

>>> import hy
>>> import hyllo
>>> hyllo.hello()
    hello world!

If we dig deeper and try to disassemble hyllo.hello using the built-in dis module, we will notice that the bytecode of the Hy function does not differ significantly from its pure Python counterpart, as shown in the following code:

>>> import dis
>>> dis.dis(hyllo.hello)
  2           0 LOAD_GLOBAL        0 (print)
              3 LOAD_CONST         1 ('hello world!')
              6 CALL_FUNCTION      1 (1 positional, 0 keyword pair)
              9 RETURN_VALUE
>>> def hello(): print("hello world!")
...
>>> dis.dis(hello)
  1           0 LOAD_GLOBAL        0 (print)
              3 LOAD_CONST         1 ('hello world!')
              6 CALL_FUNCTION      1 (1 positional, 0 keyword pair)
              9 POP_TOP
             10 LOAD_CONST         0 (None)
             13 RETURN_VALUE

As you can see, the bytecode for the Hy-based function is shorter than the bytecode for the plain Python counterpart. Maybe a similar tendency can be observed for larger chunks of code. It shows that creating a completely new language on top of a Python VM is definitely possible and may be worth experimenting with.

Summary

In this chapter, we've explored the vast topic of metaprogramming in Python. We've described the syntax features that favor the various metaprogramming patterns in detail. These are mainly decorators and metaclasses.

We've also taken a look at another important aspect of metaprogramming, dynamic code generation. We described it only briefly as it is too vast to fit into the limited size of this book. However, it should be a good starting point that gives you a quick summary of the possible options in that field.

With the example of Hy, we've seen that metaprogramming can even be used to implement other languages on top of the Python runtime. The road taken by Hy developers is of course quite unusual and, generally the best way to bridge Python with other languages is through custom Python interpreter extensions or using shared libraries and foreign function interfaces. And these are exactly the topics of the next chapter.

    Reset