Manjusaka

Manjusaka

Implement the simplest object model with Python

A Simple Object Model#

Carl Friedrich Bolz is a researcher at King's College London, deeply engrossed in the implementation and optimization of dynamic languages. He is one of the core developers of PyPy/RPython, and at the same time, he contributes code to languages such as Prolog, Racket, Smalltalk, PHP, and Ruby. This is his Twitter @cfbolz.

Introduction#

Object-oriented programming is a widely used programming paradigm that is supported by many modern programming languages. Although most languages provide similar object-oriented mechanisms for programmers, there are still many differences when delving into the details. The commonality among most languages lies in their object handling and inheritance mechanisms. However, not every language perfectly supports classes. For example, languages like Self or JavaScript, which use prototype inheritance, do not have the concept of classes; their inheritance behavior occurs between objects.

Understanding the object models of different languages is a fascinating endeavor. This allows us to appreciate the similarities among different programming languages. It must be said that such experiences can leverage our existing knowledge when learning new languages, enabling us to master them quickly.

This article will guide you in implementing a simple object model. First, we will create a simple class and its instances, allowing access to some methods through these instances. This is the object-oriented model adopted by early object-oriented languages such as Simula 67 and Smalltalk. Then, we will gradually expand this model, showcasing different language design philosophies in the next two steps, and finally optimizing the performance of our object model. The model we arrive at is not one used by any real existing language; however, if one must say, you can view our final model as a low-end version of the Python object model.

The object model presented in this article is implemented in Python. The code runs perfectly on Python 2.7 and Python 3.4. To help everyone better understand the design philosophy behind the model, this article also provides unit tests for the object model we designed, which can be run using py.test or nose.

To be honest, using Python as the implementation language for the object model is not a good choice. Generally speaking, language virtual machines are implemented in lower-level languages like C/C++, and many details need to be carefully managed to ensure execution efficiency. However, a simple language like Python allows us to focus our main efforts on different behaviors rather than getting bogged down in implementation details.

Basic Method Model#

We will start by explaining our object model using a very simple object model implemented in Smalltalk. Smalltalk is an object-oriented language developed in the 1970s by a group led by Alan Kay at Xerox PARC. It popularized object-oriented programming, and many features it introduced can still be seen in today's programming languages. One of the core design principles of Smalltalk is: "Everything is an object." The most well-known descendant of Smalltalk is Ruby, a language that retains the Smalltalk object model while using syntax similar to C.

In this section, our object model will include classes, instances, property access and modification, method calls, and will allow for the existence of subclasses. Before we begin, let me clarify that the classes here are ordinary classes with their own properties and methods.

A very good practice is to write test code first to constrain the behavior of the specific implementation. The test code written in this article consists of two parts. The first part consists of regular Python code that may use classes and some other advanced features in Python. The second part will use our self-defined object model to replace Python's classes.

When writing test code, we need to manually maintain the mapping between regular Python classes and our self-defined classes. For example, in our self-defined class, we will use obj.read_attr("attribute") as a substitute for obj.attribute in Python. In real life, such mapping would be handled by the language's compiler/interpreter.

In this article, we further simplify the model so that the code for implementing the object model and the code for writing methods in the object look quite similar. In reality, this is generally not possible, as these two are usually implemented by different languages.

First, let's write a piece of code to test reading and modifying object fields:


def test_read_write_field():
    # Python code
    class A(object):
            pass
    obj = A()
    obj.a = 1
    assert obj.a == 1
    obj.b = 5
    assert obj.a == 1
    assert obj.b == 5
    obj.a = 2
    assert obj.a == 2
    assert obj.b == 5

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("a", 1)
    assert obj.read_attr("a") == 1
    obj.write_attr("b", 5)
    assert obj.read_attr("a") == 1
    assert obj.read_attr("b") == 5
    obj.write_attr("a", 2)
    assert obj.read_attr("a") == 2
    assert obj.read_attr("b") == 5

The above test code includes three things we must implement. The Class and Instance classes represent the classes and instances in our object model, respectively. There are also two special instances of classes: OBJECT and TYPE. OBJECT corresponds to the object class, which serves as the starting point of Python's inheritance system (Note: In Python 2.x, there are actually two class systems, one referred to as new style class and the other as old style class, with object being the base class of new style class). TYPE corresponds to the type in Python's type system.

To provide common operation support for instances of the Class and Instance classes, both classes will inherit from a base class called Base, which provides a series of methods:

class Base(object):
    """ The base class that all of the object model classes inherit from. """
    def __init__(self, cls, fields):
        """ Every object has a class. """
        self.cls = cls
        self._fields = fields
    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        return self._read_dict(fieldname)
    def write_attr(self, fieldname, value):
        """ write field 'fieldname' into the object """
        self._write_dict(fieldname, value)
    def isinstance(self, cls):
        """ return True if the object is an instance of class cls """
        return self.cls.issubclass(cls)
    def callmethod(self, methname, *args):
        """ call method 'methname' with arguments 'args' on object """
        meth = self.cls._read_from_class(methname)
        return meth(self, *args)
    def _read_dict(self, fieldname):
        """ read an field 'fieldname' out of the object's dict """
        return self._fields.get(fieldname, MISSING)
    def _write_dict(self, fieldname, value):
        """ write a field 'fieldname' into the object's dict """
        self._fields[fieldname] = value

MISSING = object()

Base implements the storage of object classes and uses a dictionary to store the values of object fields. Now, we need to implement the Class and Instance classes. The constructor of Instance will complete the instantiation of the class and initialize the fields and dict. In other words, Instance is just a subclass of Base and does not add any additional methods.

The constructor of Class will accept several operations: class name, base class, class dictionary, and metaclass. For classes, these variables will be passed to the constructor by the user during class initialization. The constructor will also get default values for variables from its base class. However, we will discuss this point in the next chapter.

class Instance(Base):
    """Instance of a user-defined class. """
    def __init__(self, cls):
        assert isinstance(cls, Class)
        Base.__init__(self, cls, {})

class Class(Base):
    """ A User-defined class. """
    def __init__(self, name, base_class, fields, metaclass):
        Base.__init__(self, metaclass, fields)
        self.name = name
        self.base_class = base_class

You may have noticed that classes are still a special kind of object; they indirectly inherit from Base. Therefore, classes are also special instances of a special class, which is called a metaclass.

Now, we can successfully pass our first set of tests. However, we have not yet defined instances of Type and OBJECT. We will not construct these according to Smalltalk's object model, as it is too complex for us. As an alternative, we will adopt the type system of ObjVlisp1, from which Python's type system has absorbed a lot.

In the object model of ObjVlisp, OBJECT and TYPE are intertwined. OBJECT is the parent class of all classes, meaning OBJECT has no parent class. TYPE is a subclass of OBJECT. Generally, every class is an instance of TYPE. In specific cases, both TYPE and OBJECT are instances of TYPE. However, programmers can derive a class from TYPE to serve as a metaclass:

# set up the base hierarchy as in Python (the ObjVLisp model)
# the ultimate base class is OBJECT
OBJECT = Class(name="object", base_class=None, fields={}, metaclass=None)
# TYPE is a subclass of OBJECT
TYPE = Class(name="type", base_class=OBJECT, fields={}, metaclass=None)
# TYPE is an instance of itself
TYPE.cls = TYPE
# OBJECT is an instance of TYPE
OBJECT.cls = TYPE

To write a new metaclass, we need to derive from TYPE. However, in this article, we will not do that; we will only use TYPE as the metaclass for each of our classes.

Figure 14.1 - Inheritance

Now that the first set of tests has completely passed, let's take a look at the second set of tests, where we will test whether object property reading and writing work correctly. This code is also easy to write.

def test_read_write_field_class():
    # classes are objects too
    # Python code
    class A(object):
        pass
    A.a = 1
    assert A.a == 1
    A.a = 6
    assert A.a == 6

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={"a": 1}, metaclass=TYPE)
    assert A.read_attr("a") == 1
    A.write_attr("a", 5)
    assert A.read_attr("a") == 5

isinstance Check#

So far, we have not utilized the characteristic of objects having classes. The next test code will automatically implement isinstance.

def test_isinstance():
    # Python code
    class A(object):
        pass
    class B(A):
        pass
    b = B()
    assert isinstance(b, B)
    assert isinstance(b, A)
    assert isinstance(b, object)
    assert not isinstance(b, type)

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={}, metaclass=TYPE)
    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    b = Instance(B)
    assert b.isinstance(B)
    assert b.isinstance(A)
    assert b.isinstance(OBJECT)
    assert not b.isinstance(TYPE)

We can determine whether the obj object is an instance of certain classes cls by checking if cls is the class of obj or one of its superclasses. To determine if a class is a superclass of another class, we check if the class is in the superclass chain. This chain, which includes the superclass and the class itself, is called the method resolution order (MRO). It can be easily computed recursively:

 class Class(Base):
     ...

     def method_resolution_order(self):
         """ compute the method resolution order of the class """
         if self.base_class is None:
             return [self]
         else:
             return [self] + self.base_class.method_resolution_order()

     def issubclass(self, cls):
         """ is self a subclass of cls? """
         return cls in self.method_resolution_order()

After modifying the code, the tests can pass completely.

Method Calls#

The object model we established earlier lacks the important feature of method calls. In this chapter, we will establish a simple inheritance model.

def test_callmethod_simple():
    # Python code
    class A(object):
        def f(self):
            return self.x + 1
    obj = A()
    obj.x = 1
    assert obj.f() == 2

    class B(A):
        pass
    obj = B()
    obj.x = 1
    assert obj.f() == 2 # works on subclass too

    # Object model code
    def f_A(self):
        return self.read_attr("x") + 1
    A = Class(name="A", base_class=OBJECT, fields={"f": f_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 1)
    assert obj.callmethod("f") == 2

    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 2)
    assert obj.callmethod("f") == 3

To find the correct implementation for calling object methods, we now begin to discuss the method resolution order of class objects. The first method found in the class object dictionary according to the MRO will be called:

class Class(Base):
    ...

    def _read_from_class(self, methname):
        for cls in self.method_resolution_order():
            if methname in cls._fields:
                return cls._fields[methname]
        return MISSING

After completing the implementation of the callmethod in the Base class, the tests can pass.

To ensure that function parameter passing is correct and that our previous code can achieve method overloading, we can write the following test code, and the result will pass the test perfectly:

def test_callmethod_subclassing_and_arguments():
    # Python code
    class A(object):
        def g(self, arg):
            return self.x + arg
    obj = A()
    obj.x = 1
    assert obj.g(4) == 5

    class B(A):
        def g(self, arg):
            return self.x + arg * 2
    obj = B()
    obj.x = 4
    assert obj.g(4) == 12

    # Object model code
    def g_A(self, arg):
        return self.read_attr("x") + arg
    A = Class(name="A", base_class=OBJECT, fields={"g": g_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 1)
    assert obj.callmethod("g", 4) == 5

    def g_B(self, arg):
        return self.read_attr("x") + arg * 2
    B = Class(name="B", base_class=A, fields={"g": g_B}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 4)
    assert obj.callmethod("g", 4) == 12

Basic Property Model#

Now that the simplest version of the object model is operational, we still need to continue improving it. This section will introduce the differences between the basic method model and the basic property model. This is also the core difference among Smalltalk, Ruby, JavaScript, Python, and Lua.

The basic method model will call methods in the most primitive way:

result = obj.f(arg1, arg2)

The basic property model will divide the calling process into two steps: finding the property and returning the execution result:

    method = obj.f
    result = method(arg1, arg2)

You can experience the differences mentioned above in the following tests:

def test_bound_method():
    # Python code
    class A(object):
        def f(self, a):
            return self.x + a + 1
    obj = A()
    obj.x = 2
    m = obj.f
    assert m(4) == 7

    class B(A):
        pass
    obj = B()
    obj.x = 1
    m = obj.f
    assert m(10) == 12 # works on subclass too

    # Object model code
    def f_A(self, a):
        return self.read_attr("x") + a + 1
    A = Class(name="A", base_class=OBJECT, fields={"f": f_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 2)
    m = obj.read_attr("f")
    assert m(4) == 7

    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 1)
    m = obj.read_attr("f")
    assert m(10) == 12

We can set up property calls in the same way as we did for method calls, but there are some changes compared to method calls. First, we will look for the method name corresponding to the function name in the object. This lookup process results in what is called a bound method, specifically, this result is a special object that binds the method to a specific object. Then this bound method will be called in the subsequent operations.

To implement this operation, we need to modify the implementation of Base.read_attr. If the corresponding property is not found in the instance dictionary, we need to look it up in the class dictionary. If the property is found in the class dictionary, we will perform the method binding operation. We can easily simulate the bound method using a closure. Besides changing the implementation of Base.read_attr, we can also modify the Base.callmethod method to ensure our code passes the tests.

class Base(object):
    ...
    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        result = self._read_dict(fieldname)
        if result is not MISSING:
            return result
        result = self.cls._read_from_class(fieldname)
        if _is_bindable(result):
            return _make_boundmethod(result, self)
        if result is not MISSING:
            return result
        raise AttributeError(fieldname)

    def callmethod(self, methname, *args):
        """ call method 'methname' with arguments 'args' on object """
        meth = self.read_attr(methname)
        return meth(*args)

def _is_bindable(meth):
    return callable(meth)

def _make_boundmethod(meth, self):
    def bound(*args):
        return meth(self, *args)
    return bound

The rest of the code does not need to be modified.

Metaclass Protocol#

In addition to regular class methods, many dynamic languages also support special methods. These methods are called by the object system rather than through regular calls. In Python, you can see these methods' names start and end with two underscores, such as __init__. Special methods can be used to overload some regular operations and provide custom functionality. Therefore, their existence can inform the object model how to automatically handle different tasks. For a description of related special methods in Python, you can refer to this documentation.

The concept of the metaclass protocol was introduced by Smalltalk and is widely used in the object model of general Lisp systems like CLOS. This concept includes a collection of special methods (Note: The reference to coined3 here is unclear; please refer to the proofreader for clarification).

In this chapter, we will add three metacall operations to our object model. They will be used to provide finer control over our read and modify operations on objects. The first two methods we need to add are __getattr__ and __setattr__, which look very similar to the method names of functions with the same functionality in Python.

Custom Property Read and Write Operations#

The __getattr__ method will be called when a property cannot be found through regular means, in other words, when the corresponding property cannot be found in the instance dictionary, class dictionary, superclass dictionary, etc. We will pass the name of the property being searched for as a parameter to this method. In early Smalltalk, this method was called doesNotUnderstand:.

In the case of __setattr__, things may change a bit. First, we need to clarify that setting a property usually means we need to create it, and at this time, the __setattr__ method will typically be triggered when setting a property. To ensure the existence of __setattr__, we need to implement the __setattr__ method in the OBJECT object. This basic implementation will complete the operation of writing properties to the corresponding dictionary. This allows users to delegate their defined __setattr__ to the OBJECT.__setattr__ method.

The test cases for these two special methods are as follows:

def test_getattr():
    # Python code
    class A(object):
        def __getattr__(self, name):
            if name == "fahrenheit":
                return self.celsius * 9. / 5. + 32
            raise AttributeError(name)

        def __setattr__(self, name, value):
            if name == "fahrenheit":
                self.celsius = (value - 32) * 5. / 9.
            else:
                # call the base implementation
                object.__setattr__(self, name, value)
    obj = A()
    obj.celsius = 30
    assert obj.fahrenheit == 86 # test __getattr__
    obj.celsius = 40
    assert obj.fahrenheit == 104

    obj.fahrenheit = 86 # test __setattr__
    assert obj.celsius == 30
    assert obj.fahrenheit == 86

    # Object model code
    def __getattr__(self, name):
        if name == "fahrenheit":
            return self.read_attr("celsius") * 9. / 5. + 32
        raise AttributeError(name)
    def __setattr__(self, name, value):
        if name == "fahrenheit":
            self.write_attr("celsius", (value - 32) * 5. / 9.)
        else:
            # call the base implementation
            OBJECT.read_attr("__setattr__")(self, name, value)

    A = Class(name="A", base_class=OBJECT,
              fields={"__getattr__": __getattr__, "__setattr__": __setattr__},
              metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("celsius", 30)
    assert obj.read_attr("fahrenheit") == 86 # test __getattr__
    obj.write_attr("celsius", 40)
    assert obj.read_attr("fahrenheit") == 104
    obj.write_attr("fahrenheit", 86) # test __setattr__
    assert obj.read_attr("celsius") == 30
    assert obj.read_attr("fahrenheit") == 86

To pass the tests, we need to modify the Base.read_attr and Base.write_attr methods:

class Base(object):
    ...

    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        result = self._read_dict(fieldname)
        if result is not MISSING:
            return result
        result = self.cls._read_from_class(fieldname)
        if _is_bindable(result):
            return _make_boundmethod(result, self)
        if result is not MISSING:
            return result
        meth = self.cls._read_from_class("__getattr__")
        if meth is not MISSING:
            return meth(self, fieldname)
        raise AttributeError(fieldname)

    def write_attr(self, fieldname, value):
        """ write field 'fieldname' into the object """
        meth = self.cls._read_from_class("__setattr__")
        return meth(self, fieldname, value)

The process of obtaining properties becomes a call to the __getattr__ method, passing the field name as a parameter. If the field does not exist, an exception will be raised. Note that __getattr__ can only be called within the class (the same applies to special methods in Python), and we need to avoid recursive calls like self.read_attr("__getattr__"), as this would cause infinite recursion if the __getattr__ method is not defined.

The modification operation for properties will also be executed by the __setattr__ method, just like reading. To ensure that this method can function correctly, OBJECT needs to implement the default behavior of __setattr__, such as:

def OBJECT__setattr__(self, fieldname, value):
    self._write_dict(fieldname, value)
OBJECT = Class("object", None, {"__setattr__": OBJECT__setattr__}, None)

The specific implementation of OBJECT.__setattr__ is similar to the previous implementation of the write_attr method. After completing these modifications, we can successfully pass our tests.

Descriptor Protocol#

In the above tests, we frequently switched between different temperature scales. It must be said that it can be quite tedious to check the names of the properties used during modification operations. To solve this problem, Python introduced the concept of the descriptor protocol.

We will obtain specific properties from the __getattr__ and __setattr__ methods, while the descriptor protocol will trigger a special method when the property call process ends and returns a result. The descriptor protocol can be seen as a special means of binding classes and methods, and we can use the descriptor protocol to complete the binding of methods to objects. In addition to binding methods, one of the most important use cases for descriptors in Python is staticmethod, classmethod, and property.

In the following text, we will introduce how to use descriptors for object binding. We can achieve this by using the __get__ method, as shown in the following test code:

def test_get():
    # Python code
    class FahrenheitGetter(object):
        def __get__(self, inst, cls):
            return inst.celsius * 9. / 5. + 32

    class A(object):
        fahrenheit = FahrenheitGetter()
    obj = A()
    obj.celsius = 30
    assert obj.fahrenheit == 86

    # Object model code
    class FahrenheitGetter(object):
        def __get__(self, inst, cls):
            return inst.read_attr("celsius") * 9. / 5. + 32

    A = Class(name="A", base_class=OBJECT,
              fields={"fahrenheit": FahrenheitGetter()},
              metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("celsius", 30)
    assert obj.read_attr("fahrenheit") == 86

The __get__ method will be called on the instance of FahrenheitGetter after the property lookup is complete. The parameters passed to __get__ are the instance at the end of the lookup process.

Implementing this functionality is quite simple; we can easily modify the _is_bindable and _make_boundmethod methods:

def _is_bindable(meth):
    return hasattr(meth, "__get__")

def _make_boundmethod(meth, self):
    return meth.__get__(self, None)

Now, this simple modification ensures that we pass the tests. The previous tests regarding method binding can also pass. In Python, after the execution of the __get__ method, a bound method object will be returned.

In practice, the descriptor protocol can indeed seem quite complex. It also includes the __set__ method used for setting properties. Additionally, the version we have implemented here is somewhat simplified. Note that the previous _make_boundmethod method calling __get__ is an implementation-level operation, rather than using meth.read_attr('__get__'). This is necessary because our object model borrows functions and methods from Python rather than showcasing Python's object model. Further refining the model could effectively address this issue.

Instance Optimization#

The establishment of the object model in the previous three parts has been accompanied by many behavioral changes, while the final part of optimization will not involve behavioral changes. This optimization method is called map, which is widely used in self-booting language virtual machines. It is one of the most important optimization techniques for object models: it is applied in PyPy and modern JavaScript virtual machines like V8 (in V8, this method is referred to as hidden classes).

This optimization technique is based on the observation that, in the object model we have implemented so far, all instances use a complete dictionary to store their properties. Dictionaries are implemented based on hash tables, which can consume a lot of memory. Often, instances of the same class will have the same properties. For example, a class Point will have instances that all contain the same properties x and y.

The Map optimization takes advantage of this fact. It will split each instance's dictionary into two parts. One part stores property names that can be shared among all instances. The other part only stores a reference to the Map generated from the first part and the actual values. The map that stores property names will serve as an index for the values.

We will write some test cases for the above requirements, as follows:

def test_maps():
    # white box test inspecting the implementation
    Point = Class(name="Point", base_class=OBJECT, fields={}, metaclass=TYPE)
    p1 = Instance(Point)
    p1.write_attr("x", 1)
    p1.write_attr("y", 2)
    assert p1.storage == [1, 2]
    assert p1.map.attrs == {"x": 0, "y": 1}

    p2 = Instance(Point)
    p2.write_attr("x", 5)
    p2.write_attr("y", 6)
    assert p1.map is p2.map
    assert p2.storage == [5, 6]

    p1.write_attr("x", -1)
    p1.write_attr("y", -2)
    assert p1.map is p2.map
    assert p1.storage == [-1, -2]

    p3 = Instance(Point)
    p3.write_attr("x", 100)
    p3.write_attr("z", -343)
    assert p3.map is not p1.map
    assert p3.map.attrs == {"x": 0, "z": 1}

Note that the style of the test code here looks a bit different from our previous test code. Previously, all tests only tested the functionality of classes through implemented interfaces. Here, the tests inspect the internal properties of the classes and compare them with preset values. This testing method is known as white-box testing.

The map containing attrs in p1 stores the properties x and y, with values 0 and 1 stored in p1, respectively. Then, when creating the second instance p2, we add the same properties to the same map in the same way. In other words, if different properties are added, then the map will not be shared.

The Map class looks like this:

class Map(object):
    def __init__(self, attrs):
        self.attrs = attrs
        self.next_maps = {}

    def get_index(self, fieldname):
        return self.attrs.get(fieldname, -1)

    def next_map(self, fieldname):
        assert fieldname not in self.attrs
        if fieldname in self.next_maps:
            return self.next_maps[fieldname]
        attrs = self.attrs.copy()
        attrs[fieldname] = len(attrs)
        result = self.next_maps[fieldname] = Map(attrs)
        return result

EMPTY_MAP = Map({})

The Map class has two methods: get_index and next_map. The former is used to look up the corresponding property name in the index of the object's storage space. The latter should be used when new properties are added to the object. In this case, different instances need to calculate different mappings using next_map. This method will use next_maps to look up existing mappings. Thus, similar instances will use similar Map objects.

image

Figure 14.2 - Map transitions

The implementation of Instance using map is as follows:

class Instance(Base):
    """Instance of a user-defined class. """

    def __init__(self, cls):
        assert isinstance(cls, Class)
        Base.__init__(self, cls, None)
        self.map = EMPTY_MAP
        self.storage = []   

    def _read_dict(self, fieldname):
        index = self.map.get_index(fieldname)
        if index == -1:
            return MISSING
        return self.storage[index]

    def _write_dict(self, fieldname, value):
        index = self.map.get_index(fieldname)
        if index != -1:
            self.storage[index] = value
        else:
            new_map = self.map.next_map(fieldname)
            self.storage.append(value)
            self.map = new_map

Now this class passes None as the field dictionary to the Base class because Instance will construct the storage dictionary in another way. Therefore, it needs to override _read_dict and _write_dict. In practical operations, we will restructure the Base class so that it no longer handles the storage of field dictionaries. However, for now, passing None as a parameter is sufficient.

When a new instance is created, it uses EMPTY_MAP, which contains no objects. After implementing _read_dict, we will look up the index of the property name from the instance's map, and then map it to the corresponding storage table.

Writing data to the field dictionary is divided into two situations. The first is modifying an existing property value, which simply involves modifying the corresponding value in the mapped list. If the corresponding property does not exist, then a map transformation will occur (as shown in the above diagram), and the next_map method will be called, and the new value will be stored in the storage list.

You might be wondering what this optimization method actually optimizes. Generally speaking, it can optimize memory usage well when there are many instances with similar structures. However, keep in mind that this is not a universal optimization technique. Sometimes, when the code is filled with instances of different structures, this method may consume even more space.

This is a common issue in dynamic language optimization. Generally, it is unlikely to find a one-size-fits-all method to optimize code for speed and space efficiency. Therefore, specific situations need to be analyzed, and we need to choose optimization methods based on different circumstances.

An interesting point about the Map optimization is that although it only consumes memory, it can also significantly improve program performance when the VM uses JIT technology. To achieve this, JIT technology uses mappings to find the offset of properties in the storage space, completely eliminating the need for dictionary lookups.

Potential Extensions#

Extending our object model and introducing design choices from different languages is quite easy. Here are some possible directions:

  • The simplest is to add more special methods, such as __init__, __getattribute__, __set__, which are very easy and interesting to implement.

  • Extend the model to support multiple inheritance. To achieve this, each class would need a list of parent classes. Then, Class.method_resolution_order would need to be modified to support method lookup. A simple MRO calculation rule could use a depth-first principle, while a more complex one could adopt the C3 algorithm, which better handles issues arising from diamond inheritance structures.

  • A more radical idea is to switch to a prototype model, which requires eliminating the differences between classes and instances.

Conclusion#

The core of object-oriented programming language design lies in the details of its object model. Writing a simple object model is a simple and enjoyable task. You can use this approach to understand how existing languages work and delve into the design principles of object-oriented languages. Writing different object models to validate different design ideas is a fantastic method. You no longer need to focus on other trivial matters, such as parsing and executing code.

The work of writing object models is also very useful in practice. Besides serving as experimental tools, they can also be used by other languages. There are many examples of this: for instance, the GObject model, which is written in C and used in GLib and other Gnome projects, as well as various object models implemented in JavaScript.

References#

  1. P. Cointe, “Metaclasses are first class: The ObjVlisp Model,” SIGPLAN Not, vol. 22, no. 12, pp. 156–162, 1987.↩

  2. It seems that the attribute-based model is conceptually more complex because it needs both method lookup and call. In practice, calling something is defined by looking up and calling a special attribute __call__, so conceptual simplicity is regained. This won't be implemented in this chapter, however.↩

  3. G. Kiczales, J. des Rivieres, and D. G. Bobrow, The Art of the Metaobject Protocol. Cambridge, Mass: The MIT Press, 1991.↩

  4. A. Goldberg, Smalltalk-80: The Language and its Implementation. Addison-Wesley, 1983, page 61.↩

  5. In Python, the second argument is the class where the attribute was found, though we will ignore that here.↩

  6. C. Chambers, D. Ungar, and E. Lee, “An efficient implementation of SELF, a dynamically-typed object-oriented language based on prototypes,” in OOPSLA, 1989, vol. 24.↩

  7. How that works is beyond the scope of this chapter. I tried to give a reasonably readable account of it in a paper I wrote a few years ago. It uses an object model that is basically a variant of the one in this chapter: C. F. Bolz, A. Cuni, M. Fijałkowski, M. Leuschel, S. Pedroni, and A. Rigo, “Runtime feedback in a meta-tracing JIT for efficient dynamic languages,” in Proceedings of the 6th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems, New York, NY, USA, 2011, pp. 9:1–9:8.↩

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.