Manjusaka

Manjusaka

Let's talk about generators and coroutines in Python.

Foreword#

I originally planned to continue writing about Flask this week, but after some thought, I decided to change things up and talk about the very important but often misunderstood generators and coroutines in Python.

Generators Explained#

I guess everyone is somewhat familiar with generators, but to allow me to continue to show off, let's take some time to explain what a generator is.
For example, in Python, if we want to generate a list within the range (1,100000), we might write the following code without much thought:

def generateList(start, stop):
    tempList = []
    for i in range(start, stop):
        tempList.append(i)
    return tempList

Note 1: Some might ask why we don't just return range(start, stop). Nice question! This involves a fundamental issue regarding how range works. This varies by version; in Python 2.x, range(start, stop) essentially pre-generates a list, and the list object is an Iterator, which can be used in a for statement.
range in Python 2.x
In Python 2.x, there is also a statement called xrange, which generates a Generator object.
xrange in Python 2.x
In Python 3, things changed a bit; the community felt that the split between range and xrange was too cumbersome, so they merged them. Now in Python 3, the syntax for xrange has been removed, and the mechanism for range has changed to generate a Generator instead of a list.
range in Python 3

But have you ever considered the problem that if we want to generate a very large amount of data, pre-generating it is undoubtedly unwise, as it would consume a lot of memory? Thus, Python provides us with a new approach: Generator.

def generateList1(start, stop):
    for i in range(start, stop):
        yield i

if __name__ == "__main__":
    c = generateList1(1, 100000)
    for i in c:
        print(i)

Yes, one of the features of a Generator is that it does not generate data all at once; instead, it generates an iterable object that controls its activation timing based on the logic we write.

Deep Dive into Generators#

You might wonder if Python developers would create a Generator mechanism just for this use case. So, do we have other scenarios for using Generators? Of course! Look at the title; indeed, another significant use of Generators is as coroutines. However, before we delve into that, we need to understand Generators more deeply to facilitate our later discussions.

Built-in Methods of Generators#

Background Knowledge on Iterable Objects in Python#

First, let's look at the iteration process in Python.
In Python, there are two concepts in iteration: Iterable and Iterator. Let's examine them separately.
Firstly, Iterable can be understood as a protocol. The way to determine if an Object is Iterable is to check if it implements iter. If it does, then it can be considered an Iterable object. Let's look at some code to understand this better:

class Counter:
    def __init__(self, low, high):
        self.current = low
        self.high = high

    def __iter__(self):
        return self

    def next(self):  # Python 3: def __next__(self)
        if self.current > self.high:
            raise StopIteration
        else:
            self.current += 1
            return self.current - 1

if __name__ == '__main__':
    a = Counter(3, 8)
    for c in a:
        print(c)

Now, let's see what happens in the code above. First, the for statement checks whether the object being iterated is an Iterable or an Iterator. If it is an object that implements the __iter__ method, then it is an Iterable object. The for loop first calls the object's __iter__ method to obtain an Iterator object. So what is an Iterator object? It can be roughly understood as one that implements the next() method (Note: in Python 3, it's the next method).

OK, let's return to what we were discussing. In the code above, the for statement first determines whether it is an Iterable or an Iterator. If it is an Iterable, it calls its iter method to get an Iterator object, and then the for loop will call the next() (Note: in Python 3, it's __next__) method of the Iterator object to iterate until the iteration ends and raises a StopIteration exception.

Now, Let's Talk About Generators#

Let's revisit the earlier code:

def generateList1(start, stop):
    for i in range(start, stop):
        yield i

if __name__ == "__main__":
    c = generateList1(1, 100000)
    for i in generateList1:
        print(i)

First, we need to establish that Generators are also Iterator objects. Let's look at the code above. The for statement confirms that generateList1 is an Iterator object, and then it starts calling the next() method for further iteration. At this point, you might wonder how the next() method allows generateList1 to continue iterating. The answer lies in the built-in send() method of the Generator. Let's look at another piece of code.

def generateList1(start, stop):
    for i in range(start, stop):
        yield i

if __name__ == "__main__":
    a = generateList1(0, 5)
    for i in range(0, 5):
        print(a.send(None))

What should we expect as output here? The answer is 0, 1, 2, 3, 4, which is the same result we would get using a for loop. Now we can conclude that:

The essence of Generator iteration is to call the built-in send() method through the built-in next() or __next__() method.

Continuing to Discuss Built-in Methods of Generators#

Earlier, we mentioned a conclusion:

The essence of Generator iteration is to call the built-in send() method through the built-in next() or __next__() method.

Now let's look at an example:

def countdown(n):
    print("Counting down from", n)
    while n >= 0:
        newvalue = (yield n)
        # If a new value got sent in, reset n with it
        if newvalue is not None:
            n = newvalue
        else:
            n -= 1

if __name__ == '__main__':
    c = countdown(5)
    for x in c:
        print(x)
        if x == 5:
            c.send(3)

What should the output of this code be?
The answer is [5, 2, 1, 0]. Isn't that confusing? Don't worry; let's look at the execution flow of this code.

Code Execution Flow

In short, when we call the send() function, the value we send(x) is sent to newvalue, and execution continues until the next yield is encountered, returning the value as the end of the process. Our Generator quietly sleeps in memory, waiting for the next send to wake it up.

Note 2: Someone asked, "I don't quite understand here; is c.send(3) equivalent to yield n returning a 3 to newvalue?" Good question! We can see from the execution diagram earlier that c.send(3) first assigns 3 to newvalue, then the program runs the remaining code until it encounters the next yield. At this point, the value of n has already changed to 3 before we reach yield n, so yield n is essentially equivalent to return 3. Then, the Generator freezes all variable states and quietly stays in memory, waiting for the next next or __next__() method or send() method to wake it up.

Tip: If we call send() directly, make sure to use send(None) the first time; only then is a Generator truly activated, allowing us to proceed with the next operation.

Let's Talk About Coroutines#

First, regarding the definition of coroutines, let's look at a segment from Wikipedia:

Coroutines are computer program components that generalize subroutines for nonpreemptive multitasking, by allowing multiple entry points for suspending and resuming execution at certain locations. Coroutines are well-suited for implementing more familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists, and pipes.
According to Donald Knuth, the term coroutine was coined by Melvin Conway in 1958, after he applied it to the construction of an assembly program. The first published explanation of the coroutine appeared later, in 1963.

In short, coroutines are a lighter model than threads, allowing us to control the timing of their start and stop. In Python, there isn't a dedicated concept for coroutines; the community generally views Generators as a special type of coroutine. Think about it: we can wake up our Generator using the next or __next__() method or the send() method, and after executing the code we specified, the Generator returns and freezes all its states. Isn't that exciting?!!

A Little Homework on Generators#

Now we want to perform a post-order traversal of a binary tree. I know that those reading this article can easily write it out, so let's look at the code first:

class Node(object):
    def __init__(self, val, left, right):
        self.val = val
        self.left = left
        self.right = right

def visit_post(node):
    if node.left:
        return visit_post(node.left)
    if node.right:
        return visit_post(node.right)
    return node.val

if __name__ == '__main__':
    node = Node(-1, None, None)
    for val in range(100):
        node = Node(val, None, node)
    print(list(visit_post(node)))

However, we know that if the recursion depth is too deep, we might either run out of stack space or fail the Python transaction. OK, the Generator method is good for keeping you safe, so let's look at the code directly:

def visit_post(node):
    if node.left:
        yield node.left
    if node.right:
        yield node.right
    yield node.val

def visit(node, visit_method):
    stack = [visit_method(node)]
    while stack:
        last = stack[-1]
        try:
            yielded = next(last)
        except StopIteration:
            stack.pop()
        else:
            if isinstance(yielded, Node):
                stack.append(visit_method(yielded))
            elif isinstance(yielded, int):
                yield yielded

if __name__ == '__main__':
    node = Node(-1, None, None)
    for val in range(100):
        node = Node(val, None, node)
    visit_generator = visit(node, visit_method=visit_post)
    print(list(visit_generator))

Looks complex, right? No worries; consider it homework. Everyone can leave comments, and we can discuss Python transactions together!

  1. Improve Your Python: Explaining 'yield' and 'Generators'
  2. The Power of yield
  3. http://my.oschina.net/1123581321/blog/160560
  4. Why Must Python Iterators Implement the iter Method (Regarding iterators, I simplified some things for easier understanding; please refer to the highly-rated answers to this question for specifics.)
Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.