Python: Iterable, Iterator

Iteration

Iteration : This concept is a general term used in other programming languages not just Python’s concept, and refers to indexing data structures such as arrays one by one through repetitive statements.

arr = [1, 2, 3, 4]

for i in arr:
  print(f"i is {i}")

1
2
3
4

Abstract Base Class(ABC)

Abstract Base Classis a concepts used to lay the hierarchy of data structures in object-oriented programming, and this concept also has features that are not limited to Python. Many languages that support OOP use ABC to design the hierarchy of data structures, especially Java even has the keyword abstract that defines abstract classes.

Then what is abstract?

In computer science, the meaning of 'abstract' is far from hardware (Python is abstract rather than language such as C because it does not directly collect garbage), and in software engineering, the meaning of 'abstract' is roughly correct to interpret it as not embodying the specific properties or actions of an object.

In Python, everything is class. List, tuple, set, and dict are the four basic built-in data structures of Python.

The four data structures have unique movements and characteristics, and internally inherit a specific abstract data structure defined by Python.

In Python, each abstract data structure defines a method that the class that inherits itself should implement, and no specific action is specified to suit the meaning of 'abstract'.

In Python, the abstract data structure, which is the backbone of other data structures, including four core data structures, is managed as a class in an internal module called collections.abc.

from collections.abc import Container, Collection, Sequence

print(Container)
print(Collection)
print(Sequence)

----
<class 'collections.abc.Container'>
<class 'collections.abc.Collection'>
<class 'collections.abc.Sequence'>

And each abstract data structure declares a method that the class that inherits itself should implement, which can be examined by the attribute cls.__abstractmethods__.

print(Container.__abstractmethods__)
print(Collection.__abstractmethods__)
print(Sequence.__abstractmethods__)

frozenset({'__contains__'})
frozenset({'__len__', '__contains__', '__iter__'})
frozenset({'__len__', '__getitem__'})

At this time, the child class (data structure) that inherits them must fully implement each method.

The method __len__ is executed when the ren built-in function is called.

Through this function, we found many lengths of the list.

This was possible because internally, the list fully implements an abstract method called __len__ of the Collection abstract class.

In addition, the len() function can be applied not only to the list, but also to the tuple, set, and dict, all of which inherit the collection.

_list = [1, 2, 3]
_tuple = (4, 5)
_dict = {'six': 6, "seven": 7, "eight": 8}
_set = set()

assert len(_list) == 3
assert len(_tuple) == 2
assert len(_dict) == 3
assert len(_set) == 0


assert issubclass(list, Collection)
assert issubclass(tuple, Collection)
assert issubclass(dict, Collection)
assert issubclass(set, Collection)

issubclass() is a built-in function that verifies whether the first class parameter is a child class of the second class parameter.

The four basic data structures inherit an abstract class called Collection conceptually and practically. What should be remembered at this time is that the specific implementation contents of the method will be different for each data structure. This is the polymorphism of object orientation.

Iterable, Iterator, and Generator are both concepts and abstract classes defined in ABC. These three are abstract classes, and therefore basic data structures are inherited or not inherited.

abs = (Iterable, Iterator, Generator)
basic = (list, tuple, set, dict)

for b in basic:
    for a in abs:
        print(f'Did {b.__name__} inherit {a.__name__}?', issubclass(b, a))

-----
Did list inherit Iterable? True
Did list inherit Iterator? False
Did list inherit Generator? False

Did tuple inherit Iterable? True
Did tuple inherit Iterator? False
Did tuple inherit Generator? False

Did set inherit Iterable? True
Did set inherit Iterator? False
Did set inherit Generator? False

Did dict inherit Iterable? True
Did dict inherit Iterator? False
Did dict inherit Generator? False

Iterable and Iterator

The two classes are a pair defined in Python's concept of Iterator Protocol.

So first, we examine roughly what the Iterator Protocol is, and then we examine Iterable and Iterator, respectively.

If we understand these two, we can understand more vividly exactly how the for statement works in Python. At the end of this chapter, the inheritance relationship between the two is confirmed by code and conceptually.

Iterator Protocol

Let's conceptually explain Iterator Protocol first.

The Iterator Protocol defines rules that conceptually and practically implement Iterable and Iterator.

Simply put, it is a way to make Iterable and Iterator.

All the Iterable and Iterator classes we use are implemented as required by this protocol.

In other words, as discussed above, list, dict, etc. are Iterable because they meet all the Iterable implementation requirements defined by the Iterator Protocol internally.

This also means that if we create a class that complies with this protocol, we can create our own customized Iterables and Iterators (not predefined ones such as dict, list, etc.).


Iterable

Iterable refers to any object that can be traversed. In other words, any value that can follow the for statement in keyword in Python is Iterable.

Then, not to mention list, tuple, set, and dict, strings, files, etc. can be called Iterable.

s = 'abc'
for c in s:
  print(c)

file_name = 'any_textfile_path'
for line in open(file_name):
    print(line)

---
a
b
c

...

Are they really Iterable?

import io

assert issubclass(str, Iterable)
assert issubclass(io.TextIOWrapper, Iterable)

All work well without AssertionError errors. In other words, the string and file class are Iterable.

Then, how can we make Iterable through Iterator protocol? Let's check the abstract method to be implemented with the attribute that we checked earlier.

print(Iterable.__abstrcatmethods__)

frozenset({'__iter__'})

That is, the class that inherits Iterable, (=for the class to be Iterable):

The __iter__ abstract method must be implemented in practice, and it must return a new Iterator each time it is called.

In order for the data structure or class to be Iterable, only this condition needs to be satisfied.

Once the __iter__ method has been implemented in the class, an Iterator can be generated by applying an embedded function called iter to the object.

l = [1, 2, 3]
t = (3, 4)
d = {'a': 1, 'b': 2}
s = set()
r = range(10)

print(iter(l))
print(iter(t))
print(iter(d))
print(iter(s))
print(iter(r))


<list_iterator object at 0x7f08026e4fd0>
<tuple_iterator object at 0x7f08026e49e8>
<dict_keyiterator object at 0x7f0802721228>
<set_iterator object at 0x7f08031b9480>
<range_iterator object at 0x7f0801dc6390>

Iterator

Iterable can be defined as 'any value that can be put in the for statement', but Iterator is a little more tricky.

The Iterator remains in the state and is considered to return one element each time it is needed up to the last value that can be returned.

The meaning of "return elements one by one up to the last value that can be returned" is the same as returning values one by one in the list.

l = [1, 2, 3, 4, 5]

for i in l:
  print(i)

---
1
2
3
4
5

Each value was returned one by one through the for statement. And when you get to the end (in this case 5), you no longer return it. So far, it is not different from Iterable, but 'have a state' is important.

Each Iterator has a state. Every time an iter function is written to Iterable, a new Iterator is created.

At this time, each Iterator maintains a different state. In other words, the operation of one Iterator does not affect the operation of the other Iterator.

assert iter(l) != iter(l) # TRUE! 

So what is the state managed by each Iterator? There will be a lot of things, but here we can think of it as a location where each Iterator is touring.

In other words, by creating different Iterators for one Iterable (e.g., [1, 2, 3, 4]), one Iterator can return up to 2 so that the next return value stays at 3, and the other Iterator can return up to 4 so that there is no more value to return.

The two Iterators are completely different objects that maintain and manage different state values.

The Iterator is an object returned by applying Iterable as a parameter of the iter function, and it can be conceptually understood as an object that returns values one by one and maintains and manages the state value.

print(Iterator.__abstractmethods__)

frozenset({'__next__'})

It seems that the essential method for a class to be an Iterator is __next__.

For a class to be an Iterator, the following conditions must be met:

  • The class should implement __iter__ but return itself.

  • The class should implement the __next__ method to define the next value to be returned when it is given as a factor for the next built-in function.

  • If the Iterator no longer has a value to return, the _next_ method causes a StopIteration exception.

When all three requirements are implemented, the class can be called an Iterator.

Create my own Iterable, Iterator

Let’s make our own Iterable and Iterator.

  1. First, we have to make ensure that the __iter__ method returns a new Iterator each time.

class RandomIntIterable:
	def __init__(self) -> None:
		self.n = n

	def __iter__(self):
		return RandomIntIterable(self.n)

And then, let’s define Iterator.

from random import randint


class RandomIntIterable:
    def __init__(self, n):
        self.count = 0
        self.n = n

    def __iter__(self):
        return RandomIntIterable(self.n)

    def __next__(self):
        if self.count < self.n:
            self.count += 1
            return randint(1, 100)
        else:
            raise StopIteration


able = RandomIntIterable(3)
tor1 = iter(able)
tor2 = iter(able)

assert tor1 != tor2

# assert tor1 is iter(tor1) # Error !

print(next(tor1))
print(next(tor1))
print(next(tor1))
print(next(tor1)) # Raise StopIteration!

We have defined RandomIntIterator.

It receives the number n from the constructor method and assigns it as an instance property.

The count variable that stores how many integers have been returned to date is also initialized to 0. This variable can be called a state.

The __next__ method returns a random integer when the number of integers returned so far is n or less.

Since the requirements for n pieces have been completed, StopIteration exception is returned after that.

Last updated