posted: June 29, 2019
tl;dr: The container model for variables doesn’t work for some of today’s most popular languages, yet it is still taught...
The two languages that I use the most these days, JavaScript and Python, are both dynamically-typed object-oriented languages. In fact they are probably more similar than you think, but that’s a blog post for another day. In both languages all variable values are actually objects stored somewhere in memory, and the variable names are pointers or references to the appropriate object. A single object can, and often will, have more than one variable name pointing to it. If that object is mutable, changing the object’s value by using one variable name will give the appearance of changing the value for the other variable.
For example, in the Node.js REPL:
> a = b = {}
{}
> a['x'] = 1
1
> b
{ x: 1 }
When the exact same keystrokes are input into the Python REPL, you get the same result, albeit formatted slightly differently:
>>> a = b = {}
>>> a['x'] = 1
>>> b
{'x': 1}
This apparent anomaly is easy to explain if you first have a proper model in your head for how these languages manage memory and variables.
I recently had the pleasure of teaching a Python 101 course at meltmedia, which meant introducing some Python newbies to the language. Knowing that we would soon run into this issue, I felt it was important, at the beginning of the course, to have the students learn a proper memory model.
So I spent much of the second class presenting a (still somewhat simplified) model of how Python manages memory, with particular emphasis on how Python creates objects which are referenced by variable names. I talked about how even each integer is an object that lives somewhere in memory, and how variables are just names, labels, references, and pointers to the actual object. As an analogy I talked about how a person’s name is not the actual person, and how people can and do have multiple names, all of which reference the same person. Even though it was the second class, and even though this built-in function is rarely used in actual Python programs, I showed them the id() function, which in the CPython interpreter returns the memory address of the object referenced by the variable name. It was then easy to see how the location in memory of a variable moves when the value changes:
>>> spam = 40
>>> id(spam)
4372780416
>>> spam = spam + 2
>>> id(spam)
4372780480
This blew the minds of a few students who had experience with some other languages, such as C. In C the “container” model (a.k.a. the “labeled box” model) for variables is an accurate one: variables such as integers are fixed locations in memory that contain a value, and when the value changes, the location of the variable remains the same, while what is stored at that location changes.
Yet even though Python and JavaScript do not behave like C, and the container memory model isn’t accurate, I still see it being used more often than not in introductory texts and tutorials. One of the many reasons that we used the Think Python book for the class is that it presents an accurate object model for variables. But the book we’re going to use in the follow-on Python 111: Applied Python course, Automate the Boring Stuff with Python, presents the container model, as do many others. The container memory model is simpler for novices to grasp initially, but its inaccuracies are just going to cause problems down the road.
The benefits of the more-accurate object memory model are:
It took a little bit of explaining, but I felt the students understood the object model. There’s no need for the container model in languages such as Python and JavaScript: it is time to deprecate it.