To type check or not to type check?

posted: August 4, 2018

tl;dr: A dynamic type system, with optional static type checking, seems like the best way to go...

In one of my last projects at Uprising Technology we decided to implement some type checking in a particular area of our Python-based back end, and I wrote some code to do so. It got me thinking about where and when type checking can be of value.

Python is a dynamically typed language in which the type information of variables, functions, and methods does not have to be declared in advance, and in which the type of a variable can change over time. It stands in stark contrast to a strong statically typed language such as Java, in which the type information must be declared in advance in the source code; any attempt to pass something of the wrong type to a Java method will result in a compilation error.

Java is dominant throughout large enterprises, where there are often teams of dozens or hundreds of programmers working on the same codebase. More than a few veteran software leaders insist that large enterprise-class applications can only be written in a statically typed language such as Java. There is, however, a growing body of evidence against that belief: the success of companies (often webapp companies) which use Python extensively, including Instagram, YouTube, and Dropbox.

Types matter in Python, although they can often be ignored

Having developed in both Java and Python (as well as languages too numerous and old to mention) there is no doubt which language I would choose for a webapp startup company like Uprising that needs to develop code quickly with a very small team: Python. Writing code to perform the same functionality takes significantly less development time in Python than Java. Not having to explicitly prescribe type information saves time, and usually it is obvious what the intended types are. The resulting code is more succinct, with fewer lines and words, and more closely resembles pseudocode. Changing the type of a variable is actually not done all that often in practice, but is also helps cut down on verbosity. If a ‘start_date’ value comes into a function as a string and needs to be transformed into a datetime object, the same ‘start_date’ variable name can be reused instead of declaring another variable with a similar-but-different name, such as ‘start_datetime’.

Duck typing also reduces the amount of code because it eliminates the need to copy-and-paste methods that do similar operations but on different types. The Pythonic way of coding is to not check the types of parameters passed to a function, but rather to attempt to use whatever is passed into a function; if an exception is thrown, it’s up to the caller to deal with it. This also speeds development. Some misinterpret this to mean that the Python language does not care about types. On the contrary: everything in Python has a type; you can always find out what the type of something is by calling ‘type()’; certain operations can’t be performed on certain types; and TypeErrors definitely do happen. It’s just that, most of the time, types can be glossed over, which saves development time.

At Uprising we finally ran into a situation where it made sense to implement explicit type checking. Some new front end functionality required that a large JSON object would be passed by the front end into the Python back end, where it would go through all the layers of the stack and databases. Ultimately, perhaps hours later, it would be delivered to an external client via an API, where the data from it would be stored in the client’s SQL database.

The back end, because it uses Python, two object-oriented database technologies (Neo4j and Elasticsearch), and because it has a RESTful API using JSON, is incredibly flexible. It would be hard to create a JSON object in the front end that would cause an exception anywhere in the back end stack. The incoming JSON gets turned into a Python dict, which can have any possible key names, values, and types; that object gets stored in two databases that do not enforce much in the way of a schema (we basically just insist that the object have a unique ID field); and it could easily be passed out the API as a JSON object. However when the object leaves the external API it is no longer in Uprising’s tech stack but is rather in the tech stack of a client which intends to store the values in one or more SQL tables with predefined static schemas. At that point, every field in the object very much needs to have the right field name, a value in the proper range, and the right type, else an error would be thrown when inserting into a SQL table.

I wrote a simple type checker to check the Python dict that comes in from the front end to make sure that the field names are all valid and expected, and to make sure that the types are good. There are a few subtleties, as there always are when moving type information across technology boundaries. JSON doesn’t have a datetime type: dates should come in as a string that can be parsed into a datetime, so the type checker does that for date fields. JSON also only has number types, not ints and floats, but fortunately Python has an Abstract Base Class for numbers, which can be used to accept both ints and floats. The intent is to use the type checker at the boundaries or interfaces of our system, to check data traversing an interface. Once inside the system there’s no need for further type checking.

This illustrates one of the scenarios in which type checking can be worth the cost. Another is for runtime performance: if all the types are known ahead of time and are static, the machine code can be significantly optimized by the compiler, interpreter, or JIT compiler. Handling dynamic types at runtime is one of the main performance penalties of using Python. A final area where static typing can have value is on large teams of developers working on codebases that are many years old, where the original authors of various modules may have long ago departed and others must maintain them. The explicit static type information acts as a form of documentation.

Given that Python emeritus Benevolent Dictator for Life Guido van Rossum works at Dropbox, which has grown tremendously in size and scale over the past decade, it makes a certain amount of sense that his latest major Python project has been mypy, which adds optional static type checking to Python. I like the fact that it is optional, because I don’t see a benefit to adding type checking to every Python variable, function, and method declaration. TypeScript is taking a similar approach in the JavaScript world. There are certainly places where type checking can help, but by making it optional the programmer gets to choose where to use it.