Identifying users in simple applications

posted: December 11, 2021

tl;dr: For simple applications email is the best way to identify users, whereas name plus mailing address has significant challenges...

In my current and previous company, I’ve been faced numerous times with the challenge of identifying users without assigning them a unique userID. This happens, for example, in websites which allow users to fill out a form to receive something, such as an email campaign or a shipment in the mail. You may want to be able to detect if the same user returns to the site at a later date and fills out the form again, so that the user doesn’t end up in the system twice and doesn’t receive multiple copies of whatever is being sent to them. There is no perfect solution to this problem.

Many times I’ve been tempted to put in place an authenticated account login system, with a username (a unique identifier), email address (which could also be used as the unique identifier), and password (along with a password recovery mechanism). But often this is too much overhead, if what the user receives is something of modest value. So how can users be recognized as duplicate users, if they don’t have to login to their own personal accounts?

In the year 2021, in the United States and other countries where nearly everyone is on the Internet, email address is the best simple solution for systems designed for adult users. Nearly everyone has an email address that can be used to reach them, even the elderly, who grew up long before the Internet achieved mass market adoption. If someone doesn’t personally use email, they usually have an email address of someone close to them that can be used to reach them. An email address is required on many computer systems and websites today, so people are used to having to supply an email address.

The cover for The Who's 1978 album entitled 'Who Are You', showing the four band members atop and before stacks of powerful audio equipment and electrical gear

The Who asked a surprisingly difficult question back in 1978

Email addresses may be shared, but the vast majority of people have their own personal email address. People can of course have multiple email addresses, and can register for something multiple times with a different email address each time. There are also email systems, such as Google Mail, that allow users to on-the-fly create new email addresses that resolve to the same base email address, to (among other reasons) track how a particular email address is used. But when the task is simply to recognize when the same user mistakenly returns and re-registers in the future, email address is a good solution, since most typical users are likely to register each time with their primary personal email address.

In general, physical mailing addresses are not good unique identifiers for identifying individual users, for several reasons. A user might enter his/her mailing address with different formatting at different points in time, for example typing "123 S Main St" one time and "123 S Main Street" the next time and "123 South Main Street" a third time.

In order to be interpreted as the same address, intelligence would have to be added to "normalize" all these differently formatted addresses to the same address. This is sometimes done in address prediction systems, which use type-ahead to auto-populate or suggest a possible address in a normalized format by interpreting what the user is typing into the address fields. Many users are familiar with the address prediction systems used in map applications, such as Google Maps and Apple Maps. Map applications only have to resolve user input to an address that is already in their map database, which does not cover every possible mailing address.

The more general case is trickier for address prediction systems. When soliciting the address the user needs to use in order to receive mail, the address prediction systems will still have to allow the user to enter a custom address in whatever format the user chooses to use. There are a wide variety of address formats, and there will be cases where the user's actual address is not recognized as valid by the address prediction system. Atypical address formats not handled well by address prediction systems include rural addresses and addresses used to reach military personnel.

There are some other corner case problems with using physical mailing addresses to identify users. Mailing addresses can and do change over time as people move. They can even change if a person remains in the same place: streets get renamed, added, and even occasionally renumbered. There also might be multiple people with the same name living at the same address, such as a father and son with the same name, if the user omits a suffix such as "Jr".

For simple systems, email addresses are a better unique identifier than mailing addresses plus name. The more time I spend on identity management, the more I realize just how challenging the problem actually is.

Related post: Storing database records that lack a unique ID