Blog

Links

Aliases vs. IDs

posted: April 3, 2021

tl;dr: Especially in non-table-oriented databases, it can help to have multiple aliases for each record...

Non-table-oriented databases, such as document stores, key/value stores, and graph databases, have some fundamental differences with traditional SQL table-oriented databases. (Note: I hesitate to call these table-oriented databases “relational databases” because in other database technologies, such as graph databases, the relationships between records/objects are even more explicit and foundational.) One of those fundamental differences involves the concept of a record’s ID.

In a table-oriented database, there is typically an ID column in every table, usually as the very first column. Each record, which is a row in the table, gets assigned a unique ID value. Two common ID formats are sequential IDs and UUIDs. The ID is typically thought of as part of the record itself, and is commonly used to lookup a particular record. Behind the scenes, the ID column is typically indexed, to speed up this lookup: the database maintains a separate data structure, probably a hash table, to be able to find a record quickly (from memory or disk) when presented with an ID value.

Non-table-oriented databases can often easily emulate this functionality. They might have an ID field that is part of each record/object coupled with a fast way to find a particular record/object when presented with an ID. Key/value stores are fundamentally structured this way, where the key is the ID of each record and the value is the record/object itself. Yet it is quite common, when using a key/value store, to have multiple ways to look up a record. Looking up the record by ID is great if you know the ID; but what if you don’t know the ID? Perhaps some other piece of information about the record can be used to find it.

That’s where aliases come in handy. Aliases are other ways to look up a record. They are analogous to the programming concepts of pointers, references, and bindings. A single record/object can have multiple aliases or references that each point to it. If you know just one of the aliases or references, you can find the record/object.

Here’s a small example program which demonstrates this, written in NodeJS. This code:

// Create an object representing a person
const chrisShaver = {
    firstName: 'Chris',
    lastName: 'Shaver',
    website: 'https://cjshaver.com',
};

// Set up multiple aliases to that object
const aliases = new Map();
aliases[1000000001] = chrisShaver;
aliases['c8e032de-316c-4b6c-a3d0-e56076f57ebe'] = chrisShaver;
aliases['chris_shaver@hotmail.com'] = chrisShaver;

// Use the aliases to retrieve the object and access various fields/properties
console.log('Using 1000000001 to lookup firstName produces:', aliases[1000000001].firstName);
console.log('Using c8e032de-316c-4b6c-a3d0-e56076f57ebe to lookup lastName produces:',
    aliases['c8e032de-316c-4b6c-a3d0-e56076f57ebe'].lastName);
console.log('Using chris_shaver@hotmail.com to lookup website produces:',
    aliases['chris_shaver@hotmail.com'].website);

produces this output:

For simplicity’s sake, the program creates a single record/object. Note that the object doesn’t actually have an ID field, although there is a binding named chrisShaver that references the object. The next section sets up an alias map, with three different alias value formats: a sequential ID, a UUID, and something with a bit more meaning to humans: the email address of the person that the object represents. Any one of these three alias values can be used to find the correct record/object, as demonstrated in the final section. Although written in NodeJS, this could be very easily translated into the structure of a key/value store database such as Amazon’s DynamoDB service.

Does this record/object have an ID? Sort of; I suppose the sequential integer ID could be called the primary ID. But any one of the aliases can easily be used to find the record/object. In particular, since the alias map allows UUIDs, more than one UUID can be generated which refers to the same record/object. More than one sequential ID can be generated which refers to the same record/object, for that matter. Just like certain criminals have gone by multiple aliases, so can records/objects.

I don’t want to be unfair to table-oriented databases: it is certainly possible to set up a schema that will create aliases that can be used to look up records. One way of doing so would be have the sequential ID be in the record table as a column, and to have a separate alias table with two columns: the alias (a string or VARCHAR) and the sequential ID (an integer). The alias table could then be joined to the record table, to look up a record by alias.

Aliases are more common in key/value stores and other non-table-oriented database technologies. Sometimes records/objects may not have a single unique ID. For those used to thinking in terms of traditional table-oriented databases, a shift in mindset may be necessary.

Related post: Sequential IDs vs. UUIDs

Related post: Storing database records that lack a unique ID