Choosing and creating a database

posted: October 19, 2019

tl;dr: A typical high-level process for choosing and creating a database...

This week I was asked about the typical process for choosing and creating a database, including what stakeholders should be involved. I thought I’d share my answer:

1. Decide what data is going to be stored in the database, by meeting with all the stakeholders who will be using the database. Is the database going to be storing lots of small mostly textual and numerical records in tables, or large files, or unstructured text? The typical stakeholders would be the team developing software to use the database, the IT folks who will run the database, the Program/Product Manager overseeing the project, and all the various other departments and functional areas that will access the data either directly, by querying the database, or indirectly, by using software which will access the database.

2. Decide what the typical query patterns will be, i.e. how often and how much data will be put into and read from the database. Will it be part of a system processing thousands of transactions per second, or will it receive several data uploads a day and be used mostly for reporting purposes, in which case the number of queries is much lower? The stakeholders involved are the same as the previous step.

3. Decide upon any other special requirements of the database, such as logging and transaction auditing, redundancy for maximum uptime, geographic distribution in case of a need to provide quick responses across the globe for a worldwide app, and interoperability with various other systems and tools. Same set of stakeholders.

4. Choose the database technology and database hosting strategy. There are a variety of newer database technologies such as document stores, key/value stores, and graph databases, but a traditional SQL relational database might be the right default choice, and there are a variety of SQL options. (For a good use case of a graph database, see my post on The Power of the Graph.) Most companies use cloud-based database services, to offload the hosting responsibilities onto a cloud provider such as Amazon Web Services, Google Cloud Platform, Microsoft Azure, and others. The development team and IT would make the call here.

5. Design the database schema, which defines the structure for all the data to be stored in the database. This can be pretty trivial for a document store, and can be pretty complex for a SQL relational database with many tables. Now we're into the development and implementation phase of the project, which is primarily the development team and IT.

6. Provision the database, including redundancy, backup strategy, geographic location, and security permissions to access the database. IT and development would do this.

7. Write the code and/or hook up the tools which will load data into the database and retrieve data from it. Start testing with some non-production data. At this point an IT security audit should be conducted, although the IT security team can also certainly review the database architecture beforehand. A small subset of users from the larger stakeholder community should do so trial runs.

8. Review, test and finalize all aspects of the system, and cut it over to production. Development and IT would do this, and then all the stakeholders can start using it.

The devil, as always, is in the details, but hopefully this provides a useful high-level guide.