The Art of SQL Report Writing

Part 2 of 5

Organizing the DB Diagram

The first step to any development project is to organize yourself and map out what you are going to work on. I always like to start by getting a good diagram of the database.

There are a number of tools out there that will inspect your database and provide a data diagram. Whatever tool you do decide to use is fine. Here we will look at how to organize the diagram and any good tool should facilitate this.

We prefer to use simple diagram tools like Visio or Gliffy. We generally use these tools when designing the database so we know what we are creating during the design phase and before build. It is, unfortunately, very common to not have these diagrams even after build is complete so you may want to use a tool that helps you reverse engineer the database. If you also do not have foreign keys, which is also all too common, or you have lots of virtual foreign keys, a reverse engineering tool will be fairly limited so you may end up having to use a simple diagramming tool and abandon the tools that inspect your database to build the diagram automatically.

Begin the diagram by adding the tables to the diagram. Leave any columns off the diagram - we suggest you add those last.

Then draw in the foreign keys (whether they are physical or virtual). You need a visual of exactly how your tables are connected to each other. Rearrange the tables so you have as few lines crossing over each other as possible. I find that when I do this during the database design that if there are a lot of overlapping foreign key lines that I may need to consider refactoring the design. Well normalized databases most often (but not always) don't have a large number of foreign keys on any one table.

In a large database, you may find this quickly turns into a mess. The more tables you have, the more foreign keys you end up with and it becomes impossible to arrange your tables to avoid crossing foreign keys. This is when we typically break the database down into separate diagrams, each diagram focusing on a particular type of data set in the database. Make sure each table appears in at least one diagram. A table may appear in multiple diagrams. We typically pick one diagram where each table "lives" and then every other diagram the table appears in that table is separated from the others visually so we know it is used there but didn't originate there (this is not necessary for our purposes here though).

As an aside, if you are interested in creating a tool that automatically rearranges the tables to minimize crossing foreign keys, check out Graph Theory. This will help you with the mathematical algorithms you will need to inspect and optimize the layout.

As you rearrange your diagram, make sure you organize the tables in a top down fashion. This is where you arrange the tables so every foreign key has its one or zero relation pointing to a table that appears on the diagram above the table with the many relation. In essence you are arranging the tables so that the table at the bottom of the foreign key contains the actual foreign key column and points to the primary key of the table at the top of the key in the diagram. Another way to think about this is if you were to populate your database, you would have to do your insert into the top table first before inserting into the table on the bottom. Likewise, if you were to do deletes, you would need to delete from the bottom table before you can delete from the top table.

This way of organizing your diagrams will help anyone who needs to understand your database. At a glance you know how everything is connected and how to do inserts, deletes and, as you will see later, how to construct your queries quickly.

Here is an example of how we might construct the database diagram for a blog:

Some of you may say this is impossible because you have circular references in your database. This is where table A points to table B which points back to table A (or A to B to C and back to A, etc.).

While some people may disagree with me, I will say this: You NEVER need to have circular references in your database design and such references are very bad - it's time to refactor your database if possible. Yes, I should NEVER say NEVER. I am sure someone will come up with an example they think is an exception. Go ahead and try and let me know if you have an example to make me eat my words. I challenge you! No one has proved me wrong yet.

If you have no control over the database, you will just have to make due with what you have, at least for the time being. When we encounter this sometimes. When we do we try to arrange the tables so as few foreign keys point the wrong direction as possible.

Keep in mind that once you have these diagrams, you should share them with the team and keep them updated whenever the database changes. These documents are gold during development and will make your life so much easier.