Why Foreign Key Constraints Are Not Optional
Joe was hired into a company to maintain a PHP application with a MySQL database. They would have random reports of duplicate data and data with missing information. When Joe looked into the problem 9 times out of 10 it was due to an issue where a user deleted one row but the same delete operation didn’t delete rows related to the original row. This caused that orphaned data to show up in some reports and cause odd join results in others.
SQL provides an excellent way to make sure that our databases don’t have orphaned rows or invalid relationships. Using them is a must for all table relationships.
Why Should We Use Foreign Key Relationships?
Let’s look at an example. In our application, we have a users
table and a user_logins
table to track our user accounts and their login history respectively.
We’re going to create a set of users for our experiments.
Now what we would want to do is insert a new row into user_logins
using an SQL statement like the one below.
But let’s say we have an error and instead of using the user’s id it uses the user’s id plus 10. In that case, we’ll have a query like the following.
Now what we would hope for is that instead of the row being inserted we would get an error so we could see the error and fix it. However, when we run the query it’s successful.
Now in this case this isn’t going to cause problems but when we eventually get our 11th user it will create some odd results where a user logged in before they were ever created.
Foreign Keys to the Rescue
Most variants of SQL have a feature called foreign key constraints. This allows us to tell our database engine that a column references another column. Setting up this relationship will cause the database engine to fail when it creates, updates, or deletes rows that create invalid data like the example above.
In this case, we want to tell the database engine that the user_id
column in the user_logins
table is always going to be a value in the id
column of the users
table.
As a quick aside, this relationship isn’t limited to being separate tables. We could have a supervisor_id
column in the users
table that describes a supervisor/direct relationship and link them using foreign keys.
How to Create Our Foreign Key Relationship
To create our relationship we’re going to use the alter table
command.
For our example, this is going to look like the following.
Now because we have “bad” data in our user_logins
table we’re going to get the following error.
There are a couple of ways we can fix this. We can go through and delete the rows that have problems, we can delete all of the data, or we can temporarily disable the checks for the foreign keys during this operation. We do not recommend disabling the checks as this will cause us to still have rows with bad data which will cause problems in the future. This is a common solution on the web that it is a big no-no.
If this were production database we would go through and remove the individual rows that are problems using a left join but because this is just a demo application we’re just gonna delete all of the rows within the table.
Now we can successfully add the foreign key constraint.
Now that our foreign key constraint has been added let’s look at what happens when we try to insert the bad user_logins
data.
As we can see the foreign key constraint has prevented us from adding this bad data into the database. We now no longer need to worry about the fact that a new user will have bad data associate with their account. This should also show us that there is an error long before it causes problems.
What Happens When We Try To Delete
Now that we’ve shown what happens when we try to add a new row that does not have a matching relationship, let’s look at what happens when we try to delete a row while it still has a reference to it.
To test this will insert a new row.
Now when we try to delete the user
row with an id of 1 we get an error message.
This is how the foreign key constraint prevents us from orphaning records.
Cascading Deletes
MySQL can automatically delete these rows using a cascading delete. Without the cascading delete, we need to manually delete rows in the referencing tables which takes more time but causes less accidental deletions.
We’re going to reset our database to set this up.
Now we’ll run the command that sets up the foreign key so it automatically deletes when the referenced row is deleted. Note that it’s essentially the same command as above but it has an on delete cascade
at the end.
Let’s look at what happens when we insert a new user_logins
row and then delete the users
row that is associated with it.
Now we can double-check to make sure that the rows were deleted.
Downsides
There is a slight performance penalty when using foreign key constraints because the database must verify that all the columns are valid before it performs the operation. This tends to not be noticeable with more commands and the data integrity improvements easily outweigh the performance hit.
Conclusion
Foreign key constraints provide an excellent way to prevent orphaned and invalid data in our database. It is a must for all columns that reference another table and should be a requirement.
Scott Keck-Warren
Scott is the Director of Technology at WeCare Connect where he strives to provide solutions for his customers needs. He's the father of two and can be found most weekends working on projects around the house with his loving partner.
Top Posts
- Working With Soft Deletes in Laravel (By Example)
- Fixing CMake was unable to find a build program corresponding to "Unix Makefiles"
- Upgrading to Laravel 8.x
- Get The Count of the Number of Users in an AD Group
- Multiple Vagrant VMs in One Vagrantfile
- Fixing the "this is larger than GitHub's recommended maximum file size of 50.00 MB" error
- Changing the Directory Vagrant Stores the VMs In
- Accepting Android SDK Licenses From The OSX Command Line
- Fixing the 'Target class [config] does not exist' Error
- Using Rectangle to Manage MacOS Windows