Posted on

Code leakage: The first is that soft deletion logic bleeds out into all parts of your code. And forgetting that extra predicate on deleted_at can have dangerous consequences as it accidentally returns data that’s no longer meant to be seen.


Losing foreign keys: Another consequence of soft deletion is that foreign keys are effectively lost. The major benefit of foreign keys is that they guarantee referential integrity. But with soft deletion, this goes out the window. A customer may be soft deleted with its deleted_at flag set, but we’re now back to being able to forget do the same for its invoices.


Pruning data is hard: You may eventually find yourself writing a hard deletion process which looks at soft deleted records beyond a certain horizon and permanently deletes them from the database. But the same foreign keys that soft deletion rendered mostly useless now make this job more difficult because a record can’t be removed without also making sure that all its dependencies are removed as well. And even with liberal testing, this kind of query can still end up being a reliability problem because in case a new dependency is added in the future but an update to the query is forgotten, it’ll suddenly start failing after a year’s (or whatever the hard delete horizon is) delay.


Discouraged and rarely used: Once again, soft deletion is theoretically a hedge against accidental data loss. As a last argument against it, I’d ask you to consider, realistically, whether undeletion is something that’s ever actually done. The biggest reason for that is that almost always, data deletion also has non-data side effects. Calls may have been made to foreign systems to archive records there, objects may have been removed in blob stores, or servers spun down. The process can’t simply be reversed by setting NULL on deleted_at – equivalent undos need to exist for all those other operations too, and they rarely do.


Alternative - A deleted records table: Although I’ve never seen an undelete work in practice, soft deletion wasn’t completely useless because we would occasionally use it to refer to deleted data – usually a manual process where someone wanted to see to a deleted object for purposes of assisting with a support ticket or trying to squash a bug. And while I’d argue against the traditional soft deletion pattern due to the downsides listed above, luckily there’s a compromise. Instead of keeping deleted data in the same tables from which it was deleted from, there can be a new relation specifically for storing all deleted data, and with a flexible jsonb column so that it can capture the properties of any other table.

  • Queries for normal, non-deleted data no longer need to include deleted_at IS NULL everywhere.

  • Foreign keys still work. Attempting to remove a record without also getting its dependencies is an error.

  • Hard deleting old records for regulatory requirements gets really, really easy: DELETE FROM deleted_record WHERE deleted_at < now() - '1 year'::interval.

  • Deleted data is a little harder to get at, but not by much, and is still kept around in case someone needs to look at it.

Link

Easy, alternative soft deletion: deleted_record_insert