The Denormalized Product Tree: Initial Relief, Later a Deep Pit

While working on a manufacturing ERP, I've repeatedly seen how appealing the product tree structure looks initially. Typically, denormalization is preferred to speed up database queries and access data directly without getting into complex relationships. This might seem logical, especially during the initial development phases or in systems with a limited number of products. However, this relief often doesn't last long, leading to a complexity that becomes impossible to maintain and scale as the system grows.

For example, in one of my client's ERP systems, there was a field containing all the top-level categories for each product. This meant, for a "Washing Machine" product, this field would contain a string like "Home Appliances > Kitchen & Bath > Washing Machines". While this was a perfectly fast way to list products initially, when a category name changed or a new main category was added, a script had to be run to update this string. This script could take hours to run on millions of product records and had a high probability of error.

Why Does Denormalization Seem Appealing?

The primary appeal of denormalization is that it simplifies data access. In a normalized database, you might need to JOIN multiple tables to get all the information for a product. This can lead to performance issues, especially when dealing with many tables and complex relationships. With denormalization, frequently accessed data is consolidated into a single table, significantly speeding up read operations.