Tuesday, April 25, 2006

Leaky Databases

In the programming world, programmers use to have to manage memory. When a programmer created an object in memory, he had to remember to destroy it. If the object was never deleted and the programmer lost the reference to the object, the memory location is effectively unusable for the life of the application. This is because the application can never allocate new objects in that location since it thinks it's already allocated, and the programmer cannot free that memory location because he lost the reference to the object.

Nowadays, there are high level languages that have a garbage collector--basically another running process that detects whether the application is using a particular part of memory or not, and deletes it. (Exception is that Lisp has been around since the 60's and had a garbage collector)

In databases, records in tables refer to other records in other tables. However, due to a misbehaving application, I can see a possibility for these references to be lost or overwritten. Also, common practice in databases is to mark something as deleted, but don't actually delete it.
Thus, you're left with records in the database that might not have any use whatsoever, but is still taking up space.

This is like a "database leak", analogous to a "memory leak." But to my knowledge, no one seems particularly concerned about it. This is probably because hard drive space is cheap and plentiful.

However, I expect that we'll eventually see embedded servers, where even small devices like your cell phone or your microwave can serve up web pages. The storage requirements for these platforms might require a more frugal management of database resources.

Anyone know of database garbage collectors?

No comments:

Post a Comment