Designing a database file format

Posted by RoliSoft on Stack Overflow See other posts from Stack Overflow or by RoliSoft
Published on 2012-12-18T22:53:50Z Indexed on 2012/12/18 23:03 UTC
Read the original article Hit count: 370

Filed under:

I would like to design my own database engine for educational purposes, for the time being. Designing a binary file format is not hard nor the question, I've done it in the past, but while designing a database file format, I have come across a very important question:

How to handle the deletion of an item?

So far, I've thought of the following two options:

Each item will have a "deleted" bit which is set to 1 upon deletion.
- Pro: relatively fast.
- Con: potentially sensitive data will remain in the file.
0x00 out the whole item upon deletion.
- Pro: potentially sensitive data will be removed from the file.
- Con: relatively slow.
Recreating the whole database.
- Pro: no empty blocks which makes the follow-up question void.
- Con: it's a really good idea to overwrite the whole 4 GB database file because a user corrected a typo. I will sell this method to Twitter ASAP!

Now let's say you already have a few empty blocks in your database (deleted items). The follow-up question is how to handle the insertion of a new item?

Append the item to the end of the file.
- Pro: fastest possible.
- Con: file will get huge because of all the empty blocks that remain because deleted items aren't actually deleted.
Search for an empty block exactly the size of the one you're inserting.
- Pro: may get rid of some blocks.
- Con: you may end up scanning the whole file at each insert only to find out it's very unlikely to come across a perfectly fitting empty block.
Find the first empty block which is equal or larger than the item you're inserting.
- Pro: you probably won't end up scanning the whole file, as you will find an empty block somewhere mid-way; this will keep the file size relatively low.
- Con: there will still be lots of leftover 0x00 bytes at the end of items which were inserted into bigger empty blocks than they are.

Rigth now, I think the first deletion method and the last insertion method are probably the "best" mix, but they would still have their own small issues. Alternatively, the first insertion method and scheduled full database recreation. (Probably not a good idea when working with really large databases. Also, each small update in that method will clone the whole item to the end of the file, thus accelerating file growth at a potentially insane rate.)

Unless there is a way of deleting/inserting blocks from/to the middle of the file in a file-system approved way, what's the best way to do this? More importantly, how do databases currently used in production usually handle this?

Developer IT

Designing a database file format - Developer IT

Designing a database file format

c#

database

file

binary

format

Related posts about c#

.NET WebRequest.PreAuthenticate not quite what it sounds like

HttpWebRequest and Ignoring SSL Certificate Errors

The dynamic Type in C# Simplifies COM Member Access from Visual FoxPro

Dynamic Type to do away with Reflection

Finding a Relative Path in .NET

Related posts about database

SQL SERVER Retrieve and Explore Database Backup without Restoring Database Idera virtual database

Cloning A Database On The Same Server Using Rman Duplicate From Active Database

cPickle ImportError: No module named multiarray

SQL SERVER – 2008 – Introduction to Snapshot Database – Restore From Snapshot

OTN ???? ?????? ???????

Categories cloud