All About Programming: Primary Keys: IDs versus GUIDs

Primary Keys: IDs versus GUIDs

Long-time readers of this blog know that I have an inordinate fondness for GUIDs. Each globally unique ID is like a beautiful snowflake: every one a unique item waiting to be born.

Perhaps that's why I read with great interest recent accounts of people switching their database tables from traditional integer primary keys ...

ID  Value  --  -----  1  Apple  2  Orange  3  Pear  4  Mango

.. to GUID keys.

ID                                    Value  ------------------------------------  -----  C87FC84A-EE47-47EE-842C-29E969AC5131  Apple  2A734AE4-E0EF-4D77-9F84-51A8365AC5A0  Orange  70E2E8DE-500E-4630-B3CB-166131D35C21  Pear  15ED815C-921C-4011-8667-7158982951EA  Mango

I know what you're thinking. Using sixteen bytes instead of four bytes for a primary key? Have you lost your mind? Those additional 12 bytes do come at a cost. But that cost may not be as great as you think:

The Cost of GUIDs as Primary Keys (SQL Server 2000)
Myths, GUID vs. Autoincrement (MySQL 5)

Using a GUID as a row identity value feels more natural-- and certainly more truly unique-- than a 32-bit integer. Database guru Joe Celko seems to agree. GUID primary keys are a natural fit for many development scenarios, such as replication, or when you need to generate primary keys outside the database. But it's still a question of balancing the tradeoffs between traditional 4-byte integer IDs and 16-byte GUIDs:

GUID Pros

Unique across every table, every database, every server
Allows easy merging of records from different databases
Allows easy distribution of databases across multiple servers
You can generate IDs anywhere, instead of having to roundtrip to the database
Most replication scenarios require GUID columns anyway

GUID Cons

It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes

I'm not proposing that every database switch to GUID primary keys, but I do think it's important to know the option is out there. If you're still on the fence, what should I choose for my primary key? has excellent advice and a solid analysis of the tradeoffs.

Read full article from Primary Keys: IDs versus GUIDs

Primary Keys: IDs versus GUIDs

No comments:

Post a Comment

Labels

Popular Posts