Primary Keys: IDs versus GUIDs



Primary Keys: IDs versus GUIDs

Long-time readers of this blog know that I have an inordinate fondness for GUIDs. Each globally unique ID is like a beautiful snowflake: every one a unique item waiting to be born.

Perhaps that's why I read with great interest recent accounts of people switching their database tables from traditional integer primary keys ...

ID  Value  --  -----  1  Apple  2  Orange  3  Pear  4  Mango  

.. to GUID keys.

ID                                    Value  ------------------------------------  -----  C87FC84A-EE47-47EE-842C-29E969AC5131  Apple  2A734AE4-E0EF-4D77-9F84-51A8365AC5A0  Orange  70E2E8DE-500E-4630-B3CB-166131D35C21  Pear  15ED815C-921C-4011-8667-7158982951EA  Mango  

I know what you're thinking. Using sixteen bytes instead of four bytes for a primary key? Have you lost your mind? Those additional 12 bytes do come at a cost. But that cost may not be as great as you think:

Using a GUID as a row identity value feels more natural-- and certainly more truly unique-- than a 32-bit integer. Database guru Joe Celko seems to agree. GUID primary keys are a natural fit for many development scenarios, such as replication, or when you need to generate primary keys outside the database. But it's still a question of balancing the tradeoffs between traditional 4-byte integer IDs and 16-byte GUIDs:

GUID Pros

  • Unique across every table, every database, every server
  • Allows easy merging of records from different databases
  • Allows easy distribution of databases across multiple servers
  • You can generate IDs anywhere, instead of having to roundtrip to the database
  • Most replication scenarios require GUID columns anyway
GUID Cons

  • It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
  • Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
  • The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes

I'm not proposing that every database switch to GUID primary keys, but I do think it's important to know the option is out there. If you're still on the fence, what should I choose for my primary key? has excellent advice and a solid analysis of the tradeoffs.


Read full article from Primary Keys: IDs versus GUIDs


No comments:

Post a Comment

Labels

Algorithm (219) Lucene (130) LeetCode (97) Database (36) Data Structure (33) text mining (28) Solr (27) java (27) Mathematical Algorithm (26) Difficult Algorithm (25) Logic Thinking (23) Puzzles (23) Bit Algorithms (22) Math (21) List (20) Dynamic Programming (19) Linux (19) Tree (18) Machine Learning (15) EPI (11) Queue (11) Smart Algorithm (11) Operating System (9) Java Basic (8) Recursive Algorithm (8) Stack (8) Eclipse (7) Scala (7) Tika (7) J2EE (6) Monitoring (6) Trie (6) Concurrency (5) Geometry Algorithm (5) Greedy Algorithm (5) Mahout (5) MySQL (5) xpost (5) C (4) Interview (4) Vi (4) regular expression (4) to-do (4) C++ (3) Chrome (3) Divide and Conquer (3) Graph Algorithm (3) Permutation (3) Powershell (3) Random (3) Segment Tree (3) UIMA (3) Union-Find (3) Video (3) Virtualization (3) Windows (3) XML (3) Advanced Data Structure (2) Android (2) Bash (2) Classic Algorithm (2) Debugging (2) Design Pattern (2) Google (2) Hadoop (2) Java Collections (2) Markov Chains (2) Probabilities (2) Shell (2) Site (2) Web Development (2) Workplace (2) angularjs (2) .Net (1) Amazon Interview (1) Android Studio (1) Array (1) Boilerpipe (1) Book Notes (1) ChromeOS (1) Chromebook (1) Codility (1) Desgin (1) Design (1) Divide and Conqure (1) GAE (1) Google Interview (1) Great Stuff (1) Hash (1) High Tech Companies (1) Improving (1) LifeTips (1) Maven (1) Network (1) Performance (1) Programming (1) Resources (1) Sampling (1) Sed (1) Smart Thinking (1) Sort (1) Spark (1) Stanford NLP (1) System Design (1) Trove (1) VIP (1) tools (1)

Popular Posts