Skip to content

Are GUIDs Really the Way to Go?


I recently read a (slightly inflammatory) posting entitled “The Gospel of the GUID”.  In it, the author attempts to put forward several arguments in favor of using GUIDs for all database primary keys (as opposed to the more pedestrian sequence-generated INTEGER).  I’ve heard similar arguments in the past, and I keep coming back to the same conclusion: it’s all bunk.

To get right to the heart of my opinion, I really don’t see a compelling reason to use GUIDs for 90% of database use-cases.  Consider for a moment some of the really large databases in this world (Wikipedia, Slashdot’s comments table, Digg, etc).  I can’t think of a single web application in that league which uses GUIDs.  If you don’t believe me, look at the code for MediaWiki, the URL structure for Digg and the idiocy involving INTEGER vs BIGINT on Slashdot.  While de facto practice may not dictate optimal design, it does point to a trend that’s worthy of notice.  More importantly, it proves that INTEGER (or rather, BIGINT) primary key fields are practicable under real-world stress.  But what about the theoretical use-case?

Well to be honest, the theoretical use-case doesn’t interest me all that much.  I mean, if you tell me that your schema can support theoretically 29 trillion rows in its users table, I’ll geek out right along with you.  But if you try to feed that to a client, you’ll get either blank stares, or the equally common: “but there are only 5 billion people to chose from.”  (unless you’re talking to someone in upper management)  For an even better party, try telling your client that you can merge their data with one of their competitor’s in 45 minutes flat.  Somehow I doubt that argument will fly.

Now let’s look at the cons involved.  GUIDs are really, really hard to remember and type by hand.  When you’re in a hurry, pulling a little data-mine-while-you-wait for your boss, you’re not going to want to dig around and find that string you copy/pasted into Notepad.  Oh, and heaven forbid you actually type the GUID wrong!  Finding a single-bit deviation in a 64-bit alpha-numeric is not a trivial task let me tell you.  It’s at about this point that you start to think that maybe INTEGER keys would have been a better way to go.  At least then you can throw the userID into good ol’ short-term memory and bang it out ten seconds later in your SQL query window.  Oh, in case you’ve never seen a GUID before, here’s a couple quick examples:

  • {3F2504E0-4F89-11D3-9A0C-0305E82C3301}
  • {52335ABA-2D8B-4892-A8B7-86B817AAC607}
  • {79E9F560-FD70-4807-BEED-50A87AA911B1}

I’m not even sure how to read that, much less remember it.

Another thing worth considering is that GUIDs are VARCHARs within the database.  This means that not all ORMs will handle them appropriately.  Hibernate will, as will iBatis and ActiveObjects.  But if you’re using something like Django or ActiveRecord, you’re out of luck (disclaimer: Django might actually support VARCHAR primary keys, I’m not sure).  On top of that, some database clustering techniques don’t seem to work nicely when applied to GUID-based key schemes.  I haven’t run into this myself, but my good friend Lowell Heddings has told me tales of strange mystics and evil tides when playing with distributed indexes and GUIDs in MS SQL Server.  Technically speaking, while any sane database may work with a VARCHAR for the table’s primary key, the architects really had INTEGERs in mind when they designed the algorithms.  This means that by using GUIDs, you’re flirting with a less-well-tested use-case.  It’s certainly not unsupported, but you should consider yourself in the minority of users if you go that route.

Oh, and one little myth I’d like to debug: there really isn’t that much more overhead in grabbing the generated value of an INTEGER primary key than in sending an in-code generated GUID value.  Granted, some databases make this harder than others, but that’s their fault isn’t it?  As long as you’re using a well designed ORM, you really won’t see much of a performance increase from removing that miniscule callback.  In fact, depending on how your GUID algorithm works, you may see a significant degradation in overall performance due to extra overhead in your application.


So, the obligatory thirty second overview for our attention-deprived era:

Benefits of GUIDs perfectly unique across all your data and tables; way, way more head-room in terms of values
Problems with GUIDs really, really un-developer friendly; weird side-effects in databases and ORMs; unconventional approach to a “solved” problem; easy way to upset your DBA

So it really seems the only upshot for GUIDs is their massive range.  In trade-off, you lose usability from a raw SQL standpoint, your result sets are that much more cluttered and you risk odd bugs in your persistence backend.  Hmm, I wonder which one I’m going to use in my next app?

Before you make the choice, ask yourself: “how much uniqueness do I really require?”  Over-designing a solution is just as much a problem as under-designing one.  Try to find the mid-point that works best for your use-case.


  1. Just FYI, GUID’s are a native type in SQL Server, and are not stored as VARCHARs.

    Jon H Wednesday, November 7, 2007 at 12:03 pm
  2. While GUID itself might be a native type for SQL Server, that’s hardly something which is common across databases. As such, ORMs such as Hibernate or even low-level wrappers like JDBC can’t support it as such and usually revert back to the VARCHAR technique.

    Daniel Spiewak Wednesday, November 7, 2007 at 1:07 pm
  3. Considering that GUIDs have a known length (either as hex-with-punctuation or plain hex), why would you use VARCHAR? Versus, say, CHAR(32)?

    Mark Murphy Wednesday, November 7, 2007 at 1:57 pm
  4. If I recall, VARCHAR(32) is equivalent to CHAR(32) on almost every database.

    Daniel Spiewak Wednesday, November 7, 2007 at 2:01 pm
  5. I am sure there are use cases for using a GUID but I would hardly (as the originial author/article stated) make them mandatory.

    Robert Wednesday, November 7, 2007 at 2:30 pm
  6. you are missing the point of GUID’s.

    first let me teach you about hexadeciaml. that “random string” that your eyes pick up as VARCHARs are actually restricted a relatively small subset of characters between 0-9A-F. that is sixteen values, two to the fourth. two two to the fourth powers makes 256 values, or a byte. thus, the VARCHAR you see can actually be downconverted into a much series of bytes, and, if you count the number of characters in those GUID’s, you’ll find that they are all uniformly representable as 16 bytes long integers, or 128bits, by looking at their hex values.

    the process for converting between the human readable format and these bytestrings is fairly simple in any programming langauge, google for “hexadecimal conversion” and a keyword for your language. whenever you are given a human readable GUID, simply downconvert it into its 128bit value, and toss that around inside your program.

    second, arguments that a 16byte hex string is “less human readable” or “more error prone” than a 64 bit integer represented in decimal is crazy. foremost, the use of hex and blocking (those -’s in the guid) are there to assist in preventing error and increasing identifiability: they are there for very good reason. the first case, preventing error, happens because the longest hex component is 6 bytes, or 12 characters, which is significantly shorter than the 19 decimal characters a 64 bit string could potential represent.

    the increased identifiability is by way of a common word identifiability pattern in humans, and the fact that each GUID has 5 “words” in it: we tend to discriminate words by their first and last letters, and are very adept at identifying words where the first and last letters do not match, and whereas a single integer gives us two character to look at for this comparison, a 5 word phrase gives us 10 seperate characters for comparison. the notion is that it should be somewhat easy to identify things that look “close” very quickly, and from those close matches, we can further discriminate any exact matches. try this for a working example.

    since GUID’s are supposed to be permuted by way of a hash rather than sequentially, you may not be able to remember the exact GUID you need, but when looking at a table of GUIDs you should be able to select the one you are looking for very quickly.

    lets reanalyze your cons:
    really, really un-developer friendly;
    the first qualm is “single bit derivation”. indeed a problem, but you are supposed to generate GUIDs in a hash, not linearly. how many adjecies do you have from random allocation out of a 10^19 potential space? how many adjecencies do you have if you map every primary you have ever generated randomly into a single 10^19 space? my guess is: not many.
    so, what you mean is, “you cannot remember it”. i dunno about you, but i max out a at a 10 digit phone number, word nothing of a 19 digit non-blocked integer, so the 64bit int, to me and the developers i know, is no more rememberable. most of my databses still use INT, not BIGINT, and I still dont try and remember 3753236.

    weird side-effects in databases and ORMs
    problem exists between keyboard and chair. you are handling it wrong, its not a VARCHAR it is two 64 bit integers or 4 32 bit integers.

    unconventional approach to a “solved” problem;
    you assume tightly coupled databases & fixed for life schemas. as soon as you start having varying schemas, BIGINT is no longer a solution.

    easy way to upset your DBA
    problem exists between keyboard and chair. find a real dba.

    rektide Wednesday, November 7, 2007 at 2:43 pm
  7. @rektide
    Storing the down-converted decimal value of the GUID as a BIGINT in the database is indeed a better way to go than sticking with VARCHAR, but it doesn’t quite solve the major problem, which is usability from a DBA and developer standpoint. Regardless of whether you consider it to be a valid concern or not, most people don’t like dealing with very long and (seemingly) random strings or numbers. And no, picking out the “correct” GUID from a table is neither easy nor even useful for most scenarios.

    Like it or not, some developers and any DBA has to deal with the raw database regularly. When you’re looking around for a certain value, often times you’ll run a quick query to find a certain ID that you’re looking for, then run another query using that ID to get the data you really want. Almost all of the time, I just retype the ID as I see it in the query results rather than reaching for the mouse and copy/pasting. This is especially an issue when accessing the database from a command-line tool which may not have copy/paste. You literally take away this option if you’re using a GUID, since the value becomes too unwieldy to deal with in such a fashion.

    Changing the schema really doesn’t have anything to do with the primary key type you choose. I refactor my schemata constantly and never feel the crunch of INTEGER.

    The whole point of my post is that GUID is an extremely over-engineered solution to a very basic problem. 90% of the time, a full-blown GUID is overkill. Yes, it’s an important tool to have when you run into that 10%, but you’re never going to convince me that you should *only* use GUID keys in *any* situation.

    Daniel Spiewak Wednesday, November 7, 2007 at 2:53 pm
  8. 1) ActiveRecord supports GUID primary keys with a simple plugin. I’ve used it and there were no problems.

    2) In your conclusion, why did you list two benefits for GUIDs and then declare that “the only upshot … is their massive range”? The unique IDs across all databases/tables is a valid benefit and the one that is actually compelling my team to possibly go that way.

    3) You may be right about the “no compelling reason in 90% of cases”, but are you willing to bet that your case is never going to *become* one of those 10% cases? I’d sure rather use GUIDs from the start than face the prospect of switching a large db from integer keys to GUIDs down the road (which may be happening to me soon). There are certainly compelling reasons to do it now but there weren’t any reasons to do it six years ago when we built the database. Now I wish we had.

    Ryan Eibling Wednesday, November 7, 2007 at 3:11 pm
  9. I think the point of the original “guid guy” is that guids can provide the chance to save you a heap-load of trouble on the very few occasions that you’re going to wish you had them, but not cause you too much trouble otherwise, and as such in a large scale important and production situation it is something worth having as a kind of insurance.

    You could always included a unique integer index on the table as a command line helper as well.

    anonymous Wednesday, November 7, 2007 at 3:14 pm
  10. I wasn’t aware of the ActiveRecord plugin. Good news. :-)

    Unique ID across all tables is covered in the “massive range” advantage. Though, I suppose the algorithm used to generate the GUIDs is more relevant to that fact than the range itself.

    Switching INTEGER ids to GUIDs may not be a single query, but I’m willing to bet I could convert a well-designed schema with a script in under 2 hours (including write-time for the script). I’m more concerned about over-engineering early in the design process and cramping things in the short term. IMHO, it’s better to keep the design simple, maintaining the agility required to refactor later to meet the requirements of the moment.

    Daniel Spiewak Wednesday, November 7, 2007 at 3:17 pm
  11. As far as GUIDs being difficult to remember goes, one advantage is that if you misremember/mistype the GUID, you will almost certainly enter a GUID will not exist. If you mistype a normal integer id, there’s a strong chance that you’ll end up referencing a different existing id instead. With GUIDs, you’re more likely to realise your mistake immediately.

    Stephen Wednesday, November 7, 2007 at 4:36 pm
  12. Daniel, I don’t know what database you are using where GUIDs are stored as varchars, but you should literally take the hard drive that DBMS is installed on and drive magnetically charged nails through it.

    I will agree though with your follow up comments about readability and ease of use for DAs, they are ugly and tough to debug with or write adhoc SQL statements with.

    Craig Wednesday, November 7, 2007 at 6:38 pm
  13. @Stephen
    Very insightful! It is indeed better to fail spectacularly when you mistype something than to appear to have worked properly, but in fact messed something up.

    Daniel Spiewak Wednesday, November 7, 2007 at 7:34 pm
  14. (Posted this to the original article as well, but since there’s an actual discussion going on here;)

    I worked on several very large (hundreds-of-servers kind of large) databases, about a decade ago, and we always used a BIGINT primary key; the first longword identified which database it was in, and the second was uniquely assigned by that particular database.

    If you’re using application-generated GUIDs, how do you map a GUID to the particular database that it’s in? You can’t use a lookup table, or you’ve got a single point of failure. You can’t use a calculation, because when you create the GUID, you have no way to know that the database will still be up. You can’t create the GUID after writing the record, because the GUID has to be stored in the record when you write it.

    I’m sure this is a solved problem, I just don’t find the solution obvious at the moment…

    Jay Levitt Wednesday, November 7, 2007 at 7:45 pm
  15. Django’s ORM is also happy to handle non-integer primary keys such as GUIDs; Django simply defaults to an auto-incrementing integer if you don’t otherwise specify a primary key for your model. Some popular Django-based applications make assumptions about primary keys being integers, though (which, in context, sometimes makes sense).

    James Bennett Wednesday, November 7, 2007 at 9:20 pm
  16. @Jay
    Every algorithm I’ve ever seen uses a calculation at object creation time. So in ActiveObjects, you might do this using a ValueGenerator<String> (or long if you prefer). You then rely on your hashing algorithm to produce a key which is totally unique every time. Thus, it doesn’t really matter if the DB is up, down, or sideways, the failure point won’t be in the GUID generation and you’ll be guaranteed (statistically) that your primary keys are always unique.

    Daniel Spiewak Wednesday, November 7, 2007 at 10:01 pm
  17. “because i cant type that number in” is the wrong reason not to use something. its the your language deficiency is not a feature thread. copy paste is very powerful, you can get great tools that let you keep multiple items in your selection buffer and hotkey between the different buffers, such that you dont have to keep re-selecting your id while you are copy-pasting other code. if you’re really using completely braindamaged non-cut-paste software, you can always find some software to let you write automation macros, such that control-p or whatever will generate the same key signals as having typed the command. most of all though, i would love to know what environment you are using that doesnt support traditional copy paste shortcuts, such that i can go mock its developers in person. in linux at least, the command line cut paste operations work quite normally with gpm.

    often new schemas require new tables altogether. a common technique i’ve seen and used is having the store proceedures access new and old data tables depending on where the key is found, such that the application doesnt even know the schema has changed and that there are two totally different modles in the background.

    for the most part you are in the clear. the only exception i take with “90% of the time, a full-blown GUID is overkill” is that its at least 99.9% of the time that its serious overkill. otoh, i am a huge supporter of overkill when people deeply grasp the cost/benefit weigh in and want to see how the benefits play out in real life. and part of making that decision is knowing what your getting into, hence my posts.

    @Jay: sadly the only technique i’ve seen is randomly generating a GUID then permuting the value such that its checksum modulus 16 becomes a certain value. i’ve been told you can also used fix values for some of the “data” fields of the GUID, but i have never seen that in practice.

    rektide Wednesday, November 7, 2007 at 10:10 pm
  18. Speaking of mocking developers in prison…who refactors schemata by creating new tables? I literally cringed when you talked about spreading queries across both tables, maintaining them simultaneously as live data. If that’s your use-case for GUIDs then we’re in serious trouble. To be blunt: laziness in writing migration scripts isn’t an excuse to over-engineer your key types. :-)

    I agree that overkill has its place. However, a more appropriate technique is almost always to design things so that they can be easily upgraded/changed later. If your schema is rigidly dependent on auto-incrementing primary keys (not sure how it could be, but just in case), then obviously it’s going to be tough to upgrade to GUIDs later. *This* is the root of the problem, not the original decision to go with INTEGERs. Use-cases of the moment and foreseeable future should dictate design attributes, not the outside chance that some edge case *might* manifest itself years down the road. Agile, agile, agile.

    Granted: copy/paste is pretty much universal these days. That isn’t to say that I haven’t done a lot of database access in environments where it’s either unavailable or just very clumsy (*ahem* cmd.exe *ahem*). Nor is that to say that such odd environments don’t have their place (there’s a reason I was using them). However, I will admit that the tool I use for database access *most often* is fully copy/paste enabled. Here’s the problem though: GUIDs forces its use. With an auto-incrementing value, chances are I can just read and type. This is (in my ad hoc benchmarking) 5-10 times faster than reaching for the mouse, selecting the text, Ctrl+C, clicking focus back into the text field (hitting the right spot in the query) and tapping Ctrl+V. Copy paste would hardly be more than 1-2 seconds, but typing would be measured in milliseconds. When you’re doing a lot of ad hoc querying, or any sort of data mining, input efficiency in such areas is critical. Not an excuse to avoid GUIDs at all cost, but definitely a good reason not to just go with them irregardless of the circumstances.

    Daniel Spiewak Wednesday, November 7, 2007 at 11:07 pm
  19. Writing a single join query eliminates the need to type ids by hand. I almost always do it that way – less to type and you get it done in one step. Even if I write one query to get the id, I always copy it in the clipboard, because the next result list will remove it out of sight.

    Yuri Thursday, November 8, 2007 at 1:29 am
  20. We use GUID as primary keys in Oracle (type RAW(16)) with Hibernate. Hibernate works fine with fields declared as byte[] or String (and probably other). It also provides a UUID generator to save round trips (though Oracle provies sys_guid() as well). I can agree that GUID is probably overkill but you know what…it doesn’t cost you anything and provides some benefits from the start. I sleep pretty well.

    Jose Noheda Thursday, November 8, 2007 at 1:33 am
  21. @Daniel: Sorry, I think I wasn’t clear about the multi-database thing. It’s not a question of “the database” being up. There isn’t one database; there are a hundred physical databases acting as one logical database. The problem isn’t uniqueness; it’s findability.

    If I want to write a record to “the database”, I round-robin through my list of physical database servers, and I ask the next available server to write the record out and give me back the ID (just like you get back the AUTOINCREMENT ID in a single-database situation). If it isn’t reachable, or returns an error, I move on to the next. If, say, database server #57 is the one that successfully writes it, then the top longword of the BIGINT is 57.

    That way, when I go to load the record later, I know that I should be doing the SELECT on database #57 (or one of its replicants). I don’t have to go to an intermediate lookup table to find that out; I can get the information directly from the primary key. (It works the same way if you’re hashing the primary key, instead of splitting it – the point is that each database generates IDs from which you can derive the database number without an extra lookup).

    But that only works if the database itself generates the ID. One of the listed advantages of GUIDs is that the application generates the ID and passes it TO the database, rather than fetching it FROM the database. I don’t see how that can work with multiple databases. When you read a record, you need to know, from the primary key, which database it’s on. But when you write a record, and you pre-calculate the key, you can’t possibly know which database it’s GOING to be on.


    Jay Levitt Thursday, November 8, 2007 at 8:12 am
  22. Whew! Complicated setup. :-)

    If I understand you correctly, you’re letting the database generate the int and then ORing it with a shifted database id (in this case, maybe 57 << 12, depending on how many keys you want to accommodate). This way you can just mask off the top few bits of the id and know which database you’re looking at.

    In such a situation, I think the only way you could keep your round-robin working is to have the database generate a GUID based on an algorithm which would ensure some mathematical property of the GUID was reversable to the database index. For example, taking the GUID MOD num_databases could return the database index, or something like that. This could be done using some sort of stored procedure. When you try the INSERT, you would then pass that stored procedure a parameter giving the database number, which it would use as a seed then in the GUID generation.

    Does that make any sense? Basically, the trick is the same (being able to find the database number from the primary key value), and you absolutely *must* have the key generated on the database server, be it auto incremented or GUID.

    Daniel Spiewak Thursday, November 8, 2007 at 11:19 am
  23. @rektide: Just saw your post above. Permuting the GUID doesn’t help, though; if the application’s permuting it, then the application still has to know which DB it wrote to before finalizing the GUID. And if the database is permuting it, then the application still has to wait for the database response to find out the GUID. Either way, you don’t get the “I can make [fewer] round trips to the database” benefit that the Gospels promise us.

    @Daniel: Yep, you understand exactly. Maybe that’s not generally needed anymore; at the time, database-layer clustering solutions were immature and not very performant. Still, I have to figure that at some level – either in the application, or hidden in the clustering layer – that’s what’s happening. I mean, somehow, when I SELECT by primary key, some layer has to be able to find that record from among multiple servers.

    I think you’re right – that the database has to generate the GUID, just as we did it. Which removes the big GUID “win” of “Hey, now I don’t have to wait for the database to respond with its auto-generated ID!”.

    So I’m curious to hear how the pro-GUID camp handles that… rektide, Ryan?

    Jay Levitt Thursday, November 8, 2007 at 11:44 am
  24. @Jay
    Just for the record, in your situation I would be very much *in favor* of using GUIDs. :-) When you’ve got a setup that’s permuted across multiple disparate servers like that, I think it’d probably be most flexible to go with something more “guaranteed unique” than an auto-incremented BIGINT.

    Daniel Spiewak Thursday, November 8, 2007 at 11:49 am
  25. @Jose – one thing to be careful of with Oracle and RAW datatypes with GUIDs is implicit conversion of Strings to RAW and vice-versa. We ran into performance problems with the exact same setup as you until we created function-based indexes with RAWTOHEX on the RAW columns. Just something to be aware of :)

    Peter Mularien Thursday, November 15, 2007 at 6:06 pm
  26. @Peter: Yes, thank you. We took another approach with custom user types. We noticed it when loaded a couple million rows in a table :-)

    Jose Noheda Friday, November 16, 2007 at 1:22 am
  27. It is not at all unusual to use chars as primary keys in some situations (for example, a well-defined code universe, such as states, country codes, sex codes, race codes, etc.). In fact, this is the correct modeling technique (natural keys) in many situations. It would be retarded to create a reference table where 17 stands for New York state instead of “NY”. (No, this is not the same situation as using a natural key in a master table, such as using an SSN for an Employees table. Different table types call for different considerations.) Your statement that DBMS architects “really had integers in mind for primary keys”, that anything other than integers is “flirting with a less-well-tested-use-case” is wrong, very wrong, on both the technical and the relational-theory level, and reveals you as a someone who has not done much reading. You would strengthen your post by removing this claim.

    Also, for SQL Server at least, GUIDS have their own 128-bit type called “uniqueidentifier”. The Microsoft middleware layers (OLE-DB, ADO.NET, the CLR) also have this type. So on that platform at least, your varchar argument doesn’t hold water. The argument against GUIDS on that platform is that 128 bits is more than 32 bits, and GUIDS do not cluster as nicely as an auto-incrementing integer.

    Brian H. Monday, May 26, 2008 at 12:56 pm
  28. If you’re interested, we did a podcast about UUIDs in the database:


    Craig Wednesday, May 28, 2008 at 7:54 am
  29. The major appeal of GUIDs to me ended up being their ease of use in a distributed system – e.g. a whole mess of remote-area laptops and a couple of satellite offices, apart from the main office.

    You no longer have to check out key ranges from a central repository or, worse yet, deal with the after-effects of someone installing, getting a key range, and then *ghosting the machine*. That’s not fun.

    There are other schemes for merging data in non-ranged, non-GUID systems, but they’re not pretty.

    Ritchie Annand Monday, May 11, 2009 at 9:25 am
  30. After supporting several C# apps that were heavily GUIDed on the database side I am with you. There are clearly situations where a GUID is a very useful feature. Useful enough to make it worth using such cryptic IDs.

    As much as possible I still find a way to tie a GUID to an integer ID they are just easier to work with remember and COMMUNICATE. I cannot emphasize enough the value of these attributes in troubleshooting situations. In the end I agree that

    Larry Groove Monday, January 11, 2010 at 10:57 pm
  31. You are missing a HUUUUUGE pro of using GUID’s. When having to persist entities to a DB that reference each other, you can generate the PK in code without having to go to the DB first to try and find your next available sequence.

    e.g. PK = GUID.NewGuid()
    insertChild(GUID.NewGuid(),’peter’,PK) => peter is now a child of john

    with an integer sequence, you would have had to do a query first to determine the next sequence to use as PK, instead of using GUID.NewGuid().

    Strider Wednesday, July 28, 2010 at 5:22 am

Post a Comment

Comments are automatically formatted. Markup are either stripped or will cause large blocks of text to be eaten, depending on the phase of the moon. Code snippets should be wrapped in <pre>...</pre> tags. Indentation within pre tags will be preserved, and most instances of "<" and ">" will work without a problem.

Please note that first-time commenters are moderated, so don't panic if your comment doesn't appear immediately.