Not sure I like the new spec's "increment within same ms" change + more thoughts
See original GitHub issueYou changed the spec to include:
To ensure sortability for IDs generated within the same millisecond, implementations should remember the most recently generated randomness string and increment the most significant character that is not the last character in the encoding.
I don’t think you understand the impact of this change; it brings a whole lot of hurt into your world. For instance, you can’t rely on wall-clock-time (which you fell for). Unless you can get a reliable, monotonic, strictly increasing timesource (which is (near?) impossible AFAIK in JS) you’ll run into trouble later. Why? Because: wall-clock-time goes back (DST) / stands still (leap seconds are sometimes implemented as such) / does other weird stuff (and there’s even more). Trust me, I know 😉 Sidenote: some systems may not even have ‘ms-resolution’ timers, something to consider too.
There’s 80 bits of random data; I think it’s safe to assume (even with the birthday paradox in mind) that the chance on a collision (within any, random, millisecond) is similar to the chance of a GUID collision. However, that would be if the ULID’s would be spread evenly across time (which would, in effect, make a ULID similar or even equal to a GUID). So, given that ULID’s are typically ‘clustered’ in a certain timespan the chance on a collision goes up. Worst-case would be a DST change + some other ‘random event’ that would cause the same ‘millisecond’-point-in-time to exist (say) 3 or 4 times. The chance on a collision would increase a lot but (I haven’t done the math yet) would still suffice for most purposes unless you’re Google maybe and generating many millions or even billions of ULID’s p/ms.
So I guess my advice is to leave that part out of the spec (and, let’s be honest, this method is no beauty either (it could be improved but wouldn’t be worth the trouble IMHO)) and keep implementations much simpler that way. You could compare a ULID it with a v4 UUID with the only difference that the first 6 bytes (48 bits) are reserved for a timestamp for better ordering. However, it does mean that, within the millisecond, the ordering is ‘random’ so that the spec may need to be more clear about it / reflect that fact.
Besides time, by the way, this tiny change in the spec also requires you to keep state between generating ULID’s (so you’ll need to keep the generator-object around for example), introduces chances for race conditions (multithreading; N threads generating ULID’s will, without proper locking etc. inevitably generate exactly the same ULID’s) and all other sorts of things that either need to be taken for granted (which will bite you in the *ss sooner than later) or need to be worked out / specified.
For inspiration about some more of the problems you might run into you might want to look at Twitter’s Snowflake (retired but still available) or have a look at my “Snowflake-alike” IdGen. Both are 64 bit (actually 63*) equivalents of ulid.
* Come to think of it: You might also want to steer clear of the sign-bit; when systems would order on the (internal) 128 bit representation / byte-array (for ‘speed’) the order may ‘break’ as soon as the sign bit is set to 1
. Twitter thought of it way before it happened but I think it’s a good idea. Sacrificing 1 bit out of 128 should’t be that much of a problem (though it does halve your “ulid-space” from 2^128 to 2^127).
I was/am doing a ulid .Net port (waaay more extensive than fvilvers’, with all due respect!) and also implemented the binary form, created methods to convert to/from UUID’s (since ULID’s and UUID’s are both 128 bit you can easily convert one to the other and vice versa). I also implemented a Parse(...)
method and some others and will publish on GitHub very soon have put it on GitHub. There are some things I thought of / came across which might need ‘agreement’ and be put in the spec:
ulid’s are, IMHO, case-insensitive (e.g. NOT case-sensitive)We seem to agree on that; missed it in your README 😛- We may want to allow / think about / specify a(n optional?) separator for readability. No
{
and}
mess like (G/U)UID’s but something likettttt-ttttt-rrrrrr-rrrrr-rrrrr
(lengths 5-5-6-5-5) might make ulid’s more readable. Other formats/separators are ofcourse a possibility (as well as none of this ofcourse; I’m just bringing my ideas to the table) - You/we may want to decide on pronunciation and document it: “you-lid”? “you-el-i-dee”? “uhl-id”? Something else? You/we want to avoid the GIF’s “Jif” v.s. “Gif” wars
- Besides the pronunciation, same goes for a “standardized” writing. I catch myself writing
ulid
,ULID
,UlId
, … so that may have to be standardized too - I had more… but can’t think of it now. Will add here / open new issue when I think of it.
So far, for now, another $0.02 of mine 😝
Edit: FYI; I just released V1.0 of NUlid. 🎉 Maybe you can add it to your list of community contributed ports?
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:16 (1 by maintainers)
Top GitHub Comments
Okay so clearly the change in spec was not as well thought out as it should have been. I admit, it was made in (hopefully) uncharacteristic haste. While the twitter/snowflake implementation gets around that by remembering the highest last seen time, I agree that it’s not worth enforcing the complicating on implementations. I’d be happy to roll it back if everyone agrees
Closing issue, feel free to re-open!