Code faster by representing datetimes as ISO8601 in UTC.
ISO8601 will save us from the drag of grokking datetime libraries and parsing errors. UTC will save us from dealing with timezones, and we all know how awful timezones can be.
How to use ISO8601
Format time as a string like the following for April 25, 2020, 10:28pm UTC
As a date
2020-04-25
As a datetime in UTC with second precision (fun fact: the Z stands for Zulu Time)
2020-04-25T22:28:00Z
As a datetime in UTC with millisecond precision
2020-04-25T22:28:00.000Z
Can we use ISO8601 for non-UTC?
2020-04-25T17:28:00-05:00
Yes, see the -05:00 instead of the Z? That means EST. But why would we, timezones are for people.
Okay, we’re done right? Sure with time, but if we use ISO8601 as a case study, we can learn how to design data formats that are fast to code.
Benefits of ISO8601
Fast for humans to read and write
When was 1587843046?
Yeah, I don’t know either. Humans are really slow at reading and writing datetimes represented as the number of seconds since 1970-01-01T00:00:00Z. Usually we need a computer’s help.
When was 2020-04-25T12:43:00Z?
Much easier, right? ISO8601 is transparent so humans can read and write them wicked fast. A data format that’s easy to read and write is fast to code.
String comparison is date comparison
Try comparing “Jan 1, 1970 1:41pm” to “Feb 4, 2020 8:32am”.
Hard, right? We’d slog through some datetime libraries, if our language even makes that easy. Figure out how to parse it using the cryptic datetime parsing syntax. How do I specify the day again, oh right %b. Because b is what I think of when thinking of day of the month.
What about “1970-01-01T13:41:00Z” and “2020-02-04T08:32:00Z”?
Easy, we’ll use string comparison. Is “1970-01-01T13:41:00Z” < “2020-02-04T08:32:00Z”, well let’s look at the first character. Is “1” < “2”? Yes. Ergo, yes the first date is less than the second date.
ISO8601 (in the same timezone) are lexicographically sorted, so string comparison just works. If serializing a custom data format preserves its order, it’s much faster to code. No futzing with serialization and deserialization just to code. Just operate on it as a string.
Spacefree data makes bash happy
Sometimes other date formats are used. Like:
2020-04-26 00:00:00Z
Not bad, it’s easy for humans to read and write and preserves order. But that space. The space between the date and time can mess with bash. bash sees a space, sees two objects, splits our date and time. Huh, why isn’t my program working. What’s the bug? Oh, that space.
ISO8601 never has a space, so bash is always happy. Which means we’re happy.
So what? I’m not using bash
Say we’re writing some enterprise Java application and storing our data in databases. Well, how do we grok logs when debugging? Bash.
How do we quickly test the DB? Bash.
How would we prototype a client to make basic requests? Bash.
Even if bash isn’t part of our product’s tech stack, it will always be part of a fast developer’s development tech stack. So we need to keep bash happy.
Benefits of UTC
Bugs can hide in implicit or local timezones
If a datetime’s timezone is implicit, it’s possible for a bug to emerge. One system assumes EST, another one PST. That’s 3 hours apart. A total mess. But if all datetime’s have an explicit, unambiguous timezone (UTC), then there’s no ambiguity. There’s No way for “local” to mean two different things.
When designing a data format, make it explicit and unambiguous. This gives bugs nowhere to hide. And no bugs means no debugging which means faster coding.
No normalization drag
Let’s say all of our time data is serialized and persisted in UTC. Then all of data has the same timezone. This means we never have to normalize time. We never have to write a converter to convert time from one equivalent format to another. So, where which means there’s nowhere for a timezone bug to hide. If we need to write a data format and it has multiple equivalent representations, pick a canonical one for internal serialization and persistence. This prevents bugs where two equivalent serializations Other representations can be used on the fly or to interface with other systems.
When designing a data format, pick a canonical representation if other valid representations exist. If data is always persisted in the canonical format, we never have to slow down to normalize our data.
But customers don’t speak UTC!?
I hear, I hear, it’s fine. Internally, everything’s normalized. But outside, we speak whatever is most effective. If someone writes a bad API that we need to work with, or humans like reading “Feb 2nd”, we’re happy to oblige, externally. When rendering the date to the customer, or to an API, use a rendering method:
> renderDate('2020-02-04T14:01:00Z', customer.timezone) 'Feb 2, 2020 9:01am EST'
But never serialize this rendering and never it breach the system, or we’ll pay for it with drag on our speed.