ISO 8601 Date Validation That Doesn’t Suck

John Reeve | May 20th, 2009

For the Intervals API, we’re wrestling with issues surrounding data input validation. This recently became interesting when the matter of date validation came up. Ordinarily, Intervals allows many, many different date formats, dependent on the locale that the customer is using (for example, Intervals may expect the date format ‘mm/dd/yyyy’ for US customers, ‘dd.mm.yy’ for a customer in Austria).

For our API developers, we wanted to use a common, universal format, one that would be easily compatible with our application and database layers. For that we selected ISO 8601, which is great in terms of widespread use, but not so great in terms of how complicated its specifications are.

Generally, ISO 8601 looks something like ‘2009-05-20’ for dates and ‘2009-05-20 12:30:30’ for date/time combinations. These two examples encompass 98% of the user input we’re likely to encounter. But we wanted to make sure that if we told developers they could use ISO 8601 dates, our system would support it. Unfortunately, there’s not a lot of code out there for the validation of ISO 8601 dates (especially regular expressions), and most of the stuff that is out there doesn’t encompass the entirety of the ISO 8601 spec.

Starting off, here are some dates that the validator should match (all these are valid ISO 8601 dates to the best of my knowledge):

2009-12T12:34
2009
2009-05-19
2009-05-19
20090519
2009123
2009-05
2009-123
2009-222
2009-001
2009-W01-1
2009-W51-1
2009-W511
2009-W33
2009W511
2009-05-19
2009-05-19 00:00
2009-05-19 14
2009-05-19 14:31
2009-05-19 14:39:22
2009-05-19T14:39Z
2009-W21-2
2009-W21-2T01:22
2009-139
2009-05-19 14:39:22-06:00
2009-05-19 14:39:22+0600
2009-05-19 14:39:22-01
20090621T0545Z
2007-04-06T00:00
2007-04-05T24:00
2010-02-18T16:23:48.5
2010-02-18T16:23:48,444
2010-02-18T16:23:48,3-06:00
2010-02-18T16:23.4
2010-02-18T16:23,25
2010-02-18T16:23.33+0600
2010-02-18T16.23334444
2010-02-18T16,2283
2009-05-19 143922.500
2009-05-19 1439,55

And here are some of the strings that the validator should not match (ie. reject):

200905
2009367
2009-
2007-04-05T24:50
2009-000
2009-M511
2009M511
2009-05-19T14a39r
2009-05-19T14:3924
2009-0519
2009-05-1914:39
2009-05-19 14:
2009-05-19r14:39
2009-05-19 14a39a22
200912-01
2009-05-19 14:39:22+06a00
2009-05-19 146922.500
2010-02-18T16.5:23.35:48
2010-02-18T16:23.35:48
2010-02-18T16:23.35:48.45
2009-05-19 14.5.44
2010-02-18T16:23.33.600
2010-02-18T16,25:23:48,444

The code we came up with was the following:

^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$

I guess I should add the caveat that this code doesn’t support the time interval or duration part of the ISO 8601 spec, so I didn’t include it. And it only supports dates or date/times, since right now we don’t have to deal with time input (for the Intervals API, all time is input in decimal format, rather than ISO 8601). But it should support everything else. Please let me know if this works for you or doesn’t, of if you can fine tune it.

4 Responses to “ISO 8601 Date Validation That Doesn’t Suck”

  1. Teodor Väänänen says:

    Stumbled across your monster regexp for ISO 8601 validation a while, and found it useful in a project of mine. BTW, if you or others need to grab the individual parts of the date, here are the regexp matches you need to pay attention to:
    1. Year
    5. Month
    7. Day
    8. Week Number
    9. Weekday
    10. Ordinal date
    15. Hours
    16. Minutes (prefixed by “:”, use last two digits)
    19. Seconds (prefixed by “:”, use last two digits)
    21. Timezone, “Z” or offset
    23. Hours Offset
    24. Minutes Offset

  2. K> says:

    The regexp incorrectly validates date in format YYYYMM – first one in the list of dates to be rejected.
    (tested in GNU Octave 4.2.2, that uses PCRE to my knowledge).

  3. Torsten says:

    Hi,

    thanks for putting this regex together. May I ask you why it accepts a lower case z as timezone indicator?

    kind regards,
    Torsten.

  4. John Reeve says:

    I’m not sure, but I think the lower case z is there in case the time data was nor formatted correctly. If someone accidentally used a lower case z instead of upper case, for example. So it’s there to catch human error, because a lower case z may not be to spec, but it can be assumed it was meant to be a timezone indicator.

Leave a Reply

Intervals Blog

A collection of useful tips, tales and opinions based on decades of collective experience designing and developing web sites and web-based applications.

What is Intervals?

Intervals is online time, task and project management software built by and for web designers, developers and creatives.
Learn more…

John Reeve
Author Profile
John Reeve

John is a co-founder, web designer and developer at Pelago. His blog posts are inspired by everyday encounters with designers, developers, creatives and small businesses in general. John is an avid reader and road cyclist.
» More about John
» Read posts by John

Jennifer Payne
Author Profile
Jennifer Payne

Jennifer is the Director of Quality and Efficiency at Pelago. Her blog posts are based largely on her experience working with teams to improve harmony and productivity. Jennifer is a cat person.
» More about Jennifer
» Read posts by Jennifer

Michael Payne
Author Profile
Michael Payne

Michael is a co-founder and product architect at Pelago. His contributions stem from experiences managing the development process behind web sites and web-based applications such as Intervals. Michael drives a 1990 Volkswagen Carat with a rebuilt 2.4 liter engine from GoWesty.
» More about Michael
» Read posts by Michael

help.myintervals.com
Videos, tips & tricks