ISO 8601 Date Validation That Doesn’t Suck

UPDATED February 19th, 2010: As BobM pointed out, the original solution to this problem didn’t account for fractional decimals. Originally I didn’t include them because Intervals didn’t require that level of precision, but apparently fractional decimals are quite common elsewhere. Because of that, I’ve updated this post, along with the regex, to include support for fractional decimals.

For the Intervals API, we’re wrestling with issues surrounding data input validation. This recently became interesting when the matter of date validation came up. Ordinarily, Intervals allows many, many different date formats, dependent on the locale that the customer is using (for example, Intervals may expect the date format ‘mm/dd/yyyy’ for US customers, ‘dd.mm.yy’ for a customer in Austria).

For our API developers, we wanted to use a common, universal format, one that would be easily compatible with our application and database layers. For that we selected ISO 8601, which is great in terms of widespread use, but not so great in terms of how complicated its specifications are.

Generally, ISO 8601 looks something like ’2009-05-20′ for dates and ’2009-05-20 12:30:30′ for date/time combinations. These two examples encompass 98% of the user input we’re likely to encounter. But we wanted to make sure that if we told developers they could use ISO 8601 dates, our system would support it. Unfortunately, there’s not a lot of code out there for the validation of ISO 8601 dates (especially regular expressions), and most of the stuff that is out there doesn’t encompass the entirety of the ISO 8601 spec.

Starting off, here are some dates that the validator should match (all these are valid ISO 8601 dates to the best of my knowledge):

2009-12T12:34
2009
2009-05-19
2009-05-19
20090519
2009123
2009-05
2009-123
2009-222
2009-001
2009-W01-1
2009-W51-1
2009-W511
2009-W33
2009W511
2009-05-19
2009-05-19 00:00
2009-05-19 14
2009-05-19 14:31
2009-05-19 14:39:22
2009-05-19T14:39Z
2009-W21-2
2009-W21-2T01:22
2009-139
2009-05-19 14:39:22-06:00
2009-05-19 14:39:22+0600
2009-05-19 14:39:22-01
20090621T0545Z
2007-04-06T00:00
2007-04-05T24:00

Added Feb 19 2010:
2010-02-18T16:23:48.5
2010-02-18T16:23:48,444
2010-02-18T16:23:48,3-06:00
2010-02-18T16:23.4
2010-02-18T16:23,25
2010-02-18T16:23.33+0600
2010-02-18T16.23334444
2010-02-18T16,2283
2009-05-19 143922.500
2009-05-19 1439,55

And here are some of the strings that the validator should not match (ie. reject):

200905
2009367
2009-
2007-04-05T24:50
2009-000
2009-M511
2009M511
2009-05-19T14a39r
2009-05-19T14:3924
2009-0519
2009-05-1914:39
2009-05-19 14:
2009-05-19r14:39
2009-05-19 14a39a22
200912-01
2009-05-19 14:39:22+06a00

Added Feb 19 2010:
2009-05-19 146922.500
2010-02-18T16.5:23.35:48
2010-02-18T16:23.35:48
2010-02-18T16:23.35:48.45
2009-05-19 14.5.44
2010-02-18T16:23.33.600
2010-02-18T16,25:23:48,444

The code we came up with was the following:

Updated Feb 19 2010:
^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$

I guess I should add the caveat that this code doesn’t support the time interval or duration part of the ISO 8601 spec, so I didn’t include it. And it only supports dates or date/times, since right now we don’t have to deal with time input (for the Intervals API, all time is input in decimal format, rather than ISO 8601). But it should support everything else. Please let me know if this works for you or doesn’t, of if you can fine tune it.

Tags: , , ,
Bookmark: Post to Del.icio.us Post to Digg Post to Google Post to Ma.gnolia Post to MyWeb Post to Newsvine Post to Reddit Post to Simpy Post to Slashdot Post to Technorati

5 Responses to “ISO 8601 Date Validation That Doesn’t Suck”

  1. Dennis Gearon Says:

    How do you use this? Have you used it in production? How much processing does a regex like that use?

  2. Cameron Brooks Says:

    You can use that like any other regular expression to match a string. We do use it in production in our API for validating the format of dates passed to us, and it’s not particularly taxing on the system (though I can’t give you an exact estimation of the processing power it uses; obviously it’s more taxing than, say, matching a phone number via regex). For more information on using Perl Compatible Regular Expressions in PHP, check out http://us.php.net/manual/en/book.pcre.php, and to learn more about the preg_match function specifically, check out http://us.php.net/preg_match.

    Cameron

  3. BobM Says:

    This did not work for me on the string 2010-02-18T16:23:48.541-06:00 I tested using the regex tester at http://www.fileformat.info/tool/regex.htm which is good for testing how Java will process a regex. The date came to me as a xs:date (xml date) passed through a web service. I think this may well not be a truly ISO 8601 date format, but it is what java developers will often see when working with xml. If this is not 8601, what is wrong with it?

    Thanks,
    Bob

  4. Cameron Says:

    BobM,

    You are indeed correct. The date you had is a valid ISO 8601 date, and should have passed. My original regex didn’t support fractional decimals, as Intervals didn’t require that level of precision. I’ve updated this post and the regex to add support for this. Thank you for finding this bug and please let me know if you find anything else.

    Cameron

  5. parse this type of date format in java? - Stack Overflow Says:

    Kramer auto Pingback[...] Another answer, since you seem to be focused on simply tearing the String apart (not a good idea, IMHO.) Let's assume the string is valid ISO8601. Can you assume it will always be in the form you cite, or is it just valid 8601? If the latter, you have to cope with a bunch of scenarios as these guys did. [...]

Leave a Reply