ISO 8601 Date Validation That Doesn’t Suck
UPDATED February 19th, 2010: As BobM pointed out, the original solution to this problem didn’t account for fractional decimals. Originally I didn’t include them because Intervals didn’t require that level of precision, but apparently fractional decimals are quite common elsewhere. Because of that, I’ve updated this post, along with the regex, to include support for fractional decimals.
For the Intervals API, we’re wrestling with issues surrounding data input validation. This recently became interesting when the matter of date validation came up. Ordinarily, Intervals allows many, many different date formats, dependent on the locale that the customer is using (for example, Intervals may expect the date format ‘mm/dd/yyyy’ for US customers, ‘dd.mm.yy’ for a customer in Austria).
For our API developers, we wanted to use a common, universal format, one that would be easily compatible with our application and database layers. For that we selected ISO 8601, which is great in terms of widespread use, but not so great in terms of how complicated its specifications are.
Generally, ISO 8601 looks something like ’2009-05-20′ for dates and ’2009-05-20 12:30:30′ for date/time combinations. These two examples encompass 98% of the user input we’re likely to encounter. But we wanted to make sure that if we told developers they could use ISO 8601 dates, our system would support it. Unfortunately, there’s not a lot of code out there for the validation of ISO 8601 dates (especially regular expressions), and most of the stuff that is out there doesn’t encompass the entirety of the ISO 8601 spec.
Starting off, here are some dates that the validator should match (all these are valid ISO 8601 dates to the best of my knowledge):
2009-12T12:34
2009
2009-05-19
2009-05-19
20090519
2009123
2009-05
2009-123
2009-222
2009-001
2009-W01-1
2009-W51-1
2009-W511
2009-W33
2009W511
2009-05-19
2009-05-19 00:00
2009-05-19 14
2009-05-19 14:31
2009-05-19 14:39:22
2009-05-19T14:39Z
2009-W21-2
2009-W21-2T01:22
2009-139
2009-05-19 14:39:22-06:00
2009-05-19 14:39:22+0600
2009-05-19 14:39:22-01
20090621T0545Z
2007-04-06T00:00
2007-04-05T24:00
Added Feb 19 2010:
2010-02-18T16:23:48.5
2010-02-18T16:23:48,444
2010-02-18T16:23:48,3-06:00
2010-02-18T16:23.4
2010-02-18T16:23,25
2010-02-18T16:23.33+0600
2010-02-18T16.23334444
2010-02-18T16,2283
2009-05-19 143922.500
2009-05-19 1439,55
And here are some of the strings that the validator should not match (ie. reject):
200905
2009367
2009-
2007-04-05T24:50
2009-000
2009-M511
2009M511
2009-05-19T14a39r
2009-05-19T14:3924
2009-0519
2009-05-1914:39
2009-05-19 14:
2009-05-19r14:39
2009-05-19 14a39a22
200912-01
2009-05-19 14:39:22+06a00
Added Feb 19 2010:
2009-05-19 146922.500
2010-02-18T16.5:23.35:48
2010-02-18T16:23.35:48
2010-02-18T16:23.35:48.45
2009-05-19 14.5.44
2010-02-18T16:23.33.600
2010-02-18T16,25:23:48,444
The code we came up with was the following:
Updated Feb 19 2010:
^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
I guess I should add the caveat that this code doesn’t support the time interval or duration part of the ISO 8601 spec, so I didn’t include it. And it only supports dates or date/times, since right now we don’t have to deal with time input (for the Intervals API, all time is input in decimal format, rather than ISO 8601). But it should support everything else. Please let me know if this works for you or doesn’t, of if you can fine tune it.
Tags: iso-8601, regex, regular expressions, validation










August 7th, 2009 at 6:54 pm
How do you use this? Have you used it in production? How much processing does a regex like that use?
August 10th, 2009 at 8:12 am
You can use that like any other regular expression to match a string. We do use it in production in our API for validating the format of dates passed to us, and it’s not particularly taxing on the system (though I can’t give you an exact estimation of the processing power it uses; obviously it’s more taxing than, say, matching a phone number via regex). For more information on using Perl Compatible Regular Expressions in PHP, check out http://us.php.net/manual/en/book.pcre.php, and to learn more about the preg_match function specifically, check out http://us.php.net/preg_match.
Cameron
February 18th, 2010 at 4:27 pm
This did not work for me on the string 2010-02-18T16:23:48.541-06:00 I tested using the regex tester at http://www.fileformat.info/tool/regex.htm which is good for testing how Java will process a regex. The date came to me as a xs:date (xml date) passed through a web service. I think this may well not be a truly ISO 8601 date format, but it is what java developers will often see when working with xml. If this is not 8601, what is wrong with it?
Thanks,
Bob
February 19th, 2010 at 8:55 am
BobM,
You are indeed correct. The date you had is a valid ISO 8601 date, and should have passed. My original regex didn’t support fractional decimals, as Intervals didn’t require that level of precision. I’ve updated this post and the regex to add support for this. Thank you for finding this bug and please let me know if you find anything else.
Cameron
February 8th, 2011 at 7:17 am