[MDEV-8322] Distinguish between time and date strings more carefully Created: 2015-06-16  Updated: 2021-05-11

Status: Open
Project: MariaDB Server
Component/s: Temporal Types
Fix Version/s: None

Type: Task Priority: Major
Reporter: Alexander Barkov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Sprint: 10.1.6-1, 10.1.6-2

 Description   

We need to distinguish better between date ad time strings when parsing strings in str_to_date(), str_to_time(), etc.

The problem is well described in this letter from Sergei:

Hi, Alexander!

On Jun 15, Alexander Barkov wrote:

>> (preference matters, as it tells how to parse ambiguous strings like
>> "10:10:10").
>
> I'd say '10:10:10' should be unambiguously treated as time.
> Colon is never used to delimit date parts. Is it?

yes, I believe delimiters are pretty much ignored in our code. so any
delimiter can be used anywhere.

> Date parts are usually delimited by as follows:
> '01-01-01'
> '01.01.01'
> '01/01/01'
>
> But this is a kind of separate issue. Would you like me to create a task 
> for this?

The way it works now - after your patch - there's no much need for a
"preference" flag. The only issue I've uncovered in testing was related
to parsing strings with time preference. Like in

  WHERE time_column > '2010-12-11'

The code is

  my_bool str_to_time(const char *str, uint length, MYSQL_TIME *l_time,
                      ulonglong fuzzydate, MYSQL_TIME_STATUS *status)
  {
  ...
    /* Check first if this is a full TIMESTAMP */
    if (length >= 12)
    {                                             /* Probably full timestamp */
      (void) str_to_datetime(str, length, l_time,
                             (fuzzydate & ~TIME_TIME_ONLY) | TIME_DATETIME_ONLY,
                              status);

Which is very stupid, it decides solely on the string length. That is '2010-12-11' is parsed as a time (when there's time preference), but '10:11:12.123456' is parsed as a date (but fails and falls back to time).

I would suggest to get rid of this ad hoc detection code (check the length, try and fall back, etc). And use a systematic approach based on patterns. Like

   patterns[]=
   {
     { 'yyyy-mm-dd' , parse_date },
     { 'hh:mm:ss.uuuuuu', parse_time },
     ...
   }

Note, I wrote "like". I do not mean literally these patterns or string patterns whatsoever. I'd prefer something much faster. May be some compact "signature" number that describes the format, or may be a decision tree (where the string is parsed into an array of ints and then analyzed like three numbers? first is 4 digit? etc).


Generated at Thu Feb 08 07:26:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.