Details
-
Technical task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
Description
This function is overloaded and supports NUMBER, DATE, DATETIME, TIMESTAMP etc as parameters and returns a formatted/converted TEXT value
https://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_functions_2113.htm#OLADM691
This is currently not available under ORACLE compatibility mode.
Attachments
Issue Links
- blocks
-
MDEV-10342 Providing compatibility for basic SQL built-in functions
-
- Closed
-
- relates to
-
MDEV-25697 TO_CHAR does not take into account capitalization in format elements
-
- Open
-
-
MDEV-28136 MariaDB ASAN Unknown Crash in Item_func_to_char::parse_format_string
-
- Confirmed
-
Activity
Some quick comments:
- TO_CHAR in Oracle can take either a datatime or numerical argument. The patch seams to only handle date formats. I
think we should add some code to at least check the argument type and give an error if it's not of type date, datetime or timstamp. - RRRR format should produce different results with 2 and 4 digit years. I didn't see a test case for that.
- It looks like the special characters (':', '--' etc) are supported in the patch. Why do you say above they are not supported?
- What does Oracle return if one uses something that is not supported, like '$' ?
In MariaDB we should not just abort a statement (except in strict mode) as this may cause problems for non transactional tables? - Would it not be good to support all non-ascii characters as 'as is', instead just a list of some characters?
- You added a function 'is_special' which is not used anywhere. Should be removed
- There is several small optimizations that can be done in make_date_time_oracle():
- Store upper case version of *ptr in a variable: This will make the jump switch and smaller. (my_toupper is very efficient)
- This construct is used in a lot of places and could be simplified by a function.
length= (uint) (int10_to_str(l_time->year%100, intbuff, 10) - intbuff);
str->append_with_prefill(intbuff, length, 2, '0'); - {
return 1;
}
Could be just a return. In some cases the function will be textually shorter and cause less indentations if you first tests for false and return at once.
- MONTH has to be filled with end space up to max month name (according to Oracle manual)
An example of the changes suggested above:
case 'S':
case 's':
if (*(ptr+1) == 'S' || *(ptr+1) == 's')
else
{ return 1; }->
case 'S':
if (my_toupper(system_charset_info(*(ptr+1)) != 'S')
return 1;
append_with_zero_prefill(str, l_time->second, 2);
ptr++;
break;
Replace
const MY_LOCALE *lc= 0;
with
const MY_LOCALE *lc= locale;
I don't see why this code is needed. Either remove or add a comment:
+ if (size < MAX_DATE_STRING_REP_LENGTH)
+ size= MAX_DATE_STRING_REP_LENGTH;
Note that val_str() may return a string that is not null terminated, which means that the format argument to
Item_func_tochar::format_length() and make_date_time_oracle() may not null terminated!
Most easily fixed by adding a \0 to the format argument in the caller:
(void) str->c_ptr()
Some comments from Alexander Barkov:
If I wrote this patch, I'd probably first parse the format string to an array of INTs, with tokens like FMT_YYYY, FMT_YYY, FMT_YY, FMT_RRRR, FMT_RRR, FMT_RR, and so on.
Then, made functions format_length() and make_date_time_oracle() use this array of INTs instead of the original format string.
This should make the code faster and less duplicate.
- TO_CHAR in Oracle can take either a datatime or numerical argument. The patch seams to only handle date formats. I
think we should add some code to at least check the argument type and give an error if it's not of type date, datetime or timstamp.
----> yes, I will add code for checking argument type
- RRRR format should produce different results with 2 and 4 digit years. I didn't see a test case for that.
----> yes, I will add some more case
- It looks like the special characters (':', '--' etc) are supported in the patch. Why do you say above they are not supported?
----> ':' is supported special characters
'--' is considered to be two '-', so you can see 2000--11-11 when select to_char(c1, 'yyyy--mm-dd'), Oracle supports multiple special characters to be used together, such as ‘yyyy-:mm: dd’
- What does Oracle return if one uses something that is not supported, like '$' ?
----> There is no description about '$','@','!','%' ( and other special characters) in the document, but the test found that it has the same function as our special characters,
- Would it not be good to support all non-ascii characters as 'as is', instead just a list of some characters?
----> I will support all the special characters like oralce.
- In MariaDB we should not just abort a statement (except in strict mode) as this may cause problems for non transactional tables?
----> Is this the reason why the built-in function returns NULL instead of exiting with an error?
- You added a function 'is_special' which is not used anywhere. Should be removed
----> yes, I will remove it
- There is several small optimizations that can be done in make_date_time_oracle():
----> I will do that
- Some comments from Alexander Barkov:
----> I will consider this implementation method
Based on the review comments, I made changes. The main changes in this patch(10.6-to_char_01_13.patch) are:
1. An error code and error message have been added to indicate that error for oracle compatible function . If an error occurs during the execution of TO_CHAR, set warning and return NULL
2. Check the input parameters, currently support date/time/datetime/timestamp and string type data, for other types of data, the function returns NULL and writes warning information
3. Format string processing optimization(thanks! Alexander Barkov ), define an integer for each format unit (such as YYYY, MON, DD), and store it in the array in order, and store its ascii code directly for the separator, and traverse the array directly To fill in the results
4. Support all special characters as separators. Note: Because ''&'' is used for variable references in Oracle, it does not support a single'&', but supports multiple'&'s
5. Support more formats: AD/BC/AM/PM/A.D./B.C./A.M./P.M./DAY
Answer to your previous question:
"In MariaDB we should not just abort a statement (except in strict mode) as this may cause problems for non transactional tables?
----> Is this the reason why the built-in function returns NULL instead of exiting with an error?"
Yes, in not strict mode, functions that gets wrong input should return NULL.
This is to make statements like the following repeatable for non
transaction tables (in other words, it will be excuted for all rows).
UPDATE t1 set c=a/d;
In strict mode is set to STRICT_TRANS_TABLES then the above will generate an
error for transactional tables if d would be 0 for some row. If strict_mode is
STRICT_ALL_TABLES it would generate an error also for MyISAM tables (with
possible wrong results on a slave as the rows can be in different order on the slave and thus the update may abort at a different place).
Reviewing new patch:
- I didn't find a test case with RRRR and two digits
- We also need a test for using the function in strict mode
(I will add one).
-In the test suite, use --error ER_WRONG_PARAMCOUNT_TO_NATIVE_FCT instead of
--error=1582
-This makes the test suite easier to read and also more recillant if against
changes in error numbers.
+static bool check_quotation(const short cfmt, bool *quotation_flag)
+{
+ if (cfmt == '"')
+
+
+ if (*quotation_flag)
+ return true;
+ return false;
+}
Can be written a bit shorter:
static bool check_quotation(const char cfmt, bool *quotation_flag)
{
if (cfmt == '"')
return *quotation_flag;
}
....
+static int parse_sepcial(char cfmt, const char *ptr, const char *end, const short *arr)
parse_sepcial -> parse_special
...
+ /*
+ * '&' with text is used for variable input, but '&' with other
+ * special charaters like '|'. '*' is used as separator
+ */
+ tmp1= my_toupper(system_charset_info, *(beg+1));
+ if (cfmt == '&' && (tmp1 >= 'A' && tmp1 <= 'Z'))
+ return 0;
This is a more efficent version (which also fixes a bug where the code tried
to access things outside the string):
if (cfmt == '&' && ptr + 1 < end)
{ tmp1= my_toupper(system_charset_info, *(beg+1)); if (tmp1 >= 'A' && tmp1 <= 'Z') return 0; }+ /*
+ * Continuous store the special characters in fmt_array until non-special
+ * characters appear
+ */
+ tmp1= my_toupper(system_charset_info, *(beg+1));
+ if (INVALID_CHARACTER(tmp1) || tmp1 == '"')
+ {
+ if (tmp1 != '"')
+
+ break;
+ }
+
+ *array= *beg;
+ offset++;
+ beg++;
+ array++;
+ } while (beg<end);
The above could be written as just:
do
while (!INVALID_CHARACTER(tmp1) && tmp1 != '"');
....
+bool Item_func_tochar::parse_format_string(const String *format, uint *fmt_len)
+{
+ char cfmt= 0;
+ uint tmp_len= 0, ulen= 0;
+ const char *ptr, *end;
+ short *tmp_fmt= fmt_array;
+ bool quotation_flag= false;
+ int offset= 0;
cftm and ulen should be moved to the following for() loop as they are
only used there. (This gives the compiler more options to check
and optimize these variables)
bool Item_func_tochar::parse_format_string(const String *format, uint *fmt_len)
{
...
+ case 'H':
+
9 should be changed to 'max_month_name_length', to be calculated in
fix_length_and_dec()
+ if (my_toupper(system_charset_info, *(ptr+2)) != 'Y')
+ return 1;
+ *tmp_fmt= FMT_DAY;
+ tmp_len= 9;
+ ptr+= 1;
9 should be changed to 'max_day_name_length', to be calculated in
fix_length_and_dec()
...
I noticed you have two almost identical calls to parse_format_string(),
in Item_func_tochar::fix_length_and_dec() and Item_func_tochar::val_str().
+ if (str->length() > MAX_DATETIME_FORMAT_MODEL_LEN)
+
+
+ if (parse_format_string(str, &ulen))
+
It's better to move the checking and setting the warning message to
parse_format_string() as then you have only one copy of this code.
Another thing is that parse_format_string may access characters after
end of string, for example when checking for 'HH'. In the reviwed
patch I have attached to this MDEV, I have fixed this.
+String* Item_func_tochar::val_str(String* str)
+ if (warning_message.length())
+ goto null_date;
The above is ok for fixed length format strings but not for dynamic ones.
The easy fix is to add this:
push_warning_printf(thd,
Sql_condition::WARN_LEVEL_WARN,
ER_ORACLE_COMPAT_FUNCTION_ERROR,
ER_THD(thd, ER_ORACLE_COMPAT_FUNCTION_ERROR),
warning_message.c_ptr());
+ if (!fixed_length)
+ warning_message.length(0);
if (str->alloc(max_length))
goto null_date;
The above is not good for the case of !fixed_length as this length is
much bigger than expected. Better to set it to the value calculated by
parse_format_string().
+ * To prevent conflict with the ascii code of the separator, such as: ':', '%', '?'
+ * the starting value of the formatting unit is 128
+ */
+#define FMT_BASE 128
+#define FMT_AD FMT_BASE+1
...
As we are storing the format in as a short int, I changed FMT_BASE to 128.
This would allow us to potentially most extended characters, like åäö in
the format clause.
All the above changes are done in my proposed patch for 10.6.
This is based on my development 10.6 tree (bb-10.6-monty), so a few things
are slightly different from standard 10.6, but that will be fixed when I
push my tree (shortly...)
Other thing done in the patch:
- Fixed function comments to be according to MariaDB guidelines
- Optimized some code to be smaller and faster
- Added more comments
- If the format is constant string, give an error at once. This is safe as
no rows has been processed. - I moved the Item_func_tochar implementation from Item_strfunc to Item_timefunc
to be close to Item_dateformat, which is a related function. - Added proper calculation of month and day name lengths. (Was a bit complicated)
I had problem with one expression:
TO_CHAR(c3, 'YY&&&
MM|&&|DD HH24|| MI&||"abx"|SS')
In my new code, I got
00&&&\01&&|01 00| 00&||abx00
While the old code gave:
00&&&\01&&|01 00| 00&|abx00
I checked with Oracle 18c at dbfiddle.uk and Oracle gave same answer as my
current code. The difference is that the old code did a pre-check for '"'
and ignored the character before '"' in parse_special(). This caused any
following '|' to be lost.
I added also to the error message "Oracle compatibility function error: date format not recognized" a few characters from where things went wrong to make it easier to find errors:
"ERROR HY000: Oracle compatibility function error: date format not recognized at MQ-DD"
Here is my implementation, please reivew the patch in attachment, thanks!
TO_CHAR(expr, fmt)
Compared with the oracle implementation, there are the following limitations: