This article is contributed. See the original author and article here.
Azure Data Factory will soon add support to Data Flows for “no quote character” in DelimitedText datasets for both source and sink. QuoteChar is the single character to quote column values if the column data contains the column delimiter. The typical scenario is double quotes (“). But with no quote char, it means there is no quote char present and the column value is not quoted, while escapeChar is used to escape the column delimiter and itself.
You can find and set the property in the Delimited Text dataset type.
*** Potential Breaking Change ***
If you currently are using the Delimited Text dataset in a dataflow that has quote char to “No quote char” and escape char set to any character except “No escape char”, this may result in unexpected behavior, depending upon your actual data, as described below.
If the dataset is used in dataflow source, the breaking change will occur only if your column value contains *multiple consecutive escape characters* (default: ); or it contains an escape character following with a column delimiter (default: ,).
E.g.
- If original data has column values with a double escape character, like abcd, the previous behavior will treat it as abcd. But the new behavior will treat it as abcd. Since are translated as escape ‘escape char’ itself.
- If original data has value with an escape character following with a column delimiter like ab,cd, the previous behavior will treat it as ab and cd, two columns. But the new behavior will treat it as ab,cd, one column. Since , are translated as an escape column delimiter.
If the dataset is used in a dataflow sink, the new behavior will occur if your column value contains column delimiters (default: ,); or it contains escape character (default: )
E.g.
- If the original data has column values with a column delimiter like “ab,cd,” the previous behavior will output “ab,cd”. But the new behavior will output “ab,cd”. This is because the column delimiter needs to be escaped.
- If original data has column values with an escape character like abcd, the previous behavior will output abcd. But the new behavior will output abcd. Since escape character needs to be escaped.
Mitigation
- If your pipeline doesn’t face these conditions mentioned above, you can just keep your pipeline as is.
- If your pipeline has breaking changes mentioned above, you can mitigate this by setting the quote char to a character that would not be part of your data values.
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.
Recent Comments