JsonML serialization not escaping characters

Bug #878508 reported by Carlos Manuel Lopez
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Zorba
Fix Released
High
Paul J. Lucas

Bug Description

The module doesn't convert escaped characters as you would expect. You instead get a string containing the string with it's unescaped value. A conversion needs to be implemented, something such as:
JSON <-> XML
\" <-> &quot;
\\ <-> \
\/ <-> /
\b <-> &#x8;
\f <-> &#xC;
\n <-> *actual newline*
\r <-> *actual carriage return*
\t <-> ' '
\u$$$$<-> &#x$$$$; or #$$$$$; with the correct hex-decimal conversion
< <-> &lt;
> <-> &gt;
& <-> &amp;
' <-> &apos;

This proposition might create a regresion related to bug #866757.

Related branches

Changed in zorba:
assignee: nobody → Sorin Marian Nasoi (sorin.marian.nasoi)
Changed in zorba:
importance: Undecided → Medium
Changed in zorba:
status: New → In Progress
Revision history for this message
Sorin Marian Nasoi (sorin.marian.nasoi) wrote :

This bug will be fixed with version 2.0 of the core JSON module => as a result this bug is linked with the lp:~zorba-coders/zorba/feature-json_parser branch.

Changed in zorba:
status: In Progress → Fix Committed
Changed in zorba:
status: Fix Committed → Fix Released
Revision history for this message
mb21 (mauro-bieg) wrote :

Is it possible that this hasn't been fixed yet? I still get unescaped double quotes when converting a node that contains a string with quotes...

Chris Hillery (ceejatec)
Changed in zorba:
status: Fix Released → Confirmed
assignee: Sorin Marian Nasoi (sorin.marian.nasoi) → Paul J. Lucas (paul-lucas)
importance: Medium → High
milestone: none → 2.7
Revision history for this message
Chris Hillery (ceejatec) wrote :

Yep, it looks like this isn't working. Paul, please investigate.

I'm attaching a query that demonstrates most of the missing escape sequences in JSON serialization (based on Mauro's original). I only didn't include any unicode escapes since I know I don't know how to generate those correctly.

I suspect that JSONiq does many of these correctly, so perhaps the serialization code could be shared? On the other hand, now would be a good time to create some test cases for all of these with JSONiq and verify that it does indeed handle them right...

Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

FYI: when I run this query, I get:

static error [err:XQST0090]: "8": invalid character reference in XML 1.0

It's referring to the &#8;

Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

After some investigation, I don't think there's anything I can do about this. When *I* get a string's value via store::Item::getStringValue(), it is in its canonical representation, i.e., &lt; is actually a literal <, etc. There's some other XML serializer "downstream" from me that converts the illegal characters in XML to their &'d counterparts. If it's serializing JSON, clearly, it shouldn't do that.

Note that characters like <, >, &, are actually *legal* in JSON, so those shouldn't be converted at all. (In fact, it's illegal to \ them.)

Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

BTW: when I wrote, "... I don't think there's anything I can do about this ..." I meant, "... until the "downstream" XML serializer is fixed first.

Revision history for this message
Chris Hillery (ceejatec) wrote :

I think your comment is partially true for Mauro's query, but not as far as I can tell, mine. Mauro returned the result of json:serialize(), which meant that it would then be serialized as an XML string by Zorba, causing the escapes to be screwed up. However, in my query, I used file:write() to output the string directly to a file named "output.txt". All the same problems are evident in that file, which means that the returned value of json:serialize() is already wrong.

Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

And my point is that when *I* see the string values, I see them in their in-memory form. For example, for the < character, I see it as a < character. This is perfectly legal in JSON which means I should do *nothing* to it. Yet some other code downstream from me is converting that to &lt;. My code isn't doing that so there's nothing for me to fix.

Revision history for this message
Paul J. Lucas (paul-lucas) wrote :

And I am using your query. The < is being converted to &lt; even when using json:serialize(). It's not *me* doing that.

Revision history for this message
Chris Hillery (ceejatec) wrote :

Ok, you're correct about that. Apparently file:write() still does XML-ish serialization by default. So that explains the introduction of entities and character references.

However, there is still a bug here, as far as I can tell. I'm attaching a new reproducer. You should run it with

  zorba -f -q foo.xq --serialize-text

This uses the "text" serialization method, which should simply dump the in-memory contents of the string to the screen. When run this way, all the entities and character references are gone. But, illegal double-quotes and backslashes remain, as do several illegal control characters (you can see the latter by piping the output through cat -vet). The output also contains newlines inside JSON strings which I'm pretty sure aren't legal. The tab character is gone entirely, but I think that is happening during query parsing; not sure.

So, unless there is something else going on, the output string from json:serialize() is still not guaranteed to be valid JSON. The first thing you should do is put some debug code into the implementation of that function to output the actual return value, to be 100% sure that the on-screen output is in fact exactly the same bytes as the return value. But, assuming that it is, there's a bug.

summary: - JSON Module not escaping escape characters
+ JsonML serialization not escaping characters
Changed in zorba:
status: Confirmed → Fix Committed
Changed in zorba:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.