Cirq-ionq uses unparseable ASCII character in encoding #5216

Cynocracy · 2022-04-07T01:37:16Z

Filing this as both a slight-annoyance we plan on tackling and a 'buganizer Wonderbug' for @dabacon since I can't bug him on buganizer for real anymore ;)

Cirq uses unparseable ASCII characters in its serialization format for IonQ. This is not actively causing issues, but it has broken us in the past. We would like to stop using them, and this bug tells the story of why.

How should it (even) work?

It's explained in a comment here: https://github.com/quantumlib/Cirq/blob/master/cirq-ionq/cirq_ionq/serializer.py#L241-L246

Each key and targets are serialized into a string of the form `key` + the ASCII unit
        separator (chr(31)) + targets as a comma separated value.  These are then combined
        into a string with a seperator character of the ASCII record separator (chr(30)).
        Finally this full string is serialized as the values in the metadata dict with keys
        given by `measurementX` for X = 0,1, .. 9 and X large enough to contain the entire
        string.

Okay, cool, we're using some ASCII control point characters (Unit Separator, or the 31st/1F-th ASCII character, and Record Separator, the 30th/1E-th). These are characters that don't ever occur in any normal text, they're part of ASCII, so they work even if they're being serialized into a non-unicode-safe storage layer. What could go wrong?

Why it doesn't work

"Modern" languages really don't like those characters in JSON. Here's an example, using vanilla python 3.9

z = '{ "foo": "b' + chr(31) + 'ar" }'
json.loads(z)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 12 (char 11)

You get similar results in other languages (in javascript for example).

That's a bummer, the trick to use valid ASCII breaks parsing of JSON, and since we want to embed these strings in JSON, that's a problem.

But wait, it's like this, right now! Which brings us to

Why it works anyway

Cirq/cirq-ionq/cirq_ionq/job.py

Lines 167 to 170 in fe7fe4e

    
               for key_value in full_str.split(chr(30)): 
        
                   key, value = key_value.split(chr(31)) 
        
                   measurement_dict[key] = [int(t) for t in value.split(',')] 
        
           return measurement_dict

Cirq requires these characters when it splits the metadata on a returned job to get the measurement keys out.

If this is true, how does it work? The character is needed here, but /can't/ be returned by the API, or browsers/clients everywhere will choke on the data!

The answer is implicit unicode conversion!

If you dump the response you get from api.ionq.co you'll see

0060: 22 2c 22 6d 65 74 61 64 61 74 61 22 3a 7b 22 6d ","metadata":{"m
0070: 65 61 73 75 72 65 6d 65 6e 74 30 22 3a 22 78 5c easurement0":"x\
0080: 75 30 30 31 66 30 5c 75 30 30 31 65 79 5c 75 30 u001f0\u001ey\u0
0090: 30 31 66 31 22 2c 22 73 68 6f 74 73 22 3a 22 31 01f1","shots":"1
00a0: 30 30 30 22 7d 2c 22 70 72 65 64 69 63 74 65 64 000"},"predicted

Ah, so it's using escape sequences \u001f and \u001e, not chr(31) at all!
When python serializes to json, it handles this properly:

x = {'foo': 'bar\x1f'}

json.dumps(x)
'{"foo": "bar\\u001f"}'

And that's how we see it at the API itself.

However, we've now lost the original motivation for using these characters! It's not ASCII at all. And if a change were made to support only ASCII, it might even be reasonable to encode the string as '\x1f', and, well, bad things result.

So this is the story of how we ended up trying to use ASCII, fell backward into unicode without realizing, and are now in the uncomfortable position of requiring the client and server to be unicode-aware.

The text was updated successfully, but these errors were encountered:

Cynocracy · 2022-04-07T01:51:43Z

Anyway, we found this one day when our server-side serializer got a little less smart, and started returning '\x1f' without encoding it.

This is called out in https://www.rfc-editor.org/rfc/rfc7159#page-8

All Unicode characters may be placed within the
   quotation marks, except for the characters that must be escaped:
   quotation mark, reverse solidus, and the control characters (U+0000
   through U+001F).

but you'd have to unicode to know that :)

dabacon · 2022-04-07T18:22:11Z

Wonderbug indeed!

There is no reason to not use another character, as far as I remember. This is a hack to get around that the IonQ api doesn't have a concept of measurement key IIRC.

tanujkhattar · 2022-04-08T20:39:44Z

xref #5151

Cynocracy · 2022-05-11T18:50:52Z

Adding comment so this can be assigned to me

vtomole · 2022-05-11T19:04:36Z

@Cynocracy Done.

dstrain115 · 2022-06-06T13:53:22Z

@Cynocracy What is the status of this issue? Do you need help implementing this? (or does it rely on coordinated changes server-side as well?)

dabacon · 2022-06-09T19:56:20Z

I'm not sure if this is needed for pre-1.0. Changing this to a new delimiter should work transparently given that the API maintains the "we send you this, we get it back". There may need to be something done on the API side if things are introspected, but that's a separate issue. Going to move to after-1.0.

Cynocracy added the kind/bug-report Something doesn't seem to work. label Apr 7, 2022

Cynocracy changed the title ~~Cirq-ionq uses unparseable ASCII character~~ Cirq-ionq uses unparseable ASCII character in encoding Apr 7, 2022

95-martin-orion added triage/accepted A consensus emerged that this bug report, feature request, or other action should be worked on time/before-1.0 labels Apr 27, 2022

vtomole assigned Cynocracy May 11, 2022

viathor added area/interop area/serialization area/ecosystem labels May 20, 2022

dabacon added time/after-1.0 and removed time/before-1.0 labels Jun 9, 2022

MichaelBroughton removed the time/after-1.0 label Jul 18, 2022

mhucka added this to the 2025-T3: Optimize circuit serialization milestone Nov 13, 2024

mhucka added this to Cirq 2025 roadmap Nov 13, 2024

mhucka added the area/json label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cirq-ionq uses unparseable ASCII character in encoding #5216

Cirq-ionq uses unparseable ASCII character in encoding #5216

Cynocracy commented Apr 7, 2022 •

edited

Loading

Cynocracy commented Apr 7, 2022

dabacon commented Apr 7, 2022

tanujkhattar commented Apr 8, 2022

Cynocracy commented May 11, 2022

vtomole commented May 11, 2022

dstrain115 commented Jun 6, 2022

dabacon commented Jun 9, 2022

Cirq-ionq uses unparseable ASCII character in encoding #5216

Cirq-ionq uses unparseable ASCII character in encoding #5216

Comments

Cynocracy commented Apr 7, 2022 • edited Loading

Cynocracy commented Apr 7, 2022

dabacon commented Apr 7, 2022

tanujkhattar commented Apr 8, 2022

Cynocracy commented May 11, 2022

vtomole commented May 11, 2022

dstrain115 commented Jun 6, 2022

dabacon commented Jun 9, 2022

Cynocracy commented Apr 7, 2022 •

edited

Loading