-
-
Notifications
You must be signed in to change notification settings - Fork 76
bugfix: fixed incorrect bytestring encoding PlutusData #269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix: fixed incorrect bytestring encoding PlutusData #269
Conversation
Is there a reason to introduce MetadataIndefiniteList? I don't see the benefit over using IndefiniteList directly |
pycardano/serialization.py
Outdated
# Currently, cbor2 doesn't support indefinite list, therefore we need special | ||
# handling here to explicitly write header (b'\x9f'), each body item, and footer (b'\xff') to | ||
# the output bytestring. | ||
encoder.write(b"\x9f") | ||
for item in value: | ||
encoder.encode(item) | ||
if ( | ||
isinstance(value, MetadataIndefiniteList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why the encoding as MetadatIndefiniteList is necessary, can we not simply encode all bytes longer than 64 bytes as a list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. I'm still new to Cardano. I guess these were my thoughts and how I would push back on encoding all bytes the same way.
- If bytes length only has a restriction for metadata, will encoding them the same way in other parts of the message cause an issue?
- Why impose the same criteria on other parts of the message if it's not strictly required?
- Won't chunking bytes data cause nominally larger message lengths, and by extension, marginally higher tx fees?
If you feel these are non-issues, I think it's easy enough to remove the dummy class and encode everything the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in retrospect your comments on the issue I raised make more sense now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the cardano ledger generally specifies that the cbor encoding of bytestrings should be at most 64 bytes long each piece. This would prevent OOM attacks when reading very long bytestrings. However the ledger does not enforce this, leading to different implementations being abound on chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't OOM attack prevented my maximum transaction size anyway? Again, just playing devils advocate here. I'm still relatively new to Cardano.
I can revert changes back to the original without the dummy class for PlutusData
. Just give me a yay or nay. With these changes I was able to successfully submit to smart contracts, so I know these changes work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to bump this so I can finish it off :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer this without the dummy class - maybe you can revert to that and see if both the test cases and your submission pass?
then Jerry or I can create a test case for exactly this datum submission
Removed the dummy class. All unit tests pass, including the specific test created to test a long bytestring. My specific use case works. I'm happy to do any additional work required to finish this off. I didn't see any contributing doc, so I don't know how versioning is handled. |
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. @@ Coverage Diff @@
## main #269 +/- ##
==========================================
- Coverage 85.14% 84.81% -0.34%
==========================================
Files 26 26
Lines 2995 3022 +27
Branches 719 728 +9
==========================================
+ Hits 2550 2563 +13
- Misses 335 345 +10
- Partials 110 114 +4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this! Posted a question in the code. Apology for the delayed response.
pycardano/serialization.py
Outdated
@@ -176,7 +176,14 @@ def default_encoder( | |||
# the output bytestring. | |||
encoder.write(b"\x9f") | |||
for item in value: | |||
encoder.encode(item) | |||
if isinstance(item, bytes) and len(item) > 64: | |||
encoder.write(b"\x5f") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only activated when an item is inside a indefinite list. Do we need to break byte strings that are not part of indefinite list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK we need to break all bytes that are longer than 64 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have misunderstood, but it seemed to me that this was the best place to put it since all PlutusData
are cast to IndefiniteList
.
If I pull it out of the IndefiniteList
block, will it be handled properly? I guess it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you (correctly) noticed that all PlutusData fields are part of an indefinite list. However plutusdata can also contain bytes without being part of PlutusData (i.e. pure bytes or bytes that are keys in dictionaries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the final answer to pull it outside of the IndefiniteList
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this documentation it seems that yes, we need dummy classes. But not for lists, for bytes! :)
I am also wondering if there are cases where integers are incorrectly encoded (when they exceed 64 bytes size) since I implemented a special case for this here: https://github.com/OpShin/uplc/blob/448f634cc1225de6dd7390b670b01396d2e71156/uplc/ast.py#L430
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I am seeing more and more the intuition behind all the custom classes in OpShin.
I realize it's a bigger lift, but is there any reason why we wouldn't just take OpShin's implementation and pull it over to here? Then, just rely on pycardano rather than duplicating efforts across repos?
I apologize if I'm speaking out of ignorance and there are things I'm not considering, but this seems like it might be the more lasting implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries at all. The code I wrote for OpShin/UPLC was created after pycardano was written, hence there might be a point in copying it over. Then again, the UPLC implementation is really only catered towards PlutusData, while PyCardano also handles serialization of all other kinds of things - not sure if anything will break.
Long story short: The only reason that there are two different implementations is that no one yet tried to unify them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I would like to have this done sooner rather than later. Can I just create a dummy class for bytes to patch this and open a more general issue about syncing datum handling between OpShin and pycardano?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sounds good to me! Would also prefer to get this resolved over any big open stale PR :)
Alright, this was annoying. I made a new dummy class to wrap There's still one failing unit test that has to do with generating the script hash. I didn't have a chance to dig into it. Maybe you can take a look and more easily see what might be the issues. Once that is resolved, this should be good. |
Alright, as best as I can tell, the last failing test was caused by serializing the |
test/pycardano/test_util.py
Outdated
@@ -149,7 +149,7 @@ def test_script_data_hash(): | |||
redeemers = [Redeemer(unit, ExecutionUnits(1000000, 1000000))] | |||
redeemers[0].tag = RedeemerTag.SPEND | |||
assert ScriptDataHash.from_primitive( | |||
"032d812ee0731af78fe4ec67e4d30d16313c09e6fb675af28f825797e8b5621d" | |||
"b11ed6f6046df925b6409b850ac54a829cd1e7603145c9aaf765885d8ec64da7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this should change. If we use write the same test in Haskell, it would generate the same hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as you noticed in the other comment, this changes because all bytes are being encoded the same way per nielstrons suggestion.
Your comment makes sense. If we only change encoding in metadata/plutusdata, then the hash would not change.
pycardano/serialization.py
Outdated
elif isinstance(value, bytes): | ||
return ByteString(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it seems incorrect to replace every bytes with ByteString. Instead, we just offer ByteString for users to use in PlutusData or Metadata.
For some internal implementation that generates bytes as intermediate values, e.g. script_data_hash, we don't want to change its type arbitrarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be implementable by a simple parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was one of my original points and how I originally had it implemented.
Can someone please make a definitive final decision so I can fix and be done? I've implemented and reimplemented this multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nielstron Could you elaborate how adding a parameter will work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not work as straightforward as I imagined it. @theeldermillenial was right, maybe we should just roll with the initial design. I appreciate the excourse though because now we know precisely which bytes to encode this way 😅 sorry for the divergence.
maybe we can document this (and ideally find some supporting documentation on the discrepancy in the implementation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preferred approach is to offer users ByteString
class to use, which the encoder can automatically break it down to byte array. If a bytes object is found longer than 64 instead, pycardano should raise an exception and recommend users to use ByteString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cffls But should this error only be thrown for PolusData
and Metadata? Or should we apply it globally?
My two cents is "only implement exactly what is defined". The bytes length restriction appears to be limited to "metadata", so maybe we only apply it to metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have this check for metadata: https://github.com/Python-Cardano/pycardano/blob/main/pycardano/metadata.py#L40-L49
I thought this should be also enforced in PlutusData, which was the reason why this PR was raised. If not, I am fine with only providing ByteString as an option for user in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My apologies. I misspoke. When I said Metadata, I also meant PlutusData.
Also, I now see exactly what you're saying, and I think you're solution makes the most sense. You are saying we should inject a check for long byte strings in PlutusData and throw an error similar to what is seen in Metadata. Part of that error message should indicate the user can use the new ByteString class to allow longer bytes.
I think this is the most transparent approach, and it keeps in line with what I see to be pycardano's philosophy of being very unbiased.
If this is what you mean, I'll make the changes and we can be done. I will revert any hashes I altered, since this should really only affect the test I created. If there's any other unit tests you would like to see, I'm happy to add them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is exactly what I meant. Please go ahead with this approach. Thank you for confirming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Some code failed mypy analysis, you can do make qa
to check errors.
Should be good to go now. Now that I know all your qa stuff, it should be easier for me to contribute. I'm dedicated to building stuff on Cardano in Python, so I'll try to contribute as I find things. Like #273 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for your contribution!
Fix #268
Summary
pycardano
incorrectly encodesbytes
longer than 64 bytes forPlutusData
. Currently, abytes
element is encoded the same regardless of length, but if the length is larger than 64 bytes it should be broken up into 64 byte chunks per the following spec:https://developers.cardano.org/docs/get-started/cardano-serialization-lib/transaction-metadata/#metadata-limitations
This PR fixes this issue by creating a dummy class to catch
PlutusData
objects during serialization and properly breaks upbytes
values longer than 64 bytes in length.If further explanation is needed, I can provide examples. This PR provides a unit test to verify the expected output is obtained with a long
bytes
input.