Skip to content

Add new read_setup() for control endpoints #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ianrrees
Copy link
Contributor

Background can be found at atsamd-rs/atsamd#738 .

The USB spec says that the SETUP transactions used with control endpoints must always be accepted by the device [1], and at least in the Microchip SAMD21's USB peripheral that behaviour is handled in hardware [2]. We've observed that some hosts will send an OUT ZLP, followed very shortly by a SETUP for a new transaction - on the SAMD21 parts this results in the OUT data (none, in this case) being overwritten by SETUP, which in the existing implementation results in an overflow error (since read() was called with a zero length slice for the OUT ZLP, but >0 bytes are available from the SETUP that overwrote it). The user-observable result is that the device fails to enumerate at all on some hosts, or sporadically on others.

In light of this, it seems the control pipe code needs some way to differentiate reads of OUT data from reads of SETUP data. This PR changes the existing read() methods to return an error if SETUP data is present, making it specific to OUT data, and adds a new read_setup() method specifically for reading SETUP data.

Most of the commits/diff here are work on the test framework, to add a new test that resets the device and times how long it takes to re-enumerate. The test is a bit crude, but seems to validate the changes to the read methods.

[1] USB 2.0 spec section 5.5.5 Control Transfer Data Sequences:

If a Setup transaction is received by an endpoint before a previously initiated control transfer is completed,
the device must abort the current transfer/operation and handle the new control Setup transaction. A Setup
transaction should not normally be sent before the completion of a previous control transfer. However, if a
transfer is aborted, for example, due to errors on the bus, the host can send the next Setup transaction
prematurely from the endpoint’s perspective.

[2] SAMD21 datasheet DS40001882H section 32.6.2.6 Management of SETUP Transactions. The key thing here is that the data is copied in to the buffer regardless of BK0RDY.

@ianrrees
Copy link
Contributor Author

@ithinuel - this is the PR discussed the other day in Matrix, I've put the corresponding atsamd-rs changes here, and test class implementations for SAMD21 and SAMD51 are here

@ianrrees
Copy link
Contributor Author

@9names ran some tests of this with Raspberry Pi 5 host and SAMD21, confirming the problem (enumeration being unreliable) and that this appears to resolve it.

While confirming the problem, 9names saw the new test case error out in an unexpected way - reporting Entity not found - I think that might require a bit more work to make the enumeration time test more reliable.

@ianrrees ianrrees changed the title Draft: Add new read_setup() for control endpoints Add new read_setup() for control endpoints Oct 12, 2024
@ianrrees
Copy link
Contributor Author

Sorry, I've not had an opportunity to look at the test code changes here, and probably won't be able to test Rust USB stuff until mid December at earliest. Since the test stuff isn't really critical, I'd like to proceed with this PR so that implementors can do their part.

@rnd-ash
Copy link

rnd-ash commented Feb 27, 2025

Out of interest, what is blocking this PR? As it is required to get USB functioning on the ATSAMD platform properly.

@ianrrees
Copy link
Contributor Author

ianrrees commented Mar 3, 2025

@eldruin @ryan-summers - Apologies for my sporadic involvement with this, I'd like to get it over the line but it's not clear to me what the next step should be. The important change is about the API change, but in order to demonstrate the problem I made some fairly large changes to the testing infrastructure, maybe those should be separated?

Also, I'm hesitant to offer as I've not been doing much programming lately, but I could try to help out with maintaining usb-device by reviewing other PRs/issues if that might help share the load a bit. Would prefer to not accept my own PRs of course!

Copy link
Member

@eldruin eldruin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on this.

  1. Have you looked deeper into the Entity not found test error? I hesitate to merge a test that may fail for unrelated and unclear reasons when another PR is submitted.
  2. IIUC, this is a breaking change, so we would need to release a 0.4 version, which will lead to a lot of friction across the ecosystem given the position of this crate. Do you have an impression of the breakage and solutions when attempting to mix usb-device versions in applications and HALs?
  3. One thing missing here is an entry to the changelog.
  4. I would welcome you to the org if you are interested to join. Mostly we do 4-eye principle so that at least one more person has a look at the changes. This is not a formal policy, though. Have a look here for further info.

ianrrees added 6 commits March 5, 2025 11:32
Need to support a test that hard resets the device and ensures that it
re-enumerates.  The current model supplies a device handle to the test
case, but this case will require a handle to the rusb context so that it
can determine when the device has enumerated.
This reduces log noise for tests that reset the device under test
Needed for testing enumeration time
@ianrrees
Copy link
Contributor Author

ianrrees commented Mar 4, 2025

  1. No, sorry I've not followed up. From memory, enumeration failures+retries are the most obvious symptom of the problem that the API change aims to fix, so the new test triggers hard resets of the device and measures the time that the OS takes to enumerate the device. I don't believe that libusb provides an API that would allow a more direct test of the device behaviour. The test case that failed is new and quite different type of test from what was here before, which is a part of why I'm thinking of separating the test changes (which might need more work) from the API change (which is relatively straightforward).
  2. Yes, this is a difficult issue - a device needs to use exactly one version of usb-device, which means that all the USB class implementations, HALs, etc need to update when there's a new release usb-device.
  3. Done, cheers!
  4. Thanks! If I can make more time for this sort of thing, I'll put my hand up to join as a maintainer.

@ianrrees
Copy link
Contributor Author

ianrrees commented Mar 4, 2025

One thing I'm curious about: have other UsbBus implementations than the Microchip SAM HAL got some clever solution for this, with the existing usb-device? My hunch is that it isn't possible to make a correctly-behaving device without some way to distinguish between reads of SETUP and OUT data, but my usb-device experience is limited to SAMD devices.

@ithinuel
Copy link
Contributor

ithinuel commented Mar 5, 2025

I had some troubles with zlp followed by setup while implementing it on rp2040.
The device keeps flags for things it received. I'm on my phone rn but will edit when I get a chance to get back to my computer to link the impl we have.

EDIT:
We capture in the poll method if the setup flag was set here in inner.read_setup and remember it until a read is issued on ep_0 here.

And if a ZLP got discarded by the hardware/software, we ignore it here.

@ianrrees
Copy link
Contributor Author

ianrrees commented Mar 5, 2025

Interesting, thanks @ithinuel ! That is a clever workaround, and I think the need for it shows that the usb-device API has a bit of a gap in this area. I'll rework this PR a little so it doesn't talk about the SAMD target, because your example shows the same underlying issue.

Presuming that the RP2040 USB peripheral works like the SAMD one, I'd note that the workaround leaves a possibly-large timing hole - if poll() is called after the ZLP, poll() returns, SETUP appears, then ep_read() is called.

@ithinuel
Copy link
Contributor

ithinuel commented Mar 5, 2025

True, I realize that as I was looking at the code & thinking of your use case:

- -> ZLP
- poll()
- -> SETUP
- read(0, 0)
- poll()
- read(0, 8) <- would probably crash.

That being said, I never experienced any issue with it so far. I’m using the rp2040 as my keyboard through a KVM causing the usb to re-enumerate everytime I switch computer.
My guess would be that because the rp2040 runs at least 3 times faster than the SAMD21, so the poll-read sequence is superfast.

@ianrrees
Copy link
Contributor Author

ianrrees commented Mar 5, 2025

When that happened in the SAMD case, the read() would fail (silently, IIRC) and the host would retry the SETUP unless it had failed too many times. With most hosts, it was reliable enough to just be an annoyance (enumeration would take a second or two longer) but some never worked.

This change aims to resolve issues where an expected OUT Zero Length
Packet (ZLP) was overwritten by a subsequent SETUP transaction.  The USB
spec requires that devices accept SETUP transactions, this behaviour is
implemented directly in some USB hardware.  To allow the control pipe
implementation to distinguish between OUT data and SETUP data, the
read() method was changed to return an error if SETUP data was present,
and a new read_setup() was added that expects SETUP data not OUT data.
@ianrrees
Copy link
Contributor Author

ianrrees commented Mar 22, 2025

Just a nudge on this - maybe good to integrate a couple outstanding PRs (eg #147) and cut a release.

My feeling is that, although it's a pain to update all the other crates, this is as the documentation states a work-in-progress and this PR allows the HAL implementations to fix incorrect behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants