Handle faulty TCP connection between XHarness and Apple devices
See original GitHub issueContext
When we run mobile device tests on Apple platforms, we create a TCP tunnel through the USB cable that the AppleTV / iPhone is attached by. The TCP tunnel is created by a tool called mlaunch that XHarness uses to talk to the device. A separate mlaunch process is launched on the side that creates the tunnel and XHarness reads its stdout.
flowchart LR
Step1[XHarness]
Step2[mlaunch --tcp]
Step3[TCP tunnel]
Step4[Apple Device]
Step5[mlaunch]
Step1--spawns a backgroud mlaunch process-->Step2
Step3--listens to TCP-->Step1
Step2--creates TCP tunnel-->Step3
Step4--writes data-->Step3
Step1--calls -->Step5
Step5--installs + runs app-->Step4
Problem with TCP
It can happen that the TCP tunnel between XHarness and the device fails:
[12:46:55] dbug: [TCP tunnel] Xamarin.Hosting: Attempting USB tunnel between the port 56661 on the device and the port 56661 (21981) on the mac: 61
[12:46:55] dbug: [TCP tunnel] Xamarin.Hosting: Failed to connect to port 56661 (21981) on device (error: 61)
It can also happen that the tunnel is created but the device cannot connect to it (desribed in https://github.com/dotnet/xharness/issues/934).
When either of thees happens, the app running on the device fails to connect as a plan B just writes into stdout instead the TCP connection (the stdout is then in the net.dot.System.Runtime.Tests.log
log). So while the app did performs the actual unit tests fine, the run is considered an APP_CRASH
because the app never talked back.
Goal
The subject of this issue is to not solve the TCP flakiness as that is somewhere between iOS, MacOS, mlaunch and the .NET runtime. The problem we want to solve here is that we qualify the TCP as APP_CRASH
which puts it in the same boat as when the app actually crashes.
Instead, we’d like to recognize the TCP tunnel being the problem and exit with some new exit code (e.g. TCP_CONNECTION_FAILED
).
We can then set retries for this specific case as this can be considered an infra failure and let APP_CRASH
not retry.
To summarize:
- Add a new exit code to XHarness
TCP_CONNECTION_FAILED
and return it instead ofAPP_CRASH
when TCP dies - Allow retries for this new exit code + revert retries for
APP_CRASH
(see #11689)
Development instructions
It should be easy to stage a repro. You can do it like this:
- We create the TCP tunnel here: https://github.com/dotnet/xharness/blob/bc9877ac24c13ef8d4ad4c2c3652291a1b02b78f/src/Microsoft.DotNet.XHarness.Apple/AppOperations/AppTester.cs#L166-L172
- We get the port it was created it on https://github.com/dotnet/xharness/blob/bc9877ac24c13ef8d4ad4c2c3652291a1b02b78f/src/Microsoft.DotNet.XHarness.Apple/AppOperations/AppTester.cs#L177
- We send it to the mobile application via env variable https://github.com/dotnet/xharness/blob/bc9877ac24c13ef8d4ad4c2c3652291a1b02b78f/src/Microsoft.DotNet.XHarness.Apple/AppOperations/AppTester.cs#L223
- The app then connects to that port and this event is called: https://github.com/dotnet/xharness/blob/bc9877ac24c13ef8d4ad4c2c3652291a1b02b78f/src/Microsoft.DotNet.XHarness.Apple/AppOperations/AppTester.cs#L197
To repro, you can just send the wrong port to the app (e.g. deviceListenerPort + 1
) and this way the phone/simulator won’t connect.
You can use the XHarness E2E tests to create a Helix job from dotnet/xharness
: https://github.com/dotnet/xharness/blob/main/tools/run-e2e-test.ps1
You should be able to work with this. More details are also here: https://github.com/dotnet/xharness/issues/934
Release Note Category
- Feature changes/additions
- Bug fixes
- Internal Infrastructure Improvements
Release Note Description
Improved handling of faulty TCP connection between XHarness and Apple devices
Issue Analytics
- State:
- Created 10 months ago
- Comments:13 (13 by maintainers)
Top GitHub Comments
Thanks @AlitzelMendez, assigning to myself and will monitor this through out the week
Actually, will keep the customer issue up for tracking of this: https://github.com/dotnet/arcade/issues/11683