Connections forcably closed
See original GitHub issueDear lschoe,
I’ve been experiencing very strange MPyC behaviour since Monday. I run the following code with parameter “-M 3”:
from mpyc.runtime import mpc
async def main():
async with mpc:
secnum = mpc.SecFxp()
a = secnum(2 if mpc.pid == 0 else None)
a = mpc.input(a, 0)
print(await mpc.output(a))
b = 1 / a
print(await mpc.output(b))
if __name__ == "__main__":
mpc.run(main())
Expected outcome:
2.0
0.5
However, the code results in the following error:
Traceback (most recent call last):
File "C:/Users/kamphorstb/git_repositories/CONVINCED/kaplan-meier/scripts/tester.py", line 15, in <module>
mpc.run(main())
File "C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\runtime.py", line 161, in run
return self._loop.run_until_complete(f)
File "C:\Users\kamphorstb\AppData\Local\Programs\Python\Python37\lib\asyncio\base_events.py", line 581, in run_until_complete
raise RuntimeError('Event loop stopped before Future completed.')
RuntimeError: Event loop stopped before Future completed.
2020-07-15 14:44:03,707 Start MPyC runtime v0.6.8
2020-07-15 14:44:05,048 All 3 parties connected.
2.0
2020-07-15 14:44:05,075 Exception in callback mpc_coro.<locals>.typed_asyncoro.<locals>.<lambda>(<Task finishe...ut of range')>) at C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\asyncoro.py:387
handle: <Handle mpc_coro.<locals>.typed_asyncoro.<locals>.<lambda>(<Task finishe...ut of range')>) at C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\asyncoro.py:387>
Traceback (most recent call last):
File "C:\Users\kamphorstb\AppData\Local\Programs\Python\Python37\lib\asyncio\events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\asyncoro.py", line 387, in <lambda>
d.add_done_callback(lambda v: _reconcile(decl, v))
File "C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\asyncoro.py", line 299, in _reconcile
givn = givn.result()
File "C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\asyncoro.py", line 237, in _wrap
return await coro
File "C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\asyncoro.py", line 227, in __await__
val = self.coro.send(None)
File "C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\runtime.py", line 393, in _recombine
return thresha.recombine(field, points)
File "C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\thresha.py", line 79, in recombine
s = shares[i][h]
IndexError: list index out of range
2020-07-15 14:44:05,085 Task was destroyed but it is pending!
task: <Task pending coro=<_wrap() running at C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\asyncoro.py:237> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x000002E001D386A8>()]> cb=[mpc_coro.<locals>.typed_asyncoro.<locals>.<lambda>() at C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\asyncoro.py:387]>
2020-07-15 14:44:05,085 Task was destroyed but it is pending!
Running the same code in three terminals (additional parameter “-I x”, x = 0, 1, 2) results in similar messages for players 0 and 1. Player 2 instead shows the following:
2020-07-15 14:30:25,173 Start MPyC runtime v0.6.8
2020-07-15 14:30:25,574 All 3 parties connected.
2.0
a: <mpyc.sectypes.SecFxp32:16 object at 0x000001753828C548>
f: 16
2020-07-15 14:30:26,047 Exception in callback _SelectorSocketTransport._call_connection_lost(ConnectionRes..., 10054, None))
handle: <Handle _SelectorSocketTransport._call_connection_lost(ConnectionRes..., 10054, None))>
Traceback (most recent call last):
File "C:\Users\kamphorstb\AppData\Local\Programs\Python\Python37\lib\asyncio\events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "C:\Users\kamphorstb\AppData\Local\Programs\Python\Python37\lib\asyncio\selector_events.py", line 926, in _call_connection_lost
super()._call_connection_lost(exc)
File "C:\Users\kamphorstb\AppData\Local\Programs\Python\Python37\lib\asyncio\selector_events.py", line 700, in _call_connection_lost
self._protocol.connection_lost(exc)
File "C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\asyncoro.py", line 125, in connection_lost
raise exc
File "C:\Users\kamphorstb\AppData\Local\Programs\Python\Python37\lib\asyncio\selector_events.py", line 814, in _read_ready__data_received
data = self._sock.recv(self.max_size)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
2020-07-15 14:30:28,334 Exception in callback _SelectorSocketTransport._call_connection_lost(ConnectionRes..., 10054, None))
handle: <Handle _SelectorSocketTransport._call_connection_lost(ConnectionRes..., 10054, None))>
Traceback (most recent call last):
File "C:\Users\kamphorstb\AppData\Local\Programs\Python\Python37\lib\asyncio\events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "C:\Users\kamphorstb\AppData\Local\Programs\Python\Python37\lib\asyncio\selector_events.py", line 926, in _call_connection_lost
super()._call_connection_lost(exc)
File "C:\Users\kamphorstb\AppData\Local\Programs\Python\Python37\lib\asyncio\selector_events.py", line 700, in _call_connection_lost
self._protocol.connection_lost(exc)
File "C:\Users\kamphorstb\git_repositories\CONVINCED\kaplan-meier\.venv\lib\site-packages\mpyc\asyncoro.py", line 125, in connection_lost
raise exc
File "C:\Users\kamphorstb\AppData\Local\Programs\Python\Python37\lib\asyncio\selector_events.py", line 814, in _read_ready__data_received
data = self._sock.recv(self.max_size)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Here, the a: <mpyc.sectypes.SecFxp32:16 object at 0x000001753828C548>
and f: 16
are print statements that I put in mpc._rec()
. Players 0 and 1 did not print anything here.
The issue rises in MPyC versions 0.6, 0.6.5, 0.6.7, 0.6.8 (versions where this used to work!) and for various versions of Python 3. It also arises in a Docker environment (kind of ruling out firewall issues). My colleague experiences the same issue on a different workstation.
My colleague later noted that an easier example (which we did not quite investigate as thoroughly) kept running forever:
from mpyc.runtime import mpc
async def main():
async with mpc:
secnum = mpc.SecFxp()
a = secnum(2 if mpc.pid == 0 else None)
a = mpc.input(a, 0)
print(await mpc.output(a))
b = a * a # simpler than 1 / a
print(await mpc.output(b))
if __name__ == "__main__":
mpc.run(main())
I hope that you can reproduce the issue. If you need any further information I’d be happy to provide that.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
Agreed. Will try and include the switch to the new default behavior soon.
Indeed, I had not considered multiple senders. Although this can probably be solved I agree that things get more messy and that this might not be the optimal solution.
I appreciate the elegance of a default check that sets the integral bit to True, but I guess that such an automatic and under-the-hood feature should never cause the issues that I ran into – any user could run into this issue. Explicitly opting for integral=True ensures that the user (to some extend) is aware that something has changed, making it slightly easier to find the cause of the issue (even though it is still hard to identify in the not-so-explicit traceback / infinite loop). Moreover, the novel default behaviour can no longer cause the error. I like your suggestion to have this as the default behaviour.
Of course I can not quite predict the effects or potential side-effects in making this change as well as you, but I don’t foresee any functionality issues. Just the efficiency loss in specified protocol implementations that do not explicitly set the integral bit, as you mentioned. In my own work, I don’t expect any change in efficiency as I heavily utilise the support for non-integer secure numbers. Also, you have quite an extensive testsuite so that should give some confidence in the robustness of this change 😃