Feature: Multiprocessing-based Backend using SharedMemory + Pickle 5 for 2-3x Faster IPC
See original GitHub issueIntroduction
Hi all. I am writing a custom backend that uses multiprocessing processes, but for the IPC uses Python 3.8 SharedMemory and ouf-of-band Pickle Protocol 5. I am opening this issue to bring to your attention this work which could be of benefit to you because of its improved IPC performance and potentially merge it as a separate backend.
This new method of IPC is usually 2-3x faster than traditional Pipes.
Explanation
The purpose of SharedMemory (not file-based, but enabled by the OS kernel) is that it is a much more efficient mechanism to sharing large amounts of data than using a standard Pipe. It is also faster than using mmap’d file-based sharing.
The purpose of out-of-band Pickle Protocol 5 is that it performs fewer copies of the data during pickling. The pickling dumps
returns a list of buffers rather than one data buffer, avoiding the aggregating copy there. Also, the loads
builds the python object directly from the buffers without a copy there - effectively for free.
Demo
To illustrate the benefits of this type of IPC, here is a simple program comparing SharedMemory + Pickle 5 vs Pipes + regular Pickle. Note that the SharedMemory example does not set up a separate process, but transfer will be unaffected.
When transfering a large python object, in this case big_array = np.arange(5 * 10**7)
:
SharedMemory + Pickle 5 takes 0.6724 seconds Pipes + regular Pickle takes 1.6099 seconds
Demo Files
SharedMemory:
from multiprocessing.shared_memory import SharedMemory
import time
import numpy as np
import pickle
import pickle_utils
import copy
def sender(obj):
# Pickle the object using out-of-band buffers, pickle 5
buffers = []
data = pickle.dumps(
obj,
protocol=pickle.HIGHEST_PROTOCOL,
buffer_callback=lambda b: buffers.append(b.raw()),
) # type: ignore
# Pack the buffers to be written to memory
data_sz, data_ls = pickle_utils.pack_frames([data] + buffers)
# Create and write to shared memory
shared_mem = SharedMemory(create=True, size=data_sz)
write_offset = 0
for data in data_ls:
write_end = write_offset + len(data)
shared_mem.buf[write_offset:write_end] = data # type: ignore
write_offset = write_end
# Clean up
shared_mem.close()
return shared_mem.name, data_sz
def receiver(shared_mem_name, data_sz):
# Read the shared memory
shared_mem = SharedMemory(name=shared_mem_name)
data = shared_mem.buf[:data_sz]
# Unpack and un-pickle the data buffers
buffers = pickle_utils.unpack_frames(data)
obj = pickle.loads(buffers[0], buffers=buffers[1:]) # type: ignore
# Bring the `obj` out of shared memory
ret = copy.deepcopy(obj)
# Clean up
del data
del buffers
del obj
shared_mem.close()
shared_mem.unlink()
start_time = time.time()
# Our big python data object
big_array = np.arange(5 * 10**7)
shared_mem_name, data_sz = sender(big_array)
obj = receiver(shared_mem_name, data_sz)
print("--- Total %s seconds ---" % (time.time() - start_time))
print(obj) # [ 0 1 2 ... 49999997 49999998 49999999]
Pipes:
from multiprocessing import Process, Pipe
import time
import numpy as np
def sender(send_conn):
# Our big python data object
big_array = np.arange(5 * 10**7)
send_conn.send(big_array)
send_conn.close()
def receiver(recv_conn):
obj = recv_conn.recv()
recv_conn.close()
return obj
recv_conn, send_conn = Pipe(duplex=False)
start_time = time.time()
p = Process(target=sender, args=(send_conn,))
p.start()
obj = receiver(recv_conn)
p.join()
print("--- Total %s seconds ---" % (time.time() - start_time))
print(obj) # [ 0 1 2 ... 49999997 49999998 49999999]
Issue Analytics
- State:
- Created 3 years ago
- Reactions:8
- Comments:5
Top GitHub Comments
You are all in luck:
@DamianBarabonkovQC could you add the function
pickle_utils.pack_frames
that I am able to reproduce the provided code? I cannot find a librarypickle_utils
which provides the functionpack_frames
.