@@ -195,34 +195,29 @@ The :mod:`pickle` module provides the following constants:
195195The :mod: `pickle ` module provides the following functions to make the pickling
196196process more convenient:
197197
198- .. function :: dump(obj, file, protocol=None, \*, fix_imports=True)
198+ .. function :: dump(obj, file, protocol=None, \*, fix_imports=True, buffer_callback=None )
199199
200200 Write a pickled representation of *obj * to the open :term: `file object ` *file *.
201201 This is equivalent to ``Pickler(file, protocol).dump(obj) ``.
202202
203- The optional *protocol * argument, an integer, tells the pickler to use
204- the given protocol; supported protocols are 0 to :data: `HIGHEST_PROTOCOL `.
205- If not specified, the default is :data: `DEFAULT_PROTOCOL `. If a negative
206- number is specified, :data: `HIGHEST_PROTOCOL ` is selected.
207-
208- The *file * argument must have a write() method that accepts a single bytes
209- argument. It can thus be an on-disk file opened for binary writing, an
210- :class: `io.BytesIO ` instance, or any other custom object that meets this
211- interface.
203+ Arguments *file *, *protocol *, *fix_imports * and *buffer_callback * have
204+ the same meaning as in :class: `Pickler `.
212205
213- If *fix_imports * is true and *protocol * is less than 3, pickle will try to
214- map the new Python 3 names to the old module names used in Python 2, so
215- that the pickle data stream is readable with Python 2.
206+ .. versionchanged :: 3.8
207+ The *buffer_callback * argument was added.
216208
217- .. function :: dumps(obj, protocol=None, \*, fix_imports=True)
209+ .. function :: dumps(obj, protocol=None, \*, fix_imports=True, buffer_callback=None )
218210
219211 Return the pickled representation of the object as a :class: `bytes ` object,
220212 instead of writing it to a file.
221213
222- Arguments *protocol * and *fix_imports * have the same meaning as in
223- :func: `dump `.
214+ Arguments *protocol *, *fix_imports * and *buffer_callback * have the same
215+ meaning as in :class: `Pickler `.
216+
217+ .. versionchanged :: 3.8
218+ The *buffer_callback * argument was added.
224219
225- .. function :: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
220+ .. function :: load(file, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None )
226221
227222 Read a pickled object representation from the open :term: `file object `
228223 *file * and return the reconstituted object hierarchy specified therein.
@@ -232,24 +227,13 @@ process more convenient:
232227 protocol argument is needed. Bytes past the pickled object's
233228 representation are ignored.
234229
235- The argument *file * must have two methods, a read() method that takes an
236- integer argument, and a readline() method that requires no arguments. Both
237- methods should return bytes. Thus *file * can be an on-disk file opened for
238- binary reading, an :class: `io.BytesIO ` object, or any other custom object
239- that meets this interface.
240-
241- Optional keyword arguments are *fix_imports *, *encoding * and *errors *,
242- which are used to control compatibility support for pickle stream generated
243- by Python 2. If *fix_imports * is true, pickle will try to map the old
244- Python 2 names to the new names used in Python 3. The *encoding * and
245- *errors * tell pickle how to decode 8-bit string instances pickled by Python
246- 2; these default to 'ASCII' and 'strict', respectively. The *encoding * can
247- be 'bytes' to read these 8-bit string instances as bytes objects.
248- Using ``encoding='latin1' `` is required for unpickling NumPy arrays and
249- instances of :class: `~datetime.datetime `, :class: `~datetime.date ` and
250- :class: `~datetime.time ` pickled by Python 2.
230+ Arguments *file *, *fix_imports *, *encoding *, *errors * and *strict *
231+ have the same meaning as in :class: `Unpickler `.
232+
233+ .. versionchanged :: 3.8
234+ The *buffers * argument was added.
251235
252- .. function :: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict")
236+ .. function :: loads(bytes_object, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None )
253237
254238 Read a pickled object hierarchy from a :class: `bytes ` object and return the
255239 reconstituted object hierarchy specified therein.
@@ -258,16 +242,11 @@ process more convenient:
258242 protocol argument is needed. Bytes past the pickled object's
259243 representation are ignored.
260244
261- Optional keyword arguments are *fix_imports *, *encoding * and *errors *,
262- which are used to control compatibility support for pickle stream generated
263- by Python 2. If *fix_imports * is true, pickle will try to map the old
264- Python 2 names to the new names used in Python 3. The *encoding * and
265- *errors * tell pickle how to decode 8-bit string instances pickled by Python
266- 2; these default to 'ASCII' and 'strict', respectively. The *encoding * can
267- be 'bytes' to read these 8-bit string instances as bytes objects.
268- Using ``encoding='latin1' `` is required for unpickling NumPy arrays and
269- instances of :class: `~datetime.datetime `, :class: `~datetime.date ` and
270- :class: `~datetime.time ` pickled by Python 2.
245+ Arguments *file *, *fix_imports *, *encoding *, *errors * and *strict *
246+ have the same meaning as in :class: `Unpickler `.
247+
248+ .. versionchanged :: 3.8
249+ The *buffers * argument was added.
271250
272251
273252The :mod: `pickle ` module defines three exceptions:
@@ -295,10 +274,10 @@ The :mod:`pickle` module defines three exceptions:
295274 IndexError.
296275
297276
298- The :mod: `pickle ` module exports two classes, :class: `Pickler ` and
299- :class: `Unpickler `:
277+ The :mod: `pickle ` module exports three classes, :class: `Pickler `,
278+ :class: `Unpickler ` and :class: ` PickleBuffer ` :
300279
301- .. class :: Pickler(file, protocol=None, \*, fix_imports=True)
280+ .. class :: Pickler(file, protocol=None, \*, fix_imports=True, buffer_callback=None )
302281
303282 This takes a binary file for writing a pickle data stream.
304283
@@ -316,6 +295,17 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
316295 map the new Python 3 names to the old module names used in Python 2, so
317296 that the pickle data stream is readable with Python 2.
318297
298+ If *buffer_callback * is None (the default), buffer views are
299+ serialized into *file * as part of the pickle stream.
300+
301+ If *buffer_callback * is not None, then it can be called any number
302+ of times with a buffer view. If the callback returns a false value
303+ (such as None), the given buffer is out-of-band; otherwise the
304+ buffer is serialized in-band, i.e. inside the pickle stream.
305+
306+ .. versionchanged :: 3.8
307+ The *buffer_callback * argument was added.
308+
319309 .. method :: dump(obj)
320310
321311 Write a pickled representation of *obj * to the open file object given in
@@ -379,26 +369,43 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
379369 Use :func: `pickletools.optimize ` if you need more compact pickles.
380370
381371
382- .. class :: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict")
372+ .. class :: Unpickler(file, \*, fix_imports=True, encoding="ASCII", errors="strict", buffers=None )
383373
384374 This takes a binary file for reading a pickle data stream.
385375
386376 The protocol version of the pickle is detected automatically, so no
387377 protocol argument is needed.
388378
389- The argument *file * must have two methods, a read() method that takes an
390- integer argument, and a readline() method that requires no arguments. Both
391- methods should return bytes. Thus *file * can be an on-disk file object
379+ The argument *file * must have three methods, a read() method that takes an
380+ integer argument, a readinto() method that takes a buffer argument
381+ and a readline() method that requires no arguments, as in the
382+ :class: `io.BufferedIOBase ` interface. Thus *file * can be an on-disk file
392383 opened for binary reading, an :class: `io.BytesIO ` object, or any other
393384 custom object that meets this interface.
394385
395- Optional keyword arguments are *fix_imports *, *encoding * and *errors *,
396- which are used to control compatibility support for pickle stream generated
397- by Python 2. If *fix_imports * is true, pickle will try to map the old
398- Python 2 names to the new names used in Python 3. The *encoding * and
399- * errors * tell pickle how to decode 8-bit string instances pickled by Python
400- 2; these default to 'ASCII' and 'strict', respectively. The *encoding * can
386+ The optional arguments *fix_imports *, *encoding * and *errors * are used
387+ to control compatibility support for pickle stream generated by Python 2.
388+ If *fix_imports * is true, pickle will try to map the old Python 2 names
389+ to the new names used in Python 3. The *encoding * and * errors * tell
390+ pickle how to decode 8-bit string instances pickled by Python 2;
391+ these default to 'ASCII' and 'strict', respectively. The *encoding * can
401392 be 'bytes' to read these 8-bit string instances as bytes objects.
393+ Using ``encoding='latin1' `` is required for unpickling NumPy arrays and
394+ instances of :class: `~datetime.datetime `, :class: `~datetime.date ` and
395+ :class: `~datetime.time ` pickled by Python 2.
396+
397+ If *buffers * is None (the default), then all data necessary for
398+ deserialization must be contained in the pickle stream. This means
399+ that the *buffer_callback * argument was None when a :class: `Pickler `
400+ was instantiated (or when :func: `dump ` or :func: `dumps ` was called).
401+
402+ If *buffers * is not None, it should be an iterable of buffer-enabled
403+ objects that is consumed each time the pickle stream references
404+ an out-of-band buffer view. Such buffers have been given in order
405+ to the *buffer_callback * of a Pickler object.
406+
407+ .. versionchanged :: 3.8
408+ The *buffers * argument was added.
402409
403410 .. method :: load()
404411
@@ -428,6 +435,34 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
428435 :ref: `pickle-restrict ` for details.
429436
430437
438+ .. class :: PickleBuffer(buffer)
439+
440+ A wrapper for a potentially out-of-band buffer. *buffer * must be a
441+ :ref: `buffer-providing <bufferobjects >` object, such as a
442+ :term: `bytes-like object ` or a N-dimensional array.
443+
444+ :class: `PickleBuffer ` is itself a buffer provider, therefore it is
445+ possible to pass it to other APIs expecting a buffer-providing object,
446+ such as :class: `memoryview `.
447+
448+ :class: `PickleBuffer ` objects can only be serialized using pickle
449+ protocol 5 or higher. They are eligible for
450+ :ref: `out-of-band serialization <pickle-oob >`.
451+
452+ .. versionadded :: 3.8
453+
454+ .. method :: raw()
455+
456+ Return a :class: `memoryview ` of the memory area underlying this buffer.
457+ The returned object is a one-dimensional, C-contiguous memoryview
458+ with format ``B `` (unsigned bytes). :exc: `BufferError ` is raised if
459+ the buffer is neither C- nor Fortran-contiguous.
460+
461+ .. method :: release()
462+
463+ Release the underlying buffer exposed by the PickleBuffer object.
464+
465+
431466.. _pickle-picklable :
432467
433468What can be pickled and unpickled?
@@ -863,6 +898,125 @@ a given class::
863898 assert unpickled_class.my_attribute == 1
864899
865900
901+ .. _pickle-oob :
902+
903+ Out-of-band Buffers
904+ -------------------
905+
906+ .. versionadded :: 3.8
907+
908+ In some contexts, the :mod: `pickle ` module is used to transfer massive amounts
909+ of data. Therefore, it can be important to minimize the number of memory
910+ copies, to preserve performance and resource consumption. However, normal
911+ operation of the :mod: `pickle ` module, as it transforms a graph-like structure
912+ of objects into a sequential stream of bytes, intrinsically involves copying
913+ data to and from the pickle stream.
914+
915+ This constraint can be eschewed if both the *provider * (the implementation
916+ of the object types to be transferred) and the *consumer * (the implementation
917+ of the communications system) support the out-of-band transfer facilities
918+ provided by pickle protocol 5 and higher.
919+
920+ Provider API
921+ ^^^^^^^^^^^^
922+
923+ The large data objects to be pickled must implement a :meth: `__reduce_ex__ `
924+ method specialized for protocol 5 and higher, which returns a
925+ :class: `PickleBuffer ` instance (instead of e.g. a :class: `bytes ` object)
926+ for any large data.
927+
928+ A :class: `PickleBuffer ` object *signals * that the underlying buffer is
929+ eligible for out-of-band data transfer. Those objects remain compatible
930+ with normal usage of the :mod: `pickle ` module. However, consumers can also
931+ opt-in to tell :mod: `pickle ` that they will handle those buffers by
932+ themselves.
933+
934+ Consumer API
935+ ^^^^^^^^^^^^
936+
937+ A communications system can enable custom handling of the :class: `PickleBuffer `
938+ objects generated when serializing an object graph.
939+
940+ On the sending side, it needs to pass a *buffer_callback * argument to
941+ :class: `Pickler ` (or to the :func: `dump ` or :func: `dumps ` function), which
942+ will be called with each :class: `PickleBuffer ` generated while pickling
943+ the object graph. Buffers accumulated by the *buffer_callback * will not
944+ see their data copied into the pickle stream, only a cheap marker will be
945+ inserted.
946+
947+ On the receiving side, it needs to pass a *buffers * argument to
948+ :class: `Unpickler ` (or to the :func: `load ` or :func: `loads ` function),
949+ which is an iterable of the buffers which were passed to *buffer_callback *.
950+ That iterable should produce buffers in the same order as they were passed
951+ to *buffer_callback *. Those buffers will provide the data expected by the
952+ reconstructors of the objects whose pickling produced the original
953+ :class: `PickleBuffer ` objects.
954+
955+ Between the sending side and the receiving side, the communications system
956+ is free to implement its own transfer mechanisms for out-of-band buffers.
957+ Potential optimizations include the use of shared memory or datatype-dependent
958+ compression.
959+
960+ Example
961+ ^^^^^^^
962+
963+ Here is a trivial example where we implement a :class: `bytearray ` subclass
964+ able to participate in out-of-band buffer pickling::
965+
966+ class ZeroCopyByteArray(bytearray):
967+
968+ def __reduce_ex__(self, protocol):
969+ if protocol >= 5:
970+ return type(self)._reconstruct, (PickleBuffer(self),), None
971+ else:
972+ # PickleBuffer is forbidden with pickle protocols <= 4.
973+ return type(self)._reconstruct, (bytearray(self),)
974+
975+ @classmethod
976+ def _reconstruct(cls, obj):
977+ with memoryview(obj) as m:
978+ # Get a handle over the original buffer object
979+ obj = m.obj
980+ if type(obj) is cls:
981+ # Original buffer object is a ZeroCopyByteArray, return it
982+ # as-is.
983+ return obj
984+ else:
985+ return cls(obj)
986+
987+ We see that the reconstructor (the ``_reconstruct `` class method) returns
988+ the buffer's providing object if it has the right type. This is an easy way
989+ to simulate zero-copy behaviour on this toy example.
990+
991+ On the consumer side, we can pickle those objects the usual way, which
992+ when unserialized will give us a copy of the original object::
993+
994+ b = ZeroCopyByteArray(b"abc")
995+ data = pickle.dumps(b, protocol=5)
996+ new_b = pickle.loads(data)
997+ print(b == new_b) # True
998+ print(b is new_b) # False: a copy was made
999+
1000+ But if we pass a *buffer_callback * and then give back the accumulated
1001+ buffers when unserializing, we are able to get back the original object::
1002+
1003+ b = ZeroCopyByteArray(b"abc")
1004+ buffers = []
1005+ data = pickle.dumps(b, protocol=5, buffer_callback=buffers.append)
1006+ new_b = pickle.loads(data, buffers=buffers)
1007+ print(b == new_b) # True
1008+ print(b is new_b) # True: no copy was made
1009+
1010+ This example is limited by the fact that :class: `bytearray ` allocates its
1011+ own memory: you cannot create a :class: `bytearray ` instance that is backed
1012+ by another object's memory. However, third-party datatypes such as NumPy
1013+ arrays do not have this limitation, and allow use of zero-copy pickling
1014+ (or making as few copies as possible) when transferring between distinct
1015+ processes or systems.
1016+
1017+ .. seealso :: :pep:`574` -- Pickle protocol 5 with out-of-band data
1018+
1019+
8661020.. _pickle-restrict :
8671021
8681022Restricting Globals
0 commit comments