Decompression APIs¶
ZstdDecompressor
¶
-
class
zstandard.
ZstdDecompressor
(dict_data=None, max_window_size=0, format=0)¶ Context for performing zstandard decompression.
Each instance is essentially a wrapper around a
ZSTD_DCtx
from zstd’s C API.An instance can compress data various ways. Instances can be used multiple times.
The interface of this class is very similar to
zstandard.ZstdCompressor
(by design).Assume that each
ZstdDecompressor
instance can only handle a single logical compression operation at the same time. i.e. if you call a method likedecompressobj()
to obtain multiple objects derived from the sameZstdDecompressor
instance and attempt to use them simultaneously, errors will likely occur.If you need to perform multiple logical decompression operations and you can’t guarantee those operations are temporally non-overlapping, you need to obtain multiple
ZstdDecompressor
instances.Unless specified otherwise, assume that no two methods of
ZstdDecompressor
instances can be called from multiple Python threads simultaneously. In other words, assume instances are not thread safe unless stated otherwise.Parameters: - dict_data – Compression dictionary to use.
- max_window_size – Sets an upper limit on the window size for decompression operations in kibibytes. This setting can be used to prevent large memory allocations for inputs using large compression windows.
- format –
Set the format of data for the decoder.
By default this is
zstandard.FORMAT_ZSTD1
. It can be set tozstandard.FORMAT_ZSTD1_MAGICLESS
to allow decoding frames without the 4 byte magic header. Not all decompression APIs support this mode.
-
copy_stream
(ifh, ofh, read_size=131075, write_size=131072)¶ Copy data between streams, decompressing in the process.
Compressed data will be read from
ifh
, decompressed, and written toofh
.>>> dctx = zstandard.ZstdDecompressor() >>> dctx.copy_stream(ifh, ofh)
e.g. to decompress a file to another file:
>>> dctx = zstandard.ZstdDecompressor() >>> with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: ... dctx.copy_stream(ifh, ofh)
The size of chunks being
read()
andwrite()
from and to the streams can be specified:>>> dctx = zstandard.ZstdDecompressor() >>> dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384)
Parameters: - ifh –
Source stream to read compressed data from.
Must have a
read()
method. - ofh –
Destination stream to write uncompressed data to.
Must have a
write()
method. - read_size – The number of bytes to
read()
from the source in a single operation. - write_size – The number of bytes to
write()
to the destination in a single operation.
Returns: 2-tuple of integers representing the number of bytes read and written, respectively.
- ifh –
-
decompress
(data, max_output_size=0, read_across_frames=False, allow_extra_data=True)¶ Decompress data in a single operation.
This method will decompress the input data in a single operation and return the decompressed data.
The input bytes are expected to contain at least 1 full Zstandard frame (something compressed with
ZstdCompressor.compress()
or similar). If the input does not contain a full frame, an exception will be raised.read_across_frames
controls whether to read multiple zstandard frames in the input. When False, decompression stops after reading the first frame. This feature is not yet implemented but the argument is provided for forward API compatibility when the default is changed to True in a future release. For now, if you need to decompress multiple frames, use an API likeZstdCompressor.stream_reader()
withread_across_frames=True
.allow_extra_data
controls how to handle extra input data after a fully decoded frame. If False, any extra data (which could be a valid zstd frame) will result inZstdError
being raised. If True, extra data is silently ignored. The default will likely change to False in a future release whenread_across_frames
defaults to True.If the input contains extra data after a full frame, that extra input data is silently ignored. This behavior is undesirable in many scenarios and will likely be changed or controllable in a future release (see #181).
If the frame header of the compressed data does not contain the content size,
max_output_size
must be specified orZstdError
will be raised. An allocation of sizemax_output_size
will be performed and an attempt will be made to perform decompression into that buffer. If the buffer is too small or cannot be allocated,ZstdError
will be raised. The buffer will be resized if it is too large.Uncompressed data could be much larger than compressed data. As a result, calling this function could result in a very large memory allocation being performed to hold the uncompressed data. This could potentially result in
MemoryError
or system memory swapping. If you don’t need the full output data in a single contiguous array in memory, consider using streaming decompression for more resilient memory behavior.Usage:
>>> dctx = zstandard.ZstdDecompressor() >>> decompressed = dctx.decompress(data)
If the compressed data doesn’t have its content size embedded within it, decompression can be attempted by specifying the
max_output_size
argument:>>> dctx = zstandard.ZstdDecompressor() >>> uncompressed = dctx.decompress(data, max_output_size=1048576)
Ideally,
max_output_size
will be identical to the decompressed output size.Important
If the exact size of decompressed data is unknown (not passed in explicitly and not stored in the zstd frame), for performance reasons it is encouraged to use a streaming API.
Parameters: - data – Compressed data to decompress.
- max_output_size –
Integer max size of response.
If
0
, there is no limit and we can attempt to allocate an output buffer of infinite size.
Returns: bytes
representing decompressed output.
-
decompress_content_dict_chain
(frames)¶ Decompress a series of frames using the content dictionary chaining technique.
Such a list of frames is produced by compressing discrete inputs where each non-initial input is compressed with a prefix dictionary consisting of the content of the previous input.
For example, say you have the following inputs:
>>> inputs = [b"input 1", b"input 2", b"input 3"]
The zstd frame chain consists of:
b"input 1"
compressed in standalone/discrete modeb"input 2"
compressed usingb"input 1"
as a prefix dictionaryb"input 3"
compressed usingb"input 2"
as a prefix dictionary
Each zstd frame must have the content size written.
The following Python code can be used to produce a prefix dictionary chain:
>>> def make_chain(inputs): ... frames = [] ... ... # First frame is compressed in standalone/discrete mode. ... zctx = zstandard.ZstdCompressor() ... frames.append(zctx.compress(inputs[0])) ... ... # Subsequent frames use the previous fulltext as a prefix dictionary ... for i, raw in enumerate(inputs[1:]): ... dict_data = zstandard.ZstdCompressionDict( ... inputs[i], dict_type=zstandard.DICT_TYPE_RAWCONTENT) ... zctx = zstandard.ZstdCompressor(dict_data=dict_data) ... frames.append(zctx.compress(raw)) ... ... return frames
decompress_content_dict_chain()
returns the uncompressed data of the last element in the input chain.Note
It is possible to implement prefix dictionary chain decompression on top of other APIs. However, this function will likely be faster - especially for long input chains - as it avoids the overhead of instantiating and passing around intermediate objects between multiple functions.
Parameters: frames – List of bytes
holding compressed zstd frames.Returns:
-
decompressobj
(write_size=131072)¶ Obtain a standard library compatible incremental decompressor.
See
ZstdDecompressionObj
for more documentation and usage examples.Parameters: write_size – Returns: zstandard.ZstdDecompressionObj
-
memory_size
()¶ Size of decompression context, in bytes.
>>> dctx = zstandard.ZstdDecompressor() >>> size = dctx.memory_size()
-
multi_decompress_to_buffer
(frames, decompressed_sizes=None, threads=0)¶ Decompress multiple zstd frames to output buffers as a single operation.
(Experimental. Not available in CFFI backend.)
Compressed frames can be passed to the function as a
BufferWithSegments
, aBufferWithSegmentsCollection
, or as a list containing objects that conform to the buffer protocol. For best performance, pass aBufferWithSegmentsCollection
or aBufferWithSegments
, as minimal input validation will be done for that type. If calling from Python (as opposed to C), constructing one of these instances may add overhead cancelling out the performance overhead of validation for list inputs.Returns a
BufferWithSegmentsCollection
containing the decompressed data. All decompressed data is allocated in a single memory buffer. TheBufferWithSegments
instance tracks which objects are at which offsets and their respective lengths.>>> dctx = zstandard.ZstdDecompressor() >>> results = dctx.multi_decompress_to_buffer([b'...', b'...'])
The decompressed size of each frame MUST be discoverable. It can either be embedded within the zstd frame or passed in via the
decompressed_sizes
argument.The
decompressed_sizes
argument is an object conforming to the buffer protocol which holds an array of 64-bit unsigned integers in the machine’s native format defining the decompressed sizes of each frame. If this argument is passed, it avoids having to scan each frame for its decompressed size. This frame scanning can add noticeable overhead in some scenarios.>>> frames = [...] >>> sizes = struct.pack('=QQQQ', len0, len1, len2, len3) >>> >>> dctx = zstandard.ZstdDecompressor() >>> results = dctx.multi_decompress_to_buffer(frames, decompressed_sizes=sizes)
Note
It is possible to pass a
mmap.mmap()
instance into this function by wrapping it with aBufferWithSegments
instance (which will define the offsets of frames within the memory mapped region).This function is logically equivalent to performing
ZstdCompressor.decompress()
on each input frame and returning the result.This function exists to perform decompression on multiple frames as fast as possible by having as little overhead as possible. Since decompression is performed as a single operation and since the decompressed output is stored in a single buffer, extra memory allocations, Python objects, and Python function calls are avoided. This is ideal for scenarios where callers know up front that they need to access data for multiple frames, such as when delta chains are being used.
Currently, the implementation always spawns multiple threads when requested, even if the amount of work to do is small. In the future, it will be smarter about avoiding threads and their associated overhead when the amount of work to do is small.
Parameters: - frames – Source defining zstd frames to decompress.
- decompressed_sizes – Array of integers representing sizes of decompressed zstd frames.
- threads –
How many threads to use for decompression operations.
Negative values will use the same number of threads as logical CPUs on the machine. Values
0
or1
use a single thread.
Returns: BufferWithSegmentsCollection
-
read_to_iter
(reader, read_size=131075, write_size=131072, skip_bytes=0)¶ Read compressed data to an iterator of uncompressed chunks.
This method will read data from
reader
, feed it to a decompressor, and emitbytes
chunks representing the decompressed result.>>> dctx = zstandard.ZstdDecompressor() >>> for chunk in dctx.read_to_iter(fh): ... # Do something with original data.
read_to_iter()
accepts an object with aread(size)
method that will return compressed bytes or an object conforming to the buffer protocol.read_to_iter()
returns an iterator whose elements are chunks of the decompressed data.The size of requested
read()
from the source can be specified:>>> dctx = zstandard.ZstdDecompressor() >>> for chunk in dctx.read_to_iter(fh, read_size=16384): ... pass
It is also possible to skip leading bytes in the input data:
>>> dctx = zstandard.ZstdDecompressor() >>> for chunk in dctx.read_to_iter(fh, skip_bytes=1): ... pass
Tip
Skipping leading bytes is useful if the source data contains extra header data. Traditionally, you would need to create a slice or
memoryview
of the data you want to decompress. This would create overhead. It is more efficient to pass the offset into this API.Similarly to
ZstdCompressor.read_to_iter()
, the consumer of the iterator controls when data is decompressed. If the iterator isn’t consumed, decompression is put on hold.When
read_to_iter()
is passed an object conforming to the buffer protocol, the behavior may seem similar to what occurs when the simple decompression API is used. However, this API works when the decompressed size is unknown. Furthermore, if feeding large inputs, the decompressor will work in chunks instead of performing a single operation.Parameters: - reader – Source of compressed data. Can be any object with a
read(size)
method or any object conforming to the buffer protocol. - read_size – Integer size of data chunks to read from
reader
and feed into the decompressor. - write_size – Integer size of data chunks to emit from iterator.
- skip_bytes – Integer number of bytes to skip over before sending data into the decompressor.
Returns: Iterator of
bytes
representing uncompressed data.- reader – Source of compressed data. Can be any object with a
-
stream_reader
(source, read_size=131075, read_across_frames=False, closefd=True)¶ Read-only stream wrapper that performs decompression.
This method obtains an object that conforms to the
io.RawIOBase
interface and performs transparent decompression viaread()
operations. Source data is obtained by callingread()
on a source stream or object implementing the buffer protocol.See
zstandard.ZstdDecompressionReader
for more documentation and usage examples.Parameters: - source – Source of compressed data to decompress. Can be any object
with a
read(size)
method or that conforms to the buffer protocol. - read_size – Integer number of bytes to read from the source and feed into the compressor at a time.
- read_across_frames – Whether to read data across multiple zstd frames. If False, decompression is stopped at frame boundaries.
- closefd – Whether to close the source stream when this instance is closed.
Returns: - source – Source of compressed data to decompress. Can be any object
with a
-
stream_writer
(writer, write_size=131072, write_return_read=True, closefd=True)¶ Push-based stream wrapper that performs decompression.
This method constructs a stream wrapper that conforms to the
io.RawIOBase
interface and performs transparent decompression when writing to a wrapper stream.See
zstandard.ZstdDecompressionWriter
for more documentation and usage examples.Parameters: - writer – Destination for decompressed output. Can be any object with a
write(data)
. - write_size – Integer size of chunks to
write()
towriter
. - write_return_read – Whether
write()
should return the number of bytes of input consumed. If False,write()
returns the number of bytes sent to the inner stream. - closefd – Whether to
close()
the inner stream when this stream is closed.
Returns: - writer – Destination for decompressed output. Can be any object with a
ZstdDecompressionWriter
¶
-
class
zstandard.
ZstdDecompressionWriter
(decompressor, writer, write_size, write_return_read, closefd=True)¶ Write-only stream wrapper that performs decompression.
This type provides a writable stream that performs decompression and writes decompressed data to another stream.
This type implements the
io.RawIOBase
interface. Only methods that involve writing will do useful things.Behavior is similar to
ZstdCompressor.stream_writer()
: compressed data is sent to the decompressor by callingwrite(data)
and decompressed output is written to the inner stream by calling itswrite(data)
method:>>> dctx = zstandard.ZstdDecompressor() >>> decompressor = dctx.stream_writer(fh) >>> # Will call fh.write() with uncompressed data. >>> decompressor.write(compressed_data)
Instances can be used as context managers. However, context managers add no extra special behavior other than automatically calling
close()
when they exit.Calling
close()
will mark the stream as closed and subsequent I/O operations will raiseValueError
(per the documented behavior ofio.RawIOBase
).close()
will also callclose()
on the underlying stream if such a method exists and the instance was created withclosefd=True
.The size of chunks to
write()
to the destination can be specified:>>> dctx = zstandard.ZstdDecompressor() >>> with dctx.stream_writer(fh, write_size=16384) as decompressor: >>> pass
You can see how much memory is being used by the decompressor:
>>> dctx = zstandard.ZstdDecompressor() >>> with dctx.stream_writer(fh) as decompressor: >>> byte_size = decompressor.memory_size()
stream_writer()
accepts awrite_return_read
boolean argument to control the return value ofwrite()
. WhenTrue
(the default)``,write()
returns the number of bytes that were read from the input. WhenFalse
,write()
returns the number of bytes that werewrite()
to the inner stream.-
close
()¶
-
closed
¶
-
fileno
()¶
-
flush
()¶
-
isatty
()¶
-
memory_size
()¶
-
read
(size=-1)¶
-
readable
()¶
-
readall
()¶
-
readinto
(b)¶
-
readline
(size=-1)¶
-
readlines
(hint=-1)¶
-
seek
(offset, whence=None)¶
-
seekable
()¶
-
tell
()¶
-
truncate
(size=None)¶
-
writable
()¶
-
write
(data)¶
-
writelines
(lines)¶
-
ZstdDecompressionReader
¶
-
class
zstandard.
ZstdDecompressionReader
(decompressor, source, read_size, read_across_frames, closefd=True)¶ Read only decompressor that pull uncompressed data from another stream.
This type provides a read-only stream interface for performing transparent decompression from another stream or data source. It conforms to the
io.RawIOBase
interface. Only methods relevant to reading are implemented.>>> with open(path, 'rb') as fh: >>> dctx = zstandard.ZstdDecompressor() >>> reader = dctx.stream_reader(fh) >>> while True: ... chunk = reader.read(16384) ... if not chunk: ... break ... # Do something with decompressed chunk.
The stream can also be used as a context manager:
>>> with open(path, 'rb') as fh: ... dctx = zstandard.ZstdDecompressor() ... with dctx.stream_reader(fh) as reader: ... ...
When used as a context manager, the stream is closed and the underlying resources are released when the context manager exits. Future operations against the stream will fail.
The
source
argument tostream_reader()
can be any object with aread(size)
method or any object implementing the buffer protocol.If the
source
is a stream, you can specify how largeread()
requests to that stream should be via theread_size
argument. It defaults tozstandard.DECOMPRESSION_RECOMMENDED_INPUT_SIZE
.:>>> with open(path, 'rb') as fh: ... dctx = zstandard.ZstdDecompressor() ... # Will perform fh.read(8192) when obtaining data for the decompressor. ... with dctx.stream_reader(fh, read_size=8192) as reader: ... ...
Instances are partially seekable. Absolute and relative positions (
SEEK_SET
andSEEK_CUR
) forward of the current position are allowed. Offsets behind the current read position and offsets relative to the end of stream are not allowed and will raiseValueError
if attempted.tell()
returns the number of decompressed bytes read so far.Not all I/O methods are implemented. Notably missing is support for
readline()
,readlines()
, and linewise iteration support. This is because streams operate on binary data - not text data. If you want to convert decompressed output to text, you can chain anio.TextIOWrapper
to the stream:>>> with open(path, 'rb') as fh: ... dctx = zstandard.ZstdDecompressor() ... stream_reader = dctx.stream_reader(fh) ... text_stream = io.TextIOWrapper(stream_reader, encoding='utf-8') ... for line in text_stream: ... ...
-
close
()¶
-
closed
¶
-
flush
()¶
-
isatty
()¶
-
next
()¶
-
read
(size=-1)¶
-
read1
(size=-1)¶
-
readable
()¶
-
readall
()¶
-
readinto
(b)¶
-
readinto1
(b)¶
-
readline
(size=-1)¶
-
readlines
(hint=-1)¶
-
seek
(pos, whence=0)¶
-
seekable
()¶
-
tell
()¶
-
writable
()¶
-
write
(data)¶
-
writelines
(lines)¶
-
ZstdDecompressionObj
¶
-
class
zstandard.
ZstdDecompressionObj
(decompressor, write_size)¶ A standard library API compatible decompressor.
This type implements a compressor that conforms to the API by other decompressors in Python’s standard library. e.g.
zlib.decompressobj
orbz2.BZ2Decompressor
. This allows callers to use zstd compression while conforming to a similar API.Compressed data chunks are fed into
decompress(data)
and uncompressed output (or an empty bytes) is returned. Output from subsequent calls needs to be concatenated to reassemble the full decompressed byte sequence.Each instance is single use: once an input frame is decoded,
decompress()
can no longer be called.>>> dctx = zstandard.ZstdDecompressor() >>> dobj = dctx.decompressobj() >>> data = dobj.decompress(compressed_chunk_0) >>> data = dobj.decompress(compressed_chunk_1)
By default, calls to
decompress()
write output data in chunks of sizeDECOMPRESSION_RECOMMENDED_OUTPUT_SIZE
. These chunks are concatenated before being returned to the caller. It is possible to define the size of these temporary chunks by passingwrite_size
todecompressobj()
:>>> dctx = zstandard.ZstdDecompressor() >>> dobj = dctx.decompressobj(write_size=1048576)
Note
Because calls to
decompress()
may need to perform multiple memory (re)allocations, this streaming decompression API isn’t as efficient as other APIs.-
decompress
(data)¶ Send compressed data to the decompressor and obtain decompressed data.
Parameters: data – Data to feed into the decompressor. Returns: Decompressed bytes.
-
eof
¶ Whether the end of the compressed data stream has been reached.
-
flush
(length=0)¶ Effectively a no-op.
Implemented for compatibility with the standard library APIs.
Safe to call at any time.
Returns: Empty bytes.
-
unconsumed_tail
¶ Data that has not yet been fed into the decompressor.
-
unused_data
¶ Bytes past the end of compressed data.
If
decompress()
is fed additional data beyond the end of a zstd frame, this value will be non-empty oncedecompress()
fully decodes the input frame.
-