Decompression APIs¶
ZstdDecompressor¶
-
class
zstandard.ZstdDecompressor(dict_data=None, max_window_size=0, format=0)¶ Context for performing zstandard decompression.
Each instance is essentially a wrapper around a
ZSTD_DCtxfrom zstd’s C API.An instance can compress data various ways. Instances can be used multiple times.
The interface of this class is very similar to
zstandard.ZstdCompressor(by design).Unless specified otherwise, assume that no two methods of
ZstdDecompressorinstances can be called from multiple Python threads simultaneously. In other words, assume instances are not thread safe unless stated otherwise.Parameters: - dict_data – Compression dictionary to use.
- max_window_size – Sets an upper limit on the window size for decompression operations in kibibytes. This setting can be used to prevent large memory allocations for inputs using large compression windows.
- format –
Set the format of data for the decoder.
By default this is
zstandard.FORMAT_ZSTD1. It can be set tozstandard.FORMAT_ZSTD1_MAGICLESSto allow decoding frames without the 4 byte magic header. Not all decompression APIs support this mode.
-
copy_stream(ifh, ofh, read_size=131075, write_size=131072)¶ Copy data between streams, decompressing in the process.
Compressed data will be read from
ifh, decompressed, and written toofh.>>> dctx = zstandard.ZstdDecompressor() >>> dctx.copy_stream(ifh, ofh)
e.g. to decompress a file to another file:
>>> dctx = zstandard.ZstdDecompressor() >>> with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh: ... dctx.copy_stream(ifh, ofh)
The size of chunks being
read()andwrite()from and to the streams can be specified:>>> dctx = zstandard.ZstdDecompressor() >>> dctx.copy_stream(ifh, ofh, read_size=8192, write_size=16384)
Parameters: - ifh –
Source stream to read compressed data from.
Must have a
read()method. - ofh –
Destination stream to write uncompressed data to.
Must have a
write()method. - read_size – The number of bytes to
read()from the source in a single operation. - write_size – The number of bytes to
write()to the destination in a single operation.
Returns: 2-tuple of integers representing the number of bytes read and written, respectively.
- ifh –
-
decompress(data, max_output_size=0)¶ Decompress data in its entirety in a single operation.
This method will decompress the entirety of the argument and return the result.
The input bytes are expected to contain a full Zstandard frame (something compressed with
ZstdCompressor.compress()or similar). If the input does not contain a full frame, an exception will be raised.If the frame header of the compressed data does not contain the content size
max_output_sizemust be specified orZstdErrorwill be raised. An allocation of sizemax_output_sizewill be performed and an attempt will be made to perform decompression into that buffer. If the buffer is too small or cannot be allocated,ZstdErrorwill be raised. The buffer will be resized if it is too large.Uncompressed data could be much larger than compressed data. As a result, calling this function could result in a very large memory allocation being performed to hold the uncompressed data. This could potentially result in
MemoryErroror system memory swapping. Therefore it is highly recommended to use a streaming decompression method instead of this one.>>> dctx = zstandard.ZstdDecompressor() >>> decompressed = dctx.decompress(data)
If the compressed data doesn’t have its content size embedded within it, decompression can be attempted by specifying the
max_output_sizeargument:>>> dctx = zstandard.ZstdDecompressor() >>> uncompressed = dctx.decompress(data, max_output_size=1048576)
Ideally,
max_output_sizewill be identical to the decompressed output size.Important
If the exact size of decompressed data is unknown (not passed in explicitly and not stored in the zstd frame), for performance reasons it is encouraged to use a streaming API.
Parameters: - data – Compressed data to decompress.
- max_output_size –
Integer max size of response.
If
0, there is no limit and we can attempt to allocate an output buffer of infinite size.
Returns: bytesrepresenting decompressed output.
-
decompress_content_dict_chain(frames)¶ Decompress a series of frames using the content dictionary chaining technique.
Such a list of frames is produced by compressing discrete inputs where each non-initial input is compressed with a prefix dictionary consisting of the content of the previous input.
For example, say you have the following inputs:
>>> inputs = [b"input 1", b"input 2", b"input 3"]
The zstd frame chain consists of:
b"input 1"compressed in standalone/discrete modeb"input 2"compressed usingb"input 1"as a prefix dictionaryb"input 3"compressed usingb"input 2"as a prefix dictionary
Each zstd frame must have the content size written.
The following Python code can be used to produce a prefix dictionary chain:
>>> def make_chain(inputs): ... frames = [] ... ... # First frame is compressed in standalone/discrete mode. ... zctx = zstandard.ZstdCompressor() ... frames.append(zctx.compress(inputs[0])) ... ... # Subsequent frames use the previous fulltext as a prefix dictionary ... for i, raw in enumerate(inputs[1:]): ... dict_data = zstandard.ZstdCompressionDict( ... inputs[i], dict_type=zstandard.DICT_TYPE_RAWCONTENT) ... zctx = zstandard.ZstdCompressor(dict_data=dict_data) ... frames.append(zctx.compress(raw)) ... ... return frames
decompress_content_dict_chain()returns the uncompressed data of the last element in the input chain.Note
It is possible to implement prefix dictionary chain decompression on top of other APIs. However, this function will likely be faster - especially for long input chains - as it avoids the overhead of instantiating and passing around intermediate objects between multiple functions.
Parameters: frames – List of bytesholding compressed zstd frames.Returns:
-
decompressobj(write_size=131072)¶ Obtain a standard library compatible incremental decompressor.
See
ZstdDecompressionObjfor more documentation and usage examples.Parameters: write_size – Returns: zstandard.ZstdDecompressionObj
-
memory_size()¶ Size of decompression context, in bytes.
>>> dctx = zstandard.ZstdDecompressor() >>> size = dctx.memory_size()
-
multi_decompress_to_buffer(frames, decompressed_sizes=None, threads=0)¶ Decompress multiple zstd frames to output buffers as a single operation.
(Experimental. Not available in CFFI backend.)
Compressed frames can be passed to the function as a
BufferWithSegments, aBufferWithSegmentsCollection, or as a list containing objects that conform to the buffer protocol. For best performance, pass aBufferWithSegmentsCollectionor aBufferWithSegments, as minimal input validation will be done for that type. If calling from Python (as opposed to C), constructing one of these instances may add overhead cancelling out the performance overhead of validation for list inputs.Returns a
BufferWithSegmentsCollectioncontaining the decompressed data. All decompressed data is allocated in a single memory buffer. TheBufferWithSegmentsinstance tracks which objects are at which offsets and their respective lengths.>>> dctx = zstandard.ZstdDecompressor() >>> results = dctx.multi_decompress_to_buffer([b'...', b'...'])
The decompressed size of each frame MUST be discoverable. It can either be embedded within the zstd frame or passed in via the
decompressed_sizesargument.The
decompressed_sizesargument is an object conforming to the buffer protocol which holds an array of 64-bit unsigned integers in the machine’s native format defining the decompressed sizes of each frame. If this argument is passed, it avoids having to scan each frame for its decompressed size. This frame scanning can add noticeable overhead in some scenarios.>>> frames = [...] >>> sizes = struct.pack('=QQQQ', len0, len1, len2, len3) >>> >>> dctx = zstandard.ZstdDecompressor() >>> results = dctx.multi_decompress_to_buffer(frames, decompressed_sizes=sizes)
Note
It is possible to pass a
mmap.mmap()instance into this function by wrapping it with aBufferWithSegmentsinstance (which will define the offsets of frames within the memory mapped region).This function is logically equivalent to performing
ZstdCompressor.decompress()on each input frame and returning the result.This function exists to perform decompression on multiple frames as fast as possible by having as little overhead as possible. Since decompression is performed as a single operation and since the decompressed output is stored in a single buffer, extra memory allocations, Python objects, and Python function calls are avoided. This is ideal for scenarios where callers know up front that they need to access data for multiple frames, such as when delta chains are being used.
Currently, the implementation always spawns multiple threads when requested, even if the amount of work to do is small. In the future, it will be smarter about avoiding threads and their associated overhead when the amount of work to do is small.
Parameters: - frames – Source defining zstd frames to decompress.
- decompressed_sizes – Array of integers representing sizes of decompressed zstd frames.
- threads –
How many threads to use for decompression operations.
Negative values will use the same number of threads as logical CPUs on the machine. Values
0or1use a single thread.
Returns: BufferWithSegmentsCollection
-
read_to_iter(reader, read_size=131075, write_size=131072, skip_bytes=0)¶ Read compressed data to an iterator of uncompressed chunks.
This method will read data from
reader, feed it to a decompressor, and emitbyteschunks representing the decompressed result.>>> dctx = zstandard.ZstdDecompressor() >>> for chunk in dctx.read_to_iter(fh): ... # Do something with original data.
read_to_iter()accepts an object with aread(size)method that will return compressed bytes or an object conforming to the buffer protocol.read_to_iter()returns an iterator whose elements are chunks of the decompressed data.The size of requested
read()from the source can be specified:>>> dctx = zstandard.ZstdDecompressor() >>> for chunk in dctx.read_to_iter(fh, read_size=16384): ... pass
It is also possible to skip leading bytes in the input data:
>>> dctx = zstandard.ZstdDecompressor() >>> for chunk in dctx.read_to_iter(fh, skip_bytes=1): ... pass
Tip
Skipping leading bytes is useful if the source data contains extra header data. Traditionally, you would need to create a slice or
memoryviewof the data you want to decompress. This would create overhead. It is more efficient to pass the offset into this API.Similarly to
ZstdCompressor.read_to_iter(), the consumer of the iterator controls when data is decompressed. If the iterator isn’t consumed, decompression is put on hold.When
read_to_iter()is passed an object conforming to the buffer protocol, the behavior may seem similar to what occurs when the simple decompression API is used. However, this API works when the decompressed size is unknown. Furthermore, if feeding large inputs, the decompressor will work in chunks instead of performing a single operation.Parameters: - reader – Source of compressed data. Can be any object with a
read(size)method or any object conforming to the buffer protocol. - read_size – Integer size of data chunks to read from
readerand feed into the decompressor. - write_size – Integer size of data chunks to emit from iterator.
- skip_bytes – Integer number of bytes to skip over before sending data into the decompressor.
Returns: Iterator of
bytesrepresenting uncompressed data.- reader – Source of compressed data. Can be any object with a
-
stream_reader(source, read_size=131075, read_across_frames=False, closefd=True)¶ Read-only stream wrapper that performs decompression.
This method obtains an object that conforms to the
io.RawIOBaseinterface and performs transparent decompression viaread()operations. Source data is obtained by callingread()on a source stream or object implementing the buffer protocol.See
zstandard.ZstdDecompressionReaderfor more documentation and usage examples.Parameters: - source – Source of compressed data to decompress. Can be any object
with a
read(size)method or that conforms to the buffer protocol. - read_size – Integer number of bytes to read from the source and feed into the compressor at a time.
- read_across_frames – Whether to read data across multiple zstd frames. If False, decompression is stopped at frame boundaries.
- closefd – Whether to close the source stream when this instance is closed.
Returns: - source – Source of compressed data to decompress. Can be any object
with a
-
stream_writer(writer, write_size=131072, write_return_read=True, closefd=True)¶ Push-based stream wrapper that performs decompression.
This method constructs a stream wrapper that conforms to the
io.RawIOBaseinterface and performs transparent decompression when writing to a wrapper stream.See
zstandard.ZstdDecompressionWriterfor more documentation and usage examples.Parameters: - writer – Destination for decompressed output. Can be any object with a
write(data). - write_size – Integer size of chunks to
write()towriter. - write_return_read – Whether
write()should return the number of bytes of input consumed. If False,write()returns the number of bytes sent to the inner stream. - closefd – Whether to
close()the inner stream when this stream is closed.
Returns: - writer – Destination for decompressed output. Can be any object with a
ZstdDecompressionWriter¶
-
class
zstandard.ZstdDecompressionWriter(decompressor, writer, write_size, write_return_read, closefd=True)¶ Write-only stream wrapper that performs decompression.
This type provides a writable stream that performs decompression and writes decompressed data to another stream.
This type implements the
io.RawIOBaseinterface. Only methods that involve writing will do useful things.Behavior is similar to
ZstdCompressor.stream_writer(): compressed data is sent to the decompressor by callingwrite(data)and decompressed output is written to the inner stream by calling itswrite(data)method:>>> dctx = zstandard.ZstdDecompressor() >>> decompressor = dctx.stream_writer(fh) >>> # Will call fh.write() with uncompressed data. >>> decompressor.write(compressed_data)
Instances can be used as context managers. However, context managers add no extra special behavior other than automatically calling
close()when they exit.Calling
close()will mark the stream as closed and subsequent I/O operations will raiseValueError(per the documented behavior ofio.RawIOBase).close()will also callclose()on the underlying stream if such a method exists and the instance was created withclosefd=True.The size of chunks to
write()to the destination can be specified:>>> dctx = zstandard.ZstdDecompressor() >>> with dctx.stream_writer(fh, write_size=16384) as decompressor: >>> pass
You can see how much memory is being used by the decompressor:
>>> dctx = zstandard.ZstdDecompressor() >>> with dctx.stream_writer(fh) as decompressor: >>> byte_size = decompressor.memory_size()
stream_writer()accepts awrite_return_readboolean argument to control the return value ofwrite(). WhenTrue(the default)``,write()returns the number of bytes that were read from the input. WhenFalse,write()returns the number of bytes that werewrite()to the inner stream.-
close()¶
-
closed¶
-
fileno()¶
-
flush()¶
-
isatty()¶
-
memory_size()¶
-
read(size=-1)¶
-
readable()¶
-
readall()¶
-
readinto(b)¶
-
readline(size=-1)¶
-
readlines(hint=-1)¶
-
seek(offset, whence=None)¶
-
seekable()¶
-
tell()¶
-
truncate(size=None)¶
-
writable()¶
-
write(data)¶
-
writelines(lines)¶
-
ZstdDecompressionReader¶
-
class
zstandard.ZstdDecompressionReader(decompressor, source, read_size, read_across_frames, closefd=True)¶ Read only decompressor that pull uncompressed data from another stream.
This type provides a read-only stream interface for performing transparent decompression from another stream or data source. It conforms to the
io.RawIOBaseinterface. Only methods relevant to reading are implemented.>>> with open(path, 'rb') as fh: >>> dctx = zstandard.ZstdDecompressor() >>> reader = dctx.stream_reader(fh) >>> while True: ... chunk = reader.read(16384) ... if not chunk: ... break ... # Do something with decompressed chunk.
The stream can also be used as a context manager:
>>> with open(path, 'rb') as fh: ... dctx = zstandard.ZstdDecompressor() ... with dctx.stream_reader(fh) as reader: ... ...
When used as a context manager, the stream is closed and the underlying resources are released when the context manager exits. Future operations against the stream will fail.
The
sourceargument tostream_reader()can be any object with aread(size)method or any object implementing the buffer protocol.If the
sourceis a stream, you can specify how largeread()requests to that stream should be via theread_sizeargument. It defaults tozstandard.DECOMPRESSION_RECOMMENDED_INPUT_SIZE.:>>> with open(path, 'rb') as fh: ... dctx = zstandard.ZstdDecompressor() ... # Will perform fh.read(8192) when obtaining data for the decompressor. ... with dctx.stream_reader(fh, read_size=8192) as reader: ... ...
Instances are partially seekable. Absolute and relative positions (
SEEK_SETandSEEK_CUR) forward of the current position are allowed. Offsets behind the current read position and offsets relative to the end of stream are not allowed and will raiseValueErrorif attempted.tell()returns the number of decompressed bytes read so far.Not all I/O methods are implemented. Notably missing is support for
readline(),readlines(), and linewise iteration support. This is because streams operate on binary data - not text data. If you want to convert decompressed output to text, you can chain anio.TextIOWrapperto the stream:>>> with open(path, 'rb') as fh: ... dctx = zstandard.ZstdDecompressor() ... stream_reader = dctx.stream_reader(fh) ... text_stream = io.TextIOWrapper(stream_reader, encoding='utf-8') ... for line in text_stream: ... ...
-
close()¶
-
closed¶
-
flush()¶
-
isatty()¶
-
next()¶
-
read(size=-1)¶
-
read1(size=-1)¶
-
readable()¶
-
readall()¶
-
readinto(b)¶
-
readinto1(b)¶
-
readline(size=-1)¶
-
readlines(hint=-1)¶
-
seek(pos, whence=0)¶
-
seekable()¶
-
tell()¶
-
writable()¶
-
write(data)¶
-
writelines(lines)¶
-
ZstdDecompressionObj¶
-
class
zstandard.ZstdDecompressionObj(decompressor, write_size)¶ A standard library API compatible decompressor.
This type implements a compressor that conforms to the API by other decompressors in Python’s standard library. e.g.
zlib.decompressobjorbz2.BZ2Decompressor. This allows callers to use zstd compression while conforming to a similar API.Compressed data chunks are fed into
decompress(data)and uncompressed output (or an empty bytes) is returned. Output from subsequent calls needs to be concatenated to reassemble the full decompressed byte sequence.Each instance is single use: once an input frame is decoded,
decompress()can no longer be called.>>> dctx = zstandard.ZstdDecompressor() >>> dobj = dctx.decompressobj() >>> data = dobj.decompress(compressed_chunk_0) >>> data = dobj.decompress(compressed_chunk_1)
By default, calls to
decompress()write output data in chunks of sizeDECOMPRESSION_RECOMMENDED_OUTPUT_SIZE. These chunks are concatenated before being returned to the caller. It is possible to define the size of these temporary chunks by passingwrite_sizetodecompressobj():>>> dctx = zstandard.ZstdDecompressor() >>> dobj = dctx.decompressobj(write_size=1048576)
Note
Because calls to
decompress()may need to perform multiple memory (re)allocations, this streaming decompression API isn’t as efficient as other APIs.-
decompress(data)¶ Send compressed data to the decompressor and obtain decompressed data.
Parameters: data – Data to feed into the decompressor. Returns: Decompressed bytes.
-
eof¶ Whether the end of the compressed data stream has been reached.
-
flush(length=0)¶ Effectively a no-op.
Implemented for compatibility with the standard library APIs.
Safe to call at any time.
Returns: Empty bytes.
-
unconsumed_tail¶ Data that has not yet been fed into the decompressor.
-
unused_data¶ Bytes past the end of compressed data.
If
decompress()is fed additional data beyond the end of a zstd frame, this value will be non-empty oncedecompress()fully decodes the input frame.
-