Skip to content

gh-150583: Fix zstd compression with digested ZstdDict#150586

Open
Rogdham wants to merge 1 commit into
python:mainfrom
Rogdham:fix-zst-non-deterministic-compression-with-digested-zstddict
Open

gh-150583: Fix zstd compression with digested ZstdDict#150586
Rogdham wants to merge 1 commit into
python:mainfrom
Rogdham:fix-zst-non-deterministic-compression-with-digested-zstddict

Conversation

@Rogdham
Copy link
Copy Markdown
Contributor

@Rogdham Rogdham commented May 29, 2026

Changes

Force setting a compression_level of ZSTD_CLEVEL_DEFAULT when creating the ZstdCompressor object.

Question for reviewers: should I use _zstd_set_c_level(ZSTD_CLEVEL_DEFAULT) instead?

See reproducer in #150583.

Analysis

When creating a ZstdCompressor object (_zstd_ZstdCompressor_new_impl), the compression_level field is unset, and as a result keeps its value from whatever was in memory (not deterministic), assuming no level is passed in the constructor.

Then, when set, the C dict is loaded into the compression context with _zstd_load_c_dict, which in turn calls _zstd_load_impl.

In the case of a digested dictionary, _get_CDict(zd, self->compression_level) is now called, which in turns calls ZSTD_createCDict(..., compressionLevel).

So an arbitrary compression level is passed to that call to libzstd.

Bug impact

Prerequisites:

  • Compress data with compression.zstd
  • Do not specify a compression level (neither with level nor through options)
  • Pass a digested dictionary with zstd_dict (e.g. with ZstdDict(...).as_digested_dict)

Consequence: the compression level used is arbitrary, and may change from one run to another.

Misc

Backport suggestion: to 3.15 and 3.14.

AI disclosure: analysis was performed with help of an LLM.

@emmatyping 👋

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant