Skip to content

feat: add binary row serializers and blob implementation#31

Open
lszskye wants to merge 1 commit into
apache:mainfrom
lszskye:p2-7
Open

feat: add binary row serializers and blob implementation#31
lszskye wants to merge 1 commit into
apache:mainfrom
lszskye:p2-7

Conversation

@lszskye
Copy link
Copy Markdown
Contributor

@lszskye lszskye commented May 29, 2026

Purpose

Introduce serialization infrastructure for Paimon's binary row format, including BinaryRowSerializer, BinarySerializerUtils, RowCompactedSerializer, and the Blob implementation.

Changes

BinaryRowSerializer

  • Serializes/deserializes BinaryRow

BinarySerializerUtils

  • Utility class for converting InternalRow, InternalArray, InternalMap into their binary counterparts (BinaryRow, BinaryArray, BinaryMap)

RowCompactedSerializer

  • Compact serialization format for InternalRow, using bitset-based null tracking and variable-length integer encoding
  • Provides SerializeToBytes() / Deserialize() for row-level serialization

Blob

  • Implementation of Blob class for managing blob descriptors and blob data

Tests

  • BinaryRowSerializerTest
  • BinarySerializerUtilsTest
  • RowCompactedSerializerTest

Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I found two blockers that should be fixed before merge: one compile-time missing include and one moved-from unique_ptr dereference risk in Blob construction.

#include "paimon/common/memory/memory_segment.h"
#include "paimon/common/memory/memory_segment_utils.h"
#include "paimon/common/memory/memory_slice.h"
#include "paimon/common/utils/var_length_int_utils.h"
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header is not present in the PR head or the current base branch, so including paimon/common/utils/var_length_int_utils.h makes this code fail to compile. Please add the missing utility header/implementation or switch to an existing varint helper.

PAIMON_ASSIGN_OR_RAISE(std::string normalized_path, PathUtil::NormalizePath(path));
PAIMON_ASSIGN_OR_RAISE(std::unique_ptr<BlobDescriptor> descriptor,
BlobDescriptor::Create(normalized_path, offset, length));
auto impl = std::make_unique<Blob::Impl>(std::move(descriptor), descriptor->Uri());
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passes std::move(descriptor) and descriptor->Uri() in the same function call. The argument evaluation order is unspecified, so descriptor->Uri() may run after the unique_ptr has been moved from and become null. Please capture the URI before moving descriptor; the same pattern below in FromDescriptor needs the same fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants