Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions src/iceberg/test/truncate_util_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@

#include "iceberg/util/truncate_util.h"

#include <limits>

#include <gtest/gtest.h>

#include "iceberg/expression/literal.h"
Expand Down Expand Up @@ -51,6 +53,19 @@ TEST(TruncateUtilTest, TruncateLiteral) {
Literal::Binary(std::vector<uint8_t>(expected.begin(), expected.end())));
}

TEST(TruncateUtilTest, TruncateLiteralAvoidsSignedIntegerOverflow) {
EXPECT_EQ(TruncateUtils::TruncateLiteral(
Literal::Int(std::numeric_limits<int32_t>::max() - 1),
std::numeric_limits<int32_t>::max()),
Literal::Int(0));
EXPECT_EQ(TruncateUtils::TruncateLiteral(
Literal::Int(std::numeric_limits<int32_t>::min()), 10),
Literal::Int(std::numeric_limits<int32_t>::max() - 1));
EXPECT_EQ(TruncateUtils::TruncateLiteral(
Literal::Long(std::numeric_limits<int64_t>::min()), 10),
Literal::Long(std::numeric_limits<int64_t>::max() - 1));
}

TEST(TruncateUtilTest, TruncateLiteralRejectsInvalidWidth) {
std::vector<uint8_t> data{1, 2, 3};

Expand Down
12 changes: 11 additions & 1 deletion src/iceberg/util/truncate_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,13 @@
#include <cstdint>
#include <optional>
#include <string>
#include <type_traits>
#include <utility>

#include "iceberg/iceberg_export.h"
#include "iceberg/result.h"
#include "iceberg/type_fwd.h"
#include "iceberg/util/int128.h"

namespace iceberg {

Expand Down Expand Up @@ -84,7 +86,15 @@ class ICEBERG_EXPORT TruncateUtils {
template <typename T>
requires std::is_same_v<T, int32_t> || std::is_same_v<T, int64_t>
static inline T TruncateInteger(T v, int32_t W) {
return v - (((v % W) + W) % W);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was from the spec:

The remainder, v % W, must be positive. For languages where % can produce negative values, the correct truncate function is: v - (((v % W) + W) % W)

I don't think we need to use int128_t since W is alway a int32_t and greater than 0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the formula is from the spec. The issue is not the spec math, but evaluating it with signed C++ integer types. For int32_t, (v % W) + W can overflow when W is large. For int64_t, the final v - remainder can overflow at INT64_MIN. The wider type keeps the spec formula while avoiding signed UB.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems deviate from the Java implementation with some very large values. Perhaps it is worth raising a discussion in the dev@iceberg.apache.org?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this on Compiler Explorer. For W > INT32_MAX / 2, (v % W) + W can overflow, causing the remainder to become negative, which violates the spec that the remainder must be positive.

For values close to std::numeric_limits<int64_t>::min(), our current implementation produces the same result as the modified version, but it triggers

runtime error: signed integer overflow

when compiled with -fsanitize=undefined.

I think Java has a similar issue with the first int32_t overflow case due to wraparound arithmetic, no idea if it suffers from the signed integer overflow problem that C++ has(maybe not).

If we change this, I'd suggest keep v - (((v % W) + W) % W) as a comment and clarify why the current form.

I agree this is worth discussing on the dev mailing list.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very experienced about this process, how do we proceed?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using WideInt = std::conditional_t<std::is_same_v<T, int32_t>, int64_t, int128_t>;
using UnsignedT = std::make_unsigned_t<T>;

const auto value = static_cast<WideInt>(v);
const auto width = static_cast<WideInt>(W);
const auto remainder = ((value % width) + width) % width;
const auto truncated = value - remainder;
// Preserve modulo conversion when the mathematical result is below the signed range.
return static_cast<T>(static_cast<UnsignedT>(truncated));
}

/// \brief Truncate a Decimal to a specified width.
Expand Down
Loading