ref(utils): Make ParityChecker print out mismatches in a PII safe way by Christinarlong · Pull Request #116038 · getsentry/sentry

Christinarlong · 2026-05-21T18:00:57Z

Context, I want to use ParityChecker (PC) for programmatically validating webhook payloads so we don't log PII. Currently, PC if it finds differences will output the repr. describe_value instead will give us a "censored" form of what the mismatches were. See the tests in #116040

This is 99% the same as the old ParityChecker but

moves the class to be in utils
Takes in a function parameter (format_value) that when a mismatch is found gets run on the old/new value so the default was repr.

…param Move _ParityChecker and _qualify from testutils/helpers/serializer_parity.py to sentry/utils/payload_comparison.py so it can be used by production code. Changes from the original: - Renamed _ParityChecker -> PayloadComparator, _qualify -> qualify - Added format_value parameter (defaults to repr, preserving current behavior) - Added describe_value() for PII-safe value formatting in production serializer_parity.py now imports from the new location; all existing callers of assert_serializer_parity are unaffected.

Move _ParityChecker and _qualify from testutils/helpers/serializer_parity.py to sentry/utils/payload_comparison.py so it can be used by production code. Changes from the original: - Added format_value parameter (defaults to repr, preserving current behavior) - Added describe_value() for PII-safe value formatting in production serializer_parity.py now imports from the new location; all existing callers of assert_serializer_parity are unaffected.

ParityChecker only recursed into nested structures when known_diffs or unreliable field paths referenced something inside. When known_diffs is empty (production validation case), differing nested dicts were reported as a single flat mismatch instead of drilling into specific fields. Now always recurses into Mapping and list types.

Christinarlong · 2026-05-21T20:20:09Z

+                        )
+                    elif old_item != new_item:
+                        self.mismatches.append(
+                            f"{item_path}: old={self.format_value(old_item)}, new={self.format_value(new_item)}"


This section is new and are to address when the nested dictionary has a mismatch but we those mismatches aren't known before hand (thus not in nested_diffs or nested_unreliable)

elif isinstance(old_val, Mapping) and isinstance(new_val, Mapping): self.compare(old_val, new_val, frozenset(), full_path, _qualify(diffs_path, key))

This conditional is for when both values are dicts , recur into the nested dict to find which of the keys val pairs are different. Previously, this would just print the keys of the old and new value dicts.

elif isinstance(old_val, list) and isinstance(new_val, list): if len(old_val) != len(new_val): self.mismatches.append( f"{full_path} count: old={len(old_val)}, new={len(new_val)}" ) for i, (old_item, new_item) in enumerate(zip(old_val, new_val)): item_path = f"{full_path}[{i}]" if isinstance(old_item, Mapping) and isinstance(new_item, Mapping): self.compare( old_item, new_item, frozenset(), item_path, _qualify(diffs_path, key) ) elif old_item != new_item: self.mismatches.append( f"{item_path}: old={self.format_value(old_item)}, new={self.format_value(new_item)}"

If both old and new are lists, first report if the length is same or not. Then we iterate through the list and recur if the items are and compare again. Didn't add something for nested lists, since the webhook payload doesn't have anything of that format.

I'm pretty sure we can reformulate this to have just a generalized parpallel walk and diff phase that is basically a Generator[tuple[Path, ComparisonResult]] that is consumed by book-keeping code to validate known diffs, ignore unreliables, and produce a nice result at the end. Probably not literally a generator, as I think we'd want to be able to tell it to not bother recursing into know diffs once they're actually different or something.

We shouldn't have to have nested data walking code that is limited by how much we're willing to write out, and we can probably generalize list/dict and treat both as [K,V] sequences, differentiating by type checks.

That said, not important if this is good enough for our purposes.
But we do need to reconsidere whether DeepDiff could work here instead.

I do think DeepDiff would solve all the cases I'm trying to tack onto the paritychecker. Honestly would prefer that as it would also be way less maintenance and understanding work required than refactoring the checker to be handle lists/dicts generically and separating out the walk and diff phase. Perhaps llms can roll through it, but even taking the time to understand and grok the checker is a bit much imo

Though reading the last pr descr, it seems like it was considered and said nah too(?).

kcons

Seems reasonable; but I think we need to test this within the commit that changes it.

kcons · 2026-05-21T21:31:28Z

+                        )
+                    elif old_item != new_item:
+                        self.mismatches.append(
+                            f"{item_path}: old={self.format_value(old_item)}, new={self.format_value(new_item)}"


I'm pretty sure we can reformulate this to have just a generalized parpallel walk and diff phase that is basically a Generator[tuple[Path, ComparisonResult]] that is consumed by book-keeping code to validate known diffs, ignore unreliables, and produce a nice result at the end. Probably not literally a generator, as I think we'd want to be able to tell it to not bother recursing into know diffs once they're actually different or something.

We shouldn't have to have nested data walking code that is limited by how much we're willing to write out, and we can probably generalize list/dict and treat both as [K,V] sequences, differentiating by type checks.

That said, not important if this is good enough for our purposes.
But we do need to reconsidere whether DeepDiff could work here instead.

github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label May 21, 2026

Christinarlong mentioned this pull request May 21, 2026

feat(webhooks): Add payload validation during dual-write migration #116040

Open

Christinarlong changed the title ~~ref: Extract PayloadComparator from _ParityChecker with format_value param~~ ref(utils): Make ParityChecker print out mismatches in a PII safe way May 21, 2026

Christinarlong commented May 21, 2026

View reviewed changes

Christinarlong marked this pull request as ready for review May 21, 2026 20:20

Christinarlong requested a review from a team as a code owner May 21, 2026 20:20

Christinarlong requested review from a team and kcons May 21, 2026 20:52

kcons requested changes May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ref(utils): Make ParityChecker print out mismatches in a PII safe way#116038

ref(utils): Make ParityChecker print out mismatches in a PII safe way#116038
Christinarlong wants to merge 3 commits into
masterfrom
crl/webhook-payload-validation

Christinarlong commented May 21, 2026 •

edited

Loading

Uh oh!

Christinarlong May 21, 2026

Uh oh!

kcons May 21, 2026

Uh oh!

Christinarlong May 21, 2026

Uh oh!

kcons left a comment

Uh oh!

kcons May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Christinarlong commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Christinarlong May 21, 2026

Choose a reason for hiding this comment

Uh oh!

kcons May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Christinarlong May 21, 2026

Choose a reason for hiding this comment

Uh oh!

kcons left a comment

Choose a reason for hiding this comment

Uh oh!

kcons May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Christinarlong commented May 21, 2026 •

edited

Loading