Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
45ea7ec
docs: design PostgreSQL compatibility harness
renecannao Jun 10, 2026
ca78d6b
docs: distinguish deep parsing from classification
renecannao Jun 10, 2026
b797917
docs: define regression SQL split fallback
renecannao Jun 10, 2026
d3a132c
docs: plan PostgreSQL compatibility harness
renecannao Jun 10, 2026
91dab09
chore: ignore local worktrees
renecannao Jun 10, 2026
676d5a5
chore: pin PostgreSQL parser oracle sources
renecannao Jun 10, 2026
10a8857
fix: harden PostgreSQL source cache
renecannao Jun 10, 2026
c736fc5
fix: recover stale PostgreSQL cache locks
renecannao Jun 11, 2026
6669125
fix: interrupt PostgreSQL cache subprocesses
renecannao Jun 11, 2026
599ad46
test: define PostgreSQL compatibility result mapping
renecannao Jun 11, 2026
1d127af
fix: harden PostgreSQL statement type mapping
renecannao Jun 11, 2026
6f3e6e3
fix: unify PostgreSQL reviewed node cases
renecannao Jun 11, 2026
1c819c7
feat: add libpg_query differential runner
renecannao Jun 11, 2026
38ae2a5
fix: guard libpg_query cleanup lifecycle
renecannao Jun 11, 2026
f99923c
fix: harden libpg_query differential runner
renecannao Jun 11, 2026
1d064c5
feat: build deterministic PostgreSQL statement inventories
renecannao Jun 11, 2026
f9e5890
fix: reject abbreviated inventory options
renecannao Jun 11, 2026
5ed1427
fix: harden PostgreSQL inventory extraction
renecannao Jun 11, 2026
241a8df
fix: detect portable inventory path aliases
renecannao Jun 11, 2026
6aeb5ff
fix: normalize complete inventory paths
renecannao Jun 11, 2026
6fec5dc
fix: include algorithm for local txn remove
renecannao Jun 12, 2026
01a2e3a
feat: detect PostgreSQL release and grammar deltas
renecannao Jun 13, 2026
bcc707b
feat: enforce PostgreSQL compatibility baselines
renecannao Jun 13, 2026
1f9122e
test: add reviewed PostgreSQL syntax witnesses
renecannao Jun 15, 2026
7663f51
feat: generate PostgreSQL compatibility reports
renecannao Jun 15, 2026
5d1a12d
build: add PostgreSQL compatibility commands
renecannao Jun 15, 2026
5fcdd18
test: baseline ParserSQL against PostgreSQL 18
renecannao Jun 16, 2026
e8241f9
ci: check PostgreSQL parser compatibility
renecannao Jun 16, 2026
7bbe5d1
fix: release PostgreSQL cache lock on termination
renecannao Jun 16, 2026
2dcc0d4
test: recover PostgreSQL cache lock after termination
renecannao Jun 16, 2026
901f666
docs: document PostgreSQL compatibility workflow
renecannao Jun 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@ on:
branches: [main]
pull_request:
branches: [main]
schedule:
- cron: '17 3 * * 1'

jobs:
build-and-test:
if: github.event_name != 'schedule'
strategy:
matrix:
os: [ubuntu-22.04, ubuntu-24.04]
Expand All @@ -22,6 +25,7 @@ jobs:
run: make -f Makefile clean && make -f Makefile all

macos:
if: github.event_name != 'schedule'
runs-on: macos-latest
steps:
- uses: actions/checkout@v4
Expand All @@ -38,6 +42,7 @@ jobs:
run: make -f Makefile clean && make -f Makefile all MYSQL_CFLAGS="-I/opt/homebrew/opt/mysql-client/include"

benchmark:
if: github.event_name != 'schedule'
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
Expand All @@ -58,6 +63,7 @@ jobs:
path: benchmark_results.json

corpus-test:
if: github.event_name != 'schedule'
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v4
Expand Down Expand Up @@ -96,3 +102,43 @@ jobs:
grep -ohP '"((?:SELECT|INSERT|UPDATE|DELETE|SET|CREATE|ALTER|DROP|EXPLAIN|WITH)[^"]*)"' \
/tmp/sqlparser-rs/tests/sqlparser_postgres.rs 2>/dev/null | \
sed 's/^"//' | sed 's/"$//' | sed 's/\\"/"/g' | ./corpus_test pgsql

pg-compat:
if: github.event_name != 'schedule'
runs-on: ubuntu-24.04
timeout-minutes: 30
env:
PG_COMPAT_CACHE: /tmp/parsersql-pg-compat
steps:
- uses: actions/checkout@v4

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Consolidation: Pin all GitHub Actions to commit SHAs.

Four action references use unpinned semantic versions instead of commit SHAs, which violates the repository's security policy and allows unreviewed action mutations.

  • .github/workflows/ci.yml#L113: actions/checkout@v4 → pin to SHA
  • .github/workflows/ci.yml#L116: actions/cache@v4 → pin to SHA
  • .github/workflows/ci.yml#L133: actions/checkout@v4 → pin to SHA
  • .github/workflows/ci.yml#L136: actions/cache@v4 → pin to SHA

Example: replace actions/checkout@v4 with actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af172 (or the current latest v4 commit SHA).

🧰 Tools
🪛 zizmor (1.25.2)

[warning] 113-113: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)


[error] 113-113: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml at line 113, Pin all GitHub Actions references to
commit SHAs instead of semantic versions to comply with the repository's
security policy. In `.github/workflows/ci.yml` at lines 113 and 133, replace
`actions/checkout@v4` with
`actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af172` (or the current
latest v4 commit SHA). At lines 116 and 136, replace `actions/cache@v4` with the
full commit SHA for the latest v4 release of that action. Verify each pinned SHA
corresponds to the intended version tag to prevent action mutations.

Source: Linters/SAST tools


- name: Cache PostgreSQL compatibility sources
uses: actions/cache@v4
with:
path: /tmp/parsersql-pg-compat
key: pg-compat-${{ runner.os }}-${{ hashFiles('tests/pg_compat/upstream_pins.json') }}
restore-keys: |
pg-compat-${{ runner.os }}-

- name: Test PostgreSQL compatibility gate
run: make -f Makefile test-pg-compat
Comment on lines +106 to +124

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Consolidation: Restrict job permissions and disable credential persistence in both pg-compat jobs.

Both the pg-compat and pg-compat-full jobs omit explicit permissions configuration and allow credential persistence, creating overly broad access and credential leakage risks.

  • .github/workflows/ci.yml#L106-L124 (pg-compat): add permissions: { contents: read } and set persist-credentials: false on checkout
  • .github/workflows/ci.yml#L126-L144 (pg-compat-full): add permissions: { contents: read } and set persist-credentials: false on checkout

Both jobs only need read access to the repository; they do not write artifacts or deploy.

🧰 Tools
🪛 zizmor (1.25.2)

[warning] 113-113: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false

(artipacked)


[warning] 106-124: overly broad permissions (excessive-permissions): default permissions used due to no permissions: block

(excessive-permissions)


[error] 113-113: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)


[error] 116-116: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)

(unpinned-uses)

📍 Affects 1 file
  • .github/workflows/ci.yml#L106-L124 (this comment)
  • .github/workflows/ci.yml#L126-L144
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/ci.yml around lines 106 - 124, The pg-compat and
pg-compat-full jobs in .github/workflows/ci.yml lack explicit permissions
configuration and allow credential persistence, creating unnecessary security
risks. At .github/workflows/ci.yml#L106-L124 (pg-compat job), add a permissions
block with contents: read at the job level and set persist-credentials: false on
the actions/checkout@v4 step. Apply the same changes at
.github/workflows/ci.yml#L126-L144 (pg-compat-full job) - add permissions: {
contents: read } at the job level and set persist-credentials: false on the
checkout action. Both jobs only require read access to the repository and do not
write artifacts or perform deployments.

Source: Linters/SAST tools


pg-compat-full:
if: github.event_name == 'schedule'
runs-on: ubuntu-24.04
timeout-minutes: 45
env:
PG_COMPAT_CACHE: /tmp/parsersql-pg-compat
steps:
- uses: actions/checkout@v4

- name: Cache PostgreSQL compatibility sources
uses: actions/cache@v4
with:
path: /tmp/parsersql-pg-compat
key: pg-compat-${{ runner.os }}-${{ hashFiles('tests/pg_compat/upstream_pins.json') }}
restore-keys: |
pg-compat-${{ runner.os }}-

- name: Validate full PostgreSQL compatibility baseline
run: make -f Makefile pg-compat
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@
*.out
*.app

# Local git worktrees
.worktrees/

# Build artifacts
libsqlparser.a
sqlengine
Expand All @@ -39,4 +42,6 @@ run_tests_debug
corpus_test
run_bench
run_bench_compare
pg_compat_17
pg_compat_18
bench/sqlparser_rs_bench/target/
17 changes: 16 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ MYSQL_CFLAGS = $(shell mysql_config --cflags 2>/dev/null)
MYSQL_LIBS = $(shell mysql_config --libs 2>/dev/null)
PG_CFLAGS = -I$(shell pg_config --includedir 2>/dev/null || echo /usr/include/postgresql)
PG_LIBS = -L$(shell pg_config --libdir 2>/dev/null || echo /usr/lib/x86_64-linux-gnu) -lpq
PG_COMPAT_CACHE ?= /tmp/parsersql-pg-compat

PROJECT_ROOT = .
SRC_DIR = $(PROJECT_ROOT)/src/sql_parser
Expand Down Expand Up @@ -121,12 +122,26 @@ ENGINE_STRESS_TARGET = engine_stress_test
MYSQL_SERVER_SRC = $(PROJECT_ROOT)/tools/mysql_server.cpp
MYSQL_SERVER_TARGET = mysql_server

.PHONY: all lib test test-sqlengine test-sqlengine-in-memory test-sqlengine-single test-sqlengine-sharded bench bench-compare bench-distributed build-corpus-test build-sqlengine engine-stress mysql-server clean
.PHONY: all lib test test-sqlengine test-sqlengine-in-memory test-sqlengine-single test-sqlengine-sharded bench bench-compare bench-distributed build-corpus-test build-sqlengine build-pg-compat pg-compat pg-compat-refresh test-pg-compat engine-stress mysql-server clean

build-corpus-test: $(CORPUS_TEST_TARGET)

build-sqlengine: $(SQLENGINE_TARGET)

build-pg-compat: lib
PG_COMPAT_CACHE=$(PG_COMPAT_CACHE) ./scripts/pg_compat/fetch_libpg_query.sh
python3 ./scripts/pg_compat/run_compat.py build --cache $(PG_COMPAT_CACHE)

test-pg-compat: build-pg-compat
PG_COMPAT_RUNNER=$(PG_COMPAT_CACHE)/bin/pg_compat-18 python3 -m unittest discover -s tests/pg_compat -p 'test_*.py' -v
python3 ./scripts/pg_compat/run_compat.py test --cache $(PG_COMPAT_CACHE)

pg-compat: build-pg-compat
python3 ./scripts/pg_compat/run_compat.py full --cache $(PG_COMPAT_CACHE)

pg-compat-refresh: build-pg-compat
python3 ./scripts/pg_compat/run_compat.py refresh --cache $(PG_COMPAT_CACHE)

all: lib test

lib: $(LIB_TARGET)
Expand Down
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,26 @@ make build-sqlengine # Interactive SQL CLI
make build-corpus-test # Corpus validation harness
```

### PostgreSQL compatibility harness

ParserSQL uses `libpg_query` as the PostgreSQL syntax oracle for compatibility checks. The harness pins `libpg_query` `17-latest` and `18-latest`, extracts accepted PostgreSQL regression statements, and compares ParserSQL behavior against the committed PostgreSQL 18 baseline in [`docs/compatibility/postgresql-18.md`](docs/compatibility/postgresql-18.md).

Compatibility results separate full parser parity from classification coverage:

- `DEEP_SUPPORTED` means ParserSQL accepted the statement with the expected statement type and a full AST.
- `CLASSIFIED_ONLY` means ParserSQL recognized the top-level statement class, but did not produce full deep parser parity for that SQL.
- `PARTIAL`, `ERROR`, `TRAILING_INPUT`, and `TYPE_MISMATCH` are tracked separately in the committed baseline.

The report includes the full PostgreSQL 18 backlog, the PG17-to-PG18 release delta, statement routing coverage, and reviewed structural grammar/keyword changes.

```bash
make test-pg-compat # Build the pinned oracle runner, run harness unit tests, and replay committed CI cases
make pg-compat # Run the full pinned PG17/PG18 comparison against the committed baseline
make pg-compat-refresh # Regenerate expected_results.jsonl, ci_cases.jsonl, and docs/compatibility/postgresql-18.md
```

`PG_COMPAT_CACHE` controls where external `libpg_query` and PostgreSQL source checkouts are cached. It defaults to `/tmp/parsersql-pg-compat`.

### Interactive CLI (`sqlengine`)

```bash
Expand Down
43 changes: 42 additions & 1 deletion docs/benchmarks/REPRODUCING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Step-by-step instructions to reproduce all comparison benchmarks from scratch on
```bash
# Required tools
sudo apt-get update
sudo apt-get install -y build-essential git curl
sudo apt-get install -y build-essential git curl tar bzip2

# Rust (for sqlparser-rs benchmark)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
Expand All @@ -19,6 +19,9 @@ source ~/.cargo/env
g++ --version # need GCC 8+ (C++17 support)
cargo --version # need Rust 1.70+
git --version
curl --version
tar --version
python3 --version
```

---
Expand Down Expand Up @@ -76,6 +79,44 @@ ls -la third_party/libpg_query/libpg_query.a

---

## PostgreSQL Compatibility Harness

The PostgreSQL compatibility harness uses pinned `libpg_query` builds as the syntax oracle. It compares ParserSQL against PostgreSQL 17 and PostgreSQL 18 regression SQL, writes a committed PostgreSQL 18 baseline, and reports the PG17-to-PG18 syntax delta in `docs/compatibility/postgresql-18.md`.

Required external tools:

```text
git
curl
tar with bzip2 support
python3
C/C++ compiler
make
```

`PG_COMPAT_CACHE` controls where external sources and built oracle runners are cached. It defaults to `/tmp/parsersql-pg-compat`.

```bash
# Fast deterministic gate: unit tests plus committed CI cases
make test-pg-compat

# Full pinned PG17/PG18 comparison against committed expected_results.jsonl
make pg-compat

# Refresh committed artifacts after intentionally updating pins or mappings
make pg-compat-refresh
```

The harness result model distinguishes parser depth from statement routing:

- `DEEP_SUPPORTED` means ParserSQL accepted the SQL with the expected statement type and a full AST.
- `CLASSIFIED_ONLY` means ParserSQL recognized the top-level statement class, but did not reach full parser parity for that SQL.
- `PARTIAL`, `ERROR`, `TRAILING_INPUT`, and `TYPE_MISMATCH` are committed baseline outcomes that must not regress silently.

The refresh target updates `tests/pg_compat/expected_results.jsonl`, `tests/pg_compat/ci_cases.jsonl`, and `docs/compatibility/postgresql-18.md`.

---

## Step 4: Build the comparison benchmark

```bash
Expand Down
Loading
Loading