Skip to content

feat: capture SQL security/governance policy applications and definitions#87

Merged
lustefaniak merged 16 commits into
mainfrom
lukasz-capture-security-governance-policies
Jun 2, 2026
Merged

feat: capture SQL security/governance policy applications and definitions#87
lustefaniak merged 16 commits into
mainfrom
lukasz-capture-security-governance-policies

Conversation

@lustefaniak

@lustefaniak lustefaniak commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Adds parser/AST support for capturing security & governance policies across all supported dialects — which masking/access/governance policies are applied (and to which columns), plus the policy definitions themselves. Previously these clauses were parsed-and-discarded or skipped by the generic CREATE/ALTER/DROP fallbacks, losing both the policy references and the lineage-bearing condition expressions.

Applications (CREATE TABLE / views)

  • Snowflake table-level ROW ACCESS / AGGREGATION / JOIN / STORAGE LIFECYCLE policies → TablePolicy; table- and column-level TAG (...)Tag; conditional masking (... MASKING POLICY p) USING (cols).
  • Databricks WITH ROW FILTER f ON (cols)TablePolicyKind::RowFilter; column MASK f USING COLUMNS (...)ColumnMask; masks/filters in (materialized) views.

Policy definitions (bodies parsed as real Expr, so subqueries/table refs stay visible)

  • Snowflake CREATE [OR REPLACE] {MASKING|ROW ACCESS|AGGREGATION|PROJECTION|JOIN} POLICY ... AS (sig) RETURNS type -> bodyStatement::CreatePolicy; CREATE TAGStatement::CreateTag.
  • BigQuery CREATE/DROP [ALL] ROW ACCESS POLICY ... ON <table> [GRANT TO (...)] FILTER USING (...).
  • PostgreSQL CREATE/ALTER/DROP POLICY ... ON <table> (AS PERMISSIVE/RESTRICTIVE, FOR, TO, USING, WITH CHECK).
  • SQL Server CREATE/ALTER/DROP SECURITY POLICY with ADD/ALTER/DROP FILTER/BLOCK PREDICATE actions; column MASKED WITH (FUNCTION=...).
  • Redshift CREATE RLS POLICY / CREATE MASKING POLICY + ATTACH/DETACH/DROP/ALTER MASKING POLICY.

ALTER / DROP / toggles

  • Snowflake ALTER TABLE SET/UNSET {AGGREGATION|JOIN} POLICY, UNSET TAG; DROP {kind} POLICY / DROP TAG.
  • Databricks ALTER TABLE SET/DROP ROW FILTER, SET/UNSET TAGS, ALTER COLUMN SET/DROP MASK, ALTER COLUMN SET/UNSET TAGS.
  • SQL Server ALTER COLUMN ADD/DROP MASKED.
  • PostgreSQL ALTER TABLE { ENABLE | DISABLE | FORCE | NO FORCE } ROW LEVEL SECURITY.

Every new construct is justified against the vendor grammar (doc links in each commit body). All forms round-trip via Display. Dialect-overlapping shapes (e.g. MASKING POLICY across Snowflake/Redshift/ClickHouse, ROW ACCESS POLICY across Snowflake/BigQuery) are disambiguated with maybe_parse/lookahead so each dialect's form is parsed and the others fall through unchanged.

Validation

  • Full unit suite green (1149 tests); new per-dialect tests assert the AST exposes policy / tag / table / column references.
  • Corpus: 0 regressions across the whole branch vs origin/main; ~14 previously-failing real files now parse (Snowflake policy applications + conditional masking + ALTER/DROP, Databricks ROW FILTER/MASK incl. materialized views, 4 customer views via the FORMAT/PRIOR keyword fix). Definition files that previously only "parsed" via the skip-fallback now produce real AST.

Also fixes two general keyword-handling bugs surfaced by the corpus: FORMAT (reserved only for the ClickHouse FORMAT clause) and PRIOR (the CONNECT BY prefix operator) are now accepted as ordinary identifiers in other dialects.

Snowflake CREATE TABLE may attach security/governance policies at the table
level: ROW ACCESS, AGGREGATION, JOIN, and (dynamic tables) STORAGE LIFECYCLE.
These were parsed and discarded; this captures them in a new TablePolicy
{kind, with, policy_name, columns} on CreateTable so column-level lineage can
surface which policy guards a table and over which columns.

Each kind selects its column-list keyword: ROW ACCESS / STORAGE LIFECYCLE use
ON (cols), AGGREGATION uses ENTITY KEY (cols), JOIN uses ALLOWED JOIN KEYS
(cols); all column lists are optional (GET_DDL omits ON when the caller lacks
privilege). The optional WITH prefix round-trips.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-table
https://docs.snowflake.com/en/sql-reference/sql/create-aggregation-policy
https://docs.snowflake.com/en/sql-reference/sql/create-join-policy
https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table

Fixes 3 corpus test failures (Snowflake).
Snowflake CREATE TABLE may attach governance tags at the table level via
[WITH] TAG (tag_name = 'value', ...). These were consumed and discarded; this
captures them in a new Tag {name, value} list (table_tags on CreateTable) so
lineage can surface which tags are applied to a table.

Tag names may be qualified (db.schema.tag); values are string literals. Both
the WITH-prefixed and bare forms parse and normalize to the canonical
`WITH TAG (k = 'v', ...)` on round-trip. Removed the redundant pre-loop tag
discarder that previously swallowed a leading WITH TAG before the clause loop.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-table
Column definitions in Snowflake CREATE TABLE may carry [WITH] TAG (k = 'v', ...)
governance tags. These were consumed and discarded; this captures them in a new
`tags: Vec<Tag>` field on ColumnDef so lineage can surface per-column tagging.

Both the WITH-prefixed and bare forms parse and normalize to the canonical
`WITH TAG (...)` rendered after the column options/policy. All existing
ColumnDef constructions across the test suite gain the new (empty) field.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-table
Databricks attaches row filters and column masks via plain scalar UDFs:
  CREATE TABLE ... WITH ROW FILTER <func> ON (cols)
  <col> <type> MASK <func> [USING COLUMNS (<col>|<literal>, ...)]

ROW FILTER is captured as a new TablePolicyKind::RowFilter (uniform with the
Snowflake table policies; ON-list holds the function arguments). Column MASK is
upgraded from a bare ObjectName to a ColumnMask {function, using_columns} so the
USING COLUMNS arguments (other column names and/or constant literals) are
preserved for lineage — previously USING COLUMNS failed to parse.

Grammar per Databricks docs:
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-row-filter
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-column-mask

Fixes 2 corpus test failures (Databricks).
…TION/PROJECTION/JOIN)

Snowflake security/governance policy definitions share one shape:
  CREATE [OR REPLACE] <KIND> POLICY [IF NOT EXISTS] <name>
    AS ( [<arg> <type>, ...] ) RETURNS <type> -> <body>
    [COMMENT = '...'] [EXEMPT_OTHER_POLICIES = { TRUE | FALSE }]

These previously fell through the generic CREATE skip-until-semicolon fallback,
discarding the masking/row-access condition. A new Statement::CreatePolicy
variant captures kind + name + typed signature + RETURNS type + the `-> body`
expression + trailing options. Parsing the body as a real Expr keeps any
subqueries/table references (e.g. an EXISTS lookup) visible to lineage.

Dispatched via maybe_parse before the generic fallback; the `AS` check makes it
revert for non-Snowflake shapes (BigQuery's `ROW ACCESS POLICY ... ON <table>`),
so those still fall back unchanged pending dedicated handling.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-masking-policy
https://docs.snowflake.com/en/sql-reference/sql/create-row-access-policy
https://docs.snowflake.com/en/sql-reference/sql/create-aggregation-policy
https://docs.snowflake.com/en/sql-reference/sql/create-projection-policy
https://docs.snowflake.com/en/sql-reference/sql/create-join-policy
Snowflake tag definitions previously fell through the generic CREATE fallback.
A new Statement::CreateTag captures the tag name, the optional ALLOWED_VALUES
string list, and trailing key=value options (COMMENT, PROPAGATE, ON_CONFLICT),
so tag objects are represented in the AST alongside their applications.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-tag
BigQuery row-level security previously fell through the generic CREATE fallback,
discarding the target table and the filter predicate. A new
Statement::CreateRowAccessPolicy captures the policy name, the `ON <table>`
target, the optional `GRANT TO (...)` principal list, and the
`FILTER USING (<predicate>)` expression. Parsing the predicate as a real Expr
keeps any subquery table references (e.g. a lookup-table IN-subquery) visible to
lineage.

Dispatched after the Snowflake policy attempt (whose `AS` check reverts for this
`... ON <table>` shape), so Snowflake definitions are unaffected.

Grammar per BigQuery docs:
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Corpus Parsing Report

Total: 191317 passed, 1932 failed (99.0% pass rate)

✨ No changes in test results

By Dialect

Dialect Passed Failed Total Pass Rate Delta
ansi 511 69 580 88.1% -
athena 37 1 38 97.4% -
bigquery 42295 114 42409 99.7% -
clickhouse 2488 106 2594 95.9% -
databricks 2928 192 3120 93.8% +2
doris 22 18 40 55.0% -
dremio 27 0 27 100.0% -
duckdb 1124 45 1169 96.2% -
exasol 54 7 61 88.5% -
fabric 6 0 6 100.0% -
generic 17 38 55 30.9% -
hive 35 10 45 77.8% -
materialize 6 14 20 30.0% -
mssql 2273 410 2683 84.7% -
mysql 151 37 188 80.3% -
oracle 1034 364 1398 74.0% -
postgres 1180 116 1296 91.0% -
presto 55 8 63 87.3% -
redshift 40428 60 40488 99.9% -
singlestore 141 9 150 94.0% -
snowflake 94738 139 94877 99.9% +11
spark 90 20 110 81.8% -
sqlite 51 16 67 76.1% -
starrocks 29 4 33 87.9% -
teradata 23 20 43 53.5% -
trino 1409 81 1490 94.6% -
tsql 165 34 199 82.9% -

… list

Snowflake conditional masking places the masking policy's USING (col, cond_col,
...) clause *after* the column-list parentheses (per the column-security guide),
e.g. `CREATE TABLE t (email STRING MASKING POLICY p) USING (email, visibility)`.
This previously errored because the trailing `USING (` collided with the
Databricks `USING <format>` / Snowflake `USING TEMPLATE` handling.

For CREATE TABLE the columns are attached to the preceding column's masking
policy (canonical form carries USING inline), preserving them for lineage; for
CREATE VIEW the clause is consumed (view column policies aren't represented).

Grammar per Snowflake docs:
https://docs.snowflake.com/en/user-guide/security-column-intro

Fixes 3 corpus test failures (Snowflake).
…y/tag

Adds the ALTER-time governance operations and policy/tag drops that previously
errored or hit the generic fallback:

- ALTER TABLE SET { AGGREGATION | JOIN } POLICY <name> [ENTITY KEY (..) |
  ALLOWED JOIN KEYS (..)] [FORCE] -> AlterTableOperation::SetTablePolicy
  (reuses TablePolicy).
- ALTER TABLE UNSET { AGGREGATION | JOIN | ROW ACCESS } POLICY ->
  AlterTableOperation::UnsetTablePolicy; UNSET TAG <name> [, ...] -> UnsetTag.
- DROP { MASKING | ROW ACCESS | AGGREGATION | PROJECTION | JOIN } POLICY and
  DROP TAG -> Statement::Drop with new ObjectType::Policy(kind) / ObjectType::Tag.

Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/alter-table

Fixes 1 corpus test failure (Snowflake).
…ized) views

Databricks (materialized) view column lists may carry a column mask after the
type (`<col> <type> MASK <func> [USING COLUMNS (...)]`) and a table-level
`WITH ROW FILTER <func> ON (cols)`. These now parse; view-level policies aren't
represented in the AST (consistent with other view column policies), so they're
consumed while the AS query — and its lineage — is preserved.

Grammar per Databricks docs:
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-column-mask
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-row-filter

Fixes 1 corpus test failure (Databricks).
…uses

FORMAT was reserved as a column/table alias only for the ClickHouse `SELECT ...
FORMAT <fmt>` clause (also honored by GenericDialect); in other dialects it is a
normal identifier and implicit alias (e.g. `CASE ... END FORMAT`). PRIOR was
always parsed as the CONNECT BY hierarchical prefix operator, which rejected it
as a bare column name in a SELECT list.

FORMAT now parses as an alias for non-ClickHouse/Generic dialects (mirroring the
existing FINAL/TOP carve-outs); PRIOR falls back to an identifier when the
following tokens don't form an operand, while `CONNECT BY PRIOR ...` and the
ClickHouse FORMAT clause are unaffected.

Fixes 4 corpus test failures (Snowflake).
Completes BigQuery row-level security DDL: `DROP ROW ACCESS POLICY [IF EXISTS]
<name> ON <table>` and `DROP ALL ROW ACCESS POLICIES ON <table>` ->
Statement::DropRowAccessPolicy { if_exists, all, name, table_name }.

Dispatched (via maybe_parse) ahead of the generic DROP; the required `ON <table>`
makes it revert for the Snowflake `DROP ROW ACCESS POLICY <name>` form, which
still maps to the generic Drop with ObjectType::Policy.

Grammar per BigQuery docs:
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language
… toggles

PostgreSQL row-security DDL:
- CREATE POLICY <name> ON <table> [AS {PERMISSIVE|RESTRICTIVE}] [FOR <cmd>]
  [TO <role>,...] [USING (expr)] [WITH CHECK (expr)] -> Statement::CreatePostgresPolicy.
- ALTER POLICY ... (RENAME TO / TO / USING / WITH CHECK) -> AlterPostgresPolicy.
- DROP POLICY [IF EXISTS] <name> ON <table> [CASCADE|RESTRICT] -> DropPostgresPolicy.
- ALTER TABLE { ENABLE | DISABLE | FORCE | NO FORCE } ROW LEVEL SECURITY ->
  AlterTableOperation::RowLevelSecurity.

USING / WITH CHECK predicates are parsed as real Expr, so their column/table
references (including subqueries) stay visible to lineage.

Grammar per PostgreSQL docs:
https://www.postgresql.org/docs/current/sql-createpolicy.html
https://www.postgresql.org/docs/current/sql-altertable.html
SQL Server row-level security and dynamic data masking:
- CREATE/ALTER/DROP SECURITY POLICY with comma-separated
  { ADD | ALTER | DROP } { FILTER | BLOCK } PREDICATE <tvf>(<args>) ON <table>
  [<block_dml>] actions, plus WITH (STATE/SCHEMABINDING) and NOT FOR REPLICATION
  -> Statement::{Create,Alter,Drop}SecurityPolicy + SecurityPolicyPredicate.
- Column dynamic data masking `MASKED WITH (FUNCTION = '<mask>')` ->
  ColumnOption::MaskedWith.

Each predicate keeps its function name and target table for lineage.

Grammar per SQL Server docs:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-security-policy-transact-sql
https://learn.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking
Amazon Redshift row-level security and dynamic data masking:
- CREATE RLS POLICY <name> [WITH (cols) [[AS] alias]] USING (predicate);
  CREATE MASKING POLICY [IF NOT EXISTS] <name> WITH (cols) USING (expr,...) ->
  Statement::CreateRedshiftPolicy.
- ATTACH/DETACH { RLS | MASKING } POLICY ... ON <table>[,...] [(out_cols)]
  [USING (in_cols)] { TO | FROM } grantee [, ...] [PRIORITY n] ->
  Statement::AttachRedshiftPolicy (grantees: user / ROLE role / PUBLIC).
- DROP RLS POLICY [IF EXISTS] <name> [CASCADE|RESTRICT] ->
  Statement::DropRedshiftPolicy (MASKING reuses the generic DROP path).
- ALTER MASKING POLICY <name> USING (expr,...) -> AlterRedshiftMaskingPolicy.

The masking CREATE is dispatched via maybe_parse so other dialects' `MASKING
POLICY` shapes (e.g. ClickHouse `... ON ... UPDATE ...`) revert to the fallback.
USING predicates/expressions are real Expr nodes, preserving column refs.

Grammar per Redshift docs:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_RLS_POLICY.html
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_MASKING_POLICY.html
Completes the ALTER-time governance operations:
- Databricks ALTER TABLE DROP ROW FILTER; SET/UNSET TAGS ('k' = 'v', ...);
  ALTER COLUMN SET MASK <func> [USING COLUMNS (...)] / DROP MASK; ALTER COLUMN
  SET/UNSET TAGS. (SET ROW FILTER already reused SetTablePolicy.)
- SQL Server ALTER COLUMN ADD MASKED WITH (FUNCTION = '<mask>') / DROP MASKED.

New AlterTableOperation::{DropRowFilter, SetTags, UnsetTags} and
AlterColumnOperation::{SetMask, DropMask, AddMasked, DropMasked, SetTags,
UnsetTags}.

Grammar per docs:
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-alter-table
https://learn.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking
@lustefaniak lustefaniak merged commit 8afd4fc into main Jun 2, 2026
4 checks passed
@lustefaniak lustefaniak deleted the lukasz-capture-security-governance-policies branch June 2, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants