feat: capture SQL security/governance policy applications and definitions#87
Merged
Merged
Conversation
Snowflake CREATE TABLE may attach security/governance policies at the table
level: ROW ACCESS, AGGREGATION, JOIN, and (dynamic tables) STORAGE LIFECYCLE.
These were parsed and discarded; this captures them in a new TablePolicy
{kind, with, policy_name, columns} on CreateTable so column-level lineage can
surface which policy guards a table and over which columns.
Each kind selects its column-list keyword: ROW ACCESS / STORAGE LIFECYCLE use
ON (cols), AGGREGATION uses ENTITY KEY (cols), JOIN uses ALLOWED JOIN KEYS
(cols); all column lists are optional (GET_DDL omits ON when the caller lacks
privilege). The optional WITH prefix round-trips.
Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-table
https://docs.snowflake.com/en/sql-reference/sql/create-aggregation-policy
https://docs.snowflake.com/en/sql-reference/sql/create-join-policy
https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table
Fixes 3 corpus test failures (Snowflake).
Snowflake CREATE TABLE may attach governance tags at the table level via
[WITH] TAG (tag_name = 'value', ...). These were consumed and discarded; this
captures them in a new Tag {name, value} list (table_tags on CreateTable) so
lineage can surface which tags are applied to a table.
Tag names may be qualified (db.schema.tag); values are string literals. Both
the WITH-prefixed and bare forms parse and normalize to the canonical
`WITH TAG (k = 'v', ...)` on round-trip. Removed the redundant pre-loop tag
discarder that previously swallowed a leading WITH TAG before the clause loop.
Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-table
Column definitions in Snowflake CREATE TABLE may carry [WITH] TAG (k = 'v', ...) governance tags. These were consumed and discarded; this captures them in a new `tags: Vec<Tag>` field on ColumnDef so lineage can surface per-column tagging. Both the WITH-prefixed and bare forms parse and normalize to the canonical `WITH TAG (...)` rendered after the column options/policy. All existing ColumnDef constructions across the test suite gain the new (empty) field. Grammar per Snowflake docs: https://docs.snowflake.com/en/sql-reference/sql/create-table
Databricks attaches row filters and column masks via plain scalar UDFs:
CREATE TABLE ... WITH ROW FILTER <func> ON (cols)
<col> <type> MASK <func> [USING COLUMNS (<col>|<literal>, ...)]
ROW FILTER is captured as a new TablePolicyKind::RowFilter (uniform with the
Snowflake table policies; ON-list holds the function arguments). Column MASK is
upgraded from a bare ObjectName to a ColumnMask {function, using_columns} so the
USING COLUMNS arguments (other column names and/or constant literals) are
preserved for lineage — previously USING COLUMNS failed to parse.
Grammar per Databricks docs:
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-row-filter
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-column-mask
Fixes 2 corpus test failures (Databricks).
…TION/PROJECTION/JOIN)
Snowflake security/governance policy definitions share one shape:
CREATE [OR REPLACE] <KIND> POLICY [IF NOT EXISTS] <name>
AS ( [<arg> <type>, ...] ) RETURNS <type> -> <body>
[COMMENT = '...'] [EXEMPT_OTHER_POLICIES = { TRUE | FALSE }]
These previously fell through the generic CREATE skip-until-semicolon fallback,
discarding the masking/row-access condition. A new Statement::CreatePolicy
variant captures kind + name + typed signature + RETURNS type + the `-> body`
expression + trailing options. Parsing the body as a real Expr keeps any
subqueries/table references (e.g. an EXISTS lookup) visible to lineage.
Dispatched via maybe_parse before the generic fallback; the `AS` check makes it
revert for non-Snowflake shapes (BigQuery's `ROW ACCESS POLICY ... ON <table>`),
so those still fall back unchanged pending dedicated handling.
Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/create-masking-policy
https://docs.snowflake.com/en/sql-reference/sql/create-row-access-policy
https://docs.snowflake.com/en/sql-reference/sql/create-aggregation-policy
https://docs.snowflake.com/en/sql-reference/sql/create-projection-policy
https://docs.snowflake.com/en/sql-reference/sql/create-join-policy
Snowflake tag definitions previously fell through the generic CREATE fallback. A new Statement::CreateTag captures the tag name, the optional ALLOWED_VALUES string list, and trailing key=value options (COMMENT, PROPAGATE, ON_CONFLICT), so tag objects are represented in the AST alongside their applications. Grammar per Snowflake docs: https://docs.snowflake.com/en/sql-reference/sql/create-tag
BigQuery row-level security previously fell through the generic CREATE fallback, discarding the target table and the filter predicate. A new Statement::CreateRowAccessPolicy captures the policy name, the `ON <table>` target, the optional `GRANT TO (...)` principal list, and the `FILTER USING (<predicate>)` expression. Parsing the predicate as a real Expr keeps any subquery table references (e.g. a lookup-table IN-subquery) visible to lineage. Dispatched after the Snowflake policy attempt (whose `AS` check reverts for this `... ON <table>` shape), so Snowflake definitions are unaffected. Grammar per BigQuery docs: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language
Corpus Parsing ReportTotal: 191317 passed, 1932 failed (99.0% pass rate) ✨ No changes in test results By Dialect
|
… list Snowflake conditional masking places the masking policy's USING (col, cond_col, ...) clause *after* the column-list parentheses (per the column-security guide), e.g. `CREATE TABLE t (email STRING MASKING POLICY p) USING (email, visibility)`. This previously errored because the trailing `USING (` collided with the Databricks `USING <format>` / Snowflake `USING TEMPLATE` handling. For CREATE TABLE the columns are attached to the preceding column's masking policy (canonical form carries USING inline), preserving them for lineage; for CREATE VIEW the clause is consumed (view column policies aren't represented). Grammar per Snowflake docs: https://docs.snowflake.com/en/user-guide/security-column-intro Fixes 3 corpus test failures (Snowflake).
…y/tag
Adds the ALTER-time governance operations and policy/tag drops that previously
errored or hit the generic fallback:
- ALTER TABLE SET { AGGREGATION | JOIN } POLICY <name> [ENTITY KEY (..) |
ALLOWED JOIN KEYS (..)] [FORCE] -> AlterTableOperation::SetTablePolicy
(reuses TablePolicy).
- ALTER TABLE UNSET { AGGREGATION | JOIN | ROW ACCESS } POLICY ->
AlterTableOperation::UnsetTablePolicy; UNSET TAG <name> [, ...] -> UnsetTag.
- DROP { MASKING | ROW ACCESS | AGGREGATION | PROJECTION | JOIN } POLICY and
DROP TAG -> Statement::Drop with new ObjectType::Policy(kind) / ObjectType::Tag.
Grammar per Snowflake docs:
https://docs.snowflake.com/en/sql-reference/sql/alter-table
Fixes 1 corpus test failure (Snowflake).
…ized) views Databricks (materialized) view column lists may carry a column mask after the type (`<col> <type> MASK <func> [USING COLUMNS (...)]`) and a table-level `WITH ROW FILTER <func> ON (cols)`. These now parse; view-level policies aren't represented in the AST (consistent with other view column policies), so they're consumed while the AS query — and its lineage — is preserved. Grammar per Databricks docs: https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-column-mask https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-row-filter Fixes 1 corpus test failure (Databricks).
…uses FORMAT was reserved as a column/table alias only for the ClickHouse `SELECT ... FORMAT <fmt>` clause (also honored by GenericDialect); in other dialects it is a normal identifier and implicit alias (e.g. `CASE ... END FORMAT`). PRIOR was always parsed as the CONNECT BY hierarchical prefix operator, which rejected it as a bare column name in a SELECT list. FORMAT now parses as an alias for non-ClickHouse/Generic dialects (mirroring the existing FINAL/TOP carve-outs); PRIOR falls back to an identifier when the following tokens don't form an operand, while `CONNECT BY PRIOR ...` and the ClickHouse FORMAT clause are unaffected. Fixes 4 corpus test failures (Snowflake).
Completes BigQuery row-level security DDL: `DROP ROW ACCESS POLICY [IF EXISTS]
<name> ON <table>` and `DROP ALL ROW ACCESS POLICIES ON <table>` ->
Statement::DropRowAccessPolicy { if_exists, all, name, table_name }.
Dispatched (via maybe_parse) ahead of the generic DROP; the required `ON <table>`
makes it revert for the Snowflake `DROP ROW ACCESS POLICY <name>` form, which
still maps to the generic Drop with ObjectType::Policy.
Grammar per BigQuery docs:
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language
… toggles
PostgreSQL row-security DDL:
- CREATE POLICY <name> ON <table> [AS {PERMISSIVE|RESTRICTIVE}] [FOR <cmd>]
[TO <role>,...] [USING (expr)] [WITH CHECK (expr)] -> Statement::CreatePostgresPolicy.
- ALTER POLICY ... (RENAME TO / TO / USING / WITH CHECK) -> AlterPostgresPolicy.
- DROP POLICY [IF EXISTS] <name> ON <table> [CASCADE|RESTRICT] -> DropPostgresPolicy.
- ALTER TABLE { ENABLE | DISABLE | FORCE | NO FORCE } ROW LEVEL SECURITY ->
AlterTableOperation::RowLevelSecurity.
USING / WITH CHECK predicates are parsed as real Expr, so their column/table
references (including subqueries) stay visible to lineage.
Grammar per PostgreSQL docs:
https://www.postgresql.org/docs/current/sql-createpolicy.html
https://www.postgresql.org/docs/current/sql-altertable.html
SQL Server row-level security and dynamic data masking:
- CREATE/ALTER/DROP SECURITY POLICY with comma-separated
{ ADD | ALTER | DROP } { FILTER | BLOCK } PREDICATE <tvf>(<args>) ON <table>
[<block_dml>] actions, plus WITH (STATE/SCHEMABINDING) and NOT FOR REPLICATION
-> Statement::{Create,Alter,Drop}SecurityPolicy + SecurityPolicyPredicate.
- Column dynamic data masking `MASKED WITH (FUNCTION = '<mask>')` ->
ColumnOption::MaskedWith.
Each predicate keeps its function name and target table for lineage.
Grammar per SQL Server docs:
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-security-policy-transact-sql
https://learn.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking
Amazon Redshift row-level security and dynamic data masking:
- CREATE RLS POLICY <name> [WITH (cols) [[AS] alias]] USING (predicate);
CREATE MASKING POLICY [IF NOT EXISTS] <name> WITH (cols) USING (expr,...) ->
Statement::CreateRedshiftPolicy.
- ATTACH/DETACH { RLS | MASKING } POLICY ... ON <table>[,...] [(out_cols)]
[USING (in_cols)] { TO | FROM } grantee [, ...] [PRIORITY n] ->
Statement::AttachRedshiftPolicy (grantees: user / ROLE role / PUBLIC).
- DROP RLS POLICY [IF EXISTS] <name> [CASCADE|RESTRICT] ->
Statement::DropRedshiftPolicy (MASKING reuses the generic DROP path).
- ALTER MASKING POLICY <name> USING (expr,...) -> AlterRedshiftMaskingPolicy.
The masking CREATE is dispatched via maybe_parse so other dialects' `MASKING
POLICY` shapes (e.g. ClickHouse `... ON ... UPDATE ...`) revert to the fallback.
USING predicates/expressions are real Expr nodes, preserving column refs.
Grammar per Redshift docs:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_RLS_POLICY.html
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_MASKING_POLICY.html
Completes the ALTER-time governance operations:
- Databricks ALTER TABLE DROP ROW FILTER; SET/UNSET TAGS ('k' = 'v', ...);
ALTER COLUMN SET MASK <func> [USING COLUMNS (...)] / DROP MASK; ALTER COLUMN
SET/UNSET TAGS. (SET ROW FILTER already reused SetTablePolicy.)
- SQL Server ALTER COLUMN ADD MASKED WITH (FUNCTION = '<mask>') / DROP MASKED.
New AlterTableOperation::{DropRowFilter, SetTags, UnsetTags} and
AlterColumnOperation::{SetMask, DropMask, AddMasked, DropMasked, SetTags,
UnsetTags}.
Grammar per docs:
https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-ddl-alter-table
https://learn.microsoft.com/en-us/sql/relational-databases/security/dynamic-data-masking
iamjasinski
approved these changes
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds parser/AST support for capturing security & governance policies across all supported dialects — which masking/access/governance policies are applied (and to which columns), plus the policy definitions themselves. Previously these clauses were parsed-and-discarded or skipped by the generic
CREATE/ALTER/DROPfallbacks, losing both the policy references and the lineage-bearing condition expressions.Applications (CREATE TABLE / views)
ROW ACCESS/AGGREGATION/JOIN/STORAGE LIFECYCLEpolicies →TablePolicy; table- and column-levelTAG (...)→Tag; conditional masking(... MASKING POLICY p) USING (cols).WITH ROW FILTER f ON (cols)→TablePolicyKind::RowFilter; columnMASK f USING COLUMNS (...)→ColumnMask; masks/filters in (materialized) views.Policy definitions (bodies parsed as real
Expr, so subqueries/table refs stay visible)CREATE [OR REPLACE] {MASKING|ROW ACCESS|AGGREGATION|PROJECTION|JOIN} POLICY ... AS (sig) RETURNS type -> body→Statement::CreatePolicy;CREATE TAG→Statement::CreateTag.CREATE/DROP [ALL] ROW ACCESS POLICY ... ON <table> [GRANT TO (...)] FILTER USING (...).CREATE/ALTER/DROP POLICY ... ON <table>(AS PERMISSIVE/RESTRICTIVE, FOR, TO, USING, WITH CHECK).CREATE/ALTER/DROP SECURITY POLICYwith ADD/ALTER/DROP FILTER/BLOCK PREDICATE actions; columnMASKED WITH (FUNCTION=...).CREATE RLS POLICY/CREATE MASKING POLICY+ATTACH/DETACH/DROP/ALTER MASKING POLICY.ALTER / DROP / toggles
ALTER TABLE SET/UNSET {AGGREGATION|JOIN} POLICY,UNSET TAG;DROP {kind} POLICY/DROP TAG.ALTER TABLE SET/DROP ROW FILTER,SET/UNSET TAGS,ALTER COLUMN SET/DROP MASK,ALTER COLUMN SET/UNSET TAGS.ALTER COLUMN ADD/DROP MASKED.ALTER TABLE { ENABLE | DISABLE | FORCE | NO FORCE } ROW LEVEL SECURITY.Every new construct is justified against the vendor grammar (doc links in each commit body). All forms round-trip via
Display. Dialect-overlapping shapes (e.g.MASKING POLICYacross Snowflake/Redshift/ClickHouse,ROW ACCESS POLICYacross Snowflake/BigQuery) are disambiguated withmaybe_parse/lookahead so each dialect's form is parsed and the others fall through unchanged.Validation
origin/main; ~14 previously-failing real files now parse (Snowflake policy applications + conditional masking + ALTER/DROP, Databricks ROW FILTER/MASK incl. materialized views, 4 customer views via the FORMAT/PRIOR keyword fix). Definition files that previously only "parsed" via the skip-fallback now produce real AST.Also fixes two general keyword-handling bugs surfaced by the corpus:
FORMAT(reserved only for the ClickHouse FORMAT clause) andPRIOR(the CONNECT BY prefix operator) are now accepted as ordinary identifiers in other dialects.