Skip to content

feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback#131

Merged
colbymchenry merged 3 commits into
colbymchenry:mainfrom
andreinknv:feat/search-fields-and-fuzzy
May 8, 2026
Merged

feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback#131
colbymchenry merged 3 commits into
colbymchenry:mainfrom
andreinknv:feat/search-fields-and-fuzzy

Conversation

@andreinknv

Copy link
Copy Markdown
Contributor

Summary

Two UX improvements that turn free-text search into something a user can drive precisely.

1. Field-qualified queries

A new query parser splits the raw query into structured filters and a free-text remainder:

kind:function name:auth path:src/api authenticate

becomes:

{ kinds: ['function'], nameFilters: ['auth'],
  pathFilters: ['src/api'], text: 'authenticate' }

Filters compose with the SearchOptions arg (intersection). Unknown prefixes pass through as plain text so query "TODO:" keeps working. Quoted values (path:"my dir") handle whitespace. When the user supplies only filters with no text, the search uses a filter-only candidate scan instead of bailing out.

Recognised fields:

Prefix Value
kind: any NodeKind value (function, method, class, ...)
lang: (alias language:) any Language value
path: case-insensitive substring of file_path
name: case-insensitive substring of node.name

2. Fuzzy typo fallback

When both FTS and LIKE return nothing AND the text is at least 3 chars, scan the distinct-name set with a bounded edit distance (≤2 for ≥5-char queries, ≤1 for 4-char). Bounded edit distance early-exits once the row min exceeds maxDist, so the per-query cost stays O(distinct-names × avg-name-length) with a very low constant.

Test plan

Verified live against ollama/ollama@v0.22.0:

Query Result
kind:function auth only function-kind hits
lang:go path:server route Go files under server/
getUssr (typo) finds getUser, SetUser
confg (typo) finds Config
  • npx vitest run380 passed
  • npx tsc --noEmit clean
  • npm run build succeeds

🤖 Generated with Claude Code

…zy typo fallback

Two UX improvements that turn a free-text search into something a
real user can drive precisely.

1) Field-qualified queries.

A new query parser (src/search/query-parser.ts) splits the raw query
into structured filters and a free-text remainder:

  kind:function name:auth path:src/api authenticate

becomes
  { kinds: ['function'], nameFilters: ['auth'],
    pathFilters: ['src/api'], text: 'authenticate' }

Filters compose with the SearchOptions arg (intersection). Unknown
prefixes pass through as plain text so `query "TODO:"` keeps working.
Quoted values (`path:"my dir"`) handle whitespace. When the user
specifies only filters with no text, the search uses a filter-only
candidate scan instead of bailing out.

Recognised today:
  kind:        any NodeKind value
  lang:        any Language value (alias: language:)
  path:        case-insensitive substring of file_path
  name:        case-insensitive substring of node.name

2) Fuzzy fallback.

When BOTH FTS and LIKE return nothing AND the text is at least 3
chars, the resolver scans the distinct-name set with a bounded
Damerau-Levenshtein-style edit distance (≤2 for ≥5 chars, ≤1 for
4-char queries, off for shorter). Bounded edit-distance early-exits
once the row min exceeds maxDist, so this stays O(distinct-names *
avg-name-length) with a very low constant.

Verified live against ollama/ollama@v0.22.0:
  query "kind:function auth"          → only function-kind hits
  query "lang:go path:server route"   → Go files under server/
  query "getUssr"   (typo)            → finds getUser, SetUser
  query "confg"     (typo)            → finds Config

Full test suite: 380 passed.
…fuzzy fan-out cap, larger filter-only over-fetch, unit tests

Five fixes from independent review:

- parseQuery tokenizer: quotes that appear MID-token (path:"my dir/
  file") were not being recognised — only quotes at the start of a
  token were treated as quoted spans. The fixture path:"my dir"
  parsed as ['path:"my', 'dir"'] instead of ['path:"my dir"'].
  Tokeniser is now a single state machine that scans into a token
  until whitespace OR a quote, and recognises quotes anywhere within
  the token (skips to the matching close quote).

- searchNodesFuzzy: cap the per-name follow-up SQL queries at
  Math.max(limit*2, 50) AFTER edit-distance filtering. Without
  this, a project with many similar names (getUser1, getUser2...)
  could fan out far beyond limit queries before the inner-loop
  break kicks in.

- searchAllByFilters (filter-only no-text path): bumped over-fetch
  multiplier from 2× to 5× so a selective post-filter (e.g.
  path:src/very/specific/file.ts) doesn't return fewer than limit
  results despite the DB having matches.

- 23 new unit tests in __tests__/search-query-parser.test.ts:
  parseQuery covers known-field filter, lang/language alias,
  multiple kind: ORs, quoted spans (incl. mid-token), URL
  passthrough, empty-value passthrough, unknown prefix passthrough,
  unknown value passthrough, all-filters-no-text, empty input,
  20k-char input. boundedEditDistance covers identity, single
  insertion/deletion/substitution, length-difference shortcut,
  empty inputs, case-sensitivity, early-exit correctness.

Full test suite: 853 passed (up from 830).
Convert NodeKind and Language to runtime-iterable as const arrays
(NODE_KINDS, LANGUAGES) so the query parser imports the canonical
list instead of duplicating it. Also fix the path: JSDoc to say
substring (matches the .includes() impl).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@colbymchenry

Copy link
Copy Markdown
Owner

Reviewed and merging. Pushed a small polish commit:

  • Derived KIND_VALUES / LANGUAGE_VALUES from new NODE_KINDS / LANGUAGES as const arrays in types.ts so the parser stays in sync if a new kind or language gets added (e.g. lang:unknown now works because the type already had it).
  • Fixed the path: JSDoc — it claimed prefix match but the implementation is substring.

The field-qualified syntax and the bounded-edit-distance fuzzy fallback are both clean, well-tested wins. Thanks for the contribution.

@colbymchenry colbymchenry merged commit 56f6b3b into colbymchenry:main May 8, 2026
jorgerobles pushed a commit to jorgerobles/codegraph that referenced this pull request Jun 1, 2026
…zy typo fallback (colbymchenry#131)

* feat(search): field-qualified queries (kind:/lang:/path:/name:) + fuzzy typo fallback

Two UX improvements that turn a free-text search into something a
real user can drive precisely.

1) Field-qualified queries.

A new query parser (src/search/query-parser.ts) splits the raw query
into structured filters and a free-text remainder:

  kind:function name:auth path:src/api authenticate

becomes
  { kinds: ['function'], nameFilters: ['auth'],
    pathFilters: ['src/api'], text: 'authenticate' }

Filters compose with the SearchOptions arg (intersection). Unknown
prefixes pass through as plain text so `query "TODO:"` keeps working.
Quoted values (`path:"my dir"`) handle whitespace. When the user
specifies only filters with no text, the search uses a filter-only
candidate scan instead of bailing out.

Recognised today:
  kind:        any NodeKind value
  lang:        any Language value (alias: language:)
  path:        case-insensitive substring of file_path
  name:        case-insensitive substring of node.name

2) Fuzzy fallback.

When BOTH FTS and LIKE return nothing AND the text is at least 3
chars, the resolver scans the distinct-name set with a bounded
Damerau-Levenshtein-style edit distance (≤2 for ≥5 chars, ≤1 for
4-char queries, off for shorter). Bounded edit-distance early-exits
once the row min exceeds maxDist, so this stays O(distinct-names *
avg-name-length) with a very low constant.

Verified live against ollama/ollama@v0.22.0:
  query "kind:function auth"          → only function-kind hits
  query "lang:go path:server route"   → Go files under server/
  query "getUssr"   (typo)            → finds getUser, SetUser
  query "confg"     (typo)            → finds Config

Full test suite: 380 passed.

* fix(search): address reviewer findings — tokenizer mid-token quotes, fuzzy fan-out cap, larger filter-only over-fetch, unit tests

Five fixes from independent review:

- parseQuery tokenizer: quotes that appear MID-token (path:"my dir/
  file") were not being recognised — only quotes at the start of a
  token were treated as quoted spans. The fixture path:"my dir"
  parsed as ['path:"my', 'dir"'] instead of ['path:"my dir"'].
  Tokeniser is now a single state machine that scans into a token
  until whitespace OR a quote, and recognises quotes anywhere within
  the token (skips to the matching close quote).

- searchNodesFuzzy: cap the per-name follow-up SQL queries at
  Math.max(limit*2, 50) AFTER edit-distance filtering. Without
  this, a project with many similar names (getUser1, getUser2...)
  could fan out far beyond limit queries before the inner-loop
  break kicks in.

- searchAllByFilters (filter-only no-text path): bumped over-fetch
  multiplier from 2× to 5× so a selective post-filter (e.g.
  path:src/very/specific/file.ts) doesn't return fewer than limit
  results despite the DB having matches.

- 23 new unit tests in __tests__/search-query-parser.test.ts:
  parseQuery covers known-field filter, lang/language alias,
  multiple kind: ORs, quoted spans (incl. mid-token), URL
  passthrough, empty-value passthrough, unknown prefix passthrough,
  unknown value passthrough, all-filters-no-text, empty input,
  20k-char input. boundedEditDistance covers identity, single
  insertion/deletion/substitution, length-difference shortcut,
  empty inputs, case-sensitivity, early-exit correctness.

Full test suite: 853 passed (up from 830).

* refactor(search): derive parser kind/lang sets from types.ts as const

Convert NodeKind and Language to runtime-iterable as const arrays
(NODE_KINDS, LANGUAGES) so the query parser imports the canonical
list instead of duplicating it. Also fix the path: JSDoc to say
substring (matches the .includes() impl).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Colby McHenry <me@colbymchenry.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants