Semantic Index Architecture
The semantic index is a project-wide code analysis engine built into @codepol/core. It extracts language-agnostic semantic information from source files using tree-sitter and exposes it through the ProjectIndex query API for plugin rules.
Overview
The semantic index provides:
- Symbol extraction -- functions, classes, variables, types, interfaces, enums, and their attributes
- Scope trees -- lexical/semantic boundaries for name resolution
- Cross-file resolution -- import/export binding, re-export chains, namespace members
- Call graph -- heuristic caller/callee detection
- Module graph -- dependency order, cycle detection, entry points
- Type relations -- extends/implements hierarchy with cross-file resolution
- Control flow graphs -- per-function CFGs with cyclomatic complexity
Data Flow
- Parse -- each source file is parsed into a concrete syntax tree by tree-sitter (WASM grammars, no native deps)
- Adapter extraction -- a language adapter runs query packs against the tree to extract symbols, scopes, relations, and CFGs into a
FileIndexDelta - Store -- deltas are merged into the
IndexStore, the central mutable data store - Cross-file resolution -- after all files are indexed,
crossFileResolvelinks import bindings to their source exports, resolves namespace members, updates type relations, and resolves module specifiers - ProjectIndex -- a read-only query facade over the store, exposed to plugin rules
Component Architecture
| Component | File | Purpose |
|---|---|---|
projectIndexBuild | indexBuilder.ts | Orchestrates per-file indexing, cross-file resolution, and returns ProjectIndex |
crossFileResolve | indexBuilder.ts | Links imports to exports, resolves namespace members, updates type relations |
IndexStore | indexStore.ts | Mutable store of all symbols, scopes, and relations with indexed lookups |
ModuleGraph | moduleGraph.ts | Dependency graph with topological sort (Kahn's) and cycle detection (Tarjan's SCC) |
ProjectIndex | indexQuery.ts | Read-only query API exposed to plugins |
IndexAdapter | adapterTypes.ts | Language-specific extraction (tree-sitter queries + kind mappings) |
moduleResolve | moduleResolver.ts | Node-style module specifier resolution with path alias support |
Index Build Pipeline
projectIndexBuild(options) executes these steps:
Cross-File Resolution Steps
- Export map -- build
Map<filePath, Map<exportedName, SymbolId>>from allExportsRelationentries - Re-export propagation -- follow
sourceModulechains iteratively until stable (handlesexport * from,export { foo } from,export * as ns from) - ImportBinding resolution -- match each
ImportBindingRelationto its source export via module resolution - Reference update -- update
ReferencesRelation.resolvedSymbolIdfor references that resolved to import binding symbols - Namespace member resolution -- resolve dotted references like
utils.alphaagainst the namespace's module export map - TypeRelation resolution -- update
TypeRelation.resolvedTargetIdfrom local import binding to actual exported symbol - ImportsRelation resolution -- set
resolvedModulePathon side-effect and dynamic imports for module graph edges
Data Model
Core Records
SymbolKind: module, namespace, class, interface, type, function, method, variable, const, field, parameter, enum, enumMember
ScopeKind: file, module, type, function, block, class, namespace
SymbolFlags (bitset): Exported, Async, Generator, Static, Abstract, Readonly, Optional, Private, Protected, Public
Relations
Relations are append-only facts extracted by adapters and refined during cross-file resolution.
| Relation | Purpose | Key Fields |
|---|---|---|
DefinesRelation | Scope declares a symbol | scopeId, symbolId |
ContainsRelation | Scope contains child scope | scopeId, childScopeId |
ReferencesRelation | Identifier refers to a symbol | name, byteRange, resolvedSymbolId? |
CallsRelation | Call expression in a scope | calleeName, byteRange, resolvedSymbolId? |
ImportsRelation | Scope imports from module specifier | spec, resolvedModulePath? |
ImportBindingRelation | Links imported name to source module | localSymbolId, importedName, moduleSpec, resolvedExportId?, isDefault, isNamespace |
ExportsRelation | Symbol exported from module | symbolId, exportedName, isDefault, sourceModule?, sourceName? |
TypeRelation | Extends/implements hierarchy edge | symbolId, targetName, relationKind, resolvedTargetId? |
Control Flow Graph
Each function/method scope gets a FlowGraph with:
- FlowNode kinds:
entry,exit,statement,branch,merge,loop,return,throw - FlowEdge labels:
true,false,loop-back,unconditional,break,continue,case,default,exception,finally - Cyclomatic complexity: V(G) = E - N + 2
Adapter Architecture
Language adapters are the bridge between tree-sitter parse trees and the language-agnostic data model.
Each adapter provides:
- QueryPack -- tree-sitter S-expression patterns with named captures
- Kind mappings -- map capture suffixes / node types to canonical
SymbolKindandScopeKind - Capture names -- standard convention (
@scope,@name,@decl.*,@ref.*,@callee.*, etc.) - Reference filter -- post-filter function to remove declaration sites, property keys, etc.
See Creating Language Adapters for a step-by-step guide.
Built-in Language Support
| Language | Adapter | Query Packs | Type Relations |
|---|---|---|---|
TypeScript (.ts, .mts, .cts) | typescriptConfigCreate | scopes, symbols, refs, calls, imports, exports, typeRelations | Yes |
TSX (.tsx) | typescriptConfigCreate | same as TypeScript | Yes |
JavaScript (.js, .mjs, .cjs, .jsx) | Uses TS/TSX adapter | same as TypeScript | Yes |
Python (.py, .pyw) | pythonConfigCreate | scopes, symbols, refs, calls, imports, exports | No |
Known Limitations
These are intentional design constraints, not bugs:
- No AST exposure -- the index contains semantic primitives, not syntax nodes. Plugins never see tree-sitter trees.
- Best-effort resolution -- unresolved references are valid results (returned with
resolvedSymbolId: undefined). - No type inference -- tree-sitter alone cannot do type analysis.
TypeOfrelations are not supported. - Heuristic call detection -- may miss indirect calls (callbacks, dynamic dispatch) and may report false positives.
- Single-threaded indexing -- files are indexed sequentially. Could be parallelized per-file in the future.
- In-memory only -- no disk persistence. Large projects re-index on every run.
Related Documentation
- ProjectIndex API Reference -- full API documentation for all query methods
- Creating Language Adapters -- guide for adding new language support
- Cross-File Analysis Rules -- examples of plugin rules using the semantic index
- Creating Custom Plugins -- general plugin authoring guide