PDG Query Language (Cypher)
The Program Dependence Graph (PDG) Query Language uses Cypher, a declarative graph query language, to query dependencies, perform slicing, and verify security policies on program dependence graphs.
Overview
Cypher provides a declarative way to:
Query program dependencies (data and control)
Perform forward and backward slicing
Check information flow properties
Verify security policies
Find shortest paths between program elements
Analyze parameter dependencies
The language supports:
Pattern matching (nodes and relationships)
WHERE clauses for filtering
Path queries with variable-length patterns
Set operations (UNION, intersection via WHERE EXISTS)
Aggregations and ordering
Getting Started
Building the PDG
First, compile your program and build the PDG:
# Compile to LLVM IR
clang -emit-llvm -g -c program.c -o program.bc
# Run PDG query tool
./build/bin/pdg-query program.bc
Interactive Mode
Use -i flag for interactive queries:
./build/bin/pdg-query -i program.bc
PDG> MATCH (n:FUNC_ENTRY) WHERE n.name = 'main' RETURN n
[Results displayed]
PDG> MATCH (input:FUNC_ENTRY)-[:PARAMETER_OUT]->(ret:INST_RET)-[*]->(n)
PDG> WHERE input.name = 'getInput'
PDG> RETURN n
[Slice displayed]
PDG> exit
Single Query Mode
Use -q flag for single query:
./build/bin/pdg-query -q "MATCH (n:FUNC_ENTRY) WHERE n.name = 'main' RETURN n" program.bc
Batch Query Mode
Use -f flag to run queries from file:
./build/bin/pdg-query -f queries.txt program.bc
Language Reference
Basic Node Queries
Get All Nodes
Returns all nodes in the PDG.
Syntax: MATCH (n) RETURN n
Example:
MATCH (n) RETURN n
Returns: All nodes in the PDG.
Query Nodes by Type
Select nodes by their label (type).
Syntax: MATCH (n:LABEL) RETURN n
Node Types (Labels):
INST_FUNCALL- Function call instructionsINST_RET- Return instructionsINST_BR- Branch instructionsINST_LOAD- Load instructionsINST_STORE- Store instructionsINST_ALLOCA- Alloca instructionsINST_GEP- GetElementPtr instructionsINST_OTHER- Other instructionsFUNC_ENTRY- Function entry pointsPARAM_FORMALIN- Formal input parametersPARAM_FORMALOUT- Formal output parameters (return values)PARAM_ACTUALIN- Actual input parametersPARAM_ACTUALOUT- Actual output parametersGLOBAL- Global variablesFUNC- Function nodesCLASS- Class nodes
Example:
MATCH (n:INST_FUNCALL) RETURN n
MATCH (n:INST_BR) RETURN n
Function-Specific Queries
Return Values of a Function
Returns the return values of a function.
Syntax: MATCH (n:FUNC_ENTRY)-[:PARAMETER_OUT]->(ret:INST_RET) WHERE n.name = 'functionName' RETURN ret
Example:
MATCH (n:FUNC_ENTRY)-[:PARAMETER_OUT]->(ret:INST_RET)
WHERE n.name = 'main'
RETURN ret
MATCH (n:FUNC_ENTRY)-[:PARAMETER_OUT]->(ret:INST_RET)
WHERE n.name = 'getInput'
RETURN ret
Returns: Nodes representing return values of the specified function.
Formal Parameters of a Function
Returns the formal parameters of a function.
Syntax: MATCH (n:FUNC_ENTRY)-[:PARAMETER_IN]->(param:PARAM_FORMALIN) WHERE n.name = 'functionName' RETURN param
Example:
MATCH (n:FUNC_ENTRY)-[:PARAMETER_IN]->(param:PARAM_FORMALIN)
WHERE n.name = 'process_data'
RETURN param
Returns: Nodes representing formal input parameters.
Function Entry Points
Returns the entry points of a function.
Syntax: MATCH (n:FUNC_ENTRY) WHERE n.name = 'functionName' RETURN n
Example:
MATCH (n:FUNC_ENTRY)
WHERE n.name = 'main'
RETURN n
Returns: Function entry nodes.
Edge Queries
Query Edges by Type
Select edges by their relationship type.
Syntax: MATCH ()-[r:TYPE]->() RETURN r
Edge Types (Relationship Types):
DATA_DEF_USE- Data definition-use edgesDATA_RAW- Read-after-write edgesDATA_READ- Read edgesDATA_WRITE- Write edgesDATA_ALIAS- Alias edgesCONTROL_DEP- Control dependency edgesCONTROLDEP_BR- Control dependency from branchesCONTROLDEP_ENTRY- Control dependency from entryPARAMETER_IN- Parameter input edgesPARAMETER_OUT- Parameter output edgesPARAMETER_FIELD- Field parameter edgesCALL_DEP- Call dependency edgesGLOBAL_DEP- Global dependency edges
Example:
MATCH ()-[r:DATA_DEF_USE]->() RETURN r
MATCH ()-[r:CONTROLDEP_BR]->() RETURN r
Connected Nodes
Get nodes connected by edges.
Syntax: MATCH (a)-[r]->(b) RETURN a, b
Example:
MATCH (a)-[r:DATA_DEF_USE]->(b) RETURN a, b
MATCH (a)-[r:CONTROLDEP_BR]->(b) RETURN a, b
Slicing Operations
Forward Slice
Compute forward slice from given nodes (all nodes reachable from source).
Syntax: MATCH (source)-[*]->(n) RETURN DISTINCT n
Example:
MATCH (input:FUNC_ENTRY)-[:PARAMETER_OUT]->(inputRet:INST_RET)
WHERE input.name = 'getInput'
MATCH (inputRet)-[*]->(n)
RETURN DISTINCT n
Returns: All nodes reachable from the input nodes.
Use Case: Find what program elements are affected by a source.
Backward Slice
Compute backward slice from given nodes (all nodes that can reach the sink).
Syntax: MATCH (n)-[*]->(sink) RETURN DISTINCT n
Example:
MATCH (output:FUNC_ENTRY)-[:PARAMETER_IN]->(outputParam:PARAM_FORMALIN)
WHERE output.name = 'printOutput'
MATCH (n)-[*]->(outputParam)
RETURN DISTINCT n
Returns: All nodes that can reach the input nodes.
Use Case: Find what program elements influence a sink.
Path Between Nodes
Find nodes on paths between two sets of nodes.
Syntax: MATCH path = (start)-[*]->(end) RETURN path
Example:
MATCH (input:FUNC_ENTRY)-[:PARAMETER_OUT]->(inputRet:INST_RET)
WHERE input.name = 'getInput'
MATCH (output:FUNC_ENTRY)-[:PARAMETER_IN]->(outputParam:PARAM_FORMALIN)
WHERE output.name = 'system'
MATCH path = (inputRet)-[*]->(outputParam)
RETURN path
Returns: Paths from sources to sinks.
Use Case: Information flow analysis.
Shortest Path
Find shortest path between two sets of nodes.
Syntax: MATCH path = shortestPath((start)-[*]->(end)) RETURN path
Example:
MATCH (malloc:FUNC_ENTRY)-[:PARAMETER_OUT]->(mallocRet:INST_RET)
WHERE malloc.name = 'malloc'
MATCH (free:FUNC_ENTRY)-[:PARAMETER_IN]->(freeParam:PARAM_FORMALIN)
WHERE free.name = 'free'
MATCH path = shortestPath((mallocRet)-[*]->(freeParam))
RETURN path
Returns: Shortest path between nodes.
Set Operations
Union
Union of two sets using UNION keyword.
Syntax: MATCH ... RETURN ... UNION MATCH ... RETURN ...
Example:
MATCH (func1:FUNC_ENTRY)-[:PARAMETER_OUT]->(ret1:INST_RET)
WHERE func1.name = 'func1'
RETURN ret1
UNION
MATCH (func2:FUNC_ENTRY)-[:PARAMETER_OUT]->(ret2:INST_RET)
WHERE func2.name = 'func2'
RETURN ret2
Returns: All nodes in either set.
Intersection
Intersection using WHERE EXISTS subquery.
Syntax: MATCH (n) WHERE EXISTS { MATCH (m) WHERE ... } RETURN n
Example:
MATCH (dataNodes:INST_FUNCALL)
WHERE EXISTS {
MATCH (controlNodes:INST_BR)
WHERE dataNodes.id = controlNodes.id
}
RETURN dataNodes
Returns: Nodes in both sets.
Use Case: Find nodes that are both data and control dependent.
Difference
Set difference using WHERE NOT.
Syntax: MATCH (n) WHERE NOT (condition) RETURN n
Example:
MATCH (calls:INST_FUNCALL)
WHERE NOT (calls:INST_BR)
RETURN calls
Returns: Nodes in first set but not matching condition.
Filtering with WHERE
Property Filtering
Filter nodes by properties.
Syntax: MATCH (n) WHERE n.property = value RETURN n
Example:
MATCH (n:FUNC_ENTRY)
WHERE n.name = 'main'
RETURN n
MATCH (n)
WHERE n.line > 100 AND n.line < 200
RETURN n
Multiple Conditions
Combine conditions with AND, OR, NOT.
Syntax: MATCH (n) WHERE condition1 AND condition2 RETURN n
Example:
MATCH (n:INST_FUNCALL)
WHERE n.name = 'malloc' OR n.name = 'calloc'
RETURN n
Security Analysis Examples
Information Flow Analysis
Check if secret data flows to network:
MATCH (secret:FUNC_ENTRY)-[:PARAMETER_OUT]->(secretRet:INST_RET)
WHERE secret.name = 'getPassword'
MATCH (network:FUNC_ENTRY)-[:PARAMETER_IN]->(networkParam:PARAM_FORMALIN)
WHERE network.name = 'sendToNetwork'
MATCH path = (secretRet)-[*]->(networkParam)
RETURN path
If result is non-empty, there’s a potential information leak.
Sanitization Verification
Verify all user inputs are sanitized before use in SQL:
MATCH (sources:FUNC_ENTRY)-[:PARAMETER_OUT]->(sourceRet:INST_RET)
WHERE sources.name = 'getUserInput'
MATCH (sanitizers:FUNC_ENTRY)-[:PARAMETER_OUT]->(sanitizerRet:INST_RET)
WHERE sanitizers.name = 'sanitizeSQL'
MATCH (sinks:FUNC_ENTRY)-[:PARAMETER_IN]->(sinkParam:PARAM_FORMALIN)
WHERE sinks.name = 'executeSQL'
MATCH path = (sourceRet)-[*]->(sinkParam)
WHERE NOT EXISTS {
MATCH (sourceRet)-[*]->(sanitizerRet)-[*]->(sinkParam)
}
RETURN path
If result is empty, all flows are sanitized.
Access Control Policy
Verify sensitive file operations require authorization:
MATCH (auth:FUNC_ENTRY)-[:PARAMETER_OUT]->(authRet:INST_RET)
WHERE auth.name = 'checkPermission'
MATCH (authRet)-[:CONTROLDEP_BR]->(check)
MATCH (fileOps:FUNC_ENTRY)-[:PARAMETER_IN]->(fileParam:PARAM_FORMALIN)
WHERE fileOps.name = 'openFile' OR fileOps.name = 'writeFile'
WHERE NOT EXISTS {
MATCH (check)-[:CONTROLDEP_BR]->(fileParam)
}
RETURN fileParam
Null Pointer Analysis
Find potential null dereferences:
MATCH (nullRet:INST_RET)
WHERE nullRet.function = 'malloc' OR nullRet.function = 'fopen'
MATCH (nullRet)-[*]->(deref)
WHERE deref:INST_LOAD OR deref:INST_STORE
RETURN deref
Control Flow Analysis
Find which conditionals affect a computation:
MATCH (branches:INST_BR)
MATCH (computation:FUNC_ENTRY)-[:PARAMETER_IN]->(compParam:PARAM_FORMALIN)
WHERE computation.name = 'compute'
MATCH (branches)-[*]->(compParam)
RETURN branches
Parameter Dependency
Find dependencies between function parameters:
MATCH (inputs:FUNC_ENTRY)-[:PARAMETER_IN]->(inputParam:PARAM_FORMALIN)
WHERE inputs.name = 'processData'
MATCH (outputs:FUNC_ENTRY)-[:PARAMETER_OUT]->(outputRet:INST_RET)
WHERE outputs.name = 'processData'
MATCH path = (inputParam)-[*]->(outputRet)
RETURN path
Taint Analysis
Complete taint analysis from sources to sinks:
MATCH (sources:FUNC_ENTRY)-[:PARAMETER_OUT]->(sourceRet:INST_RET)
WHERE sources.name = 'read' OR sources.name = 'recv' OR sources.name = 'scanf'
MATCH (sinks:FUNC_ENTRY)-[:PARAMETER_IN]->(sinkParam:PARAM_FORMALIN)
WHERE sinks.name = 'system' OR sinks.name = 'exec' OR sinks.name = 'popen'
MATCH path = (sourceRet)-[*]->(sinkParam)
RETURN path
Query File Format
Batch query files support:
Comments (
#prefix)Multiple queries (one per line or separated by
;)Multi-line queries
Example query file (security_policy.txt):
# Security Policy Verification
# Check for direct flows from sources to sinks
MATCH (sources:FUNC_ENTRY)-[:PARAMETER_OUT]->(sourceRet:INST_RET)
WHERE sources.name = 'read' OR sources.name = 'recv'
MATCH (sinks:FUNC_ENTRY)-[:PARAMETER_IN]->(sinkParam:PARAM_FORMALIN)
WHERE sinks.name = 'system' OR sinks.name = 'exec'
MATCH path = (sourceRet)-[*]->(sinkParam)
RETURN path
# Check sensitive operations are authorized
MATCH (auth:FUNC_ENTRY)-[:PARAMETER_OUT]->(authRet:INST_RET)
WHERE auth.name = 'checkAuth'
MATCH (authRet)-[:CONTROLDEP_BR]->(check)
MATCH (sensitiveOps:FUNC_ENTRY)-[:PARAMETER_IN]->(opParam:PARAM_FORMALIN)
WHERE sensitiveOps.name = 'deleteFile' OR sensitiveOps.name = 'changePassword'
WHERE NOT EXISTS {
MATCH (check)-[:CONTROLDEP_BR]->(opParam)
}
RETURN opParam
Run with:
./build/bin/pdg-query -f security_policy.txt program.bc
Best Practices
Use Descriptive Variable Names:
# Good MATCH (userInput:FUNC_ENTRY)-[:PARAMETER_OUT]->(inputRet:INST_RET) WHERE userInput.name = 'scanf' # Bad MATCH (x:FUNC_ENTRY)-[:PARAMETER_OUT]->(y:INST_RET) WHERE x.name = 'scanf'
Break Complex Queries:
Use multiple MATCH clauses for clarity:
MATCH (sources:FUNC_ENTRY)-[:PARAMETER_OUT]->(sourceRet:INST_RET) WHERE sources.name = 'getInput' MATCH (sanitizers:FUNC_ENTRY)-[:PARAMETER_OUT]->(sanitizerRet:INST_RET) WHERE sanitizers.name = 'sanitize' MATCH (sinks:FUNC_ENTRY)-[:PARAMETER_IN]->(sinkParam:PARAM_FORMALIN) WHERE sinks.name = 'output' MATCH path = (sourceRet)-[*]->(sinkParam) WHERE NOT EXISTS { MATCH (sourceRet)-[*]->(sanitizerRet)-[*]->(sinkParam) } RETURN path
Comment Your Policies:
# Policy: User input must be sanitized before database queries MATCH (input:FUNC_ENTRY)-[:PARAMETER_OUT]->(inputRet:INST_RET) WHERE input.name = 'getUserInput' MATCH (sanitize:FUNC_ENTRY)-[:PARAMETER_OUT]->(sanitizerRet:INST_RET) WHERE sanitize.name = 'sanitizeSQL' MATCH (dbQuery:FUNC_ENTRY)-[:PARAMETER_IN]->(queryParam:PARAM_FORMALIN) WHERE dbQuery.name = 'executeQuery' MATCH path = (inputRet)-[*]->(queryParam) WHERE NOT EXISTS { MATCH (inputRet)-[*]->(sanitizerRet)-[*]->(queryParam) } RETURN path
Test Incrementally:
Start with simple queries and build up:
# Step 1: Verify sources exist MATCH (n:FUNC_ENTRY) WHERE n.name = 'getInput' RETURN n # Step 2: Verify sinks exist MATCH (n:FUNC_ENTRY) WHERE n.name = 'system' RETURN n # Step 3: Check flows MATCH (input:FUNC_ENTRY)-[:PARAMETER_OUT]->(inputRet:INST_RET) WHERE input.name = 'getInput' MATCH (output:FUNC_ENTRY)-[:PARAMETER_IN]->(outputParam:PARAM_FORMALIN) WHERE output.name = 'system' MATCH path = (inputRet)-[*]->(outputParam) RETURN path
Use DISTINCT for Slices:
MATCH (start)-[*]->(n) RETURN DISTINCT n
Limitations
Imprecision: PDG construction uses alias analysis which may be imprecise, leading to spurious dependencies.
Scalability: Large programs with complex dependencies may result in very large PDGs.
Context Sensitivity: PDG is context-insensitive by default. Multiple calls to the same function are merged.
Array Handling: Array dependencies may be approximated.
Pointer Analysis: Depends on underlying pointer analysis precision.
Performance Tips
Limit Path Length: Use bounded variable-length patterns when possible:
MATCH (start)-[*1..5]->(end) RETURN path
Use Specific Labels: Always specify node labels when possible:
# Good MATCH (n:INST_FUNCALL) RETURN n # Less efficient MATCH (n) WHERE n:INST_FUNCALL RETURN n
Filter Early: Apply WHERE clauses as early as possible:
MATCH (n:FUNC_ENTRY) WHERE n.name = 'main' MATCH (n)-[:PARAMETER_OUT]->(ret:INST_RET) RETURN ret
Troubleshooting
Query Returns Empty Set
Check function names are correct (case-sensitive)
Verify PDG was built successfully
Try simpler queries first (
MATCH (n:FUNC_ENTRY) WHERE n.name = 'main' RETURN n)Use
-vverbose flag
Query Too Slow
Reduce scope of query
Use more specific node/edge selection
Limit path length with
[*1..10]Analyze smaller program subset
Syntax Errors
Check quotes around string values:
WHERE n.name = 'main'Ensure proper Cypher syntax
Balance parentheses and brackets
Check relationship types are correct
See Also
ir/pdg - PDG construction details
Tutorials and Examples - PDG usage examples
API Reference - Programmatic PDG access
tools/ir/examples/- In-repo example query cookbookPDG Query Example Cookbook - Rendered documentation for the example queries