Tutorials and Examples

This section provides hands-on tutorials for using Lotus to analyze real programs. For optimization passes that can reshape IR before analysis, see Optimization and the SWPrefetching (Software Prefetching) guide.

Tutorial 1: Basic Alias Analysis

Let’s analyze a simple C program to understand pointer relationships.

Example Program

Create example1.c:

#include <stdlib.h>

int *global_ptr;

void update_value(int *p, int value) {
    *p = value;
}

int main() {
    int x = 10;
    int y = 20;
    int *ptr1 = &x;
    int *ptr2 = &y;

    update_value(ptr1, 30);

    int *heap = (int *)malloc(sizeof(int));
    *heap = 40;

    global_ptr = heap;

    free(heap);
    return 0;
}

Compile to LLVM IR

clang -emit-llvm -c -g example1.c -o example1.bc
clang -emit-llvm -S -g example1.c -o example1.ll

Run Different Alias Analyses

1. SparrowAA (CI mode) Analysis (Fast):

./build/bin/sparrow-aa example1.bc

This performs context-insensitive, inclusion-based pointer analysis. Good for quick scans.

Note: sparrow-aa also has context-sensitive variants.

2. AserPTA with 1-CFA (Balanced):

./build/bin/aser-aa -analysis-mode=1-cfa -solver=wave -dump-stats example1.bc

This adds call-site sensitivity for better precision while maintaining good performance.

3. DyckAA (Precise):

./build/bin/dyck-aa -print-alias-set-info example1.bc

Produces the most precise alias sets using Dyck-CFL reachability.

Understanding Output

The analyses will report:

  • Points-to sets: What each pointer may point to

  • Alias sets: Which pointers may alias each other

  • Call graph: Which functions may be called through function pointers

  • Statistics: Number of pointers, objects, constraints solved

Tutorial 2: Detecting Integer Overflow

Let’s find integer overflow bugs using Kint.

Example Program

Create overflow.c:

#include <stdio.h>
#include <limits.h>

int add_values(int a, int b) {
    return a + b;  // Potential overflow
}

int multiply_values(int a, int b) {
    return a * b;  // Potential overflow
}

int safe_add(int a, int b) {
    if (a > 0 && b > INT_MAX - a) {
        return INT_MAX;  // Saturate
    }
    if (a < 0 && b < INT_MIN - a) {
        return INT_MIN;  // Saturate
    }
    return a + b;  // Safe
}

int main() {
    int x = INT_MAX - 10;
    int y = 20;

    int result1 = add_values(x, y);  // Bug: overflow
    int result2 = multiply_values(1000, 1000000);  // Bug: overflow
    int result3 = safe_add(x, y);  // Safe

    printf("Results: %d, %d, %d\n", result1, result2, result3);
    return 0;
}

Compile and Analyze

clang -emit-llvm -S -g overflow.c -o overflow.ll
./build/bin/lotus-kint -check-int-overflow overflow.ll

Expected Output

The tool will report:

[Integer Overflow] Function: add_values
Line 5: return a + b;
Reason: Operands a and b may cause signed integer overflow

[Integer Overflow] Function: multiply_values
Line 9: return a * b;
Reason: Operands a and b may cause signed integer overflow

Note that safe_add is not reported because the checks prevent overflow.

Advanced: All Checks

./build/bin/lotus-kint -check-all overflow.ll

This enables all checkers: integer overflow, division by zero, bad shift, array bounds, and dead branches.

Tutorial 3: Null Pointer Detection

Let’s detect null pointer dereferences.

Example Program

Create nullpointer.c:

#include <stdlib.h>
#include <string.h>

typedef struct {
    int id;
    char *name;
} Person;

Person *create_person(int id, const char *name) {
    if (id < 0) {
        return NULL;  // Error case
    }

    Person *p = (Person *)malloc(sizeof(Person));
    if (p) {
        p->id = id;
        p->name = strdup(name);
    }
    return p;
}

void print_person(Person *p) {
    printf("Person %d: %s\n", p->id, p->name);  // Bug if p is NULL
}

int main() {
    Person *p1 = create_person(1, "Alice");
    print_person(p1);  // Safe if malloc succeeded

    Person *p2 = create_person(-1, "Bob");
    print_person(p2);  // Bug: p2 is NULL

    Person *p3 = create_person(2, "Charlie");
    if (p3) {
        print_person(p3);  // Safe: null check
    }

    return 0;
}

Compile and Analyze

clang -emit-llvm -c -g nullpointer.c -o nullpointer.bc
./build/bin/lotus-pulse -v nullpointer.bc

Expected Output

The analysis will report:

[Null Pointer Dereference]
Function: print_person
Line 24: printf("Person %d: %s\n", p->id, p->name);
Path: create_person returns NULL → print_person dereferences p

Call site in main (line 32) passes potentially NULL pointer

Tutorial 4: Taint Analysis

Let’s track information flow from untrusted sources to sensitive sinks.

Example Program

Create taint.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void execute_command(const char *cmd) {
    system(cmd);  // Dangerous sink
}

char *sanitize(const char *input) {
    // Simplified sanitization
    char *output = strdup(input);
    for (int i = 0; output[i]; i++) {
        if (output[i] == ';' || output[i] == '|') {
            output[i] = '_';
        }
    }
    return output;
}

int main() {
    char user_input[256];
    printf("Enter command: ");
    scanf("%255s", user_input);  // Tainted source

    // Bug: direct flow from source to sink
    execute_command(user_input);

    // Safe: sanitized before use
    char *safe_cmd = sanitize(user_input);
    execute_command(safe_cmd);
    free(safe_cmd);

    return 0;
}

Compile and Analyze

clang -emit-llvm -c -g taint.c -o taint.bc
./build/bin/lotus-taint -verbose taint.bc

Custom Sources and Sinks

./build/bin/lotus-taint \
    -sources="scanf,gets,read" \
    -sinks="system,exec,popen" \
    taint.bc

Expected Output

[Taint Flow Detected]
Source: scanf (line 23)
Sink: system (line 6) via execute_command
Path: user_input → execute_command(user_input) → system(cmd)

Warning: Unsanitized user input flows to dangerous system call

Tutorial 5: Abstract Interpretation with CLAM

Let’s use abstract interpretation to verify program properties.

Example Program

Create verify.c:

#include <assert.h>

int abs(int x) {
    if (x < 0) {
        return -x;
    }
    return x;
}

int bounded_add(int a, int b) {
    // Pre-condition: -100 <= a, b <= 100
    int result = a + b;
    // Post-condition: -200 <= result <= 200
    assert(result >= -200 && result <= 200);
    return result;
}

int div_safe(int a, int b) {
    if (b == 0) {
        return 0;
    }
    return a / b;
}

int main() {
    int x = abs(-10);
    assert(x >= 0);  // Should be proven

    int y = bounded_add(50, 60);
    // Assertion in bounded_add should hold

    int z = div_safe(100, 5);
    // No division by zero

    return 0;
}

Compile and Preprocess

clang -emit-llvm -c -g verify.c -o verify.bc
./build/bin/clam-pp --crab-lower-unsigned-icmp --crab-lower-select \
    verify.bc -o verify.prep.bc

Run Analysis with Different Domains

1. Interval Domain (Fast):

./build/bin/clam --crab-dom=int --crab-check=assert \
    --crab-print-invariants=true verify.prep.bc

2. Zones Domain (Relational):

./build/bin/clam --crab-dom=zones --crab-check=assert \
    --crab-print-invariants=true verify.prep.bc

3. Polyhedra (Most Precise):

./build/bin/clam --crab-dom=pk --crab-check=assert \
    --crab-print-invariants=true verify.prep.bc

Expected Output

Function: abs
Entry: x ∈ [-∞, +∞]
After branch (x < 0): x ∈ [-∞, -1]
Return: retval ∈ [0, +∞]

Function: bounded_add
Entry: a ∈ [-100, 100], b ∈ [-100, 100]
After add: result ∈ [-200, 200]
Assertion PROVEN: result >= -200 && result <= 200

Function: div_safe
Entry: a ∈ [-∞, +∞], b ∈ [-∞, +∞]
After check: b ∈ [-∞, -1] ∪ [1, +∞] (b ≠ 0)
Division is SAFE: no division by zero

Export to JSON

./build/bin/clam --crab-dom=zones --crab-check=assert \
    -ojson=verify_results.json verify.prep.bc

Tutorial 6: Program Dependence Graph Queries

Let’s use the PDG query language to analyze dependencies.

Example Program

Create security.c:

#include <stdio.h>
#include <string.h>
#include <stdbool.h>

bool authenticate(const char *password) {
    return strcmp(password, "secret123") == 0;
}

char *get_secret_data() {
    return "Confidential Information";
}

void log_message(const char *msg) {
    printf("LOG: %s\n", msg);
}

void send_to_network(const char *data) {
    printf("NETWORK: %s\n", data);
}

int main() {
    char password[100];
    printf("Enter password: ");
    scanf("%99s", password);

    if (authenticate(password)) {
        char *secret = get_secret_data();
        log_message("Access granted");
        // Secret should not flow to network without auth
        send_to_network(secret);
    } else {
        log_message("Access denied");
    }

    return 0;
}

Build PDG and Query

clang -emit-llvm -c -g security.c -o security.bc
./build/bin/pdg-query -i security.bc

Interactive Queries

In the interactive mode, try these Cypher queries:

1. View entire PDG:

> MATCH (n) RETURN n

2. Find returns of get_secret_data:

> MATCH (func:FUNC_ENTRY)-[:PARAMETER_OUT]->(ret:INST_RET)
> WHERE func.name = 'get_secret_data'
> RETURN ret

3. Find parameters of send_to_network:

> MATCH (func:FUNC_ENTRY)-[:PARAMETER_IN]->(param:PARAM_FORMALIN)
> WHERE func.name = 'send_to_network'
> RETURN param

4. Check information flow:

> MATCH (secret:FUNC_ENTRY)-[:PARAMETER_OUT]->(secretRet:INST_RET)
> WHERE secret.name = 'get_secret_data'
> MATCH (network:FUNC_ENTRY)-[:PARAMETER_IN]->(networkParam:PARAM_FORMALIN)
> WHERE network.name = 'send_to_network'
> MATCH path = (secretRet)-[*]->(networkParam)
> RETURN path

5. Security Policy Check:

> MATCH (secret:FUNC_ENTRY)-[:PARAMETER_OUT]->(secretRet:INST_RET)
> WHERE secret.name = 'get_secret_data'
> MATCH (network:FUNC_ENTRY)-[:PARAMETER_IN]->(networkParam:PARAM_FORMALIN)
> WHERE network.name = 'send_to_network'
> MATCH path = (secretRet)-[*]->(networkParam)
> RETURN path

Batch Query File

Create security_policy.txt:

# Check if secret data flows to network
MATCH (secret:FUNC_ENTRY)-[:PARAMETER_OUT]->(secretRet:INST_RET)
WHERE secret.name = 'get_secret_data'
MATCH (network:FUNC_ENTRY)-[:PARAMETER_IN]->(networkParam:PARAM_FORMALIN)
WHERE network.name = 'send_to_network'
MATCH path = (secretRet)-[*]->(networkParam)
RETURN path

# Check if authentication guards sensitive operations
MATCH (auth:FUNC_ENTRY)-[:PARAMETER_OUT]->(authRet:INST_RET)
WHERE auth.name = 'authenticate'
MATCH (authRet)-[:CONTROLDEP_BR]->(check)
MATCH (sensitiveOps:FUNC_ENTRY)-[:PARAMETER_IN]->(opParam:PARAM_FORMALIN)
WHERE sensitiveOps.name = 'send_to_network'
WHERE EXISTS {
  MATCH (check)-[:CONTROLDEP_BR]->(opParam)
}
RETURN opParam

Run batch queries:

./build/bin/pdg-query -f security_policy.txt security.bc

Tutorial 7: Dynamic Validation with DynAA

Let’s validate static analysis results against runtime behavior.

Example Program

Create dynamic.c:

#include <stdio.h>

void process(int *p, int *q) {
    *p = 10;
    *q = 20;
}

int main() {
    int x, y;
    process(&x, &y);
    printf("%d %d\n", x, y);
    return 0;
}

Instrument and Run

# 1. Compile to bitcode
clang -emit-llvm -c -g dynamic.c -o dynamic.bc

# 2. Instrument for dynamic analysis
./build/bin/dynaa-instrument dynamic.bc -o dynamic.inst.bc

# 3. Compile instrumented code
clang dynamic.inst.bc build/libRuntime.a -o dynamic.inst

# 4. Run and collect traces
mkdir logs
LOG_DIR=logs/ ./dynamic.inst

# 5. Check static analysis against traces
./build/bin/dynaa-check dynamic.bc logs/pts.log basic-aa

View Collected Data

./build/bin/dynaa-log-dump logs/pts.log

Expected Output

Dynamic Analysis Report:

Total pointer operations: 142
Unique pointers: 8
Unique objects: 6

Validation against basic-aa:
- True positives: 45
- False positives: 2
- False negatives: 1

False Positive: p and q reported as may-alias, but never alias at runtime

Tutorial 8: Interprocedural Analysis

Let’s analyze a multi-file program.

Example Programs

Create utils.c:

#include <stdlib.h>
#include "utils.h"

Buffer *create_buffer(size_t size) {
    Buffer *buf = (Buffer *)malloc(sizeof(Buffer));
    if (buf) {
        buf->data = (char *)malloc(size);
        buf->size = size;
        buf->used = 0;
    }
    return buf;
}

void destroy_buffer(Buffer *buf) {
    if (buf) {
        free(buf->data);
        free(buf);
    }
}

int append_data(Buffer *buf, const char *data, size_t len) {
    if (!buf || !buf->data) return -1;
    if (buf->used + len > buf->size) return -1;

    memcpy(buf->data + buf->used, data, len);
    buf->used += len;
    return 0;
}

Create utils.h:

#ifndef UTILS_H
#define UTILS_H

#include <stddef.h>

typedef struct {
    char *data;
    size_t size;
    size_t used;
} Buffer;

Buffer *create_buffer(size_t size);
void destroy_buffer(Buffer *buf);
int append_data(Buffer *buf, const char *data, size_t len);

#endif

Create main.c:

#include <stdio.h>
#include "utils.h"

int main() {
    Buffer *buf = create_buffer(1024);

    if (buf) {
        append_data(buf, "Hello", 5);
        append_data(buf, " World", 6);

        // Use buffer data
        printf("%.*s\n", (int)buf->used, buf->data);

        destroy_buffer(buf);
    }

    return 0;
}

Run Interprocedural Analysis

# Interprocedural alias analysis
./build/bin/aser-aa -analysis-mode=1-cfa -dump-stats program.bc

# Interprocedural abstract interpretation
./build/bin/clam --crab-inter --crab-dom=zones \
    --crab-check=null --crab-check=bounds program.bc

The interprocedural analysis will track the buffer through function calls and verify that:

  • create_buffer allocates memory correctly

  • append_data checks bounds before writing

  • destroy_buffer frees allocated memory

  • No null pointer dereferences occur

Tutorial 9: Concurrency Analysis

Let’s analyze a multi-threaded program.

Example Program

Create concurrent.c:

#include <pthread.h>
#include <stdio.h>

int shared_counter = 0;
pthread_mutex_t lock;

void *increment_thread(void *arg) {
    for (int i = 0; i < 1000; i++) {
        pthread_mutex_lock(&lock);
        shared_counter++;  // Protected
        pthread_mutex_unlock(&lock);
    }
    return NULL;
}

void *buggy_thread(void *arg) {
    for (int i = 0; i < 1000; i++) {
        shared_counter++;  // Bug: race condition
    }
    return NULL;
}

int main() {
    pthread_t t1, t2;
    pthread_mutex_init(&lock, NULL);

    pthread_create(&t1, NULL, increment_thread, NULL);
    pthread_create(&t2, NULL, buggy_thread, NULL);

    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    printf("Counter: %d\n", shared_counter);

    pthread_mutex_destroy(&lock);
    return 0;
}

Compile and Analyze

clang -emit-llvm -c -g concurrent.c -o concurrent.bc
./build/bin/lotus-concur -verbose concurrent.bc

Expected Output

[Data Race Detected]
Variable: shared_counter
Thread 1: increment_thread (line 10) - PROTECTED by lock
Thread 2: buggy_thread (line 17) - UNPROTECTED

Race condition: shared_counter++ in buggy_thread not protected by mutex

Best Practices

  1. Start Simple: Begin with fast analyses to get quick feedback

  2. Incremental Precision: If fast analyses report issues, use more precise analyses to reduce false positives

  3. Combine Analyses: Use multiple analyses together (e.g., alias analysis + taint analysis)

  4. Preprocess When Needed: Use clam-pp or other preprocessors to normalize code

  5. Verify Results: Use dynamic validation (DynAA) to confirm static analysis findings

  6. Use Configuration Files: Create YAML configs for complex analysis workflows

  7. Visualize Results: Generate DOT files to visualize call graphs, PDGs, and alias sets

  8. Batch Processing: Use scripts to analyze multiple files or entire codebases

Next Steps