RL-based Heuristics Learning for Model Checking

Introduction

Model checking is a formal verification technique that exhaustively explores the state space of a system to verify whether it satisfies a given property. However, in practical verification scenarios—such as debugging cycles where errors are fixed and re-verified, or regression testing after system modifications—running full verification from scratch each time is highly inefficient.

While directly reusing previously explored state spaces is challenging due to structural changes in the system, abstract knowledge about “which states are likely to lead to errors” can potentially be transferred across similar systems. Our research addresses this challenge by using reinforcement learning to automatically learn search heuristics based on predicate abstraction. This approach enables the learned heuristics to generalize across different system sizes and configurations, thereby accelerating repeated verification tasks.

In this work, we focus on falsification of safety properties, particularly deadlock detection in concurrent systems.


Approach

Predicate Abstraction for Feature Extraction

Predicate abstraction maps concrete states to fixed-dimensional Boolean vectors based on the satisfaction of predefined predicates. Given n predicates, each concrete state is abstracted to an n-dimensional Boolean vector:

\[\alpha(s) := \langle p_1(s), p_2(s), \ldots, p_n(s) \rangle\]

This abstraction provides two key benefits:

Heuristic Learning with Reinforcement Learning

We train a Q-function Q(s, a) on the abstracted state space to estimate the likelihood of reaching an error state. The reward function assigns 1 for error states and 0 otherwise. The learned Q-values indicate how promising each state-action pair is for finding violations.

We employ two learning approaches:

The learned Q-function is integrated into a best-first search framework. The state value function is defined as:

\[\hat{V}(s) = \max_a \hat{Q}(\alpha(s), a)\]

States with higher V-values are prioritized during exploration, directing the search toward potential error states.


Experiments

Benchmark: Dining Philosophers

The Dining Philosophers problem is a classic concurrency benchmark where N philosophers sit around a circular table with forks between adjacent philosophers. Each philosopher cycles through states: think → hungry → wait → eat. The verification goal is to detect deadlock states where all philosophers hold one fork and wait indefinitely for the other.

Predicate Design

We define 13 predicates based on the system’s transition rules:

This design ensures that instances of any size N are represented as 13-dimensional Boolean vectors.

Experimental Setup
Parameter Value
Training instance N = 3
Test instances N = 4, 5, 6, 7, 8, 9, 10
Training episodes 500
Number of predicates 13
Q-table hyperparameters lr = 0.7, γ = 0.95
DQN hyperparameters lr = 0.02, γ = 0.95, hidden = 64

Baselines:

Results: Search Efficiency

Table. Number of states explored before finding deadlock.

N BFS Random Q-table DQN
4 341 185 70 58
5 1,457 701 132 79
6 6,189 2,713 316 152
7 26,333 9,498 884 308
8 112,309 30,244 2,765 519
9 479,765 61,448 7,174 892
10 1,853,449 93,459 16,783 1,939

Key findings (N=10):

Results: Q-table Hit Ratio
N Hit Ratio
4 38%
7 4%
10 0.2%

The hit ratio—the proportion of encountered abstract states that were seen during training—drops dramatically as N increases. This explains Q-table’s limited generalization: unvisited abstract states have Q-value 0, providing no guidance.

DQN overcomes this limitation through neural network generalization, predicting reasonable Q-values for unseen states based on learned patterns.

Results: Search Time

Table. Time to find deadlock (milliseconds).

N BFS Random Q-table DQN
4 5 ms 6 ms 7 ms 66 ms
7 147 ms 207 ms 111 ms 235 ms
10 560,908 ms 28,191 ms 11,016 ms 691 ms

For small N, DQN is slower due to neural network inference overhead. However, for N ≥ 8, the reduction in explored states outweighs the inference cost. At N=10, DQN completes in under 1 second while Random requires ~28 seconds.


Publications


Contact


Last modified: 2026/02/25 (Hyeyoon Kang)