
Model checking is a formal verification technique that exhaustively explores the state space of a system to verify whether it satisfies a given property. However, in practical verification scenarios—such as debugging cycles where errors are fixed and re-verified, or regression testing after system modifications—running full verification from scratch each time is highly inefficient.
While directly reusing previously explored state spaces is challenging due to structural changes in the system, abstract knowledge about “which states are likely to lead to errors” can potentially be transferred across similar systems. Our research addresses this challenge by using reinforcement learning to automatically learn search heuristics based on predicate abstraction. This approach enables the learned heuristics to generalize across different system sizes and configurations, thereby accelerating repeated verification tasks.
In this work, we focus on falsification of safety properties, particularly deadlock detection in concurrent systems.
Predicate abstraction maps concrete states to fixed-dimensional Boolean vectors based on the satisfaction of predefined predicates. Given n predicates, each concrete state is abstracted to an n-dimensional Boolean vector:
\[\alpha(s) := \langle p_1(s), p_2(s), \ldots, p_n(s) \rangle\]This abstraction provides two key benefits:
We train a Q-function Q(s, a) on the abstracted state space to estimate the likelihood of reaching an error state. The reward function assigns 1 for error states and 0 otherwise. The learned Q-values indicate how promising each state-action pair is for finding violations.
We employ two learning approaches:
The learned Q-function is integrated into a best-first search framework. The state value function is defined as:
\[\hat{V}(s) = \max_a \hat{Q}(\alpha(s), a)\]States with higher V-values are prioritized during exploration, directing the search toward potential error states.
The Dining Philosophers problem is a classic concurrency benchmark where N philosophers sit around a circular table with forks between adjacent philosophers. Each philosopher cycles through states: think → hungry → wait → eat. The verification goal is to detect deadlock states where all philosophers hold one fork and wait indefinitely for the other.
We define 13 predicates based on the system’s transition rules:
This design ensures that instances of any size N are represented as 13-dimensional Boolean vectors.
| Parameter | Value |
|---|---|
| Training instance | N = 3 |
| Test instances | N = 4, 5, 6, 7, 8, 9, 10 |
| Training episodes | 500 |
| Number of predicates | 13 |
| Q-table hyperparameters | lr = 0.7, γ = 0.95 |
| DQN hyperparameters | lr = 0.02, γ = 0.95, hidden = 64 |
Baselines:
Table. Number of states explored before finding deadlock.
| N | BFS | Random | Q-table | DQN |
|---|---|---|---|---|
| 4 | 341 | 185 | 70 | 58 |
| 5 | 1,457 | 701 | 132 | 79 |
| 6 | 6,189 | 2,713 | 316 | 152 |
| 7 | 26,333 | 9,498 | 884 | 308 |
| 8 | 112,309 | 30,244 | 2,765 | 519 |
| 9 | 479,765 | 61,448 | 7,174 | 892 |
| 10 | 1,853,449 | 93,459 | 16,783 | 1,939 |
Key findings (N=10):
| N | Hit Ratio |
|---|---|
| 4 | 38% |
| 7 | 4% |
| 10 | 0.2% |
The hit ratio—the proportion of encountered abstract states that were seen during training—drops dramatically as N increases. This explains Q-table’s limited generalization: unvisited abstract states have Q-value 0, providing no guidance.
DQN overcomes this limitation through neural network generalization, predicting reasonable Q-values for unseen states based on learned patterns.
Table. Time to find deadlock (milliseconds).
| N | BFS | Random | Q-table | DQN |
|---|---|---|---|---|
| 4 | 5 ms | 6 ms | 7 ms | 66 ms |
| 7 | 147 ms | 207 ms | 111 ms | 235 ms |
| 10 | 560,908 ms | 28,191 ms | 11,016 ms | 691 ms |
For small N, DQN is slower due to neural network inference overhead. However, for N ≥ 8, the reduction in explored states outweighs the inference cost. At N=10, DQN completes in under 1 second while Random requires ~28 seconds.
Last modified: 2026/02/25 (Hyeyoon Kang)