A Faster Alternative to Jq

An introduction to jsongrep, a high-performance JSON search tool written in Rust that uses Deterministic Finite Automata (DFA) to achieve superior speed compared to jq.
This article is both an introduction to a tool I have been working on called jsongrep, as well as a technical explanation of the internal search engine it uses. I also discuss the benchmarking strategy used to compare the performance of jsongrep against other JSON path-like query tools and implementations.
In this post I'll first show you the tool, then explain why it's fast (conceptually), then how it's fast (the automata theory), and finally prove it (benchmarks).
Upfront I would like to say that this article is heavily inspired by Andrew Gallant's amazing ripgrep tool, and his associated blog post "ripgrep is faster than {grep, ag, git grep, ucg, pt, sift}".
Table of Contents
You can install jsongrep from crates.io:
cargo install jsongrep
Like ripgrep, jsongrep is cross-platform and written in Rust.
A Whirlwind Tour of jsongrep
jsongrep (jg binary) takes a query and a JSON input and prints every value whose path through the document matches the query. Let's build up the query language piece by piece using this sample document:
sample.json:
{
"name": "Micah",
"favorite_drinks": ["coffee", "Dr. Pepper", "Monster Energy"],
"roommates": [
{
"name": "Alice",
"favorite_food": "pizza"
}
]
}
Dot paths select nested fields by name. Dots (.) between field names denote concatenation-- "match this field, then that field":
$ cat sample.json | jg 'roommates[0].name'
roommates.[0].name:
"Alice"
Wildcards match any single key (*) or any array index ([*]):
$ cat sample.json | jg 'favorite_drinks[*]'
favorite_drinks.[0]:
"coffee"
favorite_drinks.[1]:
"Dr. Pepper"
favorite_drinks.[2]:
"Monster Energy"
Alternation (|) matches either branch, like regex alternation:
$ cat sample.json | jg 'name | roommates'
name:
"Micah"
roommates:
[
{
"name": "Alice",
"favorite_food": "pizza"
}
]
Recursive descent uses * and [*] inside a Kleene star to walk arbitrarily deep into the tree. For example, to find every name field at any depth:
$ cat sample.json | jg '(* | [*])*.name'
name:
"Micah"
roommates.[0].name:
"Alice"
The pattern (* | [*])* means "follow any key or any index, zero or more times", e.g., descend through every possible path. The trailing .name then filters for only those paths that end at a field called name.
Equivalently, jg exposes the -F ("fixed string") flag as a shorthand for these recursive descent queries.
Optional (?) matches zero or one occurrence:
$ cat sample.json | jg 'roommates[0].favorite_food?'
roommates.[0]:
{
"name": "Alice",
"favorite_food": "pizza"
}
roommates.[0].favorite_food:
"pizza"
The jsongrep Pitch
JSON documents are trees: objects and arrays branch, scalars are leaves, and keys and indices label the edges. Querying a JSON document is really about describing paths through this tree. jsongrep takes this observation literally: its query language is a regular language over the alphabet of keys and indices. Think regular expressions, but instead of matching characters in a string, you're matching edges in a tree.
Why does "regular" matter? Because regular languages have a well-known, powerful property: they can be compiled into a deterministic finite automaton (DFA). A DFA processes input in a single pass with $O(1)$ work per input symbol-- no backtracking, no recursion stack, no exponential blowup on pathological queries. The query is paid for once at compile time, then search is essentially free.
This is the key difference from tools like jq, jmespath, or jsonpath-rust. Those tools interpret path expressions: at each node in the JSON tree, they evaluate the query, check predicates, and recursively descend into matching branches. jsongrep does something fundamentally different-- it compiles the query into a DFA before it ever looks at the JSON, then walks the document tree exactly once, taking a single $O(1)$ state transition at each edge.
The jsongrep Anti-Pitch
jsongrep is not as ubiquitous as jq. The query language is deliberately less expressive than jq's. jsongrep is a search tool, not a transformation tool-- it finds values but doesn't compute new ones. There are no filters, no arithmetic, no string interpolation. jsongrep is new and has not been battle-tested in the wild.
jsongrep's DFA-Based Query Engine
The core of the search engine is a five-stage pipeline:
- Parse the JSON document into a tree via
serde_json_borrow(zero-copy). - Parse the user's query into a
QueryAST. - Construct an NFA from the query via Glushkov's construction algorithm.
- Determinize the NFA into a DFA via subset construction.
- Walk the JSON tree, taking DFA transitions at each edge and collecting matches.
The grammar is intentionally simple. Dots denote concatenation, | denotes alternation, postfix * denotes Kleene star, and postfix ? denotes optional. This maps directly to the definition of a regular language.
Source: Hacker News










