json-diff
A lot of tools diff json, but only json-diff’s output is easily processable by a program.
In this post, we’ll explore json-diff (a tool from my json-toolkit), how to use it and how to write programs that use its output.
Using json-diff
json-diff is an easy-to-use CLI tool. It takes only two arguments: FILE1 and FILE2 which are json files.
Sample usage
$ echo '{"k": "v"}' > kv.json
$ echo '{"k": "o"}' > ko.json
$ json-diff kv.json ko.json
[{"leftValue":"v","path":["k"],"rightValue":"o"}]
We can also pretty print the output by piping the output to jq.
$ json-diff kv.json ko.json | jq
[
{
"leftValue": "o",
"path": [
"k"
],
"rightValue": "v"
}
]
json-diff doesn’t have a pretty printing option built in because jq does that for us.
Explanation of the output
The json-diff output is an array of “difference objects”. Each difference object has a:
path (required) -j path is an array of strings and numbers that describe where in the json the difference lies. Strings are for accessing object values and numbers are for accessing array values.
leftValue (optional) - leftValue denotes the value of FILE1 at the path. If leftValue is missing it means there is no value in the left json at the path.
rightValue (optional) - rightValue denotes the value of FILE2 at the path. If rightValue is missing it means there is no value in the right json at the path.
Similar information can be found in the json-diff --help message.
Writing programs with json-diff
json-diff is useful for humans see differences, but its real power is for computers. json-diff shines because its so easy to write programs that process its output. This is because the output:
is json - which means one-line parsing into a data structure
has a simple, transparent schema - which means its easy to think about and it is easy to extract data from it
jq
Since the output is json, we can often use jq to process the output of json-diff. This comes up a lot in Wicked Fast Testing when filtering out expected failures. For example, in Wicked Fast Testing, tests that failed with an exception store the exception in an object with string key "e". As such, if we want to filter out differences involving errors, we can write a quick jq script:
$ json-diff expected.json actual.json \
| jq 'map(select(.path[-1] != "e"))'
What’s even more powerful is that this output is a json-diff, so you can combine many of these filters together. Let’s say we know the first test changes and we’ve manually validated those changes are good, then we can add one line of jq to hide any changes in the first test.
$ json-diff expected.json actual.json \
| jq 'map(select(.path[-1] != "e"))' \
| jq 'map(select(.path[0] != 0))'
Although we could combine these into a single jq statement, two jq commands is more readable and copy-paste-able. Here’s a real example that I used recently:
#!/usr/bin/env bash
jq 'map(select(.path | contains(["result", "backtest_d"]) | not))' | \
jq 'map(select(.path | contains(["result", "parameters"]) | not))' | \
jq 'map(select(.path[0] != 33))' | \
jq 'map(select(.path[0] != 51))' | \
jq
I use a NOOP jq on the last line so I can easily copy-paste any real jq line to create a new filter. These programs don’t have to be perfect, they’re quick and dirty throwaway programs.
python
Since these are just json filters, we can also write them in python, or any language really. Here’s the equivalent of filtering out errors and the first test written in python
#!/usr/bin/env python3
import json
import sys
def t(data):
return [d for d in data if d["path"][0] != 0 and d["path"][-1] != "e"]
print(json.dumps(t(json.load(sys.stdin))))
Most of this is boiler plate, focus on the bolded line. It does the heavy lifting. It looks rather similar to the jq because the core logic is exactly the same.
If this program is called “filter-diffs”, we can invoke it with:
$ json-diff expected.actual | ./filter-diffs | jq
Why use json-diff?
Currently, the state of the art technique for comparing json, or any structured data files, is to use a textual diff (commonly git diff). The fact that textual diff works at all is a testament to the power of plain text, but still limited. Textual diffs suffer two limitations.
Textual diffs do not contain full json paths. In a textual diff, you may get a few lines of context to understand a change. This works well for code, but for JSON where the path of a value requires understanding many more lines, this doesn’t work.
Textual diffs are easy for humans to read, but hard for computers to read. Some languages have diff parsers, like python, but they’re in beta.
json-diff overcomes both of these limitations.
Each difference contains a field “path” which contains the fully qualified path of the value which differs between the two files.
The output of json diff is itself json. This makes writing a program to process the diff a breeze. We’ll see later why this feature is incredibly valuable.
What json might we diff
Data is everywhere in codebases, here’s some examples.
Not json!
json-diff, despite the name, isn’t just about diffing “json” in particular. It’s about diffing structured data which can be in any format. CSV. XML. TOML. YAML. The trick is to convert the file into json using a tool like csv-to-json. Even binary files can be compared with json-diff, as long as you have a program which converts them to json.
This sounds hard, but it’s really easy. For example, to compare yaml files, just do:
cat old.yaml | yaml-to-json > old.json
cat new.yaml | yaml-to-json > new.json
json-diff old.json new.json
# or as one line
json-diff <(yaml-to-json < old.yml) <(yaml-to-json < new.yml)
Config files
Many codebases have config files which are structured data. Maybe as yaml or json, it doesn’t matter. If it’s yaml, convert it to json, then use json-diff.
Config-as-code
Config-as-code is taking over Devops by storm for good reason. It deserves a post in its own right because it’s such a great concept, but for our purposes the important part is that the config-as-code means that system configurations are often stored as structured data. Structured data can be rendered as json, which we can diff.
Wicked Fast Testing
Wicked Fast Testing is centered around tests as data which happen to be json. In particular, test failures can be seen by diffing the expected test results (expected.json) to the actual test results (actual.json).
Installing json-diff
Installing json-diff only takes a second, just grab and make the json-toolkit:
git clone https://github.com/tyleradams/json-toolkit
cd json-toolkit
sudo make install
If you have any trouble installing or using json-diff, just message me on twitter (@canardivore) and I’ll be happy to help.
Conclusion
We’ve seen how to use json-diff. We’ve shown how easy it is to process the output to fit our own needs whether it be in jq, python, or our favorite language. We’ve seen why we might use it and how to easily install it.
Next week, we’ll continue in our series on the json-toolkit with one of my favorite tools, json-sql. Imagine being able dump your entire db to json in a single command. Pretty neat. If you don’t want to miss out, just hit Subscribe now below