Recently, while working on a migration from a Drupal 7 site to a Drupal 8 site, the command line tool `jq` came in quite handy. The migration source was a JSON api containing the field values of nodes in the Drupal 7 site. I was looking for a quick way to get some basic information about the source data. In some cases I needed to verify the format of the data, whether it was an array, what the key was, or how many items there were. Using `jq` made these tasks super easy. This post will demonstrate some tasks that can be done with `jq` that I recently used on the migration project.
If you are not already familiar, “`jq` is a lightweight and flexible command-line JSON processor”. Filters are the foundation of `jq`, they take an input and produce an output. They can be combined and/or piped to each other. There are also built in operators, functions, and conditionals for more advanced use cases. `jq` can be used to change the shape of JSON data to satisfy a specific use case.
To look at some usage examples, we’ll use the following content in a JSON file `posts.json`.
[
{
"userId": 1,
"id": 1,
"title": "sunt aut facere",
"body": "quia et suscipit suscipit recusandae consequuntur"
},
{
"userId": 2,
"id": 2,
"title": "qui est esse",
"body": "est rerum tempore vitae sequi sint nihil"
}
]
`jq` pretty prints all output, so it can be used as a simple way to make viewing JSON output from `curl` nicer. Some other http tools, for instance httpie, pretty print JSON by default.
Let’s say we wanted to see how many items are in `posts.json`, `jq` has a built in `length` available. It’s clear that there are 2 objects in the array when we look at it, so this example would be more useful with larger data sets. Also notice that to read a json file, you can pass the path/filename as the argument after the filter expression. Another way to achieve the same thing is by using `cat`. The following examples are equivalent.
❯ jq 'length' posts.json
2
❯ cat posts.json | jq 'length'
2
If you wanted to get the user id of the posts, you could use the object identifier “.userId”. Before that can happen however, another step is needed since the post objects are inside of an array. The first step would be to use the array/object iterator `.[]` to get to the objects inside. Hint: it is also possible to use an index in the array/object iterator to get a specific item in the array, e.g. `.[0]`.
Using `.[]` will look like this, notice we now only see the 2 objects that are inside the array, without the surrounding `[]` brackets:
❯ jq '.[]' posts.json
{
"userId": 1,
"id": 1,
"title": "sunt aut facere",
"body": "quia et suscipit suscipit recusandae consequuntur"
}
{
"userId": 2,
"id": 2,
"title": "qui est esse",
"body": "est rerum tempore vitae sequi sint nihil"
}
After you get into the array, you can then use `.userId` like this:
❯ jq '.[] | .userId' posts.json
1
2
Now, if you wanted to see how many of the posts had a value of 1 for `userId`, you can use a built in function called `select()`. The `select()` function allows using comparisons to select only the data that you want. To do this, `jq` expects a pipe before the function call, `|`, where in the previous example the pipe was optional.
❯ jq '.[] | select(.userId == 1)' posts.json
{
"userId": 1,
"id": 1,
"title": "sunt aut facere",
"body": "quia et suscipit suscipit recusandae consequuntur"
}
Another cool thing `jq` can do is alter the structure of the JSON. Let’s assume the fictional scenario where the example content from `posts.json` needs to be imported into another system that expects a different JSON structure. `jq` can help with that. Our new system groups the title and body of the post into a content object, so we need to do some transformations. The new object uses the same overall structure and keys, but the title and body fields are now nested inside of an object with the key “content”. This happens by mapping the original key/value pairs as they were with the addition of ` “content”: {“title”: .title, “body”: .body}`. This adds the “content” key and wraps the title and body with the object constructor `{}`.
❯ jq '.[] | {"userId": .userId, "id": .id, "content": {"title": .title, "body": .body}}' posts.json
{
"userId": 1,
"id": 1,
"content": {
"title": "sunt aut facere",
"body": "quia et suscipit suscipit recusandae consequuntur"
}
}
{
"userId": 2,
"id": 2,
"content": {
"title": "qui est esse",
"body": "est rerum tempore vitae sequi sint nihil"
}
}
This overview scratches the surface of what is possible with `jq`, but luckily there are other great resources that go more in depth. Here are some that have been useful.
- The jq tutorial to jump right in:
https://stedolan.github.io/jq/tutorial/ - The `jq` manual for extensive documentation:
https://stedolan.github.io/jq/manual/ - jqplay for live experimentation and a few examples:
https://jqplay.org/ - “JSON on the command line with jq”:
https://shapeshed.com/jq-json/ - “jq is sed for JSON”
https://thoughtbot.com/blog/jq-is-sed-for-json