Tutorial

How to manipulate JSON using jq

May 12, 2020 by Bhavin Gandhi

JavaScript Object Notation, often referred as JSON is a data representation format which is human readable and easy to parse for machines. Personally, I find it hard to comprehend huge JSON files. In this blog post, I will be talking about the tool called jq. It’s a CLI tool to parse and manipulate JSON objects/files.

How I started using jq

In the past one year, I have been writing a lot of shell scripts (mostly for bash). I came across jq while reading data from JSON files. Most of the time I had to extract some values from output of other commands. Often the choice used to be grep, cut or some other tools. But for large outputs it started feeling like doing hacks to extract the values. Usually tools have a way to get the output in JSON format, that’s where I started using jq pretty heavily. Also REST APIs will give you option to get the response in JSON format. While there are libraries for other languages to parse JSON, I found jq to be a good tool when doing it in bash.

When I started using it, I felt like it’s a bit hard to do things using jq. But I always wanted to parse thing properly, so I decided to stick to it. Answers from Stack Overflow and jq manual helped to get the things done. While jq manual is detailed enough and has examples, it might feel a bit overwhelming at first. I will be covering all the small tricks I learned and a few standard things one have to perform in which jq and JSON parsing might help.

If you are completely new to jq, please take a look at the tutorial from official site.

$ jq --help
jq - commandline JSON processor [version 1.6]

Usage:	jq [options] <jq filter> [file...]
…

Command line options

Let’s start with few of the command line options which jq supports. These are just the flags which I use mostly and there are more of them, which you can find in the jq manual. Take a look at the Invoking jq section from the manual.

The `--raw-output` option

While reading existing shell scripts from a repository, I found jq’s output being piped to tr. It was something like jq … | tr '"' ''. The tr command just removes the double quotes (") from the output of jq. While reading jq’s manual, I found that there is a flag to achieve this. The flag --raw-output or -r prints the output without any double quotes if the result of the given filter to jq is a string.

The `--arg name value` option

From jq manual,

This option passes a value to the jq program as a predefined variable. If you run jq with --arg foo bar, then $foo is available in the program and has the value "bar". Note that value will be treated as a string, so --arg foo 123 will bind $foo to "123".

Important thing to note here is, value is going to be a string. I will be covering a way to convert it to a number. This option is useful when we want to pass some value from shell script to jq filter.

Enclose the filter in single quotes whenever possible. It is convenient to use single quotes, as JSON data often contains strings with double quotes.

Examples

Let’s see some example scenarios where I used jq. With these examples, I have shared my learning about jq below.

Find Kubernetes nodes with Ready status

While writing scripts to provision Kubernetes cluster, I wanted to check the number nodes which are in Ready state. This helps to make sure that all the nodes have joined the cluster and functioning properly. We can wait till the desired number of nodes are in Ready state.

kubectl cheat sheet has similar example which uses grep to find the number. kubectl supports multiple output formats for get command, JSON is one of them. In case of nodes the output of kubectl get nodes looks something like this (output is truncated for convenience),

{
  "apiVersion": "v1",
  "kind": "List",
  "items": [
    {
      "apiVersion": "v1",
      "kind": "Node",
      "metadata": {
        "name": "minikube",
      },
      "status": {
        "conditions": [
          {
            "message": "kubelet is posting ready status",
            "reason": "KubeletReady",
            "status": "True",
            "type": "Ready"
          },
        ],
      }
    }
  ],
}

Following command gives the number of nodes which are in ‘Ready’ state.

$ kubectl get nodes -o json \
  | jq '[
          .items[].status.conditions[]
	    | select(.type == "Ready" and .status == "True")
	]
	| length'

1

The output of kubectl command is piped to jq.
Single quoted input to jq is the actual filter which we are applying.
.items[] filter will give us all the elements from items array. The .[] is an iterator which returns all the objects from the given array.
Then we select all the condition objects from those Node elements using .status.conditions[].
Using the select() function, we select the objects whose type key has the value Ready and status key has the value True.
These selected objects are converted to an array by enclosing the whole filter in [].
At last, we call the length function on this array.

AWS CloudFormation parameters and outputs

CloudFormation is a service by AWS which allows us to write our infrastructure on AWS cloud as code (IaC). Set of infrastructure components are called as a stack. We can write the CloudFormation (CFN) stack definition in YAML or JSON. We can also provide parameters to the stack during creation/update which act as configuration variables.

Accessing the CloudFormation outputs

Once the stack is created, we can get all the details of the stack in JSON format. This includes all the given parameters and output values defined in the stack definition. Let’s see how we can get an output from stack’s details.

$ aws cloudformation describe-stacks \
    --stack-name "bhavin-yugabytedb-test-1" \
    --region "us-east-1" \
    --output json \
    | jq -r '.Stacks[0].Outputs[]
              | select(.OutputKey == "UI")
              | .OutputValue'

http://ec2-1-2-3-4.compute-1.amazonaws.com:7000

The above command selects an object from .Stacks[0].Outputs for which the value of OutputKey is equal to UI. It prints value of OutputValue from that object. The .Stacks[0].Outputs looks like this,

[
  {
    "OutputKey": "UI",
    "OutputValue": "http://ec2-1-2-3-4.compute-1.amazonaws.com:7000",
    "Description": "URL to access YugabyteDB admin portal"
  }
]

Updating the CloudFormation parameters

To update an existing stack i.e. changing some input parameters, we need to have old parameters as well. The easiest way to get those is to run describe-stacks command and save the .Stacks[0].Parameters.

$ aws cloudformation describe-stacks \
    --stack-name "bhavin-yugabytedb-test-1" \
    --region "us-east-1" \
    --output json \
    | jq -r '.Stacks[0].Parameters' > "stack-parameters.json"

Content of the stack-parameters.json file,

[
  {
    "ParameterKey": "InstanceType",
    "ParameterValue": "c5.xlarge"
  }
]

Now the next task is to update the stack-parameters.json file with new values. To achieve this, I wrote a shell script which takes three parameters. Those are, name of the key, value of the key and path to the JSON file.

#!/bin/bash

set -x -o errexit -o pipefail

key_name="$1"
value="$2"
file_path="$3"

if [[ -z "${value}" ]]; then
    echo "Not modifying '${key_name}' as the value is blank."
    exit 0
fi

jq --arg "key" "${key_name}" --arg "val" "${value}" \
  'map(
        if .ParameterKey == $key then
          .ParameterValue = $val
        else
          .
        end
  )' "${file_path}" > "${file_path}.tmp"

mv "${file_path}.tmp" "${file_path}"

The map() function applies given filter on each object of the array.
Our filter checks if the value of ParameterKey is equal to ${key_name}. If it is, then sets the value of ParameterValue to ${value}. Otherwise, keeps the object as it is.
The variables from shell script are supplied to jq as key and val using --arg command line option.

Let’s change the InstanceType and update the stack.

$ ./update_value_json.sh \
    "InstanceType" "c5.2xlarge" "stack-parameters.json"

$ aws cloudformation update-stack \
   --stack-name "bhavin-yugabytedb-test-1" \
   --region "us-east-1" \
   --use-previous-template \
   --parameters "file://stack-parameters.json"

The above solution is from Jessica Kerr’s blog post about jq.

Finding IDs from Terraform state

We can get details of the resources created using Terraform by running terraform show command. This command also supports JSON output. Following snippet is from a script which imports resources to Terraform state.

interface_id="$(terraform show -json \
  | jq --arg index "${index}" -r \
	'.values.root_module.child_modules[0].resources[]
	| select(.address == "azurerm_network_interface.YugaByte-NIC" and .index == ($index|tonumber))
	| .values.id')"

We have already seen all the functions used in this filter before. Notice the boolean expression from select(), especially .index == ($index|tonumber). Values passed using --arg are always of type string but .index is of type number. To call a function on a value in place, we can enclose the expression into round brackets.

Reference: json - How do I use jq to convert number to string?

References from jq manual

Identity: .
Object Identifier-Index: .foo,.foo.bar
Array/Object Value Iterator: .[]
Pipe: |
select(boolean_expression) function
Arrayconstruction: []
length function
map(x), map_values(x) functions
A playground for jq, written in Go https://jqplay.org

Tags:

JSON Kubernetes AWS CloudFormation Terraform

Posted on: Permalink

Comments

Comments are not enabled on this site. The old comments might still be displayed. You can reply on one of the platforms listed in ‘Posted on’ list, or email me.

GeekSocket Plug in and be Geekified