The grass is rarely greener, but it's always different

Automatically loading json files to ElasticSearch

Introduction

Right now at work I am working in the (Big Data Europe) project, a joint effort or several organizations and enterprises accross Europe to create a platform that provides a set of software services that allow to implement big data pipelines, with minimal effort compared to other stacks and in an extremely cost effective way.

This is to make any company or organization that wants to make sense of their data using Big Data an easy starter to play around with the technologies that allow so.

One of the pieces I developed is mu-bde-logging, a standalone system that allows to log HTTP traffic from running docker containers and post it into an EL(K) stack in real time for further visualization.

Last requirement was to add the possibility to replay old traffic backups and post them into the ElasticSearch instance to visualize them offline in Kibana.

So I wrote a small script to scan for the transformed .ha files (json format) in a given folder and replay them into the ElasticSearch container. Since ElasticSearch & Kibana containers are part of a docker-compose.yml project, I didn't care much about being generic and used the name the docker-compose script will give the containers, but it is easy to change & extend.

The Code

This is what I came up with:

#!/usr/bin/env bash

#/ Usage: ./backup_replay.sh <backups_folder>
#/ Description: Run ElasticSearch and Kibana standalone and post every enriched .har file in the backups folder to ElasticSearch.
#/ Examples: ./backup_replay.sh backups/
#/ Options:
#/     --help: Display this help message
usage() { grep '^#/' "$0" | cut -c4- ; exit 0 ; }
expr "$*" : ".*--help" > /dev/null && usage

BACKUP_DIR="../backups/"

# Convenience logging function.
info()    { echo "[INFO]    $@"  ; }

cleanup() {
  true;
}

# Poll the ElasticSearch container
poll() {
  local elasticsearch_ip="$1"
  local result=$(curl -XGET http://${elasticsearch_ip}:9200 -I 2>/dev/null | head -n 1 | awk '{ print $2 }')

  if [[ $result == "200" ]]; then
    return 1 # ElasticSearch is up.
  else
    return 0 # It will execute as long as the return code is zero.
  fi
}

# Parse Parameters
while [ "$#" -gt 1 ];
  do
  key="$1"

  case $key in
      -f|--folder)
      BACKUP_DIR="$2" # EXAMPLE
      shift
      ;;
      --default)
      default=YES
      ;;
    *)
    ;;
  esac
  shift
done


if [[ "${BASH_SOURCE[0]}" = "$0" ]]; then
    trap cleanup EXIT

    # Start Elasticsearch & Kibana with docker Compose
    which docker-compose >/dev/null
    if (( $(echo $?) == "0" )); then
      docker-compose up -d elasticsearch kibana
    else
      info "Install docker-compose!"
      exit -1
    fi

    # Poll ElasticSearch until it is up and we can post hars to it.
    elasticsearch_ip=$(docker inspect --format='{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' mubdelogging_elasticsearch_1)

    info "ElasticSearch container ip: ${elasticsearch_ip}"

    while poll ${elasticsearch_ip}
    do
      info "ElasticSearch is not up yet."
      sleep 2
    done

    # Find all .trans.har files in the specified backups folder/
    # Per each one, pos to ElasticSearch.
    info "Ready to work!"
    info "POST all enriched hars "
    find ${BACKUP_DIR} -name "*.trans.har" | sed 's/^/@/g' | xargs -i /bin/bash -c "sleep 0.5; curl -XPOST 'http://$elasticsearch_ip:9200/hars/har?pretty' --data-binary {}"
fi

Have fun!

#bash #docker #elasticsearch #kibana