OBSOLETE: This post is about Flume-og. Unless you have a driving reason to still be using Flume-og, I recommend upgrading the flume-ng 1.4.0. The Flume development community has done a fantastic job rewriting Flume, and have add many great improvments.

Lately I have been working with and evaluating Flume as a log aggregation engine for our mid-sized web cluster. Flume is relatively simple to get up and running out of the box, and even easier if you take the time to create your own RPM, which I highly recommend if you are going to be deploying to more than a handful of machines.


I think one common misconception is the need to configure your agents from one of the Flume master nodes. This thought process makes the admins job tedious when it comes to provisioning and decommissioning nodes that are running as Flume agents, and throws them a curve ball when trying incorporate automatic node provisioning or configuration with tools such as puppet or CFEngine, and opens things up for inconsistencies (typos for instance) when having to manually go to the master for each server and data source you want to setup. It simply did not make sense to me, that I bring up a Flume agent on a node, and then have to go to the master server to configure the Flume source and sink for this agent. But wait. The Flume shell is available from any node including the agents, and from there you can connect to any master and configure yourself.

This has been brought up a few times on the Flume user forum, but has only been covered at a 10,000 mile view, or as some would say it provides only a “walking skeleton” of how to accomplish dynamic configuration of a node. In this post I will completely outline how I have accomplished dynamically configuring my agents. Giving me several benefits over manual configuration from the master nodes;

  1. This allows me to make sure the configurations across nodes are consistent because we are able to use predefined variable from the agent side when configuring the host and source.
  2. This allows me to keep my Flume configuration clean. Configurations are removed when an agent is brought down, and added back to the master when they are started up.
  3. I can now create firewall rules preventing access to the web console, which in my opinion is a fairly large security hole to begin with.
  4. With the setup described here, this will also allow me to simply push a new configuration out to my cluster, and have it added to the master configuration. Eliminating the need for me to script out a long page of changes that would need to be added to the master manually for each agent I had running.

This cookbook or setup guide will work in either a multi-master or single master configuration and is specific to sending the data to HDFS, you can however send the data wherever you want in your collector output. For our setup we are using the “value” decorator to configure a variable based on the basename of flume script. This variable. named “category” in our examples, is then used as an output decorator in the HDFS path. Allowing us to store each data source in a separate directory named our “category” name. For redundancy purposes we have chosen to deploy with a multi-master configuration. NOTE: There are drawbacks and missing features when using multiple masters. See our blog post [LINK TO BLOG HERE].

There are many pieces that must be added to make this configuration work. I am including all scripts and files in order for you to get an idea of how to make this work.

  1. Create the following directory if not included in your installation already, this is where all flume shell scripts and environment files will live.
    1. $FLUME_CONF_DIR/conf.d/
  2. Even though the master servers are listed in the flume-conf.xml or flume-site.xml file. We list them in a separate file that is easier to parse for this purpose than the xml files. Create the following file and list each master node on a separate line
    1. $FLUME_CONF_DIR/conf.d/masters
    2. >cat masters
      nodeA
      nodeB
  3. Now We need to create a simple script that will be sourced at the top of all our flume shell scripts. This file will simply setup some consistent variables for use in configuring the agent. For our purposes we only define 2 variable, but the options are endless
    1. $FLUME_CONF_DIR/conf.d/flume-env.sh
    2. >cat flume-env.sh
      #!/bin/bash
      
      FQDN=`hostname -f`
      HOST=`hostname`
  4. Next we need to configure the flume shell scripts that will be called from our custom startup and shutdown scripts. Each flume script is written to build the configuration on startup, and remove it when we shut down the agent. This is written to be somewhat portable. The sample script here uses a simple “tail” source which is defined in the FILENAME variable. So we can have as many of the flume scripts as we want and on startup we will loop through them all and configure each data source as a source.
    1. $FLUME_CONF_DIR/conf.d/SysMessages.flume
    2. >cat SysMessages.flume
      #!/bin/bash 
      
      if [ -f /opt/flume/conf/conf.d/flume-env.sh ]; then
      . /opt/flume/conf/conf.d/flume-env.sh
      fi
      
      CMD=$1
      
      CATEGORY=`basename $0 .flume`
      LOGICALNODE="$HOST-$CATEGORY"
      FILENAME="/var/log/messages"
      
      case $CMD in
      start)
      echo "connect $MASTER"
      echo "exec unconfig $LOGICALNODE"
      echo "exec decommission $LOGICALNODE"
      echo "exec unmap $FQDN $LOGICALNODE"
      echo "exec decommission $FQDN"
      echo "exec purge $LOGICALNODE"
      echo "exec purge $FQDN"
      echo "exec config $LOGICALNODE 'tail(\"$FILENAME\")' '{ value(\"category\", \"$CATEGORY\") 
      => agentBEChain(\"sand05.test.overstock.com\", \"sand06.test.overstock.com\") }'"
      echo "exec map $FQDN $LOGICALNODE"
      echo "exec refreshAll"
      ;;
      stop)
      echo "connect $MASTER"
      echo "exec unconfig $LOGICALNODE"
      echo "exec decommission $LOGICALNODE"
      echo "exec unmap $FQDN $LOGICALNODE"
      echo "exec decommission $FQDN"
      echo "exec purge $LOGICALNODE"
      echo "exec purge $FQDN"
      ;;
      *)
      echo "USAGE:  start|stop"
      esac
      
    1. Now for the startup and shutdown scripts. These scripts live in $FLUME_HOME/bin and these scripts do several things. They can be used to load or remove all $FLUME_CONF_DIR/conf.d/*.flume configurations and stop or start the agent, or you can pass the name of a flume script to the startup/shutdown script and we will simply just add or remove the configuration from the masters. Here is a break down of how these scripts work.

    First we loop through all masters listed in $FLUME_CONF_DIR/conf.d/masters and make sure we can get a successful connection to a master server. If connections fail to all masters we simply bail and don’t proceed. If no flume script is passed on the command line we will then loop through all .flume scripts found in $FLUME_CONF_DIR/conf.d and add/remove the configuration. For each configuration we then check with the master node to make sure it was successfully added to the masters configuration. We don’t bail on a failure here, we simply just notify the user that add/removing of the configuration appears to have failed. The script the proceeds to start or stop the flume agent on the node.

    In the case where the script is called as such;

    >startup SysMessages.flume
    or
     >shutdown SysMessages.flume
    

    The startup/shutdown scripts will simply add or remove the configuration from the master nodes without actually starting or stopping the agent on the node. This is useful in the cases where you just want to add or remove a new configuration.

    Startup Script:

    >cat startup.sh
    #!/bin/bash
    
    FLUME_CONF_DIR="/opt/flume/conf"
    FLUME_BIN="/opt/flume/bin"
    
    # Function to check for the configuration of the source
    checkNode ()
    {
    	$FLUME_BIN/flume shell -q -c $MASTER -e getconfigs |grep $LOGICALNODE &>/dev/null
    }
    
    # Are we starting everything or just loading a new configuration
    if [[ ! -z "$1" ]]; then
    	if [[  -f $FLUME_CONF_DIR/conf.d/$1 ]]; then
    	     FLUME_SOURCE="$FLUME_CONF_DIR/conf.d/$1"
    	else
    	     echo "File: $FLUME_CONF_DIR/conf.d/$1 does not exist"
    	     exit 1
    	fi
    else
    	FLUME_SOURCE="$FLUME_CONF_DIR/conf.d/*.flume"
    fi
    
    # Make sure we have the neccessary master configuration
    if [[ ! -f "$FLUME_CONF_DIR/conf.d/masters" ]]; then
    	echo "Missing masters or configuration file(s), please ensure all configuration files are present"
    	exit 1
    fi
    
    # Make sure we can actually connect to a master node
    for MASTER in `cat $FLUME_CONF_DIR//conf.d/masters`; do
    	$FLUME_BIN/flume shell -q -c $MASTER -e quit &>/dev/null
     	EXITCODE=$?
    	if [ $EXITCODE -ne 0 ]; then
    		echo "Master $MASTER appears to be down, checking next master"
    		continue
     	elif [ $EXITCODE -eq 0 ]; then
    		echo "Found a working master"
    		export MASTER=$MASTER
    		break
    	fi
    
    	echo "Should not hit this spot"
    done
    
    # Check to make sure MASTER was actually exported as env variable
    MAST=`set |grep -i master`
    if [[ -n "$MAST" ]]; then
    	echo "Master found: $MAST"
    else
    	echo "No masters were found alive"
    	exit 1
    fi
    
    # Setup configuration on master node
    for SOURCE in $FLUME_SOURCE; do
    	su hadoop -c "$SOURCE start | $FLUME_BIN/flume shell -q &> /dev/null"
    	HOST=`hostname`
    	CAT=`basename $SOURCE .flume`
    	LOGICALNODE=$HOST-$CAT		
    
    	sleep 10
    	checkNode
    	RESULT=$?
    
    	if [[ $RESULT -ne 0  ]]; then
    		echo "FAILED to add configuration for $LOGICALNODE"
    	elif [[ $RESULT -eq 0 ]]; then
    		echo "SUCCEEDED adding configuration for $LOGICALNODE"
    	else
    		echo "Could not determine if configuration was added for $LOGICALNODE"
    	fi
    done
    
    # Start the flume agent now if we are not just setting up a new configuration
    if [[ -z "$1" ]]; then
    	su hadoop -c "$FLUME_BIN/flume-daemon.sh start node"
    	echo "Configuration complete and node started"
    else
    	echo "Configuration complete"
    fi
    

    Shutdown Script: :

    >cat shutdown.sh
    #!/bin/bash
    
    FLUME_CONF_DIR="/opt/flume/conf"
    FLUME_BIN="/opt/flume/bin"
    
    # Function to check for the configuration of the source
    function checkNode
    {
    	$FLUME_BIN/flume shell -q -c $MASTER -e getconfigs |grep $LOGICALNODE &>/dev/null
    }
    
    # Are we stopping everything or just unloading a configuration
    if [[ ! -z "$1" ]]; then
    	if [[  -f $FLUME_CONF_DIR/conf.d/$1 ]]; then
    	     FLUME_SOURCE="$FLUME_CONF_DIR/conf.d/$1"
    	else
    	     echo "File: $FLUME_CONF_DIR/conf.d/$1 does not exist"
    	     exit 1
    	fi
    else
    	FLUME_SOURCE="$FLUME_CONF_DIR/conf.d/*.flume"
    fi
    
    # Make sure we have the neccessary master configuration
    if [[ ! -f "$FLUME_CONF_DIR/conf.d/masters" ]]; then
    	echo "Missing masters or configuration file(s), please ensure all configuration files are present"
    	exit 1
    fi
    
    # Make sure we can actually connect to a master node
    for MASTER in `cat $FLUME_CONF_DIR//conf.d/masters`; do
    	$FLUME_BIN/flume shell -q -c $MASTER -e quit &>/dev/null
     	EXITCODE=$?
    	if [ $EXITCODE -ne 0 ]; then
    		echo "Master $MASTER appears to be down, checking next master"
    		continue
     	elif [ $EXITCODE -eq 0 ]; then
    		echo "Connection to master succeeded"
    		export MASTER=$MASTER
    		break
    	fi
    
    	echo "Should not hit this spot"
    done
    
    # Check to make sure MASTER was actually exported as env variable
    MAST=`set |grep -i master`
    if [[ -n "$MAST" ]]; then
    	echo "Master found: $MAST"
    else
    	echo "No masters were found alive"
    	exit 1
    fi
    
    # Setup configuration on master node
    for SOURCE in $FLUME_SOURCE; do
    	su hadoop -c "$SOURCE stop | $FLUME_BIN/flume shell -q &> /dev/null"
    	HOST=`hostname`
    	CAT=`basename $SOURCE .flume`
    	LOGICALNODE=$HOST-$CAT		
    
    	# Check configuration
    	checkNode
    	RESULT=$?
    	if [[ $RESULT -eq 0 ]]; then
    		echo "FAILED to remove configuration for $LOGICALNODE"
    	elif [[ $RESULT -eq 1 ]]; then
    		echo "SUCCEEDED removing configuration for $LOGICALNODE"
    	else
    		echo "Could not determine if configuration was removed for $LOGICALNODE"
    	fi
    done
    
    # Start the flume agent now if we are not just setting up a new configuration
    if [[ -z "$1" ]]; then
    	su hadoop -c "$FLUME_BIN/flume-daemon.sh stop node"
    	echo "Unconfiguration complete and node stopped"
    else
    	echo "Unconfiguration complete"
    fi
    

    Leave a Reply

    Post Navigation