This information is not new, and can be found in the flume cookbook. However these are real world working example showing how to send a file from a flume agent to your collector from the command line. The main use case I have for this, would be to resend a file of events that was rotated while the agent was down, or sending a file that was created before you started monitoring it.

First off we should run the flume agent with the node_nowatch option. This allows us to either pipe data to flume or read the data in through a normal flume source.

Command line options we should use per the flume cookbook

  • -1 – one shot execution. This makes the node instance not use the heartbeating mechanism to get a config.
  • -s – starts the Flume node without starting the node’s http status web server. If the status web server is started, a Flume node’s status server will keep the process alive even if in one-shot mode. If the -s flag is specified along with one-shot mode (-1), the Flume node will exit after all logical nodes complete.
  • -n <tempNodeName> – gives the node the physical name node. This is used in the next command
  • -c <command> – Starts the node with the given configuration definition. NOTE: If not using -1, this will be invalidated upon the first heartbeat to the master. ie; “tempNodeName:>flumeCommand|<FlumeOutput>;”

Here are some working examples. All examples assume you have a collector node setup with the hostname collect.domainname.com and are wanting to send the apache log file located in /var/log/httpd/access_log. Please pay special attention to the quotes and escaping.

This command simply execs the linux/unix cat command on the log file and sends the data to our collector.

 >$FLUME_HOME/bin/flume -1 -s -n testNode -c "testNode:exec(\"cat /var/log/httpd/access_log\") | agentBESink(\"collect.domainname.com\");"

Here we accomplish the same things as above, however we are piping the output of the cat to flume and setting flume up to read from the console and output to our collector

 >cat /var/log/httpd/access_log | $FLUME_HOME/bin/flume -1 -s -n testNode -c "testNode:console | agentBESink(\"collect.domainname.com\");"

In this last example, I show how to also use the value sink decorator in case your collector output path is expecting it to build the output path. Here we are setting a value of category to the name of the file we are sending. This would the be used as variable named %{category} in the output path, and would be replace with the name of the file

 >$FLUME_HOME/bin/flume -1 -s -n testNode -c "testNode:exec(\"cat /var/log/httpd/access_log\") | { value(\"category\". \"access_log\") => agentBESink(\"collect.domainname.com\");"

That’s all there is to it. Remember, the concepts here are also outlined in the flume cookbook. These are merely provided as real world working examples.

Leave a Reply

Post Navigation