Tips for parsing XML log files in Splunk

Normally, I prefer to send CSV or JSON data to Splunk. But sometimes XML can’t be avoided. I recently needed to ingest an XML file, and through judicious use of ‘MUST_BREAK_AFTER’ and ‘BREAK_ONLY_BEFORE’ in props.conf, I was able to extract the events from the XML file that looked like this:

<ReportSection name="foo_node" category="node">  
    <Long name="nodeOid">-94323972016633549</Long>  
    <String name="type">Windows Server</String>  
    <Integer name="passed">9</Integer>  
    <Integer name="failed">1</Integer>  
    <Integer name="errors">0</Integer>  
    <String name="status"></String>  
    <Integer name="statusPercent">90</Integer>  
    <String name="statusRange"></String>  
    <Integer name="noResults">0</Integer>  
    <Timestamp name="lastCheckTime" displayvalue="10/26/15 1:04 AM">1445835867360</Timestamp>  
</ReportSection>

The problem with this XML is that KV_MODE = XML will cause Splunk to extract the tag name (eg. “String”) as the events’ field name, rather than extracting the value of the name attribute from the XML. So you end up with an event looking like this: Splunk event example screenshot

Since I don’t write this blog to show you problems and leave you hopeless, here’s how to extract meaningful fields from this XML:

  1. Don’t put KV_MODE in props.conf
  2. Use index-time extractions instead. You can use more than one extraction if necessary.

props.conf:

REPORT-xml1 = xml1

transforms.conf:

[xml1]  
REGEX = <\w+ name="(\w+)"(?: displayvalue.*?)*>(.*?)<\/\w+>  
FORMAT = $1::$2  
MV_ADD = true  
REPEAT_MATCH = true

Originally, I was going to use a second extraction which would match the Timestamp tag and get the value of the displayvalue attribute. However, I decided instead to just grab the value for the whole Timestamp tag, which is the Unix timestamp. Splunk’s convert command makes it easy to work with Unix timestamps.

Here’s a screenshot of the end result in Splunk:Splunk event example screenshot converted