Dr. Splunk-love or: How I learned to start indexing and love the CSV

Categories:

I've been having trouble indexing CSV files. In particular, CSV files from Tripwire. I'll show you the format and how I was able to index the files in Splunk

 

Node Name,Node Type,Policy,Parent Test Group,Test Name,Description,Element,Result Time,Result State,Actual Value
"192.168.1.1",Linux Server,"My Policy Name","My Test Group","My Test Name","My Test Description","Some Element",10/25/15 2:02 AM,passed,"ELEMENT=foo"

Here's my Props.conf stanza:

[te_scm_csv]
DATETIME_CONFIG = NONE
KV_MODE = none
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
LINE_BREAKER = (?:")([\r\n]+)
INDEXED_EXTRACTIONS = csv
HEADER_FIELD_LINE_NUMBER = 1

Explanation:

  • DATETIME_CONFIG -- I want the Splunk events' _time field to be the index time (AKA when the CSV file was generated, not the "Result Time")
  • KV_MODE -- I'm not sure if this is necessary, but I disabled it just to be sure
  • NO_BINARY_CHECK -- same
  • SHOULD_LINEMERGE -- some lines in the CSV had line breaks in the field values. Splunk was throwing away these events, so I had to override the default (see LINE BREAKER, next)...
  • LINE_BREAKER -- In combination with SHOULD_LINEMERGE, this causes Splunk to create a new event only when a double quote is followed by a CRLF sequence.
  • INDEXED_EXTRACTIONS -- explicitly tell Splunk this is a CSV
  • HEADER_FIELD_LINE_NUMBER -- by default, Splunk "automatically detects" this. This is not a problem for most CSV files, but for this file format, it was causing the header row to be indexed as an event.

As a verification step, I loaded the same CSV file into Excel and cross-checked my event counts with a Pivot Table.