Dr. Splunk-love or: How I learned to start indexing and love the CSV

I’ve been having trouble indexing CSV files. In particular, CSV files from Tripwire. I’ll show you the format and how I was able to index the files in Splunk

Node Name,Node Type,Policy,Parent Test Group,Test Name,Description,Element,Result Time,Result State,Actual Value
“192.168.1.1”,Linux Server,“My Policy Name”,“My Test Group”,“My Test Name”,“My Test Description”,“Some Element”,10/25/15 2:02 AM,passed,“ELEMENT=foo”

Here’s my Props.conf stanza:

[te_scm_csv]  
DATETIME_CONFIG = NONE  
KV_MODE = none  
NO_BINARY_CHECK = true  
SHOULD_LINEMERGE = false  
LINE_BREAKER = (?:")([\r\n]+)  
INDEXED_EXTRACTIONS = csv  
HEADER_FIELD_LINE_NUMBER = 1

Explanation:

  • DATETIME_CONFIG – I want the Splunk events’ _time field to be the index time (AKA when the CSV file was generated, not the “Result Time”)
  • KV_MODE – I’m not sure if this is necessary, but I disabled it just to be sure
  • NO_BINARY_CHECK – same
  • SHOULD_LINEMERGE – some lines in the CSV had line breaks in the field values. Splunk was throwing away these events, so I had to override the default (see LINE BREAKER, next)…
  • LINE_BREAKER – In combination with SHOULD_LINEMERGE, this causes Splunk to create a new event only when a double quote is followed by a CRLF sequence.
  • INDEXED_EXTRACTIONS – explicitly tell Splunk this is a CSV
  • HEADER_FIELD_LINE_NUMBER – by default, Splunk “automatically detects” this. This is not a problem for most CSV files, but for this file format, it was causing the header row to be indexed as an event.

As a verification step, I loaded the same CSV file into Excel and cross-checked my event counts with a Pivot Table.