How to onboard log file with two different line-breakers

Recently, a client asked for help onboarding this Mulesoft/Sitecore log file onto Splunk. This application log file appears quite complex at first glance.

Sample data

ManagedPoolThread #0 05:27:44 WARN  Failed to create counter 'Sitecore.System\Events | Events Raised / sec'. Sitecore has no necessary permissions for reading/creating counters.

Message: Access to the registry key 'Global' is denied.

12496 05:27:46 INFO  Cache created: 'master[data]' (max size: 100MB, running total: 150MB)

ManagedPoolThread #0 05:27:47 INFO  Trying to load XML configuration /App_Config/Security/Domains.config

ManagedPoolThread #7 00:28:11 INFO  

**********************************************************************

Sitecore.NET 10.1.0 (rev. 005207)

**********************************************************************

C:\inetpub\wwwroot\SC10.sc\bin\antlr3.runtime.dll (Antlr3.Runtime)

ManagedPoolThread #6 00:28:17 INFO  Health.PrivateBytes: 864,923,648

The client wanted separate Splunk events for the events that contain a timestamp, even though it looks like those events are nested under the ManagedPoolThread events. Additionally, some of these events were extremely long.

To onboard this data, I implemented the following sourcetype configuration in props.conf.

props.conf

[mulesoft_sitecore]  
SHOULD_LINEMERGE = false  
(?m)([\r\n]+)^(?:ManagedPoolThread|\d+ \d+:\d+:\d+)  
TRUNCATE = 0

Explanation

The `SHOULD_LINEMERGE`` property is a regular expression:

  1. (?m) enables multi-line mode for the regex pattern, as these events span multiple lines.

  2. ([\r\n]+)¬†specifies a capture group consisting of one or more carriage return or newline characters. The capture group is discarded by the indexing process, it is not part of any event‚ÄĒit is the space between events.

  3. ^(?:ManagedPoolThread|\d+ \d+:\d+:\d+) creates a non-capturing group, starting at the first character of the line. The non-capturing group must contain either ManagedPoolThread, or \d+ \d+:\d+:\d+. The former corresponds to the regular events that start with ManagedPoolThread, and the latter matches the pattern of the sub-events.

  4. The pipe character `|`` is the real magic here, as it allows the pattern on EITHER side of the pipe to match.

  5. The text matched by the non-capturing group is indexed like normal.

Conclusion

At first glance, this log file might seem too complex to onboard. However, a little bit of regex magic and it’s a piece of cake!