Splunk Use Case Implementation Process

Summary

How are use cases developed and implemented? What does it mean to tune a correlation search? This post explains a disciplined process that focuses on the early and continuous delivery of valuable use cases. You’ll learn time-saving shortcuts and a systematic approach that eliminates frustration and churning.

Introduction

Use case implementation is a phased process requiring the collaborative effort of multiple stakeholders. These phases repeat continuously throughout the process, where the output of the current iteration is used as input to the next iteration.

The use case implementation process begins when stakeholders come together and begin the very first iteration in the Requirements phase, and it never completely ends, because the internal business and external threat environment are continuously changing. The phases do not need to be formally tracked; it’s rather like driving a car: you continuously scan the road, mirrors, and speedometer as you navigate to your destination. It’s best to collaborate on this process rather than to involve bureaucratic project management that is so common in IT projects.

Phases of Use Case Development

  1. Requirements
    1. Collaborate with stakeholders and get specific on the problem you’re trying to solve
    2. Multiple searches may be required to detect all conditions
  2. Plan
    1. What will the data source? Do the necessary field extractions exist
    2. Requirements may drive customizations to add-ons
  3. Design
    1. Will the query use tstats or from? Will it create a notable or a risk modifier?
    2. What enrichment is useful to expedite incident review?
    3. Ideally, a dense search, once, over a short time interval
  4. Develop
    1. Create the correlation search
    2. Adapt the query techniques of other use cases
  5. Release
    1. Turn on the correlation search in the production environment
    2. Requires coordination with SOC, change control, etc.
  6. Track & Monitor
    1. Continuously verify the effectiveness of the use case (frequency, fidelity, business value, etc.)

It is important to have minimal friction in moving between these phases. The use case development process is highly iterative and collaborative, and therefore any friction to the iteration will dramatically reduce the velocity of this process.

Also, don’t agonize over the details of these phases. The cycle is continuous, so for example don’t worry if you aren’t certain of the required enrichment data during the first iteration.

Advantages of this approach

  • Accommodates changes to use case requirements 
  • Continuous feedback is provided to all stakeholders
  • Facilitates self-learning in case you’re not an infosec expert

Keep in mind these key ideas: Accelerate development and avoid spinning your wheels. In other words, eliminate friction that slows you down, and don’t get stuck “in the weeds” on any particular detail.

Define the use case

Defining the use case is foundational to the implementation process. During the definition phase, multiple related use cases or correlation searches may be required. There’s no magic search to find all the things. So make a list of different patterns that you’re looking for, and don’t worry about how to craft the search yet.

The definition phase is also where the required data sources are defined. For example, a use case detecting data exfiltration through DNS will require DNS query data.

Define the required enrichment data

The goal of any use case is actionable information for SOC analysts to quickly review and act upon. Context is king for the incident response workflow. Therefore, it is imperative to consider the necessary enrichment data (asset/identity metadata) required for the SOC analyst to act. Any metadata that helps the analyst make a better decision should be included in the asset list, without polluting the notable event with useless information. Although it is tempting to demonstrate SPL wizardry by extracting every possible field from CMDB, this is not necessarily helpful and can reduce Splunk performance.

Verify the source data

Once the use case is defined, it is very important to verify that the source data actually meets the requirements of the definition. Sometimes, data sources seem right, but upon further inspection, the data source won’t work. Other times, the data source is correct, but the requirements of the use case require modification to the sourcetype in order to meet the requirements established during the definition phase.

Infeasible use cases/data sources

Review the use case definition and data sources to ensure feasibility.

  • Use case: Detect malicious DNS queries in Windows DNS debug log Reason: this log is not suitable for DNS query analysis Response: Use Stream to collect DNS query traffic
  • Use case: Detect p2p file sharing traffic using Cisco ASA firewall Reason: L3 firewalls only see address, port, and encryption protocol metadata. L3 firewalls don’t perform deep packet inspection, so it’s not possible to decrypt traffic for protocol analysis Response: use Palo Alto firewalls (next gen firewall) OR Stream AND replace browser certificates
Feasible, with customization
  • Use case: Ignore certain potentially-malicious activity detected by Crowdstrike Problem: Crowdstrike TA does not provide a CIM field indicating the specific activity that was performed Solution: Modify the Crowdstrike TA to conditionally concatenate certain fields into the “signature” field, which is CIM compliant.

Be aware that Splunkbase add-ons/TAs are not perfect

There are a number of issues with the community-developed add-ons found on Splunkbase. Even common add-ons have issues that will likely require research and customization to fix. To minimize these surprises, review the TA in combination with sample data early on during use case development. Issues that are identified can be prioritized and corrected at the right time during the engagement.

Involve the right stakeholders at the right time

Early during use case development, director-level stakeholders should be very involved with use case selection and definition. Later on, these same stakeholders must again be involved in the testing phase, to ensure that the final deliverables meet their expectations. During the definition phase, you’ll write a lot of ad-hoc queries to explore the data. This exploration is likely to change or narrow the use case definition. Keep a log of these discoveries, along with queries and sample data, as you work. This will prevent time-consuming churn.

Keep a development log for each use case

Throughout the entire use case development process, you will (should) generate artifacts—screenshots, notes, questions, sample data, web links—and you should save all of this together in one place for each use case. This is crucial in order to prevent rework.

Find (create) sample data

This sample data should be small, and yet accurately reflect the target environment in all the ways that matter. In many situations it is expedient to extract sample data to your own Splunk instance. This sample data doesn’t need to contain any PII, so you can redact the values of any sensitive fields. The sample data should contain examples of positive and negative cases. The negatives are just as important as the positives when it comes to query accuracy and performance.

Be mindful of performance

Keep in mind the performance implications of the search patterns, but focus on correctness rather than performance. Report acceleration and saved searches can be useful to summarize all-time searches (e.g. find the earliest time of a given value).

Make it work. Make it right. Make it fast.

This is a really useful motto to keep in mind when you’re developing a use case. For an example of this motto, you could “make it work” with sample data, “make it right” actual data, and “make it fast” by using tstats.

Be DRY

Don’t Repeat Yourself. Frequently asking the same question of the same data set is an anti-pattern. Consider using lookups, report acceleration, or (as a last resort) summary indexing to improve search performance.

Search Scheduling

Don’t focus on scheduling too early. Scheduling can be adjusted once the query logic is fully tested, and it is better to consider the schedule of all searches at the same time rather than just looking one search at a time.

Reserve a serial number for each use case.

This number will never change or be reused, and it will be the same number anywhere it is referenced in supporting systems (the saved search itself, Confluence, JIRA, email, etc). The correlation search name should be a short description of its purpose.

Avoid changing the CIM (Common Information Model)

Although it is sometimes unavoidable or even advantageous, the CIM should not be modified in most environments. Consider modifying the fields in the sourcetype. Common changes include: concatenating multiple fields into an existing CIM field, coalescing multiple fields, calculating a field value conditionally.

Verify data source configurations are correct and consistent

Although it is usually not the responsibility of the Splunk engineer to configure firewalls, appliances, and applications, it is still wise to verify/discuss these configurations with the appropriate engineers in order to be certain that the configuration meets the needs of the use case and is consistently configured across the environment.

Consider documenting these configurations with screenshots and store with other artifacts for the respective use cases. This provides a quick reference for the rest of the use case implementation process, and an immutable record of the known good configuration.

Gather feedback

The reason to implement a use cases is to deliver business value. Learn what is useful and what isn’t in order to improve future iterations. For example, work with the SOC analysts to understand what enrichment data (assets and identities) is useful to expedite incident review. Gather feedback from all stakeholders, not just the leadership.