How to Monitor Splunk Index Growth Over Time

update 2023-12-29

This post is now very old and the Splunk on Splunk app has been deprecated for a long time. I’ve learned quite a lot about Splunk in the last 8 years, and given the popularity of Splunk Cloud, the recommendations below are simply no longer valid.


Original post

Although you can use the Splunk on Splunk app to monitor Splunk index sizes (and many other things!), you might be interested to monitor index growth over time as well. I’ll show you how to do that.

Just for demonstration purposes, you can run this search to see the kind of data that we will collect. This uses the rest command to collect the current index metadata from the Splunk REST API. As you can see, I renamed a few fields just for aesthetic reasons.

| rest /services/data/indexes/ count=0  
| rename title AS index splunk_server AS indexer currentDBSizeMB AS usage maxTotalDataSizeMB AS size

Data Generation

Create a saved search and save that output to a CSV file. The full query for the saved search is:

| rest /services/data/indexes/ count=0  
| rename title AS index splunk_server AS indexer currentDBSizeMB AS usage maxTotalDataSizeMB AS size  
| outputcsv index_stats.csv

I set the frequency to run every day - adjust this according to your own needs.

Data Collection

Set up an input that will index this CSV file. By default on a Linux system, the file will be saved in /opt/splunk/var/run/splunk/, and you might want to create a new index for this data.

Reporting

The last step is to create a dashboard or report with this new data. Here’s a couple of queries to get you started.

Top 10 non-internal indexes with the most day-over-day growth:

index=metadata extracted_index!=_*  
| eval _time=relative_time(_time, "@d")  
| join _time extracted_index [search index=metadata  
| eval _time=relative_time(_time, "+d@d")  
| rename usage to last_usage  
| fields _time last_usage extracted_index]  
| eval growth=usage-last_usage  
| chart sum(growth) as total_growth by extracted_index  
| sort - total_growth  
| head

Time chart of top 5 non-internal indexes with the most day-over-day growth:

index=metadata extracted_index!=_*  
| eval _time=relative_time(_time, "@d")  
| join _time extracted_index [search index=metadata  
| eval _time=relative_time(_time, "+d@d")  
| rename usage to last_usage  
| fields _time last_usage extracted_index]  
| eval growth=usage-last_usage  
| timechart sum(growth) by extracted_index where sum in top5

That’s all for today… Happy Splunking!

Last updated on Dec 29, 2023