Identifying Splunk forwarders that phone home too frequently


As I've worked on many large-scale Splunk environments, a common problem I've noticed is that Splunk forwarders phone home to the deployment server (DS) too frequently. When a forwarder phones home to the DS more often than necessary, it wastes resources on the DS, and can prevent the DS from deploying apps to forwarders correctly.

By default, a Splunk Universal Forwarder or full Splunk Enterprise instance will phone home to the deployment server every 60 seconds. In a Splunk environment of a moderate size, this can easily overwhelm the resources of the DS. How quickly do you really need to deploy changes to your forwarders anyway? I normally recommend a phone home interval of at least 600 seconds (10 minutes).

Here's a simple query you can use to find forwarders that are phoning home most frequently.

index=_internal source=*splunkd.log "running phone" 
| stats count min(_time) as min_time max(_time) as max_time by host 
| eval span=max_time-min_time, minutes_per_phone=span/60/count 
| fields host count minutes_per_phone 
| sort - count

Happy Splunking!