Whether you're just starting out or looking to enhance your skills, this blog will provide the insights and knowledge you need to effectively use Splunk for log analysis in cybersecurity. Let's dive in and explore how to turn data into actionable insights with Splunk.
What is Splunk?
Splunk is a powerful data analysis platform widely used in cybersecurity and defensive security. It enables organizations to collect, index, and analyze vast amounts of machine-generated data from various sources, such as logs from firewalls, intrusion detection systems, and endpoints. By providing real-time insights, Splunk helps detect and respond to security incidents quickly, identifying threats and vulnerabilities before they can cause damage. Its robust querying capabilities allow security professionals to investigate suspicious activities, correlate events, and generate actionable intelligence to fortify defenses.
How does Splunk work?
Splunk collects and processes machine-generated data from various sources, indexes it, and makes it searchable in real time. Users can then query and analyze this data to gain insights, monitor systems, and respond to incidents. Here’s a step-by-step overview of how Splunk works:
- Data Ingestion: Splunk collects data from various sources such as servers, network devices, applications, and security systems. This data can be in different formats like logs, metrics, events, etc.
- Indexing: The ingested data is indexed by Splunk, making it searchable. During indexing, Splunk processes the raw data, extracting important fields and metadata.
- Search and Analysis: Users can search through the indexed data using Splunk's Search Processing Language (SPL). This allows for querying, filtering, and analyzing data to derive insights and detect patterns.
- Visualization: Splunk provides tools for creating dashboards, reports, and visualizations to present the data in an easily understandable format. These visualizations help in monitoring and decision-making.
- Alerting: Based on predefined criteria, Splunk can generate alerts to notify users of specific events or anomalies. This is critical for proactive monitoring and incident response.
- Apps and Add-ons: Splunk has a wide range of apps and add-ons available on Splunkbase, which extend its functionality and integrate with other tools and services.
Main Components of Splunk
- Forwarders: Collect and forward data to the indexers (Agents that are installed on the endpoints/servers).
a. Universal Forwarder (UF): Lightweight, for high-volume data collection.
b. Heavy Forwarder (HF): Can preprocess data before forwarding. - Indexers: Store and index the data, making it searchable. Handle data replication for high availability.
- Search Heads: Provide the interface for searching and analyzing data. Enable dashboard creation, reporting, and alerting.
- Deployment Server: Manages configuration and deployment of apps to forwarders and other components.
- Cluster Master: Manages indexer clusters for data replication and high availability.
- License Manager: Monitors data volume and ensures compliance with Splunk licensing.
What is SPL (Search Processing Language)?
SPL (Search Processing Language) is the query language used in Splunk for searching, filtering, and manipulating data stored within Splunk. SPL is designed to be powerful and flexible, allowing users to extract meaningful insights from large volumes of machine-generated data. It is tailored for log and event data, making it particularly useful for IT operations, security, and business analytics.
Splunk: Fields and Values
Fields in Splunk are key-value pairs that are extracted from the event data during the indexing process. They provide a structured way to search, filter, and analyze data. Each event in Splunk consists of multiple fields, which can be default fields (like _time, host, source, sourcetype) or custom fields (extracted from the event data). Fields and their values allow for precise filtering and targeted searches.
Fields
Splunk: Query Structure
A Splunk query typically consists of several components that allow users to search, filter, and analyze data. Here’s a breakdown of the basic structure:
- Search Command: The initial part of a Splunk query that specifies the data source and criteria.
Query:index=web sourcetype=http_logs
Explanation: This part of the query searches within the web index for events with the access_combined sourcetype. - Pipes (‘|’): Used to chain multiple commands together, passing the output of one command as input to the next.
Query:index=web sourcetype=http_logs | table _time, clientip, uri_path, status
Explanation: The pipe operator connects the search command to the table command, which formats the search results into a table. - Filtering Commands: Commands that narrow down search results based on specific criteria.
Query:index=web sourcetype=http_logs status=200
Explanation: This filters the search results to include only events with an HTTP status code of 200. - Transforming Commands: Commands that modify the structure or format of the search results.
Query:| stats count by clientip
Explanation: The stats command calculates the count of events for each unique clientip. - Statistical Commands: Commands that perform calculations on the search results.
Query:| stats avg(response_time) as AvgResponseTime
Explanation: The stats command calculates the average response_time and renames the result as AvgResponseTime. - Table Formatting Commands: Commands that format the search results into a table.
Query:| table _time, clientip, uri_path, status
Explanation: The table command organizes the specified fields into a tabular format for easy viewing. - Boolean Commands: Boolean operators can be combined to form more complex queries. Parentheses can be used to group conditions and control the order of operations. Boolean operators that can be used are- AND, OR, and NOT.
Query:index=web sourcetype=http_logs (status=404 OR status=500) AND method=GET
Explanation: This query retrieves events where the HTTP status is either 404 or 500, and the method is GET - Combination of all: The Splunk query that combines and demonstrates these components.
Query:
index=network sourcetype=firewall action=allowed (dest_port=22 OR dest_port=23) AND bytes>1000
| stats count by src_ip, dest_ip
| where count > 50
Explanation: This query looks for allowed connections to destination ports 22 or 23 with bytes transferred greater than 1000, counting the occurrences by source and destination IPs, and filtering for counts greater than 50.
Understanding SPL Queries and its Applicability
To understand SPL’s practically, we will be using Boss of the SOCv1 dataset, officially provided by Splunk and open source. This Dataset will be used in Splunk, which can be downloaded from Splunk’s Official page.
About BOTSv1
Boss of the SOC v1 is a realistic dataset that helps individuals apply SPL in various incident response/log analysis scenarios. The dataset contains more than 20 sourcestypes varying from Windows Event Logs, Sysmon, Network, Registry, IIS, etc, which helps in understanding and knowing how Splunk can be used in different use cases when it comes to Log analysis.
Setting up Splunk and BOTSv1
Once the required files (Splunk setup and BOTSv1 Dataset) are downloaded. The first thing that needs to be done is to install Splunk. In this case, we are using Ubuntu OS (Linux) and our setup file is a Debian package.
To install Splunk, follow these steps.
Installing Splunk
Splunk setup
Splunk configuration
Splunk setup completed
Installing BOTSv1
Upload BOTSv1
BOTSv1 added
Splunk Use Cases: Practical Log Analysis
The focus here is to showcase different queries and components that can be used to create query to achieve the motive or narrow down to specifics in the huge pool of events.
User-Related Activities
User account involved in execution
Understanding which user is compromised based on the execution activity can be helpful to know the scope of the incident. For this use case, we will use default Windows event logs and rule out some known activities which will narrow down the logs. For example, excluding any Splunk-related logs as we know it is used for log forwarding and would make no sense to investigate (initially, these can be excluded, but before closure do a separate check on Splunk if recommended to confirm if nothing propagated/pivoted using Splunk).
Query: index=botsv1 earliest=0 source="WinEventLog:Security" NOT "*splunk*" | stats count by Account_Name, New_Process_Name
User execution
Account Bruteforce Activity
Accounts could be brute force to get inside the environment or system to gain access. There are many techniques for this, but for this use case, we can assume the attacker is trying to pass multiple combinations of username and password to log in. If the credential is correct, the attacker will successfully log in.
We can make use of Event Codes which are registered in the Windows system whenever a specific event occurs. For example, when a user account logs in successfully, it triggers EventCode 4624. If it fails, it triggers the 4625 Code. Hence, we will use the EventCode field to filter for 4625 Code.
Query: index=botsv1 earliest=0 source="WinEventLog:Security" EventCode=4625
User account bruteforce
In this case, we do not have any logs for this event. But make sure to double-check if the query is current, case sensitivity, Fields, and logic should be correct.
System-Related Activities
Parent-Child Process Correlation
One of the most important techniques or use cases that comes in handy while performing log analysis of a system. To do this, we create a table with Parent Image, Parent Command Line, Child Process, and Child Command Line as field selections. This tabular representation gives a much clearer picture in terms of execution of any sort.
In this use case, we will make use of Sysmon logs.
NOTE: Not always the ingested logs will be parsed properly, i.e., you cannot perform filtering by using Fields, as they are not present/extracted due to unparsed log entry. It can look like this-
Unparsed Sysmon log
We can use rex to carve out the values and use a new field for it. We would need to find that placeholder/string, where the values of our interest will be available.
Query:
index=botsv1 earliest=0 sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational"
| rex field=_raw "<Data Name='CommandLine'>\s*(?<command_line_data>.*?)\s*<\/Data>" | rex field=_raw "<Data Name='ParentImage'>\s*(?<parent_image>.*?)\s*<\/Data>"
| rex field=_raw "<Data Name='ParentCommandLine'>\s*(?<parent_command_line>.*?)\s*<\/Data>"
| where isnotnull(command_line_data) AND len(command_line_data) > 0 AND isnotnull(parent_image) AND len(parent_image) > 0 AND isnotnull(parent_command_line) AND len(parent_command_line) > 0
| stats count by host, parent_image, parent_command_line, command_line_data
Field Carving using rex
Huge Sized Command Execution
Whenever a payload or malicious command is executed, the majority of the time the length of such commands is extremely large such as base64 encrypted command, or it's larger than normal Windows commands that could be seen.
Query:
index=botsv1 earliest=0 sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational"
| rex field=_raw "<Data Name='CommandLine'>\s*(?<command_line_data>.*?)\s*<\/Data>" | rex field=_raw "<Data Name='ParentImage'>\s*(?<parent_image>.*?)\s*<\/Data>"
| where len(command_line_data) > 500 | table command_line_data
Huge size command
Network-Related Activity
Suspicious Domain Lookup
Query:
index=botsv1 earliest=0 sourcetype="stream:dns" "query{}"=*.* NOT "query{}"=*.bing.com AND NOT "query{}"=*.arpa AND NOT "query{}"=*.microsoft.com AND NOT "query{}"=*.msn.com AND NOT "query{}"=*.info AND NOT "query{}"=*.local
| stats count by src_ip, "query{}"
Suspicious domains
Suspicious User Agents
Query:
index=botsv1 earliest=0 sourcetype="stream:http"
| search NOT http_user_agent IN ("Mozilla/5.0*", "Mozilla/4.0*", "Opera*")
| stats count by http_user_agent | sort - count
Suspicious user agents
Network Spike / Huge Data Transfer
The most important thing while focusing on network-related activities is looking for a huge data chunk to be transferred inside or outside the environment. Logically, it can be done if we sum up all the bytes transferred by source.
Query:
index=botsv1 earliest=0
| stats sum(bytes) as total_bytes by src_ip
| sort - total_bytes
| head 10
Network spike
Useful Tips
Having the clarity of the available data sources and fields.
Searching or narrowing down logs using event IDs.
Correlating parent and child process to see abnormality.
Backtracking timeline using Parent Process and its ID, Parent’s Parent Process and its ID, and so on to get to the initial vector.
Knowing what is normal and abnormal to see in logs.
Conclusion
Splunk is a powerful platform for searching, filtering, and analyzing machine-generated data, with its robust Search Processing Language (SPL) enabling complex queries. While Splunk provides the tools to navigate and manipulate datasets, the true differentiator lies in the analytics you develop over time. By practicing and creating logic tailored to specific use cases, you can derive deeper insights and value from your data.
If you found this topic interesting and you don’t have any exposure to Malware Analysis, Reverse Engineering, Digital Forensics, and Incident Response, why not take a look at our Blue Team Labs Online (BTLO) platform, Blue Team Labs Online. In the BTLO Platform, we have a lab Splunk IT, where you can implement and practice the techniques learned in this blog.
About Security Blue Team
Security Blue Team is a leading online defensive cybersecurity training provider with over 100,000 students worldwide, and training security teams across governments, military units, law enforcement agencies, managed security providers, and many more industries.