1. Introduction
Digital forensics is used to identify a malicious crimes suspect or forbidden behaviors on systems or networks, and to track evidences of such actions. Malicious users can attack a target system in many ways. The technique may not necessarily be a technical one. In accordance with the network environment, forensic research that is tailored to various environments, such the Internet of Things (IoT) and cloud computing, is being actively conducted [1,2].
Digital evidences tend to be widely scattered, and because such evidences are intermingled with other traces, identifying the key evidences manually is labor-intensive and tedious. Therefore, the experience of experts is required. For conducting digital forensics, various tools are currently available, including EnCase [3], Forensic Toolkit (FTK) [4], and AXIOM [5]. These tools categorize data in various ways to support digital forensic analysis. However, it is not possible for these tools to support all the various devices and applications currently in use. Moreover, although these tools can provide imaging of the disk containing evidence traces, or gathering and categorizing data in storage, they cannot provide tracking scenarios of specific events. Furthermore, these tools do not provide visualization; rather, they offer only a text-based user interface in explorer form. It is consequently difficult to visualize at a glance the correlation of various pieces of evidences.
Microsoft Windows (hereinafter Windows) is the most widely used operating system [6], constituting 54.6% of the global operating system market [7]. It offers the capability of storing various system logs for auditing. We can save events by setting audit policy, and can check the events using Windows Event Viewer. However, many events occur in Windows for even simple tasks, such as opening a file, and it is therefore difficult to identify the user actions from the events. This is because the stored logs often do not have information about the action that caused the events. Moreover, the events that one action triggers differ from case to case. The filtering structure of Event Viewer is too simple to filter out events of interest when multiple users performed multiple tasks in various ways.
Security is not limited to server or network attacks by professional attackers. Auditing and tracking of computer files are necessary for protecting trade secrets, preventing unauthorized access to information or resources, and preventing data manipulation [8]. Studies to protect the integrity and ownership of files have been conducted [9]. However, this paper focuses on accessing files in shared folders on servers under active directory (AD) [10]. Kim et al. [11] proposed a technique for extracting from complex Windows logs the key events that can track specific actions. In the present study, we organized the log classification into three steps by segmenting and extending the classification. This structure enables the analyst to adjust the trace depth in three steps. We additionally suggest dashboard templates using Elastic Stack to visualize the tracking process.
The remainder of this paper is as follows. Section 2 discusses the Windows audit system. Section 3 introduces Elastic Stack. Section 4 presents and discusses the proposed event trace model. Section 5 demonstrates event tracking using the proposed model. Section 6 provides conclusions and future work.
2.Windows Audit System
The security audit policy settings under Security Settings\Local Policies\Audit Policy provide broad security audit capabilities for client devices and servers that cannot use advanced security audit policy settings. The basic audit policy settings are audit account logon events, audit account management, audit directory service access, audit logon events, audit object access, audit policy change, audit privilege use, audit process tracking, and audit system events [12].
2.1 Windows Audit Policy
Windows records and manages event logs in six categories: Account Logon, Account Management, Detailed Tracking, DS Access, Logon/Logoff, and Object Access. Our research is interested in Account Logon, Logon/Logoff, Object Access categories and their subcategories. Table 1 describes categories and subcategories [13].
Categories and subcategories of interest
As shown in Fig. 1, we can set policies to log the events that are required for auditing [14]. The figure depicts an example of setting “Audit object access” to “Success, Failure” to record the events for the shared file that is the target of our study.
Windows audit policy setting.
2.2 Windows Event Viewer
Windows provides an Event Viewer, which enables the viewing of event logs stored according to audit policy settings. Event Viewer can list the logs by category, such as Application or Security. Fig. 2 shows an example of selecting the Security category, including the keywords, date and time, event ID, and task for the logs in this category. To view detailed information in the lower window, we can select an item from the list in the upper window. The details vary depending on the event. As we filter only the file accessrelated event (event ID: 5145) of interest, it shows the subject, file share information, and access mask.
2.3 Analysis of Windows Events
In the Event Viewer shown in Fig. 2, we can filter the logs by event-ID. However, this feature only supports very simple filtering; it does not support filtering by different attributes or multiple filtering conditions.
When a user performs one action, the system generates many event logs. In many cases, it is difficult to identify the action that caused the events, because different events may occur for the same action. Table 2 outlines the events that occur when we “overwrote a file in a shared folder” twice. The user actions were the same; however, the first action produced 15 logs and the second action produced 26 logs. Table 3 shows the meaning of each bit of the access masks [11,15].
Different events for the same behaviors
Meaning of each bit of the access masks
3. Elastic Stack
Elastic Stack is a tool recently introduced for analysis visualization in various areas [16,17], including the security domain. Elastic Stack consists of Elasticsearch, Logstash, Kibana, and Beats. Elasticsearch is a search and analytics engine. Logstash is a server-side data processing pipeline that simultaneously consumes data from multiple sources, transforms it, and then sends it to a “stash,” such as Elasticsearch. Kibana enables data visualization with charts and graphs in Elasticsearch. Beats is a family of lightweight, single-purpose data shippers in the Elastic Stack equation [18].
Kibana supports a variety of charts, thereby enabling analysts to choose the appropriate chart for the situation. Analysts can set various options on each chart, such as filter and annotation, to obtain a concise result from a considerable amount of information. In addition, dashboards can be organized by consolidating the various charts required for analysis. Hence, analysts can view the analysis process at a glance.
Recently, many studies on applying Elastic Stack to data visualization have been actively conducted. In the security area, an increasing number of studies have employed Elastic Stack for security threat detection and analysis. Park and Hyun [19] proposed a service that collects scattered web artifacts and provides visualization using Elastic Stack for digital forensics. Kim and Shon [20] used Elastic Stack to detect cyber threats in industrial control systems. Lee and Yang [21] proposed an Elastic Stack-based security log analysis system. We performed an analysis on the windows logs in [22]. In this paper, we present a method to systematically classify logs based on the analysis results, and a method to support analysts by creating an analysis template using Elastic Stack.
4. Proposed Event Trace Model
4.1 Classification of Event Logs
As noted above, an action does not always generate the same events. It may generate different events, depending on the way of execution. We thus designed three event databases, as shown in Table 4, following the same procedure depicted in Fig. 3. First, we recorded the events that occurred by executing one action more than ten times in the same way. FullLog is a list of events that commonly occurred in one action. ComLog is a list of common events for each similar action group created by selecting common events from FullLog. Finally, IdLog is a list of events extracted from FullLog and is the list of minimum events that can distinguish each action from other actions.
Classification of logs based on the user’s action and method.
Table 5 outlines ComLog and IDLog. We divide the actions that cause the Open, Write, and Delete events into groups and record the common events that occur in each group in ComLog. To make IDLog, we list the different actions that trigger the events of each group. Subsequently, we extract the events that distinguish each action. One event consists of a relative target name and an access mask. FullLog contains event logs for various actions. Each action consists of multiple logs. Because of space constraints, they are not included herein. Flist in Fig. 11 in Section 5.2 shows an example used in our experiment.
4.2 Elastic Stack Template
Illegal file leakage and tampering of shared files are major audit targets. There are various means of copying and modifying files. For example, a copy operation may occur before a modify operation. We experimented using various modification methods. The logs generated by the modification methods thus differed. Fig. 4 shows three representative cases. In the simplest Case 1, we modified the file on the server. In Cases 2 and 3, we modified on the client’s computer and overwrote to the server’s shared folder rather than modifying in the server’s shared folder. Note that the file modification time is not the written time on the server but the modified time on the client [22]. In Case 3, we opened two files on the server and client, respectively, then we copied the content from the server file and pasted into the client file.
Three methods to modify a shared file.
Event tracking occurs in several steps. As described in Section 2.2, the Windows Event Viewer provides only simple filtering. We thus used Kibana in Elastic Stack to configure the dashboard template to incrementally filter and analyze the logs, as shown in Fig. 5. The dashboard for the file tampering example consists of two charts (Wchart and Rchart) and two lists (Flist and Rlist), as shown in Fig. 5. To clarify the description, we define the symbols as follows:
[TeX:] $$t_{c}:$$ Creation time - when the file was created
[TeX:] $$t_{m}:$$ Modification time - when the file was most recently modified
[TeX:] $$t_{d}:$$ Detection time - when an auditor detects an illegal modification
[TeX:] $$t_{w}:$$ Writing time - when the attacker wrote the file to the server
[TeX:] $$t_{r}:$$ Reading/Copying time - when the attacker read or copied the file
S: Suspected user - a user suspected of being the attacker
Dashboard template for file modification.
Wchart is used to identify the time (tw) of the most recent event in which the file was written/overwritten. In addition, it is possible to identify user S who caused the particular event. To more specifically track an attack, Rchart is used to identify the time (tr) when S read or copied the file. Table 6 summarizes the options for constructing these charts. Annotations in the options are access masks to track. In Wchart, event IDs in IdLog are used as options for various ways of generating write/overwrite events. If one aims to ignore the write method used, only 0x12019F and 0x17019F in ComLog need be set for the write group. In Rchart, the start of the time range is set to tc; however, it can be set to the last file backup time. We call the list of events selected from FullList for comparison Flist.
Options and objectives of the dashboard charts
In addition, Rlist lists events extracted from RawLog to identify the exact file copy method by checking detailed events on the suspect’s operation. Unlike writing, it is difficult to distinguish between copying and reading; thus, it is necessary to compare them with FullLog on read/copy operations. Flist is composed of the FullLog to be compared. Table 7 summarizes the options for constructing these lists.
Options and objectives of the dashboard lists
4.3 Event Tracing Process
Event tracing consists of preparation and analysis phases, as shown in Fig. 6. The preparation phase consists of P1, P2, and P3 steps as follows:
P1: Create RawLog by setting the audit policy, as given in Section 2.1, to save the required logs.
P2: Prepare FullLog, ComLog, and IdLog by analyzing and classifying RawLog according to the method given in Section 4.1.
P3: Prepare dashboard templates for major attacks using ComLog and IdLog.
For example, if a file is suspected of being manipulated at time [TeX:] $$t_{d},$$ the file modification time is confirmed as [TeX:] $$t_{m}$$ then the analysis begins. The analysis phase consists of steps A1, A2, and A3 using the dashboard described in Section 4.2.
A1: Display RawLog in Wchart with [TeX:] $$t_{d,} t_{m},$$ the name of the file suspected of having been manipulated, event ID (5145), and the write-related access masks outlined in Table 5. Find suspect S, file modification time tw, and the write method used in the last event in Wchart.
A2: Add S to the Rchart option and change the option to the read-related access masks to determine the read or copy operation that was performed before the modification.
A3: For the actions found in A1 and A2, the estimated suspect and actions are confirmed by comparing the corresponding FullLog and RawLog and by confirming the occurrences and orders of the detailed events.
If only a simple verification procedure is required, the analyst can verify the suspect’s crimes with the results of A1. However, proceeding to A2 and A3, the analyst can obtain detailed evidence of the action. Therefore, the analyst may choose the depth of the evidence trace depending on the situation.
5. Experiment and Result
5.1 Experimental Environment
To evaluate the proposed model, we constructed the environment as shown in Fig. 7. Table 8 shows the operating system and software version used. We configured the audit policy in the AD server [23] to log events for “object access,” and set up Winlogbeat on the AD server to send the event logs to Logstash
Experimental environment.
on the Elastic Stack server in real time. On the Elastic Stack server, Logstash receives event logs from Winlogbeat and stores them as RawLog. FullLog, IdLog, and ComLog are constructed in table form by analyzing the file access logs in the shared folder. Users Kim, Jo, and Kwak are connected to the AD server as clients.
5.2 Case Analysis
We generated a tampering action for a shared file in the AD server that produced complex logs as follows. The file secret.doc in the subfolder Kim of the shared folder is a file whose integrity must be guaranteed. Kwak copied secret.doc to his computer and modified it. He overwrote the modified file onto the server’s original file. Fig. 8 shows the timeline of these actions. The actual unit of time stored in the system is milliseconds; however, for convenience, it is here expressed in minutes.
Timeline of secret.doc modification and detection.
Suppose, March 10, 2020 at 17:30, it is determined that secret.doc has been manipulated. The last modification time of the file is March 10, 2020 at 16:15. As shown in Fig. 9, using Wchart, we can identify the user who tampered with the file as being Kwak. It is also possible to determine that Kwak overwrote the file by using [TeX:] $$" \wedge \mathrm{C} \&^{\wedge} \mathrm{V} \text { " or "drag \& drop" }$$ on “Mar 10, 2020, 4:54 PM” based on the access mask (0x170196) and time in the annotation. Unfortunately, using the Windows event log it is impossible to determine which overwrite method was used.
Next, as shown in Fig. 10, we used Rchart to verify that Kwak had read the file several times before overwriting it. In Rchart, it was possible to view only the file that had been read for copying several times. To determine the specific copy method, Flist and Rlist were compared, as shown in Fig. 11. The comparison shows that Kwak opened secret.doc and then used the "Save as" function in the file menu to save the file on his computer.
Identifying the suspect and write time using Wchart.
Identifying the reading action of the suspect using Rchart.
Detailed actions of the confirmed perpetrator.
It can therefore be concluded that Kwak copied secret.doc to his computer by using “File/Save as” on March 10, 2020 at 13:31:57.068, modified and saved it in his computer on March 10, 2020 at 16:15, and the overwrote it on the server on March 10, 2020 at 16:54.
6. Conclusion
In this study, we built databases by analyzing complex Windows logs, extracting events that occurred in common for each action on shared files, identifying minor events to distinguish actions, and identifying common events for similar actions. In addition, we designed a dashboard template using Elastic Stack for visual analysis. When an action that requires auditing occurs, the stored event logs can be analyzed by comparing them with the reference databases in the dashboard templates. Evidences of the suspect and action for the events can be selected by adjusting the analysis depth.
In this study, we collected and analyzed logs only from the AD server. However, it is necessary to analyze the logs on the suspect’s computer to produce a complete collection of behavior evidences. Therefore, further research is needed to extend the model so that client logs can also be sent to Elastic Stack for analysis. In addition, the databases must be extended to analyze the event logs for other actions and event logs on the client side. One of the limitations of our study is that Windows logs are not sufficient for file tracking. For example, it is not possible to distinguish whether [TeX:] $$" \wedge \mathrm{C} \&^{\wedge} \mathrm{V}^{\prime \prime} \text { or "drag \& drop" }$$ is used as a file copy method. Elaboration of Windows log is necessary considering digital forensics.
Acknowledgement
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2017R1D1A3B03032637).