1. Introduction
Computational intelligence and artificial intelligence techniques represent a new paradigm for managing computer servers in cloud computing environments [1-4]. Without computational intelligence and artificial intelligence techniques, human beings are required to monitor the status of computational resources—e.g., CPU utilization, memory usage, input/output (I/O) performance, etc. This human intervention is unavoidable because system failures can occur at any time and human beings are required to take measures in the case of such failures [5,6]. Moreover, intensive human intervention is almost inevitable for making the meaningful observations that are necessary to effectively manage largescale servers in cloud computing systems [7,8]. The recent development of computational intelligence techniques shed some light on how to deal with these problems effectively [9-11].
Efficient management of cloud computing resources includes the management of connected services for Internet of Things (IoT) devices and big data analytics. IoT devices, which have limited CPUs and batteries, can offload their processing to cloud computing resources [12,13]. For big data analytics, unlimited cloud computing resources offer fast, efficient, low-cost processing services on demand [14]. These benefits can be achieved by efficient resource management of cloud computing environments.
In this paper, we propose an intelligent residual resource monitoring scheme for cloud computing environments. The proposed residual resource monitoring scheme periodically monitors host machines in cloud computing environments, so that the post migration performance of a virtual machine is as similar to the pre-migration performance as possible. To this end, we use a novel similarity measure to find the best target host to migrate a given virtual machine to. The design of the proposed residual resource monitoring scheme helps maintain quality of service (QoS) and service level agreement (SLA) during the migration process.
Our scheme differs from the traditional monitoring schemes described in the literature because we have designed and implemented a novel scheme for monitoring the residual resources of cloud-based host machines. This ensures that the performance of the pre-migration virtual machine is as close as possible to that of the post-migration virtual machine. By leveraging the proposed monitoring scheme, dynamic cloud computing consolidation can be undertaken without human intervention, while preserving the QoS and SLA.
The major contributions of this paper are summarized as follows:
(i) We designed an intelligent residual resource monitoring scheme based on the current status of the host machines in cloud computing environments.
(ii) We developed a novel metric to calculate the similarity of the residual resources of virtual machines for migration and cloud consolidation.
(iii) We formulated a migration problem that takes into account the QoS and SLA, and implemented a dynamic residual resource monitoring algorithm by considering the scalability.
(iv) We undertook a comprehensive analysis and performance evaluation to show the effectiveness of the proposed monitoring scheme in a real-world scenario.
The remainder of the paper is organized as follows: After reviewing related work in Section 2, we describe the system model and formally define the problem addressed in Section 3. The proposed intelligent residual resource monitoring scheme for cloud computing environments is presented in Section 4. In Section 5, we present the results of our performance evaluation, with comparisons to previous systems. Finally, in Section 6, we conclude the paper with future research directions.
2. Related Work
The monitoring of computing resources is one of the fundamental problems of distributed and cloud computing environments. Because cloud computing environments generally have no shared memory, these systems are not trivial to monitor. We can detect malware, intrusion, and other types of misbehavior by monitoring computing resources [15,16]. Another reason for monitoring cloud computing environments is to migrate virtual machines in a secure way [17]. More specifically, when a virtual machine is scheduled for migration to an overloaded host machine, the QoS and SLA may not be guaranteed.
There are some centralized monitoring solutions for cloud computing environments [18-20]. The disadvantage of centralized monitoring schemes is the lack of scalability. In other words, when the number of nodes in the system increases, the overhead also increases. This may result in bottlenecks at single points of failure.
The authors of [21] proposed a distributed monitoring architecture for cloud computing systems. However, they did not detail how to map virtual machines and host machines in dynamic environments. Moreover, they require software agents, which are not suitable for our system design. VMDriver [22] is a fine-grained monitoring system for virtualized environments. The design is based on dividing event interception between the virtual machine monitor layer and the virtual machine semantic reconstructions, with implementation of a monitoring driver in Xen [23]. This driver uses a virtual machine monitor to gather the required information.
In [24], the authors focused on analyzing various monitoring architectures to determine their pros and cons. Therefore, cloud computing administrators can make choices based on the requirements imposed by their architectures. Aceto et al. [10] surveyed monitoring architectures for commercial solutions in detail. However, the internal implementations of these architectures are invisible.
The authors of [25] presented a monitoring framework based on peer to peer architecture, similar to our architectural design. However, the design goal differs in that our purpose for monitoring cloud resources is to ensure that the post-migration performance of a virtual machine is as similar as possible to the pre-migration performance.
Recently, Wang et al. [26] proposed a monitoring system with self-adaptive properties for cloud computing environments. The parameters of this system are adjusted dynamically based on a reliability metric. However, the aim of this monitoring technique is to detect anomalies.
In this regard, none of the discussed monitoring architectures fulfil our requirements. The basic requirement of our intelligent residual resource monitoring scheme is that it must be easily integrated into cloud computing infrastructures, without human intervention, so that the QoS and SLA are maintained during the migration. In the next section, we describe the system model and explain how our design goal can be achieved intelligently in cloud computing environments without human intervention.
3. System Model and Problem Definition
An informal definition of computational intelligence is an approach based on techniques inspired by nature to solve various problems, to which traditional techniques cannot be applied [27-29]. To monitor the residual resources of host machines in an intelligent way, we consider the hosts as a set of n nodes, where each node is connected by logical communication channels. Because each host machine has distinct resource properties (CPU utilization, memory usage, I/O performance, etc.), the performance of a virtual machine may differ after it is migrated to another host machine.
Fig. 1 shows the performance degradation problem induced by virtual machine migration. The relative performance of host A is 1, while that of host B is 0.6. Before migration, the virtual machine resides on host A. If the virtual machine is scheduled for migration to host B, the performance will not be as good as on host A. In fact, the relative performance of the virtual machine is 0.6. In this case, it is difficult for the virtual machine to guarantee the QoS and SLA on host B.
On the other hand, the opposite case can also be a problem. More specifically, when a virtual machine running on host B, on which there is no problem regarding the QoS and SLA, is scheduled for migration to host A, another virtual machine that is required for the QoS and SLA may not be migrated to host A because the number of virtual machines running on the host machine is limited. The goal of the proposed residual resource monitoring scheme for cloud computing environments is to identify a similar host machine to migrate the virtual machine on to. This will maintain the pre-migration performance of the virtual machine as much as possible.
Performance degradation problem after virtual machine migration.
4. Proposed Methodology and Algorithms
In this section, we describe the proposed residual resource monitoring scheme in cloud computing environments. Two things should be considered for cloud consolidation to effectively migrate a virtual machine. The first is to monitor the unused resources on the potential host (the number of vCPUs, memory, storage, etc.). The migration cannot be allowed if the unused resources do not meet the virtual machine’s minimal requirements. The other requirement is to measure the performance of the host machines in terms of their unused resources. This is important for maintaining the QoS and SLA, as described in the previous section.
Fig. 2 shows the monitoring architecture for cloud computing environments. The physical resources (CPU, memory, storage, etc.) are virtualized by the hypervisor or virtual machine monitor. In the hypervisor level, the virtualized resources can be managed for efficient server consolidation (e.g., realtime virtual machine scheduler, QoS monitoring module for services, energy-usage prediction module, service type classification, etc.). At a higher level, the cloud platform performs the virtual machine provisioning, efficient and energy-aware virtual machine placement, and server consolidation. The main advantage of our monitoring architecture is that the management of the cloud computing systems can be performed in an intelligent way, without human intervention.
The monitoring architecture in cloud computing environments.
When measuring the performance of the host machines’ unused resources, we consider the following two metrics: CPU and I/O performance. We predict that the memory performance will not significantly affect the performance of the virtual machines. Meanwhile, the CPU and I/O configurations may affect the performance of virtual machines because of variation between models—e.g., cache size for CPUs, and hard disk drive (HDD) or solid state drive (SDD) for I/O.
We use the resource cosine similarity between two nodes defined in Eq. (1) to measure the similarities between the host machines’ unused resources:
The ResCos score between two nodes increases when their resource properties are similar. Our preliminary experiments have shown that ResCos outperforms a simple measure that counts the number of common resource properties.
Furthermore, to quantify the similarities, we calculate |ResCos(node1, node2)|, as defined in Eq. (2).
By calculating the metrics defined in Eqs. (1) and (2), the proposed monitoring scheme is able to identify a suitable host machine for migration. In addition, we employ the tuning function to adjust the importance of the properties. This is defined by Eq. (3). The tuning function allows the monitoring scheme to prioritize between Eqs. (1) and (2).
To calculate the value of ResCos for n nodes, the monitor manager must perform [TeX:] $$\frac { n ( n - 1 ) } { 2 }$$ operations, so the complexity is [TeX:] $$O \left( n ^ { 2 } \right)$$. We can reduce the complexity by letting each node evaluate the scoring functions. Each node informs the manager when it has finished evaluating its scoring functions. The monitor manager then uses the collected information as required to migrate a virtual machine. This offloading feature enables the cloud computing system to scale the number of nodes.
Algorithm 1.
The proposed monitoring algorithm.
Algorithm 1 shows the pseudocode of the proposed monitoring algorithm, which is based on the similarity measure. The properties of the virtual machine (VMi) and physical machine (PMj) are taken as input. Note that in the algorithm, we use subscript i for virtual machines and j for physical machines. The output is the mapping information between the virtual machines and the physical machines.
The proposed monitoring scheme is designed to periodically perform the monitoring procedure (lines 1–8). Unlike the previous monitoring techniques, we offload the monitoring process to each node (both virtual and physical machine); therefore, the proposed monitoring scheme is scalable in terms of the number of cloud computing nodes. The procedures followed by the VMMonitor() and MonitorResidualResource() functions are detailed on lines 22–27 and 28–32, respectively.
The procedure implemented on lines 9–21 is performed when a virtual machine is scheduled for migration. The input to this procedure is the virtual machine information and the output is the mapping information between the virtual machine and the physical machines. This enables the cloud platform to migrate the virtual machine to the physical machine.
After retrieving the virtual machine information, it calculates the similarity measure to find the best physical machine for migration. At each iteration (lines 12–19), our algorithm stores the best physical machine according to our similarity measure. Next, it returns the mapping information between the virtual machine and the physical machine.
Since our monitoring algorithm evaluates the residual resources of the physical machines that are closest to those required by the virtual machines, our procedure has the following advantages: First, the post-migration performance of the virtual machine is as similar as possible to the pre-migration performance. Hence, the QoS and SLA will be guaranteed. Second, additional energy savings can be realized through the cloud consolidation enabled by our monitoring scheme.
For instance, suppose that there is a virtual machine on a physical machine (PM_A) and that the virtual machine’s score is 0.3. In this case, our proposed monitoring scheme finds a physical machine (PM_B) that can barely afford the virtual machine (i.e., the residual score is close to 0.3). Therefore, PM_A will be turned off to save energy, while PM_B runs more virtual machines without performance degradation.
Our monitoring scheme can be applied to various implementation patterns. For example, the cloud coordinator commands and collects monitoring information on each of the nodes in the system. This monitoring information can be used when selecting a target machine for live migration. Since the complexity of the algorithm increases linearly according to the number of machines, our monitoring scheme can be applied in a scalable system with low overheads.
5. Performance Evaluation
In this section, we present experimental results that demonstrate the performance of the proposed monitoring scheme for maintaining the QoS and SLA. Table 1 shows the experimental settings used for evaluating the performance of the monitoring scheme. These types of cloud tasks are computationally intensive tasks used for multimedia and big data processing. Although we employed an equal number of physical and virtual machines for the proof of concept, different scenarios can also be used.
To verify the scalability of the proposed algorithm, we set the number of physical and virtual machines to 1,000. In other words, there is one virtual machine running on each physical machine in the cloud computing system. We configured hardware settings by taking into account the relative performance of the physical machines and the virtual machines (e.g., memory, bandwidth, the number of CPUs, etc.).
Fig. 3 shows the relative performance indices of the physical machines. Note that the higher the value of the relative performance index, the better. These values are in the range between 0 and 1. This setting helps us to verify that the proposed monitoring scheme identifies a suitable target machine for migration.
Fig. 4 shows the difference in the relative performance index after finding the target migration machine. Note that the results are depicted on a logarithmic scale and confirm that our intelligent residual resource monitoring scheme works well without incurring SLA violations. The average difference in the relative performance index after identifying a suitable target migration machine is 0.000492899.
Relative performance indices of physical machines.
Difference in the relative performance indices before versus after finding the target migration machine with the proposed scheme.
Fig. 5 shows the count used to select a target machine from the set of all nodes. The results were between 0 and 2 and the counts were 256, 488, and 256 for 0, 1, and 2, respectively. When the count value was greater than or equal to 2, the migration process was managed by the cloud platform and the monitoring information relating to the target physical machine should have been updated. About half of the machines were selected as migration targets from the nodes at least once.
The count used to select the target machine from the available nodes.
We compared migrations with and without implementing our proposed monitoring scheme by measuring the effect of not using the proposed scheme on the relative performance index, as shown in Fig. 6. The maximum and minimum values obtained are 0.988093748 and -0.984641698, respectively. This implies that the performance is hugely different before versus after migration. This will result in SLA violations or increased energy consumption.
Difference in the relative performance indices before versus after finding the target migration machine without the proposed scheme.
Fig. 7 shows the difference in the relative performance indices obtained with versus without the proposed scheme. The average of the difference is 0.337265836 and the standard deviation is 0.240753321. The results of this comparison validate the effectiveness of our intelligent residual resource monitoring scheme.
We investigated how SLA violations occur when the proposed scheme is not used by measuring the number of virtual machines that cause violations, as shown in Fig. 8. We categorized the SLA violations from level 0 to level 5. Level 5 indicates that the performance decreases by more than 50% (level 4 for 40%, and so on). Level 0 includes all post-migration performance degradation. The number of SLA violations without the proposed scheme for levels 5, 4, 3, 2, 1, and 0 was 131, 187, 253, 318, 390, and 485, respectively. The number of level 1 SLA violations was 0 when the proposed scheme was used.
Difference in the relative performance indices with versus without the proposed scheme.
Service level agreement violation without the proposed scheme.
6. Conclusion
In this paper, we proposed an intelligent residual resource monitoring scheme for cloud computing environments. The proposed monitoring scheme can effectively find a host machine for migration, so that the post migration performance of a virtual machine is consistent with its pre-migration performance. By collecting monitoring information from host machines in cloud computing environments, the QoS and SLA can be maintained without violation. We confirmed that the proposed scheme reduces the incidence of post-migration performance degradation by more than 90%. Thus, our monitoring scheme is suitable for SLA sensitive workloads, including real-time and multimedia applications. In future work, we will conduct an extensive performance evaluation for various scalability settings and investigate the use of artificial intelligence techniques to respond to failures in dynamic environments.
Acknowledgement
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2018R1D1A1B07045838 and NRF-2016R1D1A3B03933370). The author to whom correspondence should be addressed is Joon- Min Gil.