Tomoya Kawakami*A Structured Overlay Network Scheme Based on Multiple Different Time IntervalsAbstract: This paper describes a structured overlay network scheme based on multiple different time intervals. Many types of data (e.g., sensor data) can be requested at specific time intervals that depend on the user and the system. These queries are referred to as "interval queries." A method for constructing an overlay network that efficiently processes interval queries based on multiple different time intervals is proposed herein. The proposed method assumes a ring topology and assigns nodes to a keyspace based on one-dimensional time information. To reduce the number of forwarded messages for queries, each node constructs shortcut links for each interval that users tend to request. This study confirmed that the proposed method reduces the number of messages needed to process interval queries. The contributions of this study include the clarification of interval queries with specific time intervals; establishment of a structured overlay network scheme based on multiple different time intervals; and experimental verification of the scheme in terms of communication load, delay, and maintenance cost. Keywords: Communication Load Reduction , Distributed Data Management , Interval Query , Ring-Shaped Overlay Network , Sensor Data , Temporal Data 1. IntroductionThe number of Internet of Things (IoT) devices is growing rapidly [1,2], leading to an ever larger volume of published data of great variety. One example is sensor information consisting of temperature, power consumption, and camera images. Sensor information contains temporal and spatial data, including geographical positions, areas, and times, which users and systems can search. To accommodate the many devices (nodes), data, and users, a highly scalable mechanism is necessary to search the available data efficiently. Various techniques have been proposed to realize high scalability of overlay networks, such as distributed hash tables (DHTs), skip graphs, and geographical-based data [3-20]. Those techniques construct logical networks between nodes based on the specific information of each node (e.g., a one-dimensional ID or a multi-dimensional attribute). Nodes are discovered by sending a query, referred to as a “key”, with a specific value or range. Each node sends the queries to the destination node according to its routing table. In addition, current techniques assign territories to the nodes for distributed management and searches of sensor data. Users are likely to request sensor data with specific time intervals as “I want to know the data for each one month from the specific day” or “I want to show the data on the same day of other weeks”. However, current techniques cannot efficiently process queries with multiple different time intervals, which are expected to have several patterns simultaneously. Fig. 1 shows an example of such queries for sensor data collection. These queries are referred to as “interval queries” herein. In a previous study, we proposed a method for constructing overlay networks that can efficiently process queries based on multiple different time intervals [21]. This paper presents a method that assumes a ring topology and assigns each node to a keyspace based on one-dimensional time information. In addition, to reduce the number of forwarded messages for queries, each node constructs shortcut links for each interval that users are likely to request. This paper presents an extension of the earlier work and provides new experimental results. The contributions of this study are as follows: clarification of interval queries with specified time intervals; establishment of a structured overlay network scheme based on multiple different time intervals; and experimental verification in terms of communication load, delay, and maintenance cost. The extant technique is essentially based on a Chord network and is described in Section 2. The proposed method is described in Section 3, and its evaluation is summarized in Section 4. The related work is described in Section 5, and the conclusions presented in Section 6. 2. ChordChord [3] is a typical protocol and algorithm for ring-shaped DHTs for which expanded techniques have been studied [4-6]. Chord assigns each node to a specific position on a one-dimensional keyspace based on hashed information such as its ID or IP address. The node that follows a given node in the keyspace is called a “successor”, and the previous node is called a “predecessor”. Each node constructs links to its successor and predecessor when it is inserted into the overlay network. In addition, each node manages a partial space from the predecessor in its position as a territory for data management by Chord. Chord directs each node to construct shortcut links in what is called a “finger table” to reduce the number of hops for node search in the overlay network. The finger table consists of links to the nodes that have [TeX:] $$2^{1}, 2^{2}, 2^{3}, \ldots, 2^{m-1}$$-bit additional keys from each node where m is the bit length of the keyspace. The link to an additional [TeX:] $$2^{0}$$-bit key corresponds to the successor. Fig. 2 shows an example of a finger table when the length of the keyspace is [TeX:] $$2^{6}$$ bits (values from 0 to 63). The numbers in the circles in Fig. 2 refer to the nodes with their keys, and the arrows show the shortcut links of the node assigned to key 1. The upper right table shows part of the linked nodes for each key. In Chord, the finger table enables [TeX:] $$\underline{O}(\log N)$$ hops for node searching using a specified key while keeping the size of the finger table under m + 1 (N denotes the number of nodes). In addition, Chord can be applied to distributed data management, and its expanded techniques have been proposed for multi-dimensional information [4]. Users and systems are expected to request sensor data with specific time intervals such as “I want to know the data for each one month from the specific day” or “I want to show the data on the same day of other weeks”. In addition, the specified time intervals are expected to have several patterns at the same time. However, existing techniques cannot efficiently process queries with multiple different time intervals. 3. Proposed MethodIn this paper, an overlay network construction method that can efficiently process queries based on multiple different time intervals such as a year, a month, a week, a day, or the time is proposed. An overview and details of the proposed method are in the following section. 3.1 OverviewThe proposed method assumes a ring topology similar to that of Chord and assigns nodes to the keyspace based on one-dimensional time information. The one-dimensional information is a bit value that represents the unit of the query (e.g., a year, a month, a week, a day, or the time). In addition, the one-dimensional information has a length as a keyspace similar to coordinated universal time (UTC) or UNIX time. In a Chord finger table, each node constructs shortcut links to the nodes assigned to [TeX:] $$2^{i}$$ bits from its own key ([TeX:] $$i=1,2, \ldots, m-1$$). m denotes the bit length of the keyspace. In contrast, the proposed method has each node construct shortcut links with the specific intervals that users tend to request. In this case, each node constructs shortcut links to the nodes assigned to one year, one month, one week from its own key, and so on. Moreover, each node constructs shortcut links to reduce the hop number (e.g., one month, three months, six months, and 12 months from its own key); the year, month, week, day, and hour were considered in this study. Table 1 shows the types, keys, and number of the constructed shortcut links. The smallest unit in Table 1 is 1 hour, and the bit length of the keyspace is denoted by m. The minimum unit can be set to other elements such as a minute or a second. Fig. 3 shows the assumed flow for forward interval queries. Although Fig. 3 is simplified and does not show the ring topology, the interval queries are recursively forwarded to the related nodes via the shortcut links. If a node that receives the query has matched data, a reply is sent by the node.
3.2 Construction of the Shortcut linksFig. 4 shows an example of the constructed shortcut links described in Section 3.1. In Fig. 4, the minimum unit is a day, started from January 1, 2001, and the length of the keyspace is 64 days (from January 1 to March 5). The numbers in the circles show the nodes with their keys (days), and the arrows show the shortcut links of the node assigned to key 1. The upper right table shows part of the linked nodes for each key. The node assigned to key 1 constructs the shortcut links with their intervals: Month: 1 (31 days) Day: 15, 7 (one week) Neighbors: successor, predecessor Unlike the finger table used in Chord, shortcut links are constructed in the proposed method to the nodes assigned to the keys for the next month and week. This method generates few messages and hops to forward queries such as “I want to know the data for each month from the specific day” or “I want to show the data on the same day of other weeks”. In addition, Fig. 4 shows four shortcut links, which is not significantly different than the Chord technique, which constructs m – 1 shortcut links in the worst case. 4. EvaluationIn this study, a simulator was used to evaluate the proposed method. It was implemented using the Java programming language. Table 2 shows the simulation parameters, the details of which are described as follows. Table 2.
4.1 Simulation EnvironmentThe smallest unit of time in the keyspace in the simulation is 1 hour, and the length of the keyspace is 16 bits (the total is [TeX:] $$2^{16}$$, or 65,536 hours). There are 1000 to 4000 nodes; each node is assigned to a key at random and constructs shortcut links to other nodes. The proposed method was compared to the Chord-like shortcut links (exponential time). Only one query is sent; however, it contains several keys and is forwarded to the assigned nodes. There are six key patterns for the query. The first pattern is 12 keys from a random point at intervals of 720 hours (the total is 8,640 hours, or 360 days). The second pattern is also 12 keys; however, they are determined at random. The third pattern is 52 keys from a random point while those intervals are 168 hours (the total is 8,736 hours, or 52 weeks). The last pattern is 52 keys from a random point, but each key is determined at random. The fifth pattern is 120 keys from a random point while those intervals are 72 hours (the total is 8,736 hours, or 52 weeks). The last pattern is 120 keys from a random point, but each key is determined at random. The simulation was run 20 times under each environment and calculated the average number of messages sent to the assigned nodes as communication loads in a distributed system. 4.2 Results of Query ForwardingFig. 5 shows the number of messages sent to the assigned nodes in the first and second query patterns for 12 keys. The x-axis represents the number of nodes. “Exponential (Random query)” and “Proposed (Random query)” refer to the Chord-like comparison method and the proposed method with random intervals, respectively. In addition, “Exponential (Interval query)” and “Proposed (Interval query)” mean the Chord-like comparison method and the proposed method with a specified interval, respectively. Fig. 5 shows that the proposed method achieved the lowest number of messages when the 12 keys of the query had an interval of 720 hours. However, the proposed method had more messages than the comparison method when the 12 keys of the query were determined at random. In this environment, the comparison method for the interval query had the same result as the proposed method for a random query. In addition, Fig. 6 shows the maximum number of hops to the assigned nodes in the first and second query patterns for 12 keys. Although the proposed method had the worst result when the 12 keys of the query were determined at random, the method also had the lowest hop count when the 12 keys of the query had a specific interval. Figs. 7 and 8 show the number of messages and the maximum number of hops in each query pattern for 52 keys, respectively. In addition, in Fig. 7, the proposed method has the lowest number of messages even if the number of keys in the query is increased to 52. In comparison with the results in Fig. 5, the comparison method for the interval query returns a different result from the proposed method for a random query. Fig. 8 shows that the method had the lowest hop count when the keys of the query have a specific interval, although the proposed method is worse when the keys of the query are determined at random. Figs. 9 and 10 show the number of messages and the maximum number of hops in each query pattern for 120 keys, respectively. Although those curves by the number of nodes are different, they are placed in the same order. These results indicate that the proposed method reduces the number of messages needed to process queries related to the specific time intervals that users tend to request. 4.3 Results of Entries in Routing TableFig. 11 shows the average number of entries in each node’s routing table, and the x-axis represents the number of nodes. The two lines in Fig. 11 show the number of entries by registered keys; however, the other two lines show the number of unique destinations in routing table entries. The number of registered keys is affected by the length of the keyspace but not by the number of nodes. In this simulation, there were 19 registered keys in the comparison method and 16 in the proposed method. In addition, the number of unique destinations in the proposed method is less than that in the comparison method. This result indicates that the proposed method can reduce the maintenance costs to keep the routing table updated for nodes’ join/leave. 4. Related WorkExcept for Chord or DHTs, many overlay network techniques have been researched [6-20]. Skip graphs are overlay networks in which a skip list is applied in the P2P model [9-14]. The nodes in skip graphs are sorted in ascending order of those keys, and bidirectional links are created among the nodes. The “membership vector”, is assigned to each node when the peer joins and is used to create hierarchical (multi-level) links among the nodes. Skip graphs can process range queries that specify the beginning and end of the keys to be searched, and the queries are forwarded to the node whose key is within that range, or less than the end of the range. The number of hops in the key search is represented by [TeX:] $$\underline{\underline{O}(\log n)}$$ where n is the number of nodes. The average number of links on each peer is represented by [TeX:] $$\log n$$. As one of the expanded techniques of the original skip graph, the Ballistic Skip Graph reduces the degree of each node to [TeX:] $$\underline{\underline{O}(1)}$$ by limiting the number of shortcut links to a single level at random [14]. In addition, both one-dimensional and multi-dimensional range queries have been researched in many overlay network techniques [15-19]. Those techniques can be applied to the management of location/area-based data on 2D and 3D maps, some of which are “geographical overlay networks”. Many of the geographical overlay networks are based on geometrical techniques such as the R-tree, quadtree, and Delaunay triangulation (Voronoi diagram). However, these techniques do not assume “interval queries” and cannot forward the queries as efficiently as the proposed method. To enhance the reliability and feasibility of overlay networks, related techniques have been researched [22-32]. First, a variety of replication schemes, such as path replication, have been proposed for unstructured P2P networks [22], where nodes search for content by forwarding queries to publishing nodes via neighboring links. The path replication schemes replicate the replied content on the nodes between the publishing and requesting nodes. Related to path replication schemes are the many methods based on specific factors such as the number of queries, probability to put replicas, and churn situations [23-25]. In addition, replication schemes have also been proposed for structured P2P networks to increase the efficiency of replica maintenance and searches. One example is Scalaris, an Erlang implementation of a distributed key/value store [26]; it uses replication for data availability and majority-based distributed transactions for data consistency. Plover is a proactive low-overhead file replication scheme with replication among physically proximate nodes based on their available capacities [27]. Here, physically proximate nodes are grouped in clusters, each of which has a supernode with high capacity and rapid connections. In RelaxDHT, nodes are divided into data blocks [28], with each block having a root node that manages the metadata of replicas on other nodes in their own data blocks. Although these techniques do not assume interval queries, the proposed method can adapt the same or similar ideas, such as node/data replication, to enhance its reliability. 5. ConclusionThis paper describes a structured overlay network scheme based on multiple different time intervals. The overlay network construction method assumes a ring topology and assigns nodes to a keyspace based on one-dimensional time information. In addition, to reduce the number of forwarded messages for the queries, each node constructs shortcut links with the specific intervals that users tend to request. The proposed method was confirmed to reduce the number of messages needed to process queries related to time intervals. In future studies, virtual node techniques will be employed to reduce the load differences among the nodes because the amount of the registered or requested data differs with the keys that represent time. In addition, node/data replication techniques will be used to enhance the reliability and feasibility of the proposed scheme mentioned in Section 4. Subsequently, the proposed scheme will be implemented for the P2P agent platform, PIAX [33], and testbed systems to evaluate the scheme for practical applications. BiographyTomoya Kawakamihttps://orcid.org/0000-0002-5394-3067He received his B.E. degree from Kinki University in 2005 and his M.I. and Ph.D. degrees from Osaka University in 2007 and 2013, respectively. From 2007 to March 2013 and from July 2014 to March 2015, he was a specially appointed researcher at Osaka University. From April 2013 to June 2014, he was a Ph.D. researcher at Kobe University. From April 2015 to February 2020, he was an assistant professor at Nara Institute of Science and Technology. Since March 2020, he has been a senior assistant professor at the University of Fukui. His research interests include distributed computing, rule-based systems, and stream data processing. He is a member of the Information Processing Society of Japan (IPSJ) and IEEE. References
|