1. Introduction
With the wide application of image acquisition equipment, the number of images has increased rapidly. For large image databases, how to retrieve desirable images efficiently becomes an important issue. Content-based image retrieval (CBIR) is more advantageous than traditional text-based image retrieval in terms of retrieval efficiency and accuracy, especially for large image databases. Therefore, CBIR has received much attention from researchers in the fields of information retrieval and computer vision [1]. In CBIR low-level visual features are automatically extracted to represent images, such as color, texture and shape. As one of the primary visual contents of images, shape has good discrimination power, and is an important cue for indexing images. Moreover, shape feature can represent higher level of the object compared with color or texture [2], and plays an important role in a diverse range of applications.
Since shape information is so important, how to effectively represent and extract shape feature of objects has always been a hot topic in computer vision and image understanding. Many studies focus on shape feature description and matching in computer vision [1-3]. Various shape descriptors have been proposed in these studies and significant progress has been made in shape feature description. Existing shape descriptors can be roughly divided into two categories: contour-based methods and region-based methods [2,3]. In most cases the contour is more important than the inner region of shape, and human beings can distinguish shapes easily based on contour. And contour-based methods are easy to implement, so contourbased shape description and matching techniques are quite popular. Representative methods include curvature scale space (CSS), Fourier descriptor (FD), shape context (SC), inner-distance shape context (IDSC), contour points distribution histogram (CPDH), contour flexibility, and so on [4].
In CSS method the zero-cross points of curvature function defined on shape contour is located at different scales. These zero-cross points form a CSS map, and the maximum of CSS map’ contour is used for shape matching. FD is a classical method which represents two-dimensional (2D) contour using a 1D shape signature. FD is formed using the coefficients after applying Fourier transform on shape signature [5]. However, shape signatures have an important effect on FD’s retrieval performance. SC method uses a 2D histogram to represent the relative spatial distribution of landmark points around contour points. And these histograms are employed for shape matching using statistic measurement. IDSC is an extended version of SC and suitable for complicated shapes. Dynamic programming is used to match IDSC [6]. CPDH describes shapes using the distribution of points on object contour under polar coordinates [7]. Contour flexibility method represents the deformable potential at each point along a contour [4]. Several new shape descriptors make good use of the geometric features of contour points, and achieve good retrieval performances, such as farthest distance function [8], arch-height function [3], triangle-area representation [9], angular function [10] and height function [11]. It is worth noting that retrieval performances can be further improved for some methods by incorporating multiscale framework [12], and multiscale-arch-height descriptor is of the representatives [3].
Angular feature derived from shape contour points has good properties. Angular feature is intrinsically invariant to rotation, translation and scaling, and easy to extend to multiscale description. Hence it is a promising feature for shape matching, and is used in references [1,6,10,13] for shape description and matching. Angular pattern (AP) and binary angular pattern (BAP) are proposed in [10] for shape image retrieval. In this method, AP is calculated for each contour point (the value of AP ranges from 0 to 2π). Then for each point on the contour, denoted by pi, symmetric neighbor points are located, and the APs at these points are compared with that at pi for feature coding. BAP is good at describing convex strands and concave strands of contour by feature coding. In fact, line segment also exists on local shape contour, not just convex strands and concave strands. However, contour point both on line segment and concave strands will be coded with 0 in BAP. Hence, line segment can’t be distinguished from concave strands. As a result, important discrimination information is of no use in BAP. An example is shown in Fig. 1.
The shape contour in Fig. 1 belongs to bone class from MPEG-7 shape database. Contour points pi and pj are on line segment and concave strands, respectively. They play important roles in the description and identification of bone samples. But the 4-bit BAP at these two points both are [0 0 0 0] according to a study of Hu et al. [10]. Thus they cannot be effectively distinguished using BAP. In order to describe shape using angle information effectively, included-angular ternary pattern (IATP) is proposed for shape retrieval in this paper. The main contribution of our work can be summarized as follows.
(1) IATP is proposed to describe contour points well, which can clearly distinguish the difference between the line segments, convex strands and concave strands on shape contours.
(2) IATP is extended in a multiscale framework to form rich descriptors, which can obtain better shape retrieval results.
2. Included-Angular Ternary Pattern
For a given shape, its contour is detected and sampled along anticlockwise with equal interval firstly. The number of sampled points is denoted by N. Hence a shape contour can be represented by an ordered set of points, [TeX:] $$C=\left\{p_{i}=\left(x_{i}, y_{i}\right), i=1,2, \ldots, N\right\},$$ where xi and yi represent the coordinates of point pi. Two definitions are given below:
DEFINITION 1. [TeX:] $$\theta_{i} \text { at } p_{i}$$ is defined as: along contour forward and backward directions, two neighbor points [TeX:] $$p_{i+d} \text { and } p_{i-d}$$ are located respectively on the contour, satisfying the constraint that the number of sampled points between pi and either neighbor point is d. Here d is a positive integer. Points[TeX:] $$p_{i+d} \text { and } p_{i}$$ are connected, and one segment [TeX:] $$p_{i+d} p_{i}$$ is obtained. The same operations can be again applied to [TeX:] $$p_{i-d} \text { and } p_{i}.$$ The two segments form an included-angular [TeX:] $$\theta_{i}$$ at point pi, where [TeX:] $$\theta_{i} \in[0, \pi].$$ The calculation method is as follows:
It can be seen that included-angular feature captures geometrical information of contour points. It is easy to know that included-angular is invariant to rotation, translation. And it is also scale invariant. If the contour is scaled, lengths of [TeX:] $$d_{i l}, d_{i 2} \text { and } d_{i}$$ will changed in the same proportion as the contour, and this has no effect on [TeX:] $$\theta_{i}$$, which can be seen from formula (4).
DEFINITION 2. For each contour C, the IATP at pi is defined as: the included-angular at points [TeX:] $$P_{i+d}$$ and [TeX:] $$p_{i-d}$$ are calculated according to Definition 1, and denoted using [TeX:] $$\theta_{i+d}, \theta_{i-d}$$ respectively. Then [TeX:] $$\theta_{i+d}$$ are compared with [TeX:] $$\theta_{i}$$ for included-angular coding. The coding rules contain three cases for convex strands, line segment and concave strands, respectively. If included-angular at neighbor point is larger than that at pi and the difference exceeds a certain threshold, then it is coded with 1. If included-angular at neighbor point is less than that at pi and the absolute difference exceeds a certain threshold, then it is coded with -1. In other case, it is coded with 0. Take the comparison between [TeX:] $$\theta_{i+d} \text { and } \theta_{i}$$ as an example. The included-angular coding rules for [TeX:] $$\theta_{i+d} \text { at } p_{i}$$ at pi is as follow:
where [TeX:] $$\mathrm{TIA}_{l}$$ is the IATP at [TeX:] $$p_{i} \text { for } p_{i+d,} \text { and } \theta_{\mathrm{th}}$$ is the threshold with a positive value. The IATP at [TeX:] $$p_{i} \text { for } p_{i-d}$$ is calculated in a same way and is denoted by [TeX:] $$\mathrm{TIA}_{r}.$$ And the IATP at pi is coded with [TeX:] $$\left[\mathrm{TIA}_{l} \mathrm{TIA}_{r}\right].$$
Take points pi and pj on the contour in Fig. 1, for example. The IATP at pi for its neighbor points marked with ο is [0 0], while the IATP at pj for its neighbor point marked with ο is [1 1]. IATP values are different at the two points because they belong to line segment or concave strands. Hence IATP is a powerful feature for shape description. The IATP is calculated for each point on contour C and code vector is formed as,
wehere [TeX:] $$\mathrm{TIA}_{l i}$$ is the IATP at pi for neighbor point [TeX:] $$p_{i-d}$$, and [TeX:] $$\mathrm{TIA}_{r i}$$ is the IATP at pi for neighbor point [TeX:] $$P_{i+d}$$. Their values belong to {-1,0,1}.TIA is the IATP for contour C. For each point, the pattern number of IATP is 9, which are [1 1], [1 0], [1 -1], [0 1], [0 0], [0 -1], [-1 1], [-1 0], and [-1 -1]. The pattern is few for large shape dataset because there is no sufficiently discriminative information to be used. So IATP is extended in a multiscale framework to enhance the discriminating power.
From the construction of IATP it can be seen that parameter d has an important impact on the discriminant power of IATP. When the parameter is small, IATP can capture local information for contour points. And IATP can capture globe information if the parameter is large. Hence we extend IATP in a multiscale framework according to parameter , to describe contour from coarse to fine.
We can obtain multi-IATP code with different parameter d. For example, when d is set to [TeX:] $$d_{1}, d_{2}$$ separately [TeX:] $$\left(d_{1}<d_{2}\right),$$ we can find four neighbor points for pi, and 4-bit IATP can be obtained after comparing their included-angular with pi. And the total number of IATP patterns will be 81. Similarly, 6-bit, 8-bit, 10-bit IATP can be constructed, and the total number of IATP pattern will be [TeX:] $$3^{6}, 3^{8}, 3^{10}.$$ So sufficient discrimination information can be provided using these patterns. Because [TeX:] $$3^{8}=6561$$ is much more than the contour point number, IATP will be a vector with high dimension, and not suitable for shape description. Hence, 4-bit and 6-bit IATP are studied here.
To retrieval shape efficiently using IATP, an IATP histogram is extracted for each shape contour. Histogram for 4-bit IATP can be obtained according to the following steps:
Step 1. For each contour, the contour point number N and parameter [TeX:] $$d_{1}, d_{2}$$ are set firstly. Then for each point pi we find four neighbor points [TeX:] $$\left(p_{i-d 2}, p_{i-d l}, p_{i+d l,} p_{i+d 2}\right)$$ and compute their included-angular for comparison. 4-bit IATP at pi is obtained, which is [TeX:] $$\left[\mathrm{TIA}_{\mathrm{i1}}, \mathrm{TIA}_{\mathrm{i2}}, \mathrm{TIA}_{\mathrm{i3}}, \mathrm{TIA}_{\mathrm{i4}}\right].$$
Step 2. Determine the index of the histogram for 4-bit IATP. [TeX:] $$\left[\mathrm{TIA}_{\mathrm{i1}}, \mathrm{TIA}_{\mathrm{i} 2}, \mathrm{TIA}_{\mathrm{i3}}, \mathrm{TIA}_{\mathrm{i4}}\right]$$ belongs to [TeX:] $$k^{\mathrm{th}}$$- bin in histogram [TeX:] $$H(1 \leq k \leq 81),$$ where k can be set as below:
[TeX:] $$\text {Index}_{i}$$ is the index of 4-bit IATP at pi, with a value ranging from -40 to 40. And index of the histogram bin is from 1 to 81, so indexi can be mapped to corresponding histogram bin by Eq. (8).
Step 3. Construct each bin in histogram H. for each [TeX:] $$\text {Indexi}, k^{\text {th }}-\text { bin }$$ in histogram H is uploaded as below:
Example of 4-bit TAP for bone shapes: (c) and (d) are bone shape samples, and their histograms are (a) and (b) separately.
Figs. 2 and 3 show histograms for 4-bit IATP belonging to bone and butterfly classes separately. Shapes in Fig. 2(c) and Fig. 1 are the same one. Horizontal axis shows the index order of histogram bins in Fig. 2(a), 2(b) and Fig. 3(a), 3(b). Vertical axis represents the number of points belonging to the histogram bins.
To retrieval shape images, the distance between each pair of IATP histograms is calculated. Given IATP histogram [TeX:] $$H_{A} \text { and } H_{B}$$ for shape A and B separately, their distance dist(A,B) measured by cosine distance is
where D is the bin number of [TeX:] $$H_{A} \text { and } H_{B},\| \|$$ represents the magnitude of vector. dist(A,B) ranges from 0 to 1.
Example of 4-bit TAP for butterfly shapes: (c) and (d) are butterfly shape samples, and their histograms are (a) and (b) separately.
3. Shape Retrieval Experimental Testing
To evaluate the effectiveness of the proposed IATP, the retrieval tests are conducted on MPEG-7 Set B contour shape database and the Swedish leaf database. The retrieval performance is compared with other methods.
3.1 Shape Databases
There are 70 shape categories with 20 similar shapes for each category in MPEG-7 Set B shape database. Representative images in MPEG-7 Set B are shown in Fig. 4. And it is a commonly used method for testing of similarity-based retrieval. The Swedish leaf database consists of isolated leaves from 15 different Swedish tree categories, with 75 leaves per category. This database is known for high inter-class similarity and large intra-class difference. Samples from the 15 leaf categories are shown in Fig. 5. In our retrieval experiments, the number of the sampled points for each shape contour is 200.
Representative images in MPEG-7 Set B.
Representative images in Swedish leaf.
3.2 Evaluation Measure
To evaluate the retrieval performance of the proposed IATP, the retrieval results are measured using precision and recall curve. Given a query shape, the total number of retrieved shape images is [TeX:] $$m_{1}.$$ And the number of retrieved shapes which belong to the same class with query shape is c. The number of shapes in database which belong to the same class with query shape is [TeX:] $$m_{2}.$$ The recall and precision for this query shape are calculated as
[TeX:] $$m_{2}$$ is 20 for MPEG-7 Set B and 75 for Swedish leaf database. Each shape in the database is taken as a query, and for each database the final precision is the average of all the query results at each level of recall.
The proposed method is compared with other notable commonly used or recently proposed shape descriptors on both of databases. Compared descriptors include farthest point distance (FPD), angular radius function (ARF), BAP, and sequency-ordered complex Hadamard transform based descriptor (SCHD) [14].
FPD is used for comparison here, because it has better retrieval performance than Zernike method and CSS on MPEG-7 database. ARF is chosen because both our method and ARF are based on angular information. BAP is recently proposed in [10] and our method is inspired by BAP. In shape matching stage the sequential forward selection (SFS) scheme is used for BAP, and achieves better retrieval performance than IDSC and CSS. To facilitate performance comparison, same scale parameter was used for both BAP and IATP.
3.3 Results and Discussion
As indicated in Section 2, parameter d has a crucial effect on the performance of IATP histogram. IATP can represent local information of contour with a small d, while capture global feature of contour with a large d. we test the proposed method with different parameter d. In consideration of that a good shape descriptor should have characteristics of compactness and that IATP is extended in multiscale framework, shape retrieval experiments are conducted firstly using IATP with d=1–16 on MPEG-7 database. The average precision-recall on MPEG-7 database are showed in Table 1.
Precision (%) for IATP with d=1–16
As can be seen in Table 1, IATP with d=5 achieves better retrieval performance in most cases than other situations. However, this performance is not good enough because there are no enough patterns to distinguish so many shape samples. Hence IATP in multiscale framework are tested next.
As indicated in Section 2, in multiscale framework the dimensions of 8-bit IATP are too high to be suitable for describing shape features. Hence only 4-bit and 6-bit IATP are tested. Each shape in MPEG- 7 database is taken as a query, and parameter d is set to 5 as indicated in Table 1. The average precisionrecall of multiscale IATP on MPEG-7 database are showed in Table 2.
It can be seen that the performance of multiscale IATP is much better than that of single-scale IATP by comparing Table 2 and Table 1, and 6-bit IATP has better performance than 4-bit.
Precision (%) for multiscale IATP
Besides, performance of 6-bit IATP is not desirable enough for shape retrieval, because IATP just captures the relations of size between the adjacent included-angular, and discards their more precise values. To make effective use of included-angle for better retrieval performance, included-angular histogram are constructed and integrated with multiscale IATP together. Included-angular histogram has 24 bins considering Included-angular ranges [0, π], and the construction is same as IATP histogram. Fig. 6 shows the precision-recall plots of proposed method and the compared methods on the MPEG-7 database. And Fig. 6 shows the precision-recall plots of proposed method and the compared methods on the Swedish leaf database. The precision-recall information of SCHD in Fig. 6 comes from a study of Wang et al. [14]. And precision-recall information of FPD and ARF come from a study of Xu et al. [15].
Fig. 7 demonstrates that among the competing methods, the proposed method has slightly better performance than BAP, FPD, and SCHD on the MPEG-7 Set B database, and they all exceed ARF. Fig. 7 shows that the proposed method reaches the highest precision with the same recall value compared with the others on the Swedish leaf database. These retrieval results demonstrate that integrated IATP can capture shape features well and shows good performance.
Precision-recall comparisons on MPEG-7 database.
Precision-recall comparisons on Swedish leaf database.
4. Conclusion
This paper proposed IATP based shape descriptor for efficient image retrieval. Shape image retrieval experiments were conducted to test key parameters of IATP on MPEG-7 Set B database and Swedish leaf database. Good shape retrieval performance was achieved by multiscale IATP integrated includedangular histogram.
In the future, we are interested in further improving the performance of the proposed shape descriptor. Since the proposed method works with angular feature derived from shape contour points, it is also suitable for describing partially observed object shapes. It is possible to match shapes in part to part way with IATP. Moreover, the multiscale method can be optimized using feature selection algorithms or granularity computing to obtain higher shape retrieval performance.
Also, the proposed shape descriptor can be implemented in several applications. For example, plant leaf recognition is very important to agricultural information and automatic plant recognition systems, and leaf retrieval experiment on the Swedish Leaf database proved the potential performance of the proposed shape descriptor method for plant recognition. In the next, it is possible to implement shapebased plant leaf recognition.
Acknowledgement
This paper is supported partially by the National Natural Science Foundation of China (No. 61503005), the National Key Research and Development Program (No. 2017YFB0802305), and Universities Key Scientific Research Project of Henan (No. 19A520029, 17A520046, and 19B520017).