1. Introduction
The accumulation of multimedia information on web servers and specifically images due to increasing uploading from different sources and for different aims in large amounts, made the development of efficient algorithms for exploring and searching of images an important and mandatory task. The first generation of such techniques associated with each image a set of descriptive words. Accordingly, searching for images similar to a query in this paradigm is guided by textual information (keywords). This type of search is referred to by the acronym TBIR (text based image retrieval). Although this technique is widely used by existing search engines, it suffers from a set of disadvantages. Firstly, annotation, which is mostly performed manually on large databases, is a hard and time- consuming task. Secondly, it is largely admitted that manual annotation is subjective and relative to the human perception of images content and to human feelings, culture and so on. To overcome these limitations, a new image search paradigm based on the visual content of the images was proposed. This technique is designated in the literature by the acronym CBIR (content based image retrieval). It has known an extensive research activity since the 90s [1]. It mainly consists in searching for pertinent images based on visual attributes that are extracted from images, essentially, color, texture and shape. The color attribute has been exploited successfully since the early years of CBIR. For instance, it has been used by Swain and Ballard [2] through color histograms for indexing large image databases. Despite its simplicity and robustness against affine transformations (such as rotation, translation and scaling), histograms are a coarse characterization of the image and lead to loss of location information. For sake of higher robustness, Striker and Orengo [3] proposed cumulative color histograms. The use of major features of the distributions is another approach proposed in [3] to increase the robustness of the retrieval. Pass and Zabih [4] proposed the Color Coherent Vector (CCV). That it is a color histogram refinement under some conditions that partially incorporates local information about colors.
From another perspective, texture is a ubiquitous property in objects and images. Indeed, many objects can be distinguished mainly by their texture. However, texture features are generally hard to extract and exploit. There seems to be even a difficulty to come up with a formal and precise definition of texture [5]. A quite modern and distinguished texture features extraction method was introduced by Ojala et al. [6] is the local binary pattern (LBP). Due to its strong ties with our approach, the LBP will be described briefly in the next section. The combination of the features of many attributes is an alternative followed by some authors for boosting their retrieval system effectiveness. For instance, Murala et al. [7] combined color and texture features extracted by a histogram and Gabor wavelets, respectively. Instead of color histograms, color moments are combined with Gabor wavelets by Singh and Hemachandran [8]. In [9], the authors combined these two properties (color and texture), but they used the Haar wavelet for extracting texture features. In [10] and [11], the authors have combined all the three properties. Another alternative for boosting retrieval systems effectiveness is the use of machine learning techniques. For instance, Wan et al. [12] tackled the problem of CBIR with deep learning technique, and obtained encouraging results. In the same context, Xu et al. [13] investigated different deep learning methods for image retrieval with region and global features, and they concluded that the first one is more effective.
Another interesting descriptor which proved its effectiveness in indexing large databases is the scale invariant feature transform (SIFT) [14]. This descriptor is calculated within a square grid (neighborhood) centered at each point of interest. These points of interest are detected through staged filtering approach. To handle location information lacked by the SIFT descriptor, Mikolajczyk and Schmid [15] proposed the gradient location-orientation histogram (GLOH), which uses a log-polar grid. Square, log-polar and other grid forms are compared by Winder and Brown [16]. Among the tested grid forms, there is one that outperformed the others and has some resemblance with our framework. However, this framework uses rather circular shapes. In fact, a deeper look reveals many differences between this framework and ours. For instance, in that paper, the forms are centered at each interesting point, whereas our proposed framework is centered in the middle of the image. Furthermore, we establish a histogram of LBP, whereas Winder and Brown [16] are interested in the gradient around the interest points. Even the design is also different. Daisy method [17] is another technique that uses a framework resembling to ours. But in fact, there are many significant differences here also. For instance, and as mentioned by the authors, Daisy method is designed for effective dense computation (i.e., at each pixel location), which is used for dense wide-baseline matching. In contrast, our method centers the framework in the middle of the image and establishes an LBP histogram for each elliptic region, as will be explained below.
The technique proposed in this paper addresses the limitations engendered by global histograms of the LBP. It uses a distinguished design inspired from the Gabor filter bank to determinate regions of elliptic forms, organized in circular-concentric manner of increasing size. Compared to circle-shaped forms, the elliptic ones are more flexible in the sense that they allow targeting directionality through some of various ellipses possessing the same direction as a specific texture. The aim is to determine one local histogram for each so-selected region. Therefore, the method inherits many advantages. Firstly, it is able to capture high and low local features (in regions). Secondly, the gradual fashion of these regions and the selective manner when matching, give the method the ability to deal with the spatial information (specifically rotation). Thirdly, the manner of generating the regions of the same class (scale) by rotation gives the method the ability to take into account the oriented textured regions. Indeed, the experimental setup demonstrates the effectiveness of the proposed method and its outperformance over some published highly competitive methods.
The main contributions of this paper are then as follows:
1. The current work is much more maturated as compared to the published work in [18].
2. An enhanced design, which yielded a novel more effective and more efficient version (technical comparison is made in the Subsection 3.2).
3. Extensive comparison with some additional and significant published works is performed in the current work. Particularly, GLIBP is compared to the quite recent and very effective DLEP method of Murala et al. [19].
The organization of the remainder of this paper is as follows. In Section 2, we will overview some related works including the original LBP. In Section 3, we will describe the proposal: gradual locality integration of binary patterns (GLIBP). The experiments and results will be reported and discussed in Section 4. Finally, this work is terminated with a conclusion in Section 5.
2. Related Works
2.1 Texture-Based Approaches for CBIR
The existing texture methods can be categorized according to the domain in which they were developed. These are: spatial domain, frequency domain or spatial-frequency domain. For instance, the “Co-occurrence Matrices” of Haralick et al. [20] is one of the earliest, yet, most significant methods, that operates in the spatial domain. The computed matrices represent each the apparition frequency of gray levels in a specified direction and distance. Although this method was introduced in the 1970’s, it is considered as one of the most important methods for local information integration. On the other hand, some other methods operate in the frequency domain, such as the Fourier transform [21,22] and the discrete cosine transform (DCT) [23]. Both the spatial and the frequency approaches miss the power, each, that of the other. For instance, the frequency methods lack the spatial information and the spatial approaches lack the frequency one. Wavelets are the basis of another class of methods that operate on spatial-frequency domain. Many works in the literature exploited the power of these methods and investigated the effectiveness of different wavelet types, e.g., Haar wavelet [24], Daubechies wavelet [25], and Gabor wavelet [26].
2.2 Local Binary Patterns
The LBP is a very simple, yet a very effective method for texture characterization. This statistical method has been originally introduced by Ojala et al. [6]. Later, the method was extended by Ojala et al. [27] to yield rotation invariant LBP and other derivatives. The basic version of the LBP uses a neighborhood of 8 pixels (P=8) in a 3×3 block to capture texture information. The central pixel is used as a threshold value (Fig. 1(a)). The values of neighbors less than that of the central pixel are set to 0, whereas, those that are greater than the center pixel are set to 1 (Fig. 1(b)). The LBP code of the central pixel is then the sum of the calculated values of the neighbors as a power of 2, as illustrated in Fig. 1(c)–(d).
Original LBP: an illustration of the texture information coding at the central pixel (153).
2.3 Extensions and Variants of LBP
The LBP operator has known many successful applications in many fields, where many enhanced versions were proposed. These methods include improved LBP (ILBP) [28] for face detection. In this method and in addition to the 8 neighbors (3×3 patch) considered by LBP, the central pixel is also considered by giving it the largest weight. The mean of all pixels (9 pixels for 3×3 patch) is taken as a threshold. Local Ternary Patterns (LTP) [29] is another variant, which associates at each patch a 3- valued code (-1, 0, 1) instead of the binary code of LBP (0, 1). For this aim, values between pc±t (t is some threshold and pc is the central pixel value) are quantized to 0, the values above this interval are quantized to 1 and the values below it are quantized to -1. Another variant is the local derivative patterns (LDP) for face recognition also [30] and content based image retrieval [31]. On the contrary, of LBP, which catches only the non-directional first-order local pattern, the LDP tries to catch the higherorder derivative information. Recently, the local spatial binary patterns (LSBP) was proposed by Xia et al. [32] for content based image retrieval. This method encodes the relationship between the referenced pixel and its neighbors using gray-level variation patterns (GVP) for each direction from the four directions considered. After that, the spatial information between these variations and those of the 4 neighbors are calculated using the LSBP. Additionally, the magnitudes of these variations are also calculated using the mean and the standard deviation.
2.4 LBP and Local Histograms
Generally, works that used the LBP or one of its derivatives or enhanced versions have used the global histograms as a descriptor. However, the global histograms lack two types of information. The first is the local information and the second is the spatial information. An alternative to address these two fails is the use of the segmentation to delimit regions. However, the segmentation in itself is a problematic and difficult task. Another alternative followed by some authors, is the division of the image to blocks (sub-images). To the best of our knowledge, the work of Takala et al. [33] is the only work which handles the texture feature extracted by the LBP with local histograms. In their interesting work, Takala et al. [33] match image blocks of a given size in order to integrate spatial information.
In the following, we present an enhanced version of LBP, called GLIBP (gradual locality integration of binary patterns). GLIBP is a novel attempt to catch as much local information as possible, in a gradual fashion. Indeed, GLIBP aggregates the texture features present in grayscale images extracted by LBP through a complex structure comprised of a multitude of ellipse-shaped regions that are arranged in circular-concentric forms of increasing size. The complex framework of ellipses is in fact derived through a simple parameterized generator.
The fact that the GLIBP calculates histograms within regions delimited by ellipses means that it exploits the local information (high and low locality). Secondly, the fact that the comparison between the histograms is allowed only between histograms calculated within regions delimited by ellipses of the same scale means that it exploits some spatial information.
3. Gradual Locality Integration of Binary Patterns
Let I(l,m) be an LBP map. The generation of the series of ellipses used in GLIBP is performed as follows. A pixel p(i,j) i=1..l, j=1..m, belongs to a region delimited by the ellipse e if it satisfies the following inequality:
where a and b are the semi major and semi minor axes of the ellipse e, respectively.
Ellipses positioning S=3, N=7.
Let es,n be an ellipse of scale s, where s=0…S-1 and orientation n, where n=0...2*N-1.
The internal ellipses e0,n (Fig. 2) are generated by rotation of the ellipse e0,0 with respect to the image center, then with respect to the ellipse center. The external ellipses es,n (s>0) are created by dilation of the ellipse e0,n. For each region, one normalized histogram is calculated, as shown in Fig. 3. For comparison, each normalized histogram of a region delimited by e's,n of the query image is compared with all histograms of regions delimited by the ellipses with the same scale s in e''s,j of the target image, where j=0..2*N-1. Only the minimum distance is finally kept. The final distance between the query image Q and the database images B is the sum of these minimum distances. The method is further explained in the following.
Illustration of elliptic regions on LBP grayscale image S=3; N=7. (a) Original image and (b) elliptic regions on LBP image.
3.1 Ellipses Rotation
For calculating the internal ellipses e0,n, with parameters a0 and b0, we proceed as follows. Firstly, we rotate the center c0(cx0, cy0) of e0,0 using the two Equations below, with respect to image center (i0, j0). Secondly, we determine the pixels belonging to the ellipse with this rotated center using Eq. (1). Finally, the ellipse’s pixels are rotated with respect to its center.
In equations, Θ represents ellipse orientation.
3.2 Scale and Ellipses Chaining
For generating the ellipses of other scales, which will have bigger sizes, we must first calculate their parameters as and bs, where s>0. For this, we used the following equations:
where
U is an empiric value.
Fig. 4 shows the ellipses chaining, the center cs+1,0 of an ellipse es+1,0 is calculated as follows:
The proposed method presents the following differences against the ElLBP [18]:
1. The chaining of ellipses in GLIBP is by their centers rather than the linear eccentricity in ElLBP (i.e., more simplicity). Therefore, Eq. (8) in this paper replaces the 5th one of ElLBP.
2. In ElLBP a simple histogram is established. In the current work, we used normalized histograms. This allows more stabilization against the rounding.
3. The internal regions are bigger than that of ElLBP, and thus allow better capturing the local regions texture.
3.3 The Comparison Phase
The distance D between two images, a query Q and a target image B is calculated as follows:
where Hqs,n (respectively, Hbs,k) is the normalized histogram of a region delimited by the ellipse es,n (es,k) in image Q (respectively, B), and d is one of distance metrics. For this work, we used the Manhattan and d1 distance, defined by the following equations (respectively):
The different steps of our method are illustrated by the descriptive organigram of Fig. 5.
Fig. 6 shows an illustrative comparison between two images from the Corel dataset (subset, http://vision.stanford.edu/resources_links.html). The region delimited by the red ellipse in the query image will be matched with all regions of its class (regions delimited by the green ellipses) in the target image. Accordingly, a distance is calculated at each comparison as shown in the image (note that the distances indicated are real distances). Finally, the smallest value of the distance is kept (the red distance is the distance with the region delimited by the red ellipse in the target image, and it is the smallest value among all the distances (some distances only are displayed). The procedure is repeated for all regions delimited by the green ellipses in the query image and for regions delimited by the brown ellipses in the query image. Of course, they should be compared with the regions of the same class in the target image (i.e., with the regions delimited with the brown ellipses, in the target image).
Descriptive organigram of the general algorithm.
An illustrative comparison with (a, b, U, N, S) = (50, 25, 1.4, 10, 2).
4. Experimentation
For testing our method (GLIBP), we used the Corel 1k database (Wang) [34]. This dataset is composed of 1,000 images, classified into 10 categories: Africans, beaches, buildings, buses, dinosaurs, elephants, flowers, horses, mountains and food. All the images are of size 256×384 or 384×256. One sample from each class is shown in Fig. 7.
Some sample from Corel database.
For the evaluation phase, we used visual inspection of individual queries and the precision criteria defined by:
Note that the number of retrieved images depends on the window size (w).
The average precision AP over all images of a class I of size m images is defined as:
The average precision over the (w) first retrieved images is defined by (14). This measure is also called “weighted precision” [35].
Therefore, the average weighted precision (AWP) is:
We have implemented the proposed method using C# programming language. For the LBP, we used the function available on: http://www.cse.oulu.fi/CMV/Downloads/LBPMatlab.
We performed experimentations on a personal laptop, with Intel Core 2 duo, 2 GHz and 3 GB of RAM, running Vista sp1 operating system.
We conducted two experiments. In the first, we fixed the parameters (a, b, U, S) to (50, 25, 1.4, 2), then to (30, 15, 2, 3) with different values for N for each, using Manhattan and d1 distance metrics (Tables 1–3). In the second, we investigated the contribution of regions delimited by ellipses of each scale separately (Table 4).
The results of the GLIBP with the parameters (a, b, U, S) = (50, 25, 1.4, 2) and different values for N using Manhattan distance metrics in terms of average weighted precision ( w=20)
The results of the GLIBP with the parameters (a, b, U, S) = (50, 25, 1.4, 2) and different values for N using d1 distance metrics in terms of average weighted precision ( w=20)
The results of the GLIBP with the parameters (a, b, U, S) = (30, 15, 2, 3) and different values for N using Manhattan and d1 distance metrics in terms of average weighted precision ( w=20)
The contribution of regions delimited by ellipses of each scale separately, using d1 distance metric in terms of average weighted precision (w=20)
4.1 Results and Discussion
Improvements are observed from Table 1 in the terms of AWP (+0.13%, +0.22% and +0.22%) with N equal to (9, 10, and 11 respectively) compared to N=8. For the set of parameters (a, b, U, S) = (30, 15, 2, 3) in Table 3, we obtained the improvements by (+0.15% and +0.32%) with N equal to (8 and 9 respectively) compared to N=7, using the results obtained by d1 metric. Results analysis on class basis shows that the classes “Dinosaurs,” “Buses,” “Horses” and “Flowers” are the classes that registered an AWP over 90%. On the other hand, the classes “Africans,” “Mountains” and “Foods” are the classes that registered an AWP lower than 65%. The lowest AWP is about 48% registered by the class “Mountains.”
Table 4 shows the contribution of regions delimited by ellipses of each scale separately. For instance, when using the regions delimited by ellipses of scale s=0 and s=1 for the set (a, b, U, N, S) = (30, 15, 2, 9, 3), we can observe classes that maintain high precision (over 90%). Typically, these classes are “Buses” and “Dinosaurs.” These classes share one property, that is, the existence of one central object (bus and dinosaur). However, different types of background distinguish them. When all images of class “Dinosaurs” have the same simple background, the images of class “Buses” have different backgrounds. This explains why the “Dinosaurs” keeps a high precision while the “Buses” precision drops to 91%, when using the regions delimited by ellipses of scale s=2. An example on this point is reported in Fig. 8. In that figure, we see that 11 images from the 12 retrieved are relevant when using the scales s=0.1. Also, and as expected, this precision drops to 7/12, but one can remark that the retrieved images share a common characteristic, that is, the background (grass).
The impact on retrieval using some scales separately, with the parameters (a, b, U, N, S) = (30, 15, 2, 9, 3). (a) Using only the scales s=0.1 and (b) using only the scale s=2.
Remark also the significant precision improvement (+12%) of “Horses” using s=2, because all images share a common background (grass). For the other classes, generally they showed high (or comparable) precisions when using only s=2 compared when using s=0 and s=1. This can be explained by the size of external regions that have bigger size than the internal regions. Accordingly, they will catch more information. The same remarks hold for the set (a, b, U, N, S) = (50, 25, 1.4, 10, 2), but with different precisions.
The improvements using s=2 and s=1 for the sets (30, 15, 2, 9, 3) and (50, 25, 1.4, 10, 2), compared when using (s=0 & s=1 only) and (s=0 only) respectively, are not negligible for the classes “Africans” and “Food.” This can be interpreted by two reasons. Firstly, the images of these classes do not contain a specific central object like the images of buses class. Secondly, the images of these two classes do not contain different regions with different textures like beach images (sand and water). Furthermore, these two reasons clarify why the results of the LBP, which uses the global histograms, have given better or comparable AWP compared to GLIBP (Table 5), for these two classes only.
The effectiveness of our method is clearly shown in Table 5, where we compared it versus ElLBP of Bougueroua and Boucheham [18], LBP of Ojala et al. [6] and Improved LBP of Jin et al. [28] in terms of AWP.
In Tables 6 and 7, we compared the proposal with ‘ElLBP’ [18], ‘DLEP’ [19], ‘DCT-Based’ (with different filters) [23], ‘LDP16,2’ [31] and the results of the block-based ‘BLK-based’ method [33] as reported by Murala et al. [19]. These comparisons also show clearly the effectiveness of our method in terms of average precision (AP). As an exception, we see that the class “Africans” has a good result with LBP compared to our method, and this can be explained by the fact that the class “Africans” consists of peoples' images, wherein there is not a variety of textured regions; thus, the global histograms are more convenient than local ones.
Screenshots of the results obtained using the proposed method GLIBP and LBP method are shown in Figs. 9–12.
Comparison of the proposed method (a, b, U, N, S) = (50, 25, 1.4, 10, 2) with the results of some existing methods using Manhattan distance metric in term of average weighted precision
Comparison of the proposed method (a, b, U, N, S) = (50, 25, 1.4, 10, 2) with the results of some existing methods in terms of average precision (w=9)
4.2 GLIBP Efficiency vs. ElLBP [
18]
To prove the high performance of the proposed GLIBP method with respect to that of the EILBP [18] method, we have taken the number of regions as a metric. The best average precision obtained by EILBP in [18] was for the set of parameters (a, b, U, N, S) = (50, 25, 20, 10, 4). So, the number of histograms used is: 2×N×S, which yields 2×10×4=80 histograms. However, the proposed GLIBP method reached the best average precision with the set (a, b, U, N, S) = (50, 25, 1.4, 10, 2), i.e., with 40 histograms only. In other words, GLIBP used only half the number of histograms used by ElLBP. Furthermore, the AWP in the proposal is better than that of the ElLBP (GLIBP 76.36% vs. EILBP 74.64%).
Comparison of the proposed method ( a, b, U, N, S) = (50, 25, 1.4, 10, 2) with the results of some existing methods in terms of average precision
Results obtained by querying with image #127 (Class: Beaches). (a) GLIBP (precision=12/12) and (b) LBP (precision=9/12).
Results obtained by querying with image #232 (Class: Building). (a) GLIBP (precision=12/12) and (b) LBP (precision=7/12).
Results obtained by querying with image #548 (Class: Elephants). (a) GLIBP (precision=9/12) and (b) LBP (precision=4/12).
Results obtained by querying with image #787 (Class: Horses). (a) GLIBP (precision=12/12) and (b) LBP (precision=9/12).
5. Conclusions
In this work, we have proposed an improved version of LBP. The improvement is achieved by better exploiting the locality property, spatial information, and the orientation of regions in an original fashion. The effectiveness of the proposal is clearly shown through the obtained precisions on Corel dataset, using the two-distance metrics Manhattan and d1. The obtained results in terms of average weighted precision and average precision have shown significant higher or comparable performance of GLIBP with regard to some published methods, which qualifies it as a good tool for scene images retrieval.
For future works, we will investigate further the contribution of the regions of each scale separately. This will eventually enable improving the proposed method effectiveness through an appropriate weighing schema of different scales. In addition, we plan the generalization of the presented approach to other histogram-based methods for texture feature extraction and color feature extraction. Advanced tools like learning and deep learning are also interesting enough to be considered in the future.