1. Introduction
Nowadays, the digital image is still one of the most significant carriers to help people obtain large amounts of information. It is said that an image is worth innumerable words, which describes the fact that an image contains tremendous information. However, with the increasing use of modern sophisticated image editing software, such as Adobe Photoshop and GNU image manipulation program (GIMP), digital images are easily manipulated and altered without leaving any clues, and the credibility of image content cannot be identified even by trained observers. The tampered images altered maliciously by promulgator may pose the severe social problems, especially in medical diagnosis, court sentence, patent infringements, and insurance claims. One of the most famous events of the forged images is Iranian missile test in July 2008 [1] that is published on the front-pages of a few major news websites, including The New York Times, The Chicago Tribune, The Los Angeles Times, etc. The tampered photo was obtained from the web site of Iran’s Sepah News, as shown in Fig. 1. A genuine Iranian missile photo is exhibited in Fig. 1(a); while Fig. 1(b) and (c) show the published version of Iranian missile situation, in which the third missile from the left was digitally appended to the original photo to cover up the fact that it did not fire. A day later, the Associated Press news agency published the original photo (Fig. 1(a)) that further proved that published picture was synthetic. The similar event triggered by the counterfeit image is occurring every day. To validate the creditability of digital image content, image forensics technology is urgent to develop to avoid a huge loss of social benefits.
Actual event of Iranian missile: (a) genuine Iranian missile photo, (b) forged Iranian missile photo published on BBC NEWS, and (c) forged Iranian missile photo with marked region-duplication.
Since the advent of synthetic images, many researchers have been devoting to the field of image forensics aim at different image forgery manners, such as copy-move [2], splicing [3], resampling [4], filtering [5], and double JPEG (joint photographic experts group) compression [6]. The classification of images forensics technologies is given in [7], as shown in Fig. 2. Image forensics technologies are divided into two categories: active forensics and passive forensics. Active forensics verifies the integrity of auxiliary information to decide whether the image has been tampered with, for instance, digital signature [8] and digital watermarking [9]. However, this type of technology requires special software or hardware to insert the authentication information into images or extract authentication information from images before being distributed. Passive forensics verifies the authenticity of the image by analyzing its contents and structure. In this survey, we focus on blind image forensics technologies for copy-move. Copy-move is one of the most common manners to alter the images by manipulators, which is usually used to obscure the objects by flat regions or append the objects within a same image. Copy-move forgery detection (CMFD) technologies are mainly divided into two classes, block-based methods and keypoint-based methods, which will be discussed in detail in the later subsequent sections.
Image forensics technologies classification.
The rest of this survey is organized as follows. Section 2 starts with a brief review of two models of copy-move forgery, and then two frameworks of CMFD methods are described. Block-based CMFD technologies are presented in Section 3, and Section 4 shows keypoint-based CMFD technologies. In Section 5, datasets consisting of forged images with copy-move and performance evaluation criterions are collected to evaluate the performance of CMFD technologies. Section 6 gives finally the future directions of CMFD and conclusion.
2. Models of Copy-Move Forgery and Frameworks of CMFD
In this section, two models of copy-move forgery are reviewed. Two frameworks accordant with the diverse CMFD technologies are also presented, and most of CMFD schemes are based on these models.
2.1 Models of Copy-Move Forgery
In [10], after analyzing 100 natural images, the authors found that it is impossible that a single image has two similar areas larger than 0.85% of the image area. The goal is looking for two similar large areas in suspicious image, as shown in Fig. 3(a). They made a deduction as below:
Given an image I , the forged image I', must subordinate to: ∃ areas D1 and D2 are subsets of D and a shift vector d = (dx,dy), (they assumed that [TeX:] $$\left| D _ { 1 } \right| = | D _ { 2 } | > | D | * 0.85 \%$$ and |d| > L ), [TeX:] $$I ^ { \prime } ( x , y ) = I ( x , y ) \text { if } ( x , y ) \notin D _ { 2 }$$ and [TeX:] $$I ^ { \prime } ( x , y ) = I ( x - d x , y - d y ) \text { if } ( x , y ) \in D _ { 2 }$$, where D1 is the source region and D2 is the pasted region, D2 = D1 + 1. Nevertheless, Luo’s model cannot describe the forgery way that a copied region is pasted to two or more places, and the copied region is rotated before being pasted. Plain copy-move forgery is shown in Fig. 3(a).
Two models of copy-move forgery: (a) Luo’s model and (b) Liu’s model.
To remedy the defect of Luo’s model, Liu et al. [11] presented a more comprehensive copy-move forgery model, as shown in Fig. 3(b). They assumed that the shift vector threshold is [TeX:] $$\boldsymbol { V } _ { \mathrm { T } } = \left[ V _ { \mathrm { tx } } , V _ { \mathrm { ty } } \right]$$ , and copied region threshold (the ratio of the copy area and the whole image area) is AT. An image I is forged to I' via copy-move manipulation, if
1) The copied region [TeX:] $$C _ { i } , i \in \{ 1,2 , \cdots , n \}$$ is single-connected and has no hole inside, and its area is greater than ATa(I), where a(I) denotes the area of I.
2) Supposing the pasted region of copied region Ci is Mi, there might be many duplicated region pairs [TeX:] $$\left\{ C _ { 1 } \left\| M _ { 1 } , C _ { 1 } \right\| M _ { 2 } , \cdots , C _ { n } \| M _ { n } \right\} , C _ { i } , M _ { i } \in I ^ { \prime }$$, which satisfy [TeX:] $$C _ { i } \neq C _ { j } , \forall i \neq j , i , j \in \{ 1,2 , \cdots , n \}$$ and [TeX:] $$C _ { i } \cap C _ { j } = \phi$$. For any pair Ci||Mi defining the origin of the reference system as the center of rotation, the copy-move forgery can be considered as translation after rotation, described by
where f denotes the pixel value at the position (x,y);∆x and ∆y is the shift distance along x and y axis, respectively; and θ is the rotation angle. However, the pasted region may be altered by other operations, like scaling that changes one-to-one mapping. Therefore, a more generalized copy-move model needs to be proposed. A good CMFD scheme can detect the duplicated regions even if the pasted region is distorted by blurring, rotation, noise contamination, scaling, or JPEG compression.
2.2 Frameworks of CMFD
In this subsection, two frameworks of CMFD methods are presented, and most of CMFD schemes adhere to the two frameworks.
Block-based and keypoint-based CMFD methods always follow the framework as shown in Fig. 4(a).
1) Pre-processing: The suspicious image is processed by a series of operations. For example, adopt Wiener filter [12] or dyadic wavelet transform (DyWT) [13] for de-noise; convert RGB (red, green, blue) color space into grayscale space [14,15], YCrCb color space [16], HSV (hue, saturation, and value) space [17], or color local binary pattern (LBP) space [18]; perform discrete wavelet transform (DWT) [19] or Gaussian pyramid decomposition [11] to obtain the dimension reduction presentation of the image.
2) Feature extraction: Image segmentation is used for feature extraction in block-based CMFD methods. The image is divided into overlapping square image blocks [20], non-overlapping square image blocks [21], or overlapping circle image blocks [22]. Besides, simple linear iterative clustering (SLIC) method [23] is also used for image segmentation [24]. A great deal of features are extracted for CMFD, such as discrete cosine transform (DCT) [14], Fourier-Mellin transform (FMT) [25], 2D-Fourier transform [26], polar harmonic transform (PHT) [27], singular value decomposition (SVD) [28], LBP [22], Zernike moment [29], Hu moment [11], scale-invariant feature transform (SIFT) [15], speeded up robust features (SURF) [30], Harris corner features [31], DAISY [32], etc.
3) Feature matching: Feature matching is the procedure of finding similar feature vectors. To narrow the range of similar feature vectors, sorting algorithm causes similar features to be adjacent, such as lexicographic sorting [19] and radix sorting [33]. Besides, k-d tree [18] or locality-sensitive hashing (LSH) [34] speeds up the process of finding similar feature vectors a lot. In addition, there are many methods to evaluate the similarity between the feature vectors, such as Euclidean distance and Manhattan distance, which are corresponding with Eq. (2) and Eq. (3), respectively.
where v1 and v2 are n-dimensional feature vectors.
4) Localization and post-processing: If the regions determined by the process of feature matching are shown in the map, there will be many isolated points, and morphologic operations [35], filtering [33] or random sample consensus (RANSAC) algorithm [36] is usually used to refine detected regions.
Another model is based on machine learning, as shown in Fig. 4(b). The classifier is trained by trained image set with labels and the trained classifier will determine whether the test image has been tampered or not, like support vector machine (SVM) classifier [16]. However, CMFD methods based machine learning only decide whether the test image has been forged or not. It is still a challenging problem for this type of methods to locate tampered region.
Two frameworks of CMFD methods: (a) framework of block-based CMFD and keypoint-based CMFD and (b) framework of CMFD based on machine learning. Adapted from G. K. Birajdar and V. H. Mankar. Digital image forgery detection using passive techniques: a survey. Digital Investigation 2013;10(3):226-245, with the permission of Elsevier [ 37].
3. Block-Based CMFD Methods
Diverse CMFD methods are briefly described in this section, including DCT-based, wavelet transform-based, PHT-based, LBP-based, Zernike-based, SVM-based, etc. A table is given to compare the performance of CMFD methods from various aspects at the end of this section.
3.1 DCT Based Algorithms
Fridrich et al. [14] first proposed the DCT-based method for CMFD. The image is divided into fixedsize overlapping image blocks at raster-scan, and DCT is performed on each block. The quantization feature vector is obtained by performing zigzag scanning on the quantized DCT coefficient matrix. The feature matrix is lexicographically ordered and Euclidean distance is used for similarity judgment. However, this method has high computation cost. To reduce computational complexity, Huang et al. [38] truncated the feature vector by using a constant to reduce the dimensionality of the feature and presented a scheme to judge similarity between feature vectors. Mahmood et al. [39] used Gaussian radial basis function (RBF) and kernel principal component analysis (KPCA) to reduce the dimensionality of the feature vector that ameliorated the efficiency in the feature matching process. Cao et al. [40] divided the inscribed circle of a square image block into four non-overlapping parts and extracted mean of each part coefficients as the feature to detect duplicated regions. Fadl and Semary [41] divided feature vectors into several groups by using fast k-means algorithms and searched duplication-region in each group. In [42], the authors divided the image into smooth and complex by using edge detection information, and if the image is smooth (complex), it is divided into big (small) blocks. DCT coefficients were set as 0 or 1 according to the rules in [43] for CMFD. Alkawaz et al. [44] studied the effects of different block size on the performance of CMFD method. In [45], package clustering algorithm is used to divide the DCT feature vectors and coordinates into different packages, and then find similar feature vectors in each package. Zhao and Guo [46] presented a scheme combining DCT and SVD for CMFD. Doyoddorj and Rhee [47] used quantized DCT coefficients obtained by performing DCT in Radon space of each image block and detected copy-move regions. Ustubioglu et al. [48] proposed a CMFD algorithm based on LBP and DCT.
3.2 Wavelet Transform Based Algorithms
In this subsection, different wavelet transform algorithms are collected. The image is decomposed into four parts by DWT, including approximate component sub-band LL, horizontal detail sub-band LH, vertical detail sub-band HL, and diagonal detail sub-band HH. In [19], LL is divided into overlapping image blocks and singular value vector is the feature vector obtained by performing SVD on each block. Kashyap and Joshi [49] extracted blur moment invariants from each block and performed PCA on blur moment invariants matrix to reduce the dimensionality. DyWT [50] is shift invariant and captures the structure in a better way than DWT. In [51,52], authors presented a CMFD scheme by using the fact that copied region and pasted region should exhibit similarity between them in LL, while copied region and pasted region in HH should exhibit high dissimilarity between them in noise pattern. A similar approach in [53] utilized singular value vectors obtained from overlapping blocks in LL and HH to detect duplicated regions. However, LL and HH are obtained by stationary wavelet transform (SWT) [54].
3.3 Other Transforms Based Algorithms
In [55], the authors offered a CMFD scheme that used image block feature mapped to log-polar coordinates and phase correlation to search duplicated region. Bravo-Solorio and Nandi [56] computed 1D descriptor invariant to rotation and reflection by summing a log-map along the log-radius axis to make the localization of duplication-region more precise. In [57], they proved the effectiveness of their scheme by various experiments and comparisons with other CMFD methods. However, interpolation from the Cartesian coordinates to a log-polar gridding reduces precision and results in considerable errors in low image resolution or small block-size. By using rotation and scale invariant features, Wu et al. [58] proposed a CMFD scheme to detect region-duplication in a forged image by using log-polar fast Fourier transform (LPFFT), while Park et al. [59] presented a scheme to detect region-duplication by utilizing the feature extracted from the up-sampled log-polar Fourier (ULPF) descriptor. By introducing an adaptive phase correlation method in the log-polar coordinate system and utilizing the information extracted from the band limitation, Yuan et al. [60] presented a robust CMFD method which can handle with large scaling operation.
Yap et al. [61] proposed PHTs-based method against rotation, which included polar sine transform (PST), polar cosine transform (PCT), and polar complex exponential transform (PCET). PCET has relatively better performance than the other two transforms. In [27,62], the authors divided the image into overlapping circular blocks, and PST was performed on each circular block to extract feature. After filtering and morphological processing, the duplicated regions were found. Li [34] extracted PST coefficients of each block as the feature and searched duplicated region by approximate nearest searching and LSH to achieve CMFD. Ganty and Kousalya [63] realized spectral-hashing-based PCT image CMFD algorithm. In [64], the authors proposed a PCET-based CMFD scheme in which LSH was used for identifying the potential similar image blocks. Bi et al. [65] extracted color texture descriptor and invariant moment descriptor calculated from the PCET moments to solve the problem of searching duplicated regions. Wo et al. [66] presented a CMFD method based on multi-radius PCET that can detect the pasted region with large-scale scaling and rotation. In [67], the authors proposed an efficient discrete Radon polar complex exponential transform (DRPCET)-based scheme for extracting the scaling and rotational invariant features for CMFD. It is worth mentioning that they introduced an auxiliary circular template to construct invariant feature, as shown in Fig. 5. Zhong et al. [68] extracted the discrete radial harmonic Fourier moments (DRHFMs) from each circular block with the help of the circular template (Fig. 5).
A circular template in Cartesian space. Adapted from J. Zhong et al. A new block-based method for copy move forgery detection under image geometric transforms. Multimedia Tools and Applications 2017;76(13):14887-14903, with the permission of Springer [ 68].
In [25], the authors proposed FMT-based CMFD scheme in which counting bloom filters, rather than lexicographic sorting, were used to save computational time. Li and Yu [69] improved Bayram’s scheme in which distance vectors are clustered by a vector erosion filter that is robust to rotation and scaling. After introducing analytical Fourier-Mellin transform (AFMT), Zhong and Gan [70] proposed discrete analytical Fourier-Mellin transform (DAFMT)-based CMFD scheme. However, as pointed in [67], the defect of AFMT is too complicated, especially the construction of its invariant moment.
Ketenci and Ulutas [26] applied 2D-Fourier transform on each overlapping square block to extract feature for achieving CMFD algorithm. After performing the Fourier transform of the polar expansion on the overlapping windows pair and implementing an adaptive band limitation to construct a correlation matrix, Shao et al. [71] offered a CMFD scheme by estimating the rotation angle of the forged region and using search algorithm to locate the duplicated regions. In [72], the authors utilized four features extracted from Fourier transform coefficients of each circular block to achieve CMFD algorithm. After extracting electromagnetism-like (EMag) mechanism descriptor from each nonoverlapping block, Dadkhah et al. [73] applied discrete Fourier transform (DFT) to EMag features to achieve CMFD. By using city block filter, horizontal filter, vertical filter, and frequency filter, Huang et al. [74] offered a threshold-free CMFD scheme by combining the features including fast Fourier transform (FFT), SVD, and PCA.
3.4 LBP and Moment Invariant Based Algorithms
Li et al. [22] divided the image into overlapping circular blocks and extracted features using rotation invariant uniform LBP to detect duplicated regions. Davarzani et al. [75] presented an efficient scheme for CMFD using multi-resolution LBP (MLBP), in which k-d tree is used to save time and RANSAC is used to remove the possible false matches. In [76], the object is detected by normalized cut segmentation, and then, with the help of Hessian method, local interest points are localized. Duplication-region is found by using center-symmetric LBP (CSLBP). Yang et al. [77] also used uniform LBP to detect duplicated region. What is different with [22] is that the authors used a shift-vector counter instead of the block matching. Tralic et al. [78] combined cellular automata (CA) and LBP to extract feature vectors for CMFD. In [79], the authors proposed a CMFD method using binary gradient contours (BGC), and they proved the performance of their schemes is superior to many LBP-based methods.
Ryu et al. [29] presented a CMFD algorithm by using the magnitude of Zernike moment invariant against rotation. In [80], the authors combined LSH and RANSAC to improve the accuracy and efficiency of CMFD scheme based on Zernike moment. Al-Qershi and Khoo [81] adopted a grouping method [82] for block matching to improve detection accuracy. Thuong et al. [83] extracted foreground of the image by morphological technology and performed the wavelet transform to the foreground to extract approximate component. Zernike moment is used for CMFD in [83]. Mahmoud and Abu- Alrukab [84] proposed pseudo-Zernike moment (PZM)-based scheme for CMFD and improved Zernike moment based method.
Mahdian and Saic [85] proposed a CMFD method based on blur invariant moment constructed by applying the algorithm [86]. Du et al. [87] combined the 1D moment, the 2D moment, and the Markov feature to present a CMFD algorithm based on multiple features, where 1D moment is the feature of 1D histogram and 2D moment is the feature of 2D histogram in the horizontal and vertical direction. Imamoglu et al. [88] extracted Krawtchouk moment to detect the duplicated region in the forged image. Liu et al. [11] extracted Hu moment for CMFD and Kushol et al. [89] combined Hu moment and Lab color space-based feature for CMFD.
3.5 Other Algorithms
Popescu and Farid [90] utilized PCA to reduce the dimensionality of block features and proposed a scheme to detect duplicated regions in forged images. Kakar and Sudha [91] extracted the features by using the MPEG-7 image signature tools and presented a novel technology for CMFD. Malviya and Ladhake [92] employed auto color correlogram (ACC), a feature used in image retrieval, to obtain feature vector and detected duplicated regions successfully. In [93], the authors presented a CMFD method against scaling operation. Vladimirovich and Valerievich [94] proposed a plain CMFD algorithm using structure pattern and 2D Rabin-Karp rolling hash, which achieves zero false negative error and fast execution speed for the images with high resolution. On the basis of the method in [94], Kuznetsov and Myasnikov [95] presented a CMFD scheme by using a hash value calculation in a sliding window mode. Kashyap et al. [96] combined SVD and cuckoo search algorithm that can automatically generate suitable parameter value for each image.
PatchMatch [97] is a fast approximate nearest-neighbor search algorithm for block matching [98- 100]. In [98], the authors modified the basic PatchMatch algorithm and proved its efficiency by using CMFD based on Zernike moment and CMFD based on RGB value, respectively. They presented two detectors in [99] that can detect forged regions in the spliced image and copy-move image, respectively. By utilizing the invariant features and a suitably modified version of PatchMatch, Cozzolino et al. [100] achieved a CMFD scheme that has a good robustness to various types of geometrical distortion.
In [16], the authors extracted multi-resolution Weber law descriptors (WLD) as the feature and trained a model by SVM in which the accuracy of their method can reach up to 91%. After training models by SVM with MLBP and multi-resolution WLD, respectively, Hussain et al. [101] found that multi-resolution WLD performs better than multi-resolution LBP in detecting splicing and copy-move forgeries. In [102], steerable pyramid transform (SPM) is performed on the chrominance channels Cr and Cb to obtain some multi-scale and multi-oriented sub-bands. The feature vector is produced by concatenating the histograms from each sub-band and SVM uses the feature vectors to classify images into forged or authentic. Rao and Ni [103] presented a CMFD scheme based on the deep learning technology including SVM and convolutional neural network (CNN). A 10-layer CNN is used to automatically learn hierarchical representations from the RGB images. Dense features extracted from the test image are obtained by using the pre-trained CNN, and a feature fusion technology is designed to obtain the discriminative features for SVM classification.
State-of-the-art block-based CMFD algorithms and some of the classical schemes are described in Table 1. Table 1 describes the methods from several aspects including pre-processing, feature extraction, method for searching similar blocks, post-processing, performance, and dataset.
In Table 1, several aspects need to be explained. The collected data are the basis of the corresponding literature, and the detail information can be found in literature. In ‘Feature extraction’ column, GLCM is the abbreviation of gray-level co-occurrence matrix; CLD is the abbreviation of color layout descriptors; CHT is the abbreviation of circular harmonic transforms; and feature1 is three averages of the red, blue, and green color of the pixels and entropy. In ‘Performance’, single/multiple means the method can detect the number of the forged region. AWGN means additive white Gaussian noise; [max_value,min_value] means the range of relevant processing with maximum value and minimum value; and min_value:step:max_value means the value ranges from maximum value and minimum value with the step. The parameters in ‘Performance’ need to be distinguished by readers to read relevant literature because of the difference of researchers’ comprehension. In ‘Dataset’, datasets for CMFD will be listed in Section 5.2, and basic datasets used by researchers to create their own datasets for CMFD are also listed in this survey, such as UCID [104], National Geographic [105], ImageNet [106], Kodak [107], DOCR [108], PIMPRCG [109], USC-SIPI [110], KSU [51], and Caltech-256 [111].
Block-based CMFD methods comparison
4. Keypoint-Based CMFD Methods
Typical keypoint-based CMFD methods selected from many keypoint-based methods are presented in this section, such as SIFT, dense scale-invariant feature transform (DSIFT), affine-scale-invariant feature transform (ASIFT), SURF, Harris corner feature, DAISY, mirror reflection invariant feature transform (MIFT) [112], multi-support region order-based gradient histogram (MROGH) [113,114].
In [115], the authors extracted SIFT descriptor as the feature, and best-bin-first (BBF) search method is used to match the similar feature. Pan and Lyu [15] estimated the geometric transform between matched SIFT keypoints and found the duplicated regions. Amerini et al. [116-118] proposed SIFTbased CMFD methods. In [116], maximum likelihood estimation (MLE) of the homograph and RANSAC algorithm are used for geometric transformation estimation. Amerini et al. [117] proposed a generalized 2NN test for multiple duplicated regions localization and agglomerative hierarchical clustering is used to identify the possible cloned region. In [118], they introduced J-Linkage algorithm to improve their works. Jin and Wan [119] used non-maximum value suppression and optimized JLinkage to ameliorate the performance of SIFT-based CMFD methods. In [120], the authors proposed a SIFT-based CMFD scheme by using the SIFT keypoints extracted from actual part obtained by performing DyWT to the image. Different with the manner of converting the color image into the gray image in pre-processing, Gong and Guo [121] extracted the color gradient from the suspicious image and took the gradient as the only input for SIFT extraction. In [122], the authors converted the color image into HSV. To solve value setting, Zhao and his colleagues [123,124] proposed a CMFD method based on SIFT with particle swarm optimization (PSO). ASIFT [125] and DSIFT [126] are also used in CMFD, respectively. In [127], the authors used expectation maximization (EM) algorithm to estimate the transform matrix. Warif et al. [128] presented a CMFD method that combined SIFT-based CMFD scheme with symmetry-based matching.
Shivakumar and Baboo [129] proposed a CMFD scheme based on SURF and k-d tree was used for feature matching. Mishra et al. [130] combined SURF and hierarchical agglomerative clustering (HAC) and presented a CMFD method. After multi-scale analysis and voting processes, Silva et al. [131] presented a CMFD scheme by using SURF as the feature. By combining adaptive minimal-maximal suppression (AMMS) and SURF, Yang et al. [132] presented a CMFD method to solve the problem of insufficient keypoints in the uniform area. SLIC was used for image segmentation, and SURF was used as the feature to find the duplicated region in [24].
Chen et al. [133] proposed a CMFD scheme based on Harris corner points and step sector statistics, in which BBF algorithm was used to find duplicated region. By combining Harris corner points and LBP, Zhao and Zhao [134] presented a scheme to detect region duplication in images. Wang et al. [135] used the statistical features of the Harris corner keypoints neighborhoods as forensics feature, and a new feature matching method was used for the improvement of detected accuracy. Combining the angular radial partitioning and Harris keypoints, Uliyan et al. [136] presented a CMFD scheme.
In recent several years, many CMFD schemes based on hybrid keypoints have been proposed and implemented, such as SIFT, SURF, and Harris corner [137], SURF and SIFT [138], SURF and binary robust invariant scalable keypoints (BRISK) [139], Harris corner points and BRISK [140], SURF, SIFT, and histogram oriented gradient (HOG) [141,142], MROGH and Harris corner points [143], and KAZE and SIFT [144].
State-of-the-art keypoint-based CMFD algorithms and some of the classical schemes are described in Table 2. Table 2 describes the methods from the below aspects: feature, performance, and dataset. In ‘Performance’, the second item is the visualization form. In ‘Dataset’, SATA-130 is included in FAU.
Keypoint-based CMFD methods comparison
5. Performance Evaluation Criterions and Datasets
5.1 Performance Evaluation Criterions
The performance of CMFD methods is usually evaluated from two aspects: the image level and the pixel level. The most frequently used performance evaluations are Precision p, Recall r, and F1 score [145], which are shown in Eqs. (4)–(6), respectively.
where TP denotes the number of doctored images correctly detected as doctored images; FP denotes the number of authentic images erroneously detected as doctored images; and FN denotes the number of doctored images falsely detected as authentic images, at image level. At pixel level, TP denotes the number of correctly detected as doctored pixels; FP denotes the number of falsely detected as doctored pixels; and FN denotes the number of falsely detected as authentic pixels. The larger the p, r, and F1 are, the higher the accuracy of the CMFD scheme is.
Zhao and Guo [46] presented another evaluation criterion at pixel level, the detection accuracy rate RDA and the false positive rate RFP, which are shown in Eqs. (7) and (8), respectively.
where | | denotes the area of the copied region or pasted region, ∩ denotes the intersection of two regions, - denotes the difference between two regions, ψC denotes the pixels of the copied region, ψP denotes the pixels of the pasted region, ψDC denotes the pixels of detected copied region, and ψDP denotes the pixels of detected pasted region. The closer RFP is to 0 and RDA is to 1, the higher the accuracy of the CMFD method is.
5.2 Datasets
Diverse datasets for CMFD are listed in this sub-section. A good dataset for CMFD should have the original images, the forged images, the distorted forged images, and their corresponding ground truth maps, as shown in Fig. 6, which are from the CoMoFoD dataset [146]. Some commonly used datasets for the evaluation of CMFD methods are collected in Table 3, and their corresponding links are shown in References.
Besides these datasets mentioned above, many methods created their own datasets by using images from the Internet and other datasets that are also collected in this survey [104-111,151].
Example of the CoMoFoD dataset: (a) original image, (b) forged image, (c) forged image with image blurring, and (d) ground truth map.
6. Future Direction and Conclusion
6.1 Future Direction
On the basis of the existing problems in current research status, several future directions for CMFD research are provided in this subsection based on the existing problems.
• Benchmark dataset. A dataset is indispensable to evaluate the performance of CMFD method. Dataset for CMFD evaluation should include original images and corresponding forged images with different resolution, diverse forged images with regions (smooth or texture) which have different size in various geometric transformation (rotation, scaling, etc.), the forged region saved individually as images, the distorted images with post-processing methods (JPEG compression, AWGN, noise contamination, blurring, etc.). Besides, the corresponding ground truth maps and post-processing methods with open-source code (MATLAB, OpenCV) also should be included in dataset.
• Effectiveness and robustness. CMFD methods should be effective to detect the forged regions in distorted doctored images as mentioned in benchmark dataset. It is worth exploring efficient local invariant feature and descriptors extraction, high-speed method of feature matching, and accurate localization method.
• Deep learning. It is relatively few CMFD methods based on deep learning. The application of deep learning is only used in the classification of authentic images and forged images, and it is hard to determinate the accurate forged regions. It is also a difficult problem that these CMFD methods based on deep learning are hard repeatable and used for comparison because of the difference of the training set and testing set or the complex experiments. The researchers study this topic by using deep learning technologies in the future, such as deep Boltzmann machines [152] and CNN [153].
6.2 Conclusion
Passive forensics technology of digital image is one of the rapidly growing fields of research. Our brief review of image CMFD technologies indicates that the research is still in the phase of vigorous development and has a huge potential for the future research and development applications. Two classical models of copy-move forgery and two frameworks of CMFD technologies are presented at first. Then, block-based and keypoint-based CMFD methods are reviewed from different aspects, respectively, including the classical CMFD technologies and the state-of-the-art algorithms for CMFD in recent several years. The performance evaluation criterions and frequently used datasets for evaluating the performance of the CMFD schemes are collected. The future directions of this topic are given at last. With the help of the advanced technologies, some CMFD schemes with high performance are expected to become standard tools in the future. We also hope that this survey will provide related information to scientists, researchers, and relevant research communities in this field. The investigation on image forensics is still a continual, sustainable process and it will continue to explore forensics technologies with high accuracy and robustness.
Acknowledgement
This work was supported by the National Natural Science Foundation of China (No. 61702303, No. 61201371); the Natural Science Foundation of Shandong Province, China (No. ZR2017MF020, No. ZR2015PF004); and the Research Award Fund for Outstanding Young and Middle-Aged Scientists of Shandong Province, China (No. BS2013DX022). The authors thank Surong Zhang, Xiuhong Wei, and Chi Wang for their kind help and valuable suggestions in revising this paper.