Younghoon Jung and Daewon Kim*Feature Extraction of Non-proliferative Diabetic Retinopathy Using Faster R-CNN and Automatic Severity Classification System Using Random Forest MethodAbstract: Non-proliferative diabetic retinopathy is a representative complication of diabetic patients and is known to be a major cause of impaired vision and blindness. There has been ongoing research on automatic detection of diabetic retinopathy, however, there is also a growing need for research on an automatic severity classification system. This study proposes an automatic detection system for pathological symptoms of diabetic retinopathy such as microaneurysms, retinal hemorrhage, and hard exudate by applying the Faster R-CNN technique. An automatic severity classification system was devised by training and testing a Random Forest classifier based on the data obtained through preprocessing of detected features. An experiment of classifying 228 test fundus images with the proposed classification system showed 97.8% accuracy. Keywords: Faster R-CNN , Classification , Machine Learning , Non-proliferative Diabetic Retinopathy , Random Forest 1. IntroductionDiabetic retinopathy (DR) is a representative microvascular complication in diabetic patients and a major cause of impaired vision and blindness in worldwide. This disease occurs in approximately 60% of diabetic patients [1]. Early treatment through fundus examination can lower prevalence but the discovery and treatment tend to be delayed because there is no symptom perceptible by the patient in the early stage. For non-proliferative diabetic retinopathy (NPDR), which is an early stage of DR, pathological symptoms can be observed such as microaneurysms, hard or soft exudate, and retinal hemorrhage. Microaneurysm is a clinical finding that can be observed first in DR and then is abnormally expanded with the progress of DR. In severe cases, it may develop to proliferative diabetic retinopathy (PDR) [2]. In the case of retinal hemorrhage, the blood vessels in the retina can burst, causing blurred or impaired vision. In the case of hard exudate, intestinal juice leaks from the retinal blood vessels and leaves lipids. The blood cholesterol level can be estimated through the hard exudate. The Early Treatment Diabetic Retinopathy Study (ETDRS) [3] classified DR into PDR and NPDR, and subdivided NPDR into no apparent retinopathy, mild NPDR, moderate NPDR, severe NPDR, and PDR. In ophthalmology, different treatments are performed according to the grade under this classification system. It is necessary to classify the severity of the patient's retinal condition because each treatment method is different [4]. There is a growing need for research to improve specificity and sensitivity for early diagnosis and classification of DR. Many studies have been done to detect and classify the pathological symptoms of DR using various medical image processing and machine learning techniques. Representative examples are studies on medical image analysis using the convolutional neural network (CNN), which are deep learning artificial intelligence techniques based on the support vector machine (SVM) [5] and big data. Recently, Google conducted a study on DR based on deep learning [6] and there have been many researches on the severity classification of DR using the CNN. Pratt et al. [7] used more than 80,000 6M pixel images provided by Kaggle [8]. Studies on the diagnosis and classification of DR are being conducted continuously and more research and investment are required in the development of an automatic classification system for the disease. Das et al. [9] proposed a deep learning architecture which is based on segmented fundus image features for classification of DR showing maximum of 98.7% accuracy and used a DR dataset, DIARETDB1. Padmanayana and Anoop [10] researched on binary classification of DR using the CNN with fundus color images which are the same form of datasets used in this research and provided 94.6% of testing accuracy. Unlike the features used in this study, there are cases where texture features are used to classify various severity grades of DR through deep learning [11]. They used ResNet, DenseNet, and DetNet, which showed the highest accuracy of 96.35%. Hathwar and Srinivasa [12] studied a method for automatically grading the DR status present in the retinal fundus image using deep learning and showed sensitivity of 94.3% and specificity of 95.5% which are very similar to this study. There is also a study [13] that classified the stages of DR using the same Messidor [14] data used in this study. In that study, several types of enhanced deep learning method were used, mainly AlexNet, VGG, GoogleNet, and ResNet, and showed the accuracy of 99.66%. In some cases, a study [15] was conducted to automatically grade DR using ResNet. As a result of using various networks, the accuracy of up to 86.67% was shown and the Indian Diabetic Retinopathy Image Dataset (IDRiD) dataset was used. In the study [16] of classifying DR using the R-CNN, which is one of deep learning methods, the accuracy was 93% and it showed superiority of 7.4% and 37.83%, respectively, compared to the performances with the SVM or k-nearest neighbor (KNN). Also, there is a study [17] that classified the severity of DR using the Kaggle dataset. Here, the binary CNN was proposed and after comparing the performance using various existing deep learning networks, it showed an accuracy of 91.04%. In addition, there have been many studies to classify the severity of DR using deep learning. There are also papers [18-21] published after collecting the used structure and results of various methods in one place. This study detected microaneurysms, retinal hemorrhage, and hard exudate, which are the early-stage features of NPDR from fundus images using Faster R-CNN [22], and instantly diagnosed fundus condition by analyzing the pathological features using a classifier. The proposed system is performed mainly in two steps: detection step and classification step with diagnosis. Finally, a Random Forest classifier [23] was designed to classify severity by analyzing the ratio of pathological symptoms such as microaneurysms and retinal hemorrhage. 2. Diagnosis of Diabetic Retinopathy using Faster R-CNN and Random Forest MethodThis study designed a system that can quickly detect the pathological features of DR and classify severity grades regardless of brightness, contrast, and color tone of the fundus imaging system using the Faster R-CNN and Random Forest methods. First the pathological features of DR such as microaneurysms, retinal hemorrhage, and hard exudate are detected using Faster R-CNN method. In the second step, the retinal condition of DR is diagnosed using a Random Forest algorithm resulting in the severity grade classification. Fig. 1 shows the overall structure of the DR detection and automatic classification system. 2.1 Detecting the Features of Diabetic Retinopathy using Faster R-CNNLabeling work is performed to detect the features of DR. Microaneurysms and retinal hemorrhage are classified as Label 0 and hard exudate is classified as Label 1. The DR data for learning are in XML file format, which consists of four variables: path of the image file, total size of image, and the names and the coordinates of objects to learn. Examples of microaneurysms, retinal hemorrhage and hard exudate are shown in Fig. 2. For the pre-training network model, the Inception-ResNet-v2 [24] was used, which consists of multiple layers in the convolution step of CNN, and which has the advantages of reducing the calculation amount and improving the accuracy of complex convolutions. Then the learning procedure to use Faster R-CNN for detecting the pathological symptoms of DR is performed. Anchors were randomly created for 16 mini-batch sizes and the region proposal network (RPN) was trained for the number of repetitions. Based on the target regions created by the training of the RPN, Faster R-CNN is trained for the number of repetitions using the Inception-ResNet-v2, which is a pre-trained network and learning structure. Then, the training data of the convolution layer are shared based on the trained Faster R-CNN and the RPN is trained for the number of repetitions. The training is completed through fine-tuning of the shared convolution layer to make it advantageous for detecting objects. The detection method extracts a feature map from the pre-trained model and then hands it over to the RPN and region-of-interest (RoI) pooling layer. In the RoI pooling layer, the class scores and box coordinates for the detected objects are obtained. The structure and procedure of the Faster R-CNN used to detect the features of DR are shown in Fig. 3. Detected images were created by recognizing the learned pathological symptoms and inserting white boxes at the coordinates of the parts where pathological symptoms were detected in the original fundus images. The results calculated from the Faster R-CNN consist of the Pixel Values of the corresponding region based on the Coordinates of the region, Class Type determined through the labeling work, and Class Score which is a probability of belonging to the class. 2.2 Feature Information Preprocessing StepThe data refining and preprocessing steps are performed to detect microaneurysms, retinal hemorrhage, and hard exudate, which are the features of DR from the input fundus images, and to transmit the feature information to the Random Forest classifier. For the microaneurysms and retinal hemorrhage, the pixel values for the corresponding objects are transformed to grayscale as in Eq. (1). This is done so that the pixel values can be expressed as brightness values of 256 steps. Then, the pixels of each region are reverse-transformed using Eq. (2) and the histogram equalization process of Eq. (3) is applied. After an even distribution of brightness values is created and the boundaries are clearly distinguished in this way, the background region, microaneurysms, and retinal hemorrhage regions are separated. Next, the pixels are extracted as binary data using basic thresholding method as shown in Eq. (4) and the number of pixels of the hemorrhage region is derived. Fig. 4 shows the overall preprocessing steps from the detection of pathological symptoms.
(3)[TeX:] $$h(v)=R\left(\frac{c d f(v)-c d f_{\min }}{(M \times N)-c d f_{\min }} \times(L-1)\right)$$
To analyze the distribution of retinal hemorrhage and microaneurysms in the fundus images, the X and Y coordinates are derived based on the boxes of the extracted objects and the sum of the distances be¬tween objects is calculated as shown in Eq. (5). If one or fewer objects are detected, it is indicated as 1.
Table 1 shows the feature data transmitted to the Random Forest classifier through the preprocessing step which are used to classify the severity of DR. These features input to the classifier consist of the number of pixels occupied by microaneurysms and retinal hemorrhage, the maximum distance between each disease object, class type and class scores obtained from the Faster R-CNN. Table 1.
2.3 Diabetic Retinopathy Diagnosis and Classification SystemThe severity grades were classified by analyzing the ratios of the regions occupied by microaneurysms, retinal hemorrhage, and hard exudate in the entire retina for the input fundus images using the DR classification criteria of ETDRS. First, to calculate the area of the regions of pathological symptoms, the background region is removed from the input fundus images. Based on the information transmitted from the Faster R-CNN algorithm, the ratio of the total fundus region (C) in the retina is calculated with Eq. (6) using the pixel information of the detected microaneurysms and retinal hemorrhage (A), and hard exudate (B).
If the ratio [TeX:] $$\alpha$$ in the retina is lower than 0.258, it is classified as mild grade, and if it is higher than 0.258, the conditions are classified into moderate or severe grades. Here, the α value of 0.258 was derived by analyzing the distribution chart of the ratios of pathological symptoms by severity grade for 175 fundus images that were determined as moderate or severe grade by specialists of DR. This represents the criterion for separation is classified into mild, moderate, and severe grades. The whole algorithm that describes the classification method for the severity grades of DR by detecting pathological symptoms in input images and calculating the ratios of pathological symptoms is shown in Fig. 5. This research also used the Random Forest technique based on supervised learning which is a representative machine learning algorithm. The Random Forest algorithm is designed to avoid the over-fitting and under-fitting phenomena of decision tree. It learns the decision tree by randomly extracting some variables of the features from the dataset and creates a classifier, thus using an ensemble learning method that combines multiple models. Fig. 6 shows the result of data importance analysis by comparing the average of the information gain that each tree obtained from the set of decision trees comprising the Random Forest method. This table shows that the Random Forest classifier analyzed the number of pixels representing the disease area, the maximum distance between diseased objects, class type and class scores with 45%, 24%, 21%, and 10% of importance, respectively. Then the Fig. 7 shows the flowchart for the severity grade classification method of DR using the Faster R-CNN and Random Forest classifier. The overall sequence consists of detecting the pathological symptoms of DR using Faster R-CNN and classifying the severity grade using Random Forest classifier. The probability of pathological symptoms and the coordinates of the box are determined by applying Faster R-CNN to the input fundus images. Then, the exact number of hemorrhage pixels included in the box through the data preprocessing step and the distance values of the distribution of pathological symptoms are transmitted to the Random Forest classifier. Finally, the classifier analyzes the refined data and classifies the severity grade of DR. 3. Experiment and ResultsThe experiment was conducted using CUDA toolkit 10.2 in the CAFFE environment of TensorFlow installed in the Windows 10 operating systems and used OpenCV 4.0 for data preprocessing. Table 2 shows the hardware configuration, used language, tools and development environment for this experiment. The data used for learning and evaluating the algorithm that was proposed in this study were the Messidor dataset [14] and fundus images provided by the Dankook University Medical Center (DUMC). Table 2.
The Messidor dataset consisted of 653 fundus images that had been classified in terms of severity. The data received from DUMC consisted of 228 fundus images that ophthalmologists had classified for severity. Fig. 8 shows sample images of Messidor and the DUMC dataset and the data were classified by the severity grade of DR as shown in Table 3. For the Messidor data, 153 mild images, 247 moderate images, and 253 severe images were used as the learning data set for Faster R-CNN. Furthermore, 1,760 learning data were used that consisted of 767 data for microaneurysms, 649 data for retinal hemorrhage, and 344 data for hard exudate. To evaluate the classification system, 228 image data received from DUMC were used, which consisted of 53 mild images, 62 moderate images, and 113 severe images. Table 3.
3.2 Experiment using the Classification SystemEvery experiment used the Inception-ResNet-v2 model of the Faster R-CNN with the same dataset. An experiment for extracting the features of DR and for classifying severity were performed. The classification algorithms used a method based on the area ratio of the pathological symptoms and the Random Forest classifier. Fig. 9 shows the network model structure involved in the block diagram of the Faster R-CNN method. After the DR fundus image is input to the Inception-ResNet-v2 network, it is transferred to the RPN. At this stage, the object region in the input image is identified. Then the classifier block and object classifier of the bounding box regressor play a role in finding the appropriate box candidates from the RPN output. After that, it goes through ROI pooling and a Fully connected layers. Finally, each target object's bounding box and its category label are exported. The initial learning rate was started at 0.0001 and a total of 12,500 training epochs were run. Training was stopped when the value of the multi-task loss function was minimized. The model corresponding to the minimum loss value was used in the testing stage. To verify the performance, both the plain CNN model and the SVM were used as control methods. For evaluation, true positive (TP) was defined as classifying mild fundus images as mild; true negative (TN) was defined as classifying moderate or severe fundus images as moderate or severe; false positive (FP) was defined as classifying moderate or severe fundus images as mild; false negative (FN) was defined as classifying mild fundus images as moderate or severe. Sensitivity, specificity, and accuracy were defined by Eq. (7), (8), and (9), respectively, and used to analyze and evaluate the classification results.
3.2.1 Classification experiment based on the area ratio of pathological symptomsIn this experiment, the pathological symptoms of DR were detected and their severity grades were classified based on the area ratio of the pathological symptoms in the retina. Fig. 10 shows the detection result of pathological symptoms using the Faster R-CNN with the moderate fundus images of DR. In addition, the results are shown for the areas of hard exudate and retinal hemorrhage. Table 4.
Table 4 shows the ratios of pathological symptoms detected in the part of 228 fundus images from which the correlation between the fundus grade and the ratio of pathological symptoms can be confirmed. It can be seen that the ratios of hard exudate and retinal hemorrhage are low in the early stage of DR, but the ratios increase as the disease worsens. The severity grades were classified based on the ratios of pathological symptoms. Table 5 shows the results of sensitivity, specificity, and accuracy calculated by inserting the classifi¬cation results into the confusion matrix. As shown in Table 5, 156 TN images were classified accurately. However, classifying TP images was difficult because this included fundus images that could not be classified using the ratio of pathological symptoms in the data. The detection accuracy was 92.1% indicating a good result. However, the low sensitivity result of 78.26% compared to the specificity result of 98.11% suggests that classifying the severity grades of DR using just the area ratio of pathological symptoms of the retinas in the fundus images has limitations. Table 5.
3.2.2 Classification using the Random Forest methodIn this experiment, the severity grade of DR was classified using the Random Forest method based on the data extracted through the Faster R-CNN. First, the data of pathological symptoms were detected using the Faster R-CNN, as shown in Fig. 11(a). The data preprocessing step was performed to extract the area of the pathological symptoms and red lesions that were detected using the Faster R-CNN. Fig. 11(b) shows the resultant image of preprocessing performed for the image of pathological symptoms in Fig. 11(a). The data after preprocessing were used to calculate the number of pixels and area of the hemorrhage lesions. Then the maximum distance, class, and score of each object obtained from the Faster R-CNN were transmitted to the Random Forest classifier. The feature data were adjusted to values in the range of zero through one. Then, the total size of decision trees was adjusted considering the limited memory area. Also, the overall size of the decision tree was appropriately adjusted in the training stage to prevent overfitting and consequently it was reduced to create a stable model while observing performance changes. Table 6 shows a confusion matrix that summarizes the classification results. Table 6 shows that out of the 228 total test images, 58 and 165 images were classified as TP and TN, respectively. Compared to the classification result based on the area ratio of pathological symptoms in the retina in the previous experiment, the FN result was decreased from 15 to 3. This demonstrates that the use of a classifier based on the Random Forest method and the preprocessing of data detected through Faster R-CNN can derive excellent experimental results. Table 6.
3.2.3 Comparison of classification methodsIn order to evaluate the performance of the classifier as proposed in this study, the SVM and the CNN were selected as the algorithms to compare and they were evaluated using the same dataset. The experiment was performed with a linear model for the kernel of the SVM. After preprocessing with the same data as those used in the Random Forest classifier, the kernel size was adjusted and the model with the highest accuracy was used for the evaluation. For the CNN used in this comparison, convolution layers are increased to allow the network to learn deeper features. The network starts with convolution blocks with activation and then batch normalization after each convolution layer. All max-pooling is performed with kernel size 3×3 and 2×2 strides. The ReLu was used as an activation function and L2 regularization was used for weight and biases. The network was also initialized with Gaussian initialization to reduce initial training time. The loss function used to optimize was the widely used categorical cross-entropy function. The Messidor data were used to train the CNN and the experiment was performed for the 228 image data from the DUMC. Table 7 outlines the comparison and analysis for the results. Table 7.
The classification result using the SVM showed higher performance for the classification of TP, TN, FP, respectively compared to the results based on the area ratio of pathological symptoms in the retina. However, it classified four images as FP, suggesting that the performance improvement of the SVM algorithm should be considered. The SVM showed 1.3% superior performance than the results from the method using area ratio in a view of accuracy. The classification results using CNN showed 96.95% performance for specificity and a performance of 90.62% for sensitivity. It also showed 1.75% better result than that from the SVM. The proposed method which used the Faster R-CNN and the Random Forest classifier showed the best result of 97.8% in accuracy. Furthermore, an examination of the mild fundus images that had failed to be classified through SVM revealed that the retinal hemorrhage in a patient’s fundus image was similar to the one from a patient in whom retinal hemorrhage rarely occurred. Thus, failure to appropriately classify these data seems one of the reasons for the failure of total classification. The plain CNN also showed good results even though it did not analyze the pathological features of DR and it could overcome using relatively enough data. Table 8 shows another comparison results of performances with other existing classifiers. The performances of other existing methods are also evaluated with various evaluation indicators including accuracy. Sudarmadji et al. [13] showed the highest performance with 99.66%. The dataset used by each research group are Messidor [14], Kaggle [8], and IDRiD [15] etc. The study result of Hathwar and Srinivasa [12] showed that the kappa value was 0.88, indicating that the classification result was almost perfectly consistent. The study result of Ghan et al. [16] showed the F1-score of 92% with the accuracy of 93%. Most of the studies have been conducted in the form of classifying the normal and abnormal DR and classifying the severe grades in abnormal condition. Each study performed the classification into two to five classes. The results in Table 8 show that each used dataset, deep learning network architecture, and evaluation index were comparatively evaluated under different conditions and the accuracy of this research also shows good results when compared with those methods. Table 8.
4. ConclusionThis study proposed a DR detection and severity grade classification system with high accuracy using the Faster R-CNN and the Random Forest method. An experiment on the correlation between pathological symptoms and the severity grade of DR found that a data preprocessing step was necessary for efficient classification. The training for the Faster R-CNN algorithm applied in this study extracted features about the pathological symptoms of DR using given data. Then, appropriate classification results could be derived by training the Random Forest classifier system using the features of the data and the advantages of machine learning. Those features are composed of number of pixels for microaneurysms and retinal hemorrhage, maximum distance between each objects, class type and class scores. The proposed classification method based on the Random Forest classifier was analyzed for comparison with existing classification methods such as the SVM and CNN. The results confirmed that the proposed method achieved better results than other methods in terms of performance evaluation indices including 97.8% of accuracy. If the similar research results introduced in Section 1 of this paper were listed in order, the accuracy was 98.7% [9], 94.6% [10], 96.35% [11], 99.66% [13], 86.67% [15], 93% [16], and 91.04% [17], respectively. It can be seen that the results of this study are definitely not far behind. If a large body of meaningful data can be acquired for training, it will greatly help the classification of moderate and severe grades. In the future, we plan to expand the proposed method and research the design and development of a system that can automatically classify detailed severity grades while promoting the fast and objective judgment of testers by enabling real-time classification in conjunction with a fundus imaging system. Therefore, future work will be focused on more accurately classifying according to the severity of DR by improving and strengthening the internal structure and network of deep learning algorithms after obtaining more clinical data and undergoing more effective image preprocessing. BiographyYounghoon Junghttps://orcid.org/0000-0001-9161-657XHe received the B.S. (2017) from Dankook University, Yongin, Korea, and currently pursuing the M.S. in Department of Computers in graduate school of Dankook University. He worked as a graduate student researcher at Next Generation Terminals in Multimedia & SW Lab. His research interests include R-CNNs, deep learning, and neural networks. BiographyDaewon Kimhttps://orcid.org/0000-0001-6964-9535He received the M.S. (1996) from the University of Southern California, Los Angeles, CA, USA, and the Ph.D. (2002) in Electrical and Computer Engineering from Iowa State University, Ames, IA, USA. He is currently a professor in Department of Applied Computer Engineering at Dankook University, Republic of Korea. His research interests include image and signal processing, deep learning, mobile applications and nondestructive evaluation. References
|