1. Introduction
In the 1980s, several countries like the United States and Canada pioneered precision agriculture. Using remote sensing technology, geographic information technology, global satellite positioning technology, and computer automatic control technology monitor and manage agriculture in real-time. These tech¬nologies bring many conveniences to agriculture. For example, the optimal amount of fertilizer appli¬cation can be determined.
Furthermore, the production can be increased, and the cost can be reduced under the premise of reducing pollution. Early detection and prevention can effectively slow down the spread of crop diseases. At the same time, fewer drugs can be used to prevent or control crop diseases ahead of the stage, which can reduce pollution to the environment. Timely as well accurate crop information is of great significance for social economic and the environment.
Since precision farming was proposed, which initiated a new research field in agriculture, it has brought many problems and challenges, such as the effect of environment, plant diseases, crop yield, food safety, and health. Meanwhile, accompanying the emergence of big data technology, machine learning (ML), which motivates farm production, in favor of reducing the impact of environment and maintaining sustainability, is used for trying to resolve those challenges. What’s more, because of ML’s excellent computing ability, it can make better use of quantitative and qualitative analysis of data in smart farming operational environment. ML has been applied to many areas, such as agriculture [1-3], medicine [4,5], human-robot interaction [6-8]. ML is the practice of having computer simulating human learning, acquiring new knowledge, continuously improving performance, and achieving an intelligent self-improvement method. However, designing the feature extractor of ML requires careful engineering and considerable domain expertise, which is time-consuming and demanding on human, material, and financial resources. The conventional ML does not satisfy that yet.
In 2006, Geoffrey Hinton proposed that deep belief networks (DBN) can use unsupervised layer-by-layer greedy training algorithm [9] with bringing hope for training Deep Neural Networks. Then, deep learning (DL) is widely spread out. The most typical and representative DL models are restricted Boltzmann machine (RBM) [10], autoencoder (AE) [11], convolutional neural network (CNN) [12], and recurrent neural network (RNN) [13]. In the process of DL, DL learns data representation with multiple levels of abstraction through computational model that is composed of multiple processing layers. The dimension of the data is reduced, and the concise description is created by feature extraction from data input in DL. All the samples are labeled step by step. In other words, a DL network trains all of the sample data. Compared with the traditional ML, DL has attracted full attention from researchers because of the advantages of DL in various application fields. Many scholars have made remarkable achievements in image classification [14,15], speech recognition [16], and image recognition [17,18] by using the concept of the DL networks.
DL can effectively extract various features of image and structured data. Hence, DL may combine with agricultural machinery to support the development of agricultural intelligent machinery equipment. In recent years, DL-based research results also keep emerging in the field of agriculture. This paper investigates the applications and techniques of DL in agriculture. This paper aims to provide a reference to the DL methods for agricultural researchers. This paper can be helpful for researchers to retrieve the literatures related to the research problems quickly and accurately.
2. Scope
In recent years, the applications of DL in agriculture have achieved remarkable results. Firstly, we searched papers based on keywords of agriculture, deep learning, framing, and convolutional neural network from the databases such as Web of Science, IEEE Xplore, Google Scholar, and Baidu Scholar. By this means, we filtered out articles that involved DL but did not apply to agriculture. There are 54 remaining. Then, out of 54, we selected 32 papers that are sorted by citation index from high to low and are published in the last 5 years. This sort of ranking ignores some areas that get less attention but is meaningful. The papers are studied in terms of research problems, proposed solutions, data sources, and results. This paper aims to survey DL techniques and their applications in agriculture to provide a guideline and timely reference for the research communities. The full use and technical analysis of each paper is the main difference between this paper and other surveys.
Several fascinating investigations have been published on the subject of DL in agriculture, and the examples include a survey of DL in agriculture [19], and a review of crop yield prediction and nitrogen status estimation using ML in agriculture [20]. Studies close to the topic of the paper are listed in Table 1. According to the survey of relevant review articles in the last 5 years, we can see from Table 1 that the articles in 2018 were the most. Since 2018, researchers have been indirectly paying more and more attention to the applications of DL in agriculture.
The structure of this paper is as follows. Section 3 gives a brief introduction to DL. Section 4 presents a survey of the selected documents. Section 5 discusses the technical analysis of the papers, and Section 6 discusses the survey. Section 7 finally concludes the paper.
Studies close to the topic of the paper
3. A Brief Introduction to Deep Learning
DL finds distributed characteristic of data by combining low-level feature to form more abstract high-level representation of attribute categories or features. Its motivation is to build neural networks to simulate the human brain for analytical learning. DL interprets data (e.g., text, images, video, and sound) by mimicking the way the human brain work.
3.1 Artificial Neural Network
The concept of DL was derived from the study of artificial neural network (ANN). A DL structure consists of a multi-layer perceptron with multiple hidden layers. ANN refers to a series of neurons connected in an acyclic diagram. During ANN training, the gradient becomes more and more sparse and tends to converge to the local minimum. Back-propagation (BP) is not ideal when only a few layers of the network use the typical algorithms of traditional multi-layer network training. A simple neural network consists of three parts: input layer (i), hidden layer (j), and output layer (k), as shown in Fig. 1.
Process of BP neural network model.
At the input layer, variables are input. Computation is performed in the hidden layer, and output is produced at the output layer. The hidden layer contains neurons which rely on activation functions to execute operations.
Between them, the transfer function f of each node must satisfy the conditions of everywhere derivative. The most common function is sigmoid. If the network produces the desired outputs of the kth neuron [TeX:] $$y_{k}^{*},$$ the squared error function of the network is as follows:
The BP algorithm modifies the weight according to the negative gradient of the error function in (1). The weight update formula expressed as follows where l denotes the number of layers:
The BP neural network has good nonlinear mapping ability. It can automatically learn feature from training datasets. The training process of the BP includes forward propagation of signal and backpropagation of error. The error output is calculated in the direction from input to output, while the adjustment weight and threshold are calculated in the direction from output to input. The training process of the BP neural network model is shown in Fig. 2.
3.2 Convolutional Neural Network
CNN is an algorithm of DL which consists of a deep feedforward ANN. With the shared weight, high layering ability, and learning ability, CNN is capable of resolving more complex problems with a larger model and producing gratifying results. CNN has also made significant breakthroughs in large quantities of applications, such as speech recognition [28-30], language translation [31-33], image recognition [34-37], information retrieval [38-41]. Nevertheless, it is not comparable with ANN in solving large-scale problems.
CNN is usually made up of three parts: convolutional layer, pool layer, and full connection layer, as shown in Fig. 3. The items from left to right in Fig. 3 are the input, convolutional layer, rectified linear units (ReLU) layer, pooling layer, convolutional layer, relu layer, pooling layer, full connection layer, and softmax layer. The convolutional layer forms a set of filters to extract various features from an image. The pooling has several pooling methods, such as MaxPooling and AveragePooling. MaxPooling, which is widely used, can reduce the size of the convolutional kernel while retaining corresponding features. Therefore, it is mainly used for dimension reduction. The convolutional layer and the pooling layer are usually used together for feature extraction from input images. After multiple convolutions, the highly abstracted features of the full connection layer are integrated and can be normalized to output a pro¬bability for each classification. Then, the classifier can be classified according to the probability obtained by the full connection. In general, rock-bottom convolutions can describe objects such as lines and textures.
High-level convolutions represent detailed features. High-level features are obtained from low-level combinations. With sharing weight and no pressure to process high dimensional data, CNN can extract features automatically and perform exceptionally on classification and prediction. However, CNN employs a gradient descent algorithm, which often generates the local minimum and overfits. The pooling layer also loses much of valuable information.
Before starting the training, CNN needs to set some super parameters, such as the number and size of filters, the pooling step size of the pooling layer, the zero filling amounts, the batch size, and the learning rate. Once the super parameters are set, they do not change during the training. Training images can be input into CNN for training in batches. After the training, another new picture is input into CNN. And then, the network performs the forward propagation process again and calculates the probability of each image belonging to each category. The training process of CNN is shown in Fig. 4.
The training process of CNN.
As shown from Fig. 2, the idea of CNN originated from BP neural network. BP neural network is a multi-layer feedforward neural network trained according to the algorithm of error backward propagation. It has strong nonlinear mapping ability and flexible network structure. However, CNN has the ability of representational learning and can classify input information by translation and invariance according to its hierarchical structure.
4. Applications of Deep Learning in Agriculture
This section describes the survey papers related with applications of deep learning in agriculture, and Table 2 summarizes the relevant papers.
4.1. Plant Domain
With the development of agricultural modernization, the area of large-scale cultivation becomes increasing. DL has a wide range of applications in the planting of agriculture, such as the detection of plant diseases, species classification, and prediction of crop yield.
In agricultural production, especially the diseases of crop need to be detected for improved pro-ductivity. There are many types of plant species to be inspected, and so are types of disease species. If we rely on professionals to visually observe the disease situation of crops in the planting area, it requires huge demand for human services for control, which is inefficient and imprecise. Therefore, automated computer vision technologies are desired to help solve the problem of disease identification in agricultural production. There are several works on DL applying to crop disease classification or detection. The work by Ha et al. [42] proposed a highly accurate system to detect radish disease (Fusarium wilt). The radish was classified into diseased and healthy through the deep convolutional neural network (DCNN). The work by Ma et al. [43] developed a DCNN to recognize cucumber four types of cucumber diseases. Compared to conventional methods (e.g., RF, SVM, and AlexNet), DCNN can detect better cucumber diseases with 93.41% of accuracy. Similar to the research [43], Lu et al. [44] came up with CNNs to identify ten types of rice diseases with 95.48% of accuracy, which demonstrated the superiority of CNN-based models to DCNN in identifying rice diseases. The work by Liu et al. [45] presented a novel AlexNet-based model to detect four types of common apple leaf diseases. The approach demonstrated 97.6% and improved the robustness of the CNN model in experiment. Considering the food security issues, Mohanty et al. [46] proposed to identify 26 types of diseases and 14 crop species using the CNN model. The model demonstrated an excellent performance, which proved itself was feasible and robust for detection diseases. The work by Tran et al. [47] presented a system for monitoring the growth pro¬cess and increasing tomato production. It classified nutrient deficiencies and pathology during growth. Based on the output of the system, agriculture experts gauge corresponding measures to resolve symptoms. The work by Fuentes et al. [48] used DL three meta-architectures, faster region-based convolutional neural network (Faster R-CNN), region-based fully convolutional network (R-FCN), and single shot multibox detector (SDD). The model in the paper combined each of them with feature extractor, VGG and ResNet to detect plant diseases and pests. The work showed that the developed models can effectively detect nine types of diseases and pests in complex surrounding. The work by Wang et al. [49] diagnosed disease severity by training fine-tuned CNN with transfer learning using the PlantVillage dataset, which explained that the best model produced 90.4% accuracy.
Crop classification and identification are the critical initial stages of the agricultural monitoring system. Precise identification of various crop types not only allows an accurate estimation for crop planting area, structure, and spatial distribution but also provides the input parameters of the estimation model for crop yield. Zhong et al. [50] presented a classification framework for identifying crop growth patterns and crop types using DL applied to time-series remotely sensed data based on Conv1-D. Their work showed that the framework was effective in representing the time series of multi-temporal classification tasks. Another study by Milioto et al. [51] presented a system to detect and classify sugar beets and weeds with outstanding performance. The work by Ghazi et al. [52] combined transfer learning and popular CNN architectures, including VGGNet, AlexNet, and GoogLeNet to recognize plant types. They analyzed the parameters of the networks and adjusted them to improve performance. Their model placed the third in PlanCLEF2016. The work by Zhu et al. [53] used an improved inception V2 architecture to identify plant species. Through experiment with real scenes, it was proved that the proposed method had accuracy superior to FasterRCNN in identifying leaf species in a complex environment. In the last one study, to boost fruit production and quality, the work by Dias et al. [54] developed a robust system to recognize apple flowers using CNN.
Prediction of crop yield that can predict production in advance before harvest belongs to another area of study in planting. It provides forecast data based on region, crop, and multiple forecast surveys at different growth stages. To observe the growth of apple at every stage, Tian et al. [55] put forward a YOLOV3-dense model to detect apple growth and estimate yield using data augment technique to avoid overfitting. The orchard in their study involved undulating lighting, complex backgrounds, overlapping of fruits. Their approach was concluded as valid for real-time application in apple orchards. The work by Rahnemoonfar and Sheppard [56] used an improved Inception-ResNet model with accuracy for esti¬mating fruit yield in terms of the number of fruits. The model was efficient even with complex condition on fruits.
4.2 Animal Domain
As the concern on animals grows, DL technologies have been adopted in the animal domain for monitoring and improving animal breeding environment and the quality of meat products. The study on DL-based face recognition and behavior analysis of pigs and cows is very active in applied research. To develop an automatic recognition method of nursing interactions for animal farms by using DL techniques, Yang et al. [57] showed that the fully convolutional network combining spatial and temporal information was able to detect nursing behaviors, which was tremendous progress in identifying nursing behaviors in pig farm. The study by Qiao et al. [58] presented a Mask R-CNN architecture to settle cattle contour extraction and instance segmentation in a sophisticated feedlot surrounding. The method was trained and tested on the challenging dataset. The study by Kumar et al. [59] used DL techniques based on nose pattern characteristic to identify cattle to address the loss or exchange of animals and inaccurate insurance claim. Inspired by the work of face recognition, the work by Hansen et al. [60] proposed a CNN-based model to recognize pigs. In order to predict sheep commercial value, the work by Jwade et al. [61] built an automatic system to recognize sheep types in a sheep environment and reached 95.8% accuracy. The work by Tian et al. [62] proposed counting CNN to deal with the pig amounts and got 1.67 MAE per image.
4.3 Land Cover
Land cover change is an active area of research in global change. Land cover changes affect not only the natural basis of human survival and development, such as climate, soil, vegetation, water resources, and biodiversity but also the structure and function of the Earth’s biochemical circle as well as the energy and material circulation of the Earth system. A fundamental task in land cover change is cover classification. Kussul et al. [63] presented a multi-level DL technique that classified crop types and land cover from Landsat-8 and Sentinel-1A RS satellite imagery with nineteen multitemporal scenes. The work by Gaetano et al. [64] proposed a two-branch end-to-end model called MultiResoLCC. The model extracted characteristics of land covers and classified land covers by combing their attributes at the PAN resolution. The work by Scott et al. [65] trained a DCNNs model and used transfer learning and data augmentation to classify land covers for remote sensing imagery. The work by Xing et al. [66] used improved architectures, VGG16, ResNet-50, and AlexNet to validate land cover, and the results showed that the proposed method was effective with accuracy 83.80%. The work by Mahdianpari et al. [67] presented a survey of DL tools for classification of wetland classes and checked seven power of deep networks using multispectral remote sensing imagery.
4.4 Other Domains
The development of smart agriculture inevitably requires automated machines. To operate it safely without supervision, it should have the function of detecting and avoiding obstacles. The work by Christiansen et al. [68] detected unusual surrounding areas or unknown target types with distant and occlusion targets using DeepAnomaly, which combined DL algorithms. Compared to Faster R-CNN and most CNN models, DeepAnomaly had better performance and accuracy and requires less computation and fewer parameters for image processing, which was suitable for real-time systems. In contrast to [68], the work by Steen et al. [69] can detect an obstacle with high accuracy in the field of row crops and grass mowing. However, it cannot recognize people and other distant objects. The work by Khan et al. [70] used popular DL networks to estimate vegetation index from RGB images. They used a modified AlexNet deep CNN and Caffe as the base framework for implementation. The work by Kaneda et al. [71] presented a novel prediction system for plant water stress to reproduce tomato cultivation. The word by Song et al. [72] combined DBN and MCA to predict soil moisture in the Zhangye oasis, Northwest China. The work by Wang et al. [73] presented used CNN, ResNet, and modified architecture ResNeXt to examine lousy blueberries. The work by Mandeep et al. [74] employed H2O model to estimate evapotranspiration in Northern India and got a better performance than four learning methods, including DL, generalized linear model (GLM), random forest (RF), and gradient boosting machine (GBM).
Applications of deep learning in agriculture
5. Techniques of Deep Learning in Agriculture
The surveyed papers used numerous DL techniques to address their concerned issues. CNN was adopted most as the backbone [42-44,54,60,62-64,68,70], especially AlexNet and VGG. The number of the applied methods was based on their improvements [45,49,69,70], and a few works used the combination of CNN and other approaches [54,62,68,71,72] to achieve better results through experiment. Table 3 classifies the papers into several groups according to DL techniques.
Caffe is a clear and efficient framework and is widely used due to its convenience of expression, high speed, and openness in DL. Caffe was employed in 80% of the surveyed works, including [42,52,54,65,69,70], followed by TensorFlow [56,64,65], and MATLAB [68] is also used commonly. Table 4 shows the classification of the papers according to the frameworks.
Another concerned step is data preprocessing. The process of data preprocessing includes data cleaning, data conversion, and dimensionality reduction. The data cleaning technology is mainly used to ensure the integrity of the specific characteristic of data. The data transformation is to meet the requirements of the DL model. The data conversion has a role of converting data from one format or structure to another process. The dimension reduction is to remove irrelevant and redundant variables, reduce the complexity of analysis and generation model, and improve the modeling efficiency. The most common preprocessing methods is image resizing, containing image segmentation, scaling, and normalization. In the paper under study, each image was resized to particular size, such as 256×256 [49], 32×32 [73], 200×200 [42].
The DL model with relatively complex architecture is generally composed of multi-layer nonlinear learners. The data to be analyzed are derived from natural environment. In order to make the DL model have better generalization performance, it is necessary to increase the training sample size as much as possible. The most widely used data augmentation techniques include random image rotation, cropping, translation, horizontal and vertical inversion, etc. Data augmentation was used to improve the model performance in [43,54,55,58,60,65,73]. Another technique to avoid overfitting is dropout which resets the activation values of randomly selected neurons in training to zero. This is a very efficient way of performing model averaging with neural networks [50,51].
In order to evaluate the DL effect, accuracy [46,50-1,55,64,73], recall R [54], root mean square error (RMSE) [70,72,74], F1 value [46,55,68,73], and other evaluation indexes were adopted as shown in Table 1. From these analysis results, we can obtain that the DL-based methods are superior to other imple¬mentation mechanisms. DL in the fields of plant disease and insect pests detection, plant identification or classification has shown outstanding performance in the aspects of the identification accuracy, the fast identification speed, the strong robustness, and the improved generalization. Especially, DL showed more than 95% identification accuracy.
Transfer learning is about transferring trained model with its parameters to another model for the reuse of model. The purpose of transfer learning is to address the difficulty in data acquisition. Many researchers [49,52,54,65] proposed the incorporation of transfer learning techniques.
In the surveyed papers, the Learning rate is an essential hyperparameter in supervised learning and DL. It determines when the objective function should converge to the local minimum. Different values of learning rate have been designed, such as 0.01 [42] and 0.001 [43]. The choice of an appropriate learning rate is vital for the objective function to converge to the local minimum value in the proper time.
6. Discussion
6.1 Advantages/Disadvantages of Deep Learning
Now, the manual feature finding is still not an easy task. Traditional methods of feature extraction need the significant human. However, DL not only can improve performance in classification and detection, as shown in Table 2, but also can reduce efforts in feature research. Besides, to deal with real-world issues, DL may stimulate more databases to train networks [42,47,48,50-62,69-74]. Meanwhile, DL also has good generalization performance [46,50,51,56,61,62,65,67,68,70,72]. However, DL cannot estimate the law of data without bias. Therefore, to achieve higher accuracy, much data support is needed. Although data augmentation techniques mentioned in Section 5 can increase the size of the dataset, actually a significant number of images are needed. Another notable disadvantage is data annotation that requires expertise to accurately annotate to improve performance. In some areas, experts or labeling volunteers are limitation. Moreover, the process of data training is time-consuming in DL, especially when the input image size is large.
6.2 Future of Deep Learning in Agriculture
In this paper, the applications of DL in agriculture are listed with classification crop types, detection diseases, detection weeds, counting fruits, prediction yield, classification land cover, estimation water stress, and others. From the above analysis, we can observe that CNN has better performance in terms of precision. When DL-based methods are compared with other techniques in literatures for performance, the premise is to have the same experimental environment. However, in fact, this is very difficult because each paper employed different datasets, techniques, models, and metrics. It is observed that DL outperforms traditional methods such as ANN, SVM, RF, and others. Automatic feature extraction using DL models are more efficient than conventional feature extraction. To improve performance in classification and prediction, more techniques are being adopted to solve practical agriculture problems in the future. Long short-term memory (LSTM) [75] and RNN models [76] have the function of mining time dimensions and memory. Thus, they can be used to estimate plant and animal growth based on previously recorded data, assess fruit yield or water needs. Two models can also be applied to the environment, for instance, predicting climate change and phenomena, etc. Using infrared thermal imaging and hyperspectral imaging technologies [77] to provide data for early detection of crop diseases is the development direction of early detection of crop diseases.
7. Conclusion
In this paper, we have surveyed the development of deep neural-based work efforts in the agriculture domain in the last 5 years. We have analyzed 32 works on the applications of deep learning and the technical details of their implementation. Each work was compared with existing techniques for performance. It is found that deep learning was better in performance than other technologies. Moreover, with the advances in computer hardware, we think that deep learning will receive more attention and broader applications in future research. This paper aims to encourage more researchers to study deep learning to settle agricultural issues such as recognition, classification or prediction, relevant image analysis, and data analysis, or more general computer vision tasks.
Acknowledgement
This work was supported by the Ministry of Education of the Republic of Korea (No. 2019R 1I1A3A01060826).