Dong-Gun Lee* and Yeong-Seok Seo*
Systematic Review of Bug Report Processing Techniques to Improve Software Management Performance
Abstract: Bug report processing is a key element of bug fixing in modern software maintenance. Bug reports are not processed immediately after submission and involve several processes such as bug report deduplication and bug report triage before bug fixing is initiated; however, this method of bug fixing is very inefficient because all these processes are performed manually. Software engineers have persistently highlighted the need to automate these processes, and as a result, many automation techniques have been proposed for bug report processing; however, the accuracy of the existing methods is not satisfactory. Therefore, this study focuses on surveying to improve the accuracy of existing techniques for bug report processing. Reviews of each method proposed in this study consist of a description, used techniques, experiments, and comparison results. The results of this study indicate that research in the field of bug deduplication still lacks and therefore requires numerous studies that integrate clustering and natural language processing. This study further indicates that although all studies in the field of triage are based on machine learning, results of studies on deep learning are still insufficient.
Keywords: Bug Report , Clustering , Duplication Detection , Information Retrieval , Machine Learning , Priority , Severity , Software Developer Assignment , Software Management , Triage
With the constantly increasing complexity of bug report processing because of modern developments, developers are unable to avoid bugs. A bug simply means any unexpected, result and they significantly reduce the quality of software if unfixed. Thus, bug fixing is an extremely important task for developers.
In a world of modern developments, developers primarily address bugs through bug reports. A bug report is a description of what a quality assurance team or users think is a discrepancy between the actual and the promised outcomes of a software. Addressing bug reports is indispensable in software maintenance, and thus, software engineers expect a bug tracking system to be efficient [1,2] .
Bug report processing involves the processes from submitting the bug report [2-5], to fixing the bug. Fig. 1 shows how bugs are addressed at each stage of processing . As seen from Fig. 1, many stages of processing are involved before the “Fixed” phase is reached. “Assigned” stage involves the most important task of allocation of bug reports to appropriate developers, and thus, requires additional effort than other stages [7-9]. There are two main reasons for this.
First, a duplicate bug report is a report on bugs that have already been solved. The amount of duplicate bug reports/whole bug reports is up to 30% in Firefox . Thus, reassigning the already solved bug reports to developers is a waste of efficiency. Therefore, it is desirable to bundle these duplicate bug reports as one set. Generally, automatic duplicate bug report identification techniques do not work perfectly, and therefore, bug report identification is performed manually.
Second, a bug report triage is the classification of bug reports by their meta field, such as priority or severity. Numerous bug reports are submitted daily and it is impossible for developers to process all of them. Thus, developers need to first address the bugs that are fatal to systems or make users uncomfortable. This indicates that a bug report triage is essential for software maintenance. Because the duplicate bug report identification process is included in bug report triage, bug report triage is also performed manually. In the case of Eclipse, two man-hours are required every day . To compensate for this, many triage automation techniques have been proposed, but they do not show satisfactory accuracy.
In this study, we investigate the bug report processing techniques based on three factors: utilized techniques, experiment target, and comparison with two fields (duplicate bug report identification and triage accuracy improvement) to improve the software maintenance performance. In the case of duplicate bug report identification, we focus on topic modeling, natural language processing, information retrieval, and clustering based on similarity. Most deduplication studies tend to utilize Mozilla (or a software developed by Mozilla, such as Firefox), Eclipse, and Open Office to configure their testing set. A few of them have proposed methods that integrate 2–3 techniques. In the case of improving bug report triage accuracy, we focus on classification techniques such as k-nearest neighbors (KNN), naïve Bayes (NB), and support vector machine (SVM).
The rest of this paper is organized as follows: Section 2 introduces the form of bug reports in a modern development environment. Section 3 provides studies identifying duplicate bug reports. Section 4 presents studies for improving bug report triage accuracy similar to their meta field such as priority or severity. Finally, Section 5 concludes the paper and discusses some possible future issues.
2. Bug Report Features in Modern Software Development Environment
2.1 Form of Bug Reports
A bug report is a document intended for users or a Q&A team to communicate bugs that cause program failures to developers in a specific format. Figs. 2–4 show reports of a bug in JIRA , Mantis , and Bugzilla . The bug report of Fig. 2 is for Block Chain. It was submitted by the reporter using “Mekia Edwards” as the identifier. It can be seen that it is classified as having a “Highest” priority. It is an Unresolved status. The bug report of Fig. 3 is for the MantisBT project. It was submitted by the reporter using “shashi1” as an identifier. It can be seen that it is classified as having an “emergency” priority and a severity of “important”. It is the Opened status. The bug report of Fig. 4 is for Android. It was submitted by the reporter using “philipp” as an identifier.
The metadata includes the following components:
1) Product: Describes the environment in which a bug occurs.
2) Component: Classifies the items based on their closeness to meta fields such as priority and severity. These metadata can be used as criteria for classifying bug reports.
Priority is the classification of a bug over others based on its importance, and severity indicates the critical nature of a bug; thus, priority and severity are the most important tasks in bug classification, which is simply called “Triage”. The accuracy of triage is directly related to the quality of the software used for classification. In the case of Bugzilla, priority is in a scale of p1 to p5, with p1 being the highest priority. In the same case, severity is classified as Blocked (fatal bug rendering development or patching of a software impossible), Critical (fatal bug affecting program execution), Major (important functional flaw), Normal (common error), Minor (not an important functional flaw, or is an error can be easily fixed), Trivial (typographical error), and Enhancement (function and performance improvements).
2.2 Bug-Handling Process
Once a bug appears, it is processed according to the bug report processing cycle, as presented in Fig. 1. A bug is addressed at each stage as follows:
· New: A new bug is first logged or found. A new bug is usually found by a user or the QA team. It is then moved to the “Assigned” stage where it will be classified to determine the type of handling.
· Assigned: After a tester logs a bug, this stage executes the triage. Triaged bugs are classified by their priority and/or severity. If the triage determines a bug to be rejected, they are moved to the “Rejected” phase. If the triage determines a bug, it is assigned to the appropriate developer based on its priority and severity.
· Open: This stage analyzes and corrects the bugs assigned to developers. When the bug is fixed, it is moved to the “Fixed” stage.
· Fixed: In this stage, the developer sends the bug to the test team after fixing it. Based on the results of the test, the bug may be moved to the “Reopen” or “Retest” stages.
· Reopen: In this stage, if a bug fixed by a developer still exists, it is sent back to the “Assigned” stage and the succeeding stages follow.
· Rejected: If the developer feels the bug is not genuine, the bug is rejected. Then, the status of the bug is changed to “Rejected”.
· Deferred: A bug assigned the “Deferred” status implies that it is expected to be fixed in next releases. There are several reasons for assigning the “Deferred” status to a bug. A few reasons are priority of the bug being low, lack of time for release, and no major effect by the bug on the software.
· Duplicate: If a bug is reported twice or if two bugs are found to cause the same problem, then the status of one of them is changed to “Duplicate”.
· Retest: In this stage, the tester retests the modified code provided by the developer to check if the defect has been fixed.
· Verified: The tester retests the bug after it is fixed by the developer. If this bug is no longer detected in the software, then the status of the bug changed to “Verified”.
· Closed: If the tester feels that the bug no longer exists in the software, the status of the bug is changed to “Closed”. This state means that the bug has been fixed, tested, and approved.
2.3 Issues in Processing Previous Bug Reports
Typical bug report processing systems generally focus on the identification of the types of bug reports. Through this identification, they attempt to automatically classify the types of bug reports and assign software developers responsible for resolving the reported issues. Thus, the most important point is to be able to accurately identify the type of the bug report. However, there are several issues for accurate type identification. For example, a bug report could be classified as “Trivial” because fonts are not just rendered in the website. Conversely, a bug report could be classified as “Critical” because a system crash occurred. Even though this classification scheme is not entirely incorrect, it could cause wrong classification of the bug reports. Suppose there is a font which is not rendered correctly. If there is a typo in this font, or the font is represented in the pop-up pages that would disappear in a couple of days, these bugs could be trivial. In contrast, the fonts in the graphical user interface (GUI), which are always displayed on the web page or in fonts that contain important content, can lead to a misunderstanding by the user. In these cases, these bugs could be critical types in spite of the problem being just the font. In practice, because software developers need advanced techniques to improve bug report processing efficiency, many studies are focusing on reducing duplicate bug reports and improving bug triage performance, with the correct identification techniques for types of bug reports.
3. Techniques for Reducing Duplicate Bug Reports
When a bug occurs, many users report the bug to developers. Eventually, developers receive a lot of bug reports on the same bug. Duplicate bug reports cause boredom to developers and decrease their work efficiency. Thus, duplicate bug reports can be grouped as one set and processed at a time. There are many techniques for reducing the number of duplicate bug reports. In this study, we introduce novel methods for deduplication in three important areas: natural language processing (NLP), information retrieval (IR), and similarity-based classification.
3.1 Natural Language Processing
A common way to identify duplicate bug reports is to use NLP [15,16]. Because bug reports are described in natural language used by humans, different types of words are generally used. Therefore, for greater accuracy, methods (such as stemming, lemmatization, and stopword processing) are used [17,18]. Among all the NLP techniques, topic modeling is mainly used, which processes words using summary or text from bug reports and selects the criterion words. The topic selected in this process represents a few features of the bug report and defines the bug report that has same topic as a duplicate bug report [19,20].
The most frequently used technique for topic modeling is latent Dirichlet allocation (LDA). LDA addresses the drawbacks of the probability model probabilistic latent semantic analysis (PLSA). LDA is similar to PLSA in that it obtains the probability of words in a document, but the distribution of topics is assumed to follow the Dirichlet distribution .
Baek et al.  identified duplicate bug reports using the LDA, NB , and NB polynomial . They compared the obtained results with the conventional machine learning method using Eclipse bug reports. As a result, duplicate bug reports were identified with an accuracy of approximately 80%. In addition, they indicated significant differences in using statistical methods.
Zou et al.  proposed the LDA and N-gram (LNG) technique, which consists of two parts. One is topic modeling with existing LDA method and the other is a linearly coupled weight-based N-gram similarity. A new measure, exact-accuracy (EA) rate, was introduced to verify redundancy. As a result of verification using approximately 230,000 Eclipse bug reports, the LNG technique was found to have improved recall rate, precision rate, and EA rate compared with the existing techniques in detecting bug report duplication. In particular, the recall rate was improved by 2.96% to 10.53% compared with DTBM , which is a state-of-the-art approach used in detecting bug report duplication.
3.2 Information Retrieval
Identification of duplicate bug reports based on information retrieval uses the meta data in bug reports. The bug report contains various meta data such as bug type, operation system, priority, and severity, which help identify similar bug reports [25-27]. However, because even bug reports in the same environment can sometimes have different solutions, duplicate bug report identification techniques based on information retrieval are often used in conjunction with other methods.
Alipour et al.  focused on contextual information for bug report deduplication. They extracted the context implied by bug reports using BM25F techniques, a score algorithm used in IR fields, in pre-built software dictionaries (word lists). Their study also found that the extracted context from the bug report has non-functional requirements—related to software’s quality closely (not functionality)—as well as the subject of the bug report. Alipour et al.  constructed the words list using Android layered architectural words , software non-functional requirements words , Android topic words using LDA , Android topic words using labeled-LDA , and random words from English dictionary and evaluated using 37,236 Android reports. The result was a more than 11.55% improvement in bug report redundancy compared with the result of Sun et al. .
Aggarwal et al.  supplemented the study of Alipour et al.  using a method called software literature context. This method uses words extracted from software engineering literature. This study provided a beneficial result using which the manual efforts on deduplication can be reduced, compared with the study by Alipour et al.  that extracts words from the bug report, and whose result was a slight loss of accuracy. The authors  further validated documents from Eclipse, Mozilla, Open Office, and Android bug reports used by Alipour et al. . Their method was also shown to be available for general cases compared with Alipour et al.’s method.
Sun et al.  proposed REP, a technique using similarity between two different bug reports dataset. REP is an extension of BM25F and uses non-textual fields (component, product, and version) as well as text content such as summary and contents of bug reports. Using these measures, the bug reports were identified and evaluated using Mozilla, Eclipse, and Open Office’s bug reports, resulting in a 10%–27% increase in recall rate compared with the previous version, and a 17%–23% improvement in mean average precision.
Sun et al.  proposed a discriminative model for searching similar bug reports in bug tracking systems. The model uses information retrieval techniques to determine similarity between two bug reports based on 54 features. They applied the model to three large software bug reports from Firefox, Eclipse, and Open Office, demonstrating a relative improvement of 17%–31%, 22%–26%, and 35%–43% over the-state-of-art techniques.
The method of identifying duplicate bug reports based on similarity is highly reliable because each bug report is directly compared. This technique can also be used to classify bug reports using a scale called similarity and is often used together with clustering because of their high affinity .
Hiew  introduced an approach based on topical detection and tracking techniques that considered clustering in news articles. They used this approach for each Firefox, Eclipse, Fedora, and Apache projects to achieve a 29% accuracy and a 50% recall rate as the best result for Firefox.
Gopalan and Krishna  proposed a clustering-based technique to identify redundancy in a set of large bug reports. This technique maintains a low false positive, i.e., the rate at which normal bug reports are considered redundant. Eclipse, Mozilla, and Open Office were used for evaluation of this technique.
3.4 Hybrid Techniques
Achieve satisfactory identification accuracy with one technique is difficult. Therefore, many researchers have begun to use more than one technique in combination. In particular, information retrieval techniques are preferred to be supplemented mainly by the LDA method, which is a topical modeling technique of machine learning.
Nguyen et al.  proposed DBTM, a technique that has both the merits of topical-based features and IR-based features. DBTM used BM25F and T-Model, an extension of the topic modeling technique LDA. Evaluation of the DTBM technique using bug reports from Eclipse, Open Office, and Mozilla indicated increased duplicate bug report identification by approximately 20% compared with the Relational Topic Model (RTM)  and REP .
Lin et al.  proposed SVM-SBCTC method, an SVM that considers the semantic correlation of text based on SVM discriminative scheme (SVM-54) . SVM-SBCTC applied all the correlation between the NLP (Word2vec), information retrieval (BM25), and clustering-based features. They verified the study on three large open source projects: Apache, ArgoUML, and SVN. Compared with the SVM-54 scheme, the detection performance of SVM-SBCTC improved 2.79%–28.97% in the top-5 recall rates on three projects.
Tian et al.  extended the study of Jalbert and Weimer . Duplicate bug reports were identified using relative similarity. It includes text similarity, surface features, and clustering. They considered several factors for this extension. The first was to use the extension of BM25 rather than using the term appearance count as similarity criteria. The information retrieval community widely use BM25, and it is more familiar than the term frequency (TF)-based similarity measure. BM25 is also known as the most accurate measure method of technical text searches . Second, the “product” meta field of different bug reports was applied. Bug reports classified as different “product” are more likely to not be duplicated. Third, it is to build a hybrid function to examine the top N of the most similar bug reports, rather than using the best out of the most similar bug reports. Using the Mozilla project, Tian et al.  has demonstrated better performance than Jalbert and Weimer , improving the true positives (from 8% to 24%) and maintaining the false positives low (at 9%). Calculating the harmonic mean of the true positive rate and true negative rate improved the accuracy of the previous approach (from 14.8% to 38.6%).
3.5 Comparative Analysis
Fig. 5 shows the distribution of bug report deduplication studies based on the techniques. Table 1 shows the bug report deduplication study introduced in this study.
Most studies use the same experiment target such as a Mozilla, Eclipse, Firefox (a part of Mozilla), and Open Office project. All studies, except that of Hiew , are improvements over prior studies or have become a comparison target. There is a lack of research on studies of combined use of clustering and NLP techniques. Thus, such studies could be one of the open challenges in this field. In particular, most
clustering-based studies are compared with the representative work (e.g., ) or are proposed without any comparative analysis with existing techniques. Thus, a variety of verification schemes need to be considered. In addition to the technical aspects, bug report processing processes or frameworks could also be a research issue. There is also a need for studies on methodologies that effectively build and utilize these frameworks across the enterprise, using the various existing above-mentioned techniques.
4. Techniques for Improving Bug Triage Performance
It is generally impossible for a limited number of developers to process all the bug reports submitted every day, which are usually great in number. Developers usually fix bugs that are critical to the operation of the software or those that are more prioritized; this classification of bug reports by priority or severity is referred to as triage. Techniques to improve the performance of bug reports triage are mainly performed through machine learning.
4.1 Classification Algorithm
Numerous bug tracking systems use classification techniques, such as SVM [38,39] and KNN , based on machine learning. However, they are rarely used alone, and they are mostly integrated with a classification technique based on NB [41-44] classification or heterogeneity.
Anvik and Murphy  proposed a technique to create recommenders that assist with a variety of decisions aimed at streamlining the development process using machine learning techniques such as NB, EM [46,47], SVM, C4.5 (decision trees), nearest neighbor, and conjunctive rules. This technique extracts the normalized TF using the title and description. To evaluate this technique, they used bug reports from the Eclipse and Firefox projects. This technique showed 60% precision and 3% recall in Eclipse and 51% precision and 24% recall in Firefox.
Bhattacharya and Neamtiu  proposed a method using refined classification to improve bug report triage accuracy and reduce the length of the tossing paths. It extracts features such as TF-IDF or bag-ofwords (BOW) using the NB and a tossing graph. It uses information such as bug report meta data of various types, title, description, keywords, product, component, and the last developer activities. They used Eclipse and Mozilla's bug reports to evaluate this technique and achieved 77.43% accuracy for Eclipse and 77.87% accuracy for Mozilla.
Kanwal and Maqbool  defined the features of bug reports and proposed a prioritized recommender based on the classifier. They compared the SVM with the NB algorithm, the most commonly used classification algorithm, to indicate the differences in performance by the classifier used in this technique. As a result of evaluating Eclipse projects, the SVM was found to be superior to the NB algorithm for text features, whereas for categorical features, the performance of NB was found to be better than that of the SVM. The highest accuracy is achieved with SVM when categorical and text features are combined for training.
Peng et al.  proposed a method to build a developer recommendation system to assign bugs based on relevant search scheme. This method consists of the index for bugs and searches for new bugs from the index. Then, it analyzes relevant bug and recommends it to the performance developer. They evaluated this method in Mozilla and Eclipse environments, and showed that it was better than machine learning algorithms such as NB and SVM.
Xuan et al.  proposed techniques using SVM, NB for various factors such as developer prioritization and severity identification. This technique extracts TF-IDF and developer priorities using meta data such as titles and descriptions. The evaluation was conducted in the Eclipse and Mozilla bug reports and the technique was found to be superior to SVM or NB.
Xuan et al.  addressed the bug triage data reduction and improving the quality of bug data by reducing the scale. This method combines instance two selections: instance and feature. The method also reduces of bug- and word-dimensions at the same time by using the NB algorithm. This method also extracted TF from bug reports using title and description as meta data. It was evaluated using the bug reports of Eclipse and Mozilla. As a result, it was found to have an improved accuracy compared with the SVM, KNN, and NB.
Yang and Lee  proposed a method to predict the severity of a newly submitted bug report. When a new bug report is submitted, it finds a similar topic and uses the bug report’s meta-fields to decrease the scope of the candidate bug report. It predicts the severity of the new bug report by training the extracted bug report with NB multinomial technique. They indicated that the method was more effective in predicting bug severity than NB, NB Multinomial, and KNN in Eclipse and Mozilla open-source projects. Its performance depends on the quality of the bug report. Generally, it is difficult to predict the bug reports as the Blocker label, because there are small number of bug reports identified as the Blocker label and many factors to be considered to classify the bug reports presented by various programming languages . However, this study showed good predictions for the bug reports with Blocker severity level.
Yang et al.  proposed a new approach to predict the severity of bug reports based on emotional similarity. This approach uses a unigram model to identify emotional words, and searches for bug reports with emotional words using Kullback–Leibler divergence. They proposed emotion simplicity (ES)- Multinomial, a new algorithm that replaces the NB Multinomial. To compare ES-Multinomial with the existing NB Multinomial, they applied ES-Multinomial to open sources from Eclipse, GNU, JBoss, Mozilla, and WireShark and demonstrated that it was more efficient than the NB Multinomial.
Zhang et al.  proposed a technique called KNN search and heterogeneous proximity (KSAP) that uses the heterogeneous network of bug repository and historical bug reports to improve auto-allocation of bug reports. The KSAP is a two-step process. The first step is to search for similar historically fixed bug reports, and the second is to rank the developers who contributed to similar bug reports by heterogeneous proximity. They used projects from Mozilla, Eclipse, Apache Ant, and Apache Tomcat6 for evaluation, and there was a 7.5%–32.25% recall improvement compared with ML-KNN [56,57], DREX , DRETOM , Bugzie , DevRec , and developer prioritization (DP) method . They showed that the KSAP was better than other modern techniques when the developer works less. Adjusting the number of similar historically fixed bug reports (K) and developers (Qs) maintains the superiority of KSAP.
Zhang et al.  proposed a severity prediction accuracy improvement method for automated techniques to replace manually assigned fixers. This method uses the REP  algorithm and KNN classification to find historical bug reports having the same features with input bugs. Then, it extracts the features to estimate the severity and recommend fixers. The method was applied to GCC, Open Office, Eclipse, NetBeans, and Mozilla’s open source for evaluation and improvements in precision, recall, and F-measure were demonstrated compared with DRETOM, DREX, and DevRec.
4.2 Specialized Algorithm
Artificial neural networks, also used for deep learning, are often used in bug report triage. Conventional classification models such as BOW [62-64] are also replaced by techniques such as CNN and RNN.
Mani et al.  proposed a bug report representation algorithm that learns syntax and meaning in an unsupervised way using DBRNN-A (an agent-based stay secondary network). They compared DBRNNA with cosine distance, NB, SVM, softmax classifier, and BOW model and showed that DBRNN-A provides a higher rank-10 average accuracy.
Most of the studies that use topic modeling and information retrieval techniques for triaging focus on the meta-data in bug reports. They use the NB Multinomial to supplement the technique or use information such as time and emotion to find ways to allocate better-performing bug reports.
Badashian et al.  proposed a new method to use Q&A community platforms such as GitHub, which is a source of developer expertise for bug triage. This method extracts keywords from bug reports using keywords (of metadata), project languages, tags from “GitHub” or “Stack overflow” as well as titles and descriptions. These keywords match social expertise. They evaluated this method using bug reports from 20 GitHub projects and showed an 89.43% accuracy.
Jonsson et al.  proposed a method using stacked generalization (SG)  as an ensemble learner to improve predictive accuracy in automatic bug allocation. SG is a state-of-the-art method that combines the output of various classifications used in various applications. One notable example is that SG-based solutions overwhelm competition in predicting movie ratings. In the field of software engineering, SG was applied in the prediction of the number of residual defects in the Black Box test  and detection of malware in smartphones . The titles and descriptions were used for TF extraction and approximately 35,000 industrial project bug reports were used for evaluation. The results showed an 89% percent accuracy.
Shokripour et al.  presented an ABA-time-TF-IDF method, an auto-assignment technique based on TF-IDF time metadata. A corpus is constructed using nouns, and specialized knowledge is identified and recommended to developers. Shokripour compared the ABA-time-TF-IDF with ABA-TF-IDF, NB, VSM, SUM, and SVM in Eclipse, NetBean, and ArgoUML projects. As a result, accuracy and mean reciprocal rank (MBR) improved by up to 11.8% and 8.94%.
Another study by Shokripour et al.  proposed a two-phased method using assignment recommendations based on the predicted location of the bug. It addressed several problems of the activity-based method. This method uses source code information as well as title and description, where meaningful words are extracted. They used bug reports from Eclipse and Firefox to achieve an 89.41% accuracy and 59.76% accuracy for the assessment. However, the number of bug reports is smaller than that of other studies.
Tamrawi et al.  proposed Bugzie that recommends bug reports to developers by building a fuzzyset based on words extracted from titles and descriptions. Bugzie greatly improved both accuracy and time efficiency compared with most conventional machine learning techniques such as NB, Bayesian Network, C4.5, SVM, incremental NB, and incremental Bayesian Network in the Eclipse bug report.
Wang et al.  proposed FixerCache, an unsupervised bug triage technique. FixerCache has overcome the problems of supervised classification based on the activation of its product components. FixerCache uses active developer caches to extract TF from the title and description of the bug. They evaluated FixerCache in a bug report from Eclipse and Mozilla, showing better accuracy than the SVM and NB.
Wen et al.  proposed the Configuration Bug Learner Uncovers Approved options (CoLUA), a twostep automation technique that integrates NLP, IR, and machine learning, to address communication problems between bug reporters and developers. As the first step, CoLUA selects functions in the text information of the bug report and applies machine learning techniques to create a triage model. The second step identifies the configuration options that are included in the labeled bug report. CoLUA was applied to open-source projects from Mozilla, Apache, and MySQL. As a result, the average F-measure for the ZeroR classifier was found to be 0.33, whereas the average F-measure for CoLUA is 0.73 for all three projects.
4.3 Comparative Analysis
Table 2 lists the bug report triage studies that are introduced in this study. Most studies use the same experiment target such as Mozilla, Eclipse, Firefox (a part of Mozilla), and Open Office project. In addition, the influence of the NB and SVM methods was very strong in bug triage, i.e., 13 out of 17 studies used the NB or SVM methods. Thus, in order to improve the performance of bug report triage, it is necessary to carry out studies using various techniques, such as clustering, deep learning, and graph theory. In particular, deep learning-based approaches could be used as a key option to provide potential synergy with existing techniques. In addition to the technical aspects, enterprise-wide frameworks for improving efficiency of the bug report triage must be studied.
This study systematically addressed the bug report processing techniques for improving software management. A number of studies were discussed based on identification of duplicate bug reports and triage bug reports. The studies were classified for the purpose of discussion, such as used techniques, experiment target, and compared target. The results indicated the future research direction of bug report processing techniques.
Software bugs are inevitable during the development and maintenance phases. Because it is specified and managed in the form of software bug reports, it is necessary to study more efficient bug report processing techniques. Through this survey study, in our viewpoint, there are further issues in the two mainstreams of bug report studies: reducing duplicate bug reports and improving bug triage performance. From a technical point of view, one of the core techniques is identifying the similarity between bug reports in this research field. Thus, it is necessary to study this technique in detail to build elaborate models for the two mainstreams issues mentioned above. In particular, deep learning-based and graph-based approaches [75-77] are in their early stages, and could be one option. There are still many deep learning algorithms and graph theories that can be applied to bug report processing systems. As more elaborate models are proposed in this field, the efficiency of the bug processing system can be increased, and consequently software development effort can be reduced. Thus, the researchers and practitioners must focus on progress of similarity studies between bug reports. From a research methodological point of view, in order to validate the superiority of the proposed techniques, it is necessary to analyze it with various existing techniques for efficient bug report processing. Instead of analyzing only the proposed technique itself, meaningful and practical results through comparative analysis with the various techniques proposed in the past showed be provided. Through such studies, it would be of great significance if software developers can be guided on which techniques should be used for reducing duplicate bug reports or improving bug triage performance in their software development environments.
He received the B.S. degree in computer engineering from Yeungnam University, Korea, in 2019. He is currently a M.S. student in the Department of Computer Engineering, Yeungnam University, Korea. His research interests include software engineering, open source software, software defect prediction, and edge computing.
He received the B.S. degree in computer science from Soongsil University, Korea, in 2006, and the M.S. and Ph.D. degrees in computer science from Korea Advanced Institute of Science and Technology (KAIST), Korea, in 2008 and 2012, respectively. From September 2012 to December 2013, he was a postdoctoral researcher in KAIST Institute for Information and Electronics. From January 2014 to August 2016, he was a senior researcher in Korea Testing Laboratory (KTL), Korea. He is currently an assistant professor in the Department of Computer Engineering, Yeungnam University, Korea. His research interests include software engineering, artificial intelligence, Internet of Things (IoT), and data mining.