One of the imperative properties of software is that it evolves with time. As the software adapts to the new environment, it moves away from its original design. Modification and enhancement of software make the code more complex; and, thus, lowers the quality of the software. All of these factors make the software maintenance phase costly and time-consuming. One of the most expensive activities performed during the d evelopment and evaluation phase of an application is the maintenance of the source code, since changes can certainly occur in software to remain updated, useful, and with high quality .
Bad smells develop in a software system when the design of the code component is incorrectly assumed or the solution by the developers is improperly designed . A code smell is an exterior symptom indicating a deeper issue in the system . They are generally not bugs and can be technically correct as they indicate some weakness in the design of the software that may lead to failure in the future. Bad smells in code can be detected and corrected using refactoring technique. According to Fowler , refactoring is a process that changes a software system to improve its internal structure without modifying the external behavior of the program code. It is a systematic method of minimizing the chances of introducing bugs by simply cleaning the code . With the help of refactoring, bad designs may be converted into well-structured code by reworking on them. In addition, refactoring can significantly improve some of the software’s external qualities such as reusability, maintainability, and readability . In the past, several surveys have been conducted in the field of software refactoring , code smells  and software metrics . To improve software quality, a detailed knowledge is required regarding the most frequently occurring code smells, most used refactoring technique and the software metrics that have a direct influence on them. These review studies conducted by Mens and Tourwe , Zhang et al. , and Xenos et al.  did not provide any holistic comprehensive view putting three perspectives together. In the current study, an attempt has been made to collectively review various studies from all the three perspectives (bad smell, refactoring, and software metrics) so as to provide a better insight to the developers. This study will help the users to gain better information on the three perspectives. Also, the tools used in the shortlisted studies are categorized based on their usage in software refactoring, code smells detection, and software metric calculation. This will further help the users to select the tools as per their requirement.
In the current study, 68 papers on the three perspectives and published between the years 2001 and 2019 were investigated. This current review aims to summarize, analyze and comprehend the studies based on the following aspects:
Various bad smells identified, analyzed and/or corrected.
Various refactoring techniques used to correct different kinds of code smell.
Framework/platform used to apply refactoring techniques.
Identification of various software metrics used to identify code smells.
Comparison of various refactoring techniques in terms of their effects on software metrics.
To achieve the target, 8 digital libraries were considerably explored, and 68 studies were identified and incorporated in this review to answer the Research Questions (RQs) that are discussed later in this article. The remaining article is structured as follows: Section 2 presents the review methodology. Section 3 discusses the planning phase of the systematic literature review (SLR) process. Section 4 describes the conducting phase that includes the formulation of selection criteria, removing duplicate studies, etc. Section 5 discusses the results and answers the RQs that were raised while assessing the studies. Section 6 provides the conclusion of the paper followed by future directions, which is provided in Section 7.
2. Review Methodology
The process involved in conducting the review is presented in this section. The SLR here identifies, assesses, and interprets all available research applicable to refactoring techniques and code smells. The review methodology adopted is illustrated in Fig. 1; this methodology is majorly divided into three stages: planning, conducting, and reporting. In the first stage, i.e., planning stage, the search databases were identified, the RQs were formed, and the relevant papers in the field of study were extracted. In the second stage, i.e., conducting stage, papers were shortlisted after removal of duplicate and irrelevant papers that did not conform to the inclusion criteria. The integration was carried out in the second stage on the basis of the information provided in each selected study. In the final stage, RQ’s were answered.
3. Planning Phase for Review
3.1 Search Process
The initial search began with exploring various online databases, electronic journals, and search engines. The search process was divided into two steps. In the first step, databases such as Scopus, Google Scholar, Science Direct, IEEE Xplore, and Springer were identified. Internet resources provided by Guru Gobind Singh Indraprastha University (GGSIPU) were also used. Appropriate search strings such as “code refactoring”, “bad smell”, “software metrices”, “refactoring for bad smells”, “refactoring techniques”, and “code smell detection” were used to extract relevant studies. The search string that was formed using the search terms is given below:
(SoftwareRefactoring OR CodeRefactoring OR RefactoringTechniques) AND (BadSmells or CodeSmells) AND (SoftwareMetrices OR MetricSuites OR Maintainability) OR (RefactoringTools OR BadSmell Tools).
Thousands of results were attained in the initial step using the above terms. In the second step, publications belonging to renowned journals, and conferences were selected. In addition, the search was restricted to the years between 2001 to 2019.
3.2 Formulation of Research Questions
A number of RQ’s were raised while assessing the shortlisted studies. RQs provide an insight into the past and current trends in the field of code refactoring. RQs were precisely formulated in order to maintain a proper flow of the study and to avoid deviation while going through the vast amount of information obtained from the selected papers. Table 1 provides the list of RQs for the current review.
4. Conducting Phase for Review
The following section includes the activities performed during the conducting phase of SLR. It presents the selection criteria used to filter out the irrelevant studies. In addition, a questionnaire was formed to score the selected studies as per the RQs mentioned in Table 1. Pie charts and line graphs were used to show the bifurcation of the shortlisted studies as per the year and the source of publication.
4.1 Inclusion/Exclusion Criteria (Removal of Extraneous Studies)
Studies were organized in an orderly fashion, and duplicate and irrelevant studies were dropped from the review procedure. Based on the RQs listed in Table 1 the inclusion criteria were set in order to choose those studies that included some content on some aspects of software refactoring or code smells. The papers that focused on refactoring techniques to remove bad smell, improving software quality with the help of refactoring, detection of bad smells, effect of code refactoring on software maintenance, use of software metrics to identify code smells, effect of bad smells on quality attributes, and prediction models that identify opportunities for refactoring, qualified the inclusion criteria.
Empirical studies that identified different refactoring techniques.
Empirical studies that identified/corrected code smells.
Empirical studies that made use of software metrics to indicate the presence of bad smells.
Empirical studies that proposed refactoring techniques to improve software quality.
Studies that did not focus on software refactoring or bad smells.
Studies that could not answer the RQs as specified in Table 1.
Studies that do not belong to the years between 2001 to 2019.
Studies that did not provide any experimental results.
4.2 Quality Assessment Criteria
A questionnaire was formulated to filter out irrelevant papers and provide a further refinement to assessment and selection of relevant studies. Initially, 106 studies were extracted on the basis of search strings using different search engines, out of which 76 studies were selected that qualified the inclusion criteria mentioned in Section 4.1. Table 2 presents the list of questions that were used as the checklist for quality assessment that scored the studies on a scale of 0 to 1.
Quality assessment questions
4.3 Data Extraction
Data that answered all the RQs were extracted in the conducting phase. A database consisting of different features such as publication year, name of the study, author’s name, publication source, objective of the study, results, tools used, techniques applied, etc., was prepared. Majorly, five attributes were selected that answered the RQs and, are listed in Table 3.
All the selected studies were analyzed from various perspectives, and an attempt was made to identify any association between the studies. The data were pictorially illustrated using pie charts, line graphs, bar chart, etc. The visualization technique helps the reader to absorb and interpret the data easily.
Quality assessment attributes
4.4 Distribution of Papers
The selected studies were divided into three categories depending upon their source of publication. The sources considered are journals, conferences, and others such as symposiums, book chapters, workshops, etc. The distribution of studies is demonstrated using a pie chart in Fig. 2. The majority of studies, i.e., 32 studies, belonged to journal publications, followed by 31 studies from conferences, and 5 studies from various other sources.
Distribution of papers as per the source of publication.
4.5 Distribution of Papers as per Year of Publication
The PhD dissertation titled “Program restructuring as an aid to software maintenance” of William Griswold published in 1991 was considered to be the first significant work in the field of refactoring. Since then, multiple studies have covered the concept of refactoring and bad smells and proposed various techniques in a similar field. The shortlisted studies lie between the years 2001 and 2019. Fig. 3 shows the distribution of studies on the basis of years.
Year-wise bifurcation of studies.
5. Reporting Phase
This section provides the results obtained from the SLR. Firstly, the quality assessment questions are presented along with the scores given to the selected studies. Then, the answers to the RQs are provided.
5.1 Quality Assessment Questions along with Their Scores
The quality assessment of selected studies was performed by team of 2 members that comprised 1 Assistant Professor and 1 Research Scholar from GGSIPU, Delhi, India. The selected studies were scored as follows:
1 mark for the complete qualification of the paper
0.5 mark for partial qualification.
0 mark for no qualification.
Each study was scored on a scale of 0 to 1, and the total score was calculated by marking them against all the quality assessment questions. The maximum and minimum marks obtained can be 12 and 0. After
Scores for quality assessment questions
calculating the final score, studies that scored below 8 were discarded; thus, 8 papers were not considered for the review process. Lastly, 68 studies qualified for the complete selection process and were chosen for the review. The complete list of scores obtained by the selected studies for the questions presented in Section 4.2, along with their paper references, is given in Table 4.
5.2 Reporting of Research Questions
The shortlisted studies were analyzed thoroughly on the basis of the raised RQs. Answers of the RQs helped us to view these studies from various perspectives. Solutions to the problem statement and implementation results were easily analyzed with the help of information collected while answering the RQs. Answers to the RQs and brief facts about them are presented in the following subsections.
RQ1. Various bad smells identified/corrected
Code smell is a feature of the program code that acts as a pointer to determine the opportunity for refactoring and its application on the code. Some of the code smells do not disturb the working of the software system and, thus, need not be removed. However, certain smells decrease the quality of software that further leads to an increase in cost of maintenance. To maintain the standards of the software and prevent failure of the system, harmful code smells must be detected and removed on a regular basis during software development activities. Refactoring is one way to remove the code smells. Smells can be identified manually or using several tools such as JDeodrant, JSpirit, etc. that are available in the market for smell detection and correction. Table 5 lists all the code smells detected along with their paper references and total count.
Feature envy, data class, and long method class are some of the most recognized bad smells that were detected in 22, 19, and 17 numbers of studies, respectively. “Data class” bad smell occurs if a class does not perform any function and contains only data or attributes. Such classes cannot operate by themselves on the data owned by them. “Feature envy” smell occurs in a method if it makes use of attributes (fields, methods) of some other class more than its own class. This indicates that the method is wrongly placed due to the low usage of its own data. “Long method” code smell occurs if the size of the method becomes too vast. Generally, LOC crossing the limit of 15 is considered in the category of long method bad smell. Apart from the list of bad smells mentioned in Table 5, several other code smells were identified in the shortlisted studies. Those smells have a count of 1 each and are not included in the table due to lack of space.
RQ1.1 Technical aspect of most detected code smells
1) Feature envy: It is a class-level code smell that is categorized under “couplers”. In general, it occurs when some data fields are moved to a particular class, and the operations performed on the data are left behind. Feature envy breaks encapsulation and makes unit testing difficult.
- How can it be resolved: If a method uses more attributes and functions of another class to perform any action, simply incorporate the logic of that particular action in the same class of that method.
- Refactoring suggestions: Move method, extract method, and extract class.
2) Long method: It is a method-level code smell that is categorized under “bloaters”. This bad smell complicates readability and understandability. Duplicate codes might get ignored if the size of the method is too long. Thus, it becomes necessary to resolve the long method smell.
- How can it be resolved: If a method needs description or comments then that section must be placed in another newly created method. Any part of the method that needs explanation must be split into another method in order to decrease the complexity.
- Refactoring suggestions: Replace temp with query, introduce parameter object, decompose conditionals, extract method, and replace method with method object.
3) Data class: It is a class-level code smell that is categorized under “dispensables”. Generally, data classes are not much harmful, but when the size of the project increases, internal qualities such as coupling and cohesion are affected due to data class smell. Due to increased dependencies, coupling between classes is increased.
- How can it be resolved: Operations on the data or the required methods must be moved within the data class in order to decrease the dependencies.
- Refactoring suggestions: Move method, encapsulate field, hide method, remove setting method, extract method, and encapsulate collections.
RQ2. Refactoring techniques used or identified in the studies
Refactoring performs certain transformations on the source code that helps to preserve its behavior while restructuring it. Refactoring removes code smells, improving software quality of a project and reducing the maintenance cost. Users must not mix refactoring as rewriting since refactoring does not modify the functionality of a code. Developers must refactor their code on a regular basis to maintain the quality and standard of the software project. Refactoring can be either performed manually or by using several tools such as JDeodrant, IntelliJ, etc. that are available to perform automatic refactoring.
Fowler  introduced 72 different types of refactoring methods that can be used to improve software quality. Table 6 depicts various refactoring techniques, along with their assigned paper references. The table also includes the detail of the total count of papers in which the techniques were identified or applied.
After analyzing Table 6, we found that extract class, move method, and extract method are the top 3 most used refactoring techniques. They were used in a total of 19, 17, and 15 studies, respectively. Also, we found that extract method and move method are also some of the majorly identified refactoring techniques. Extract class refactoring is used to create a new class in which methods or data of the previous class are transferred. This refactoring technique is applied to a class when its scope becomes vague or it is overloaded with lots of responsibilities. Move method refactoring is used to relocate a method from its existing class to another class. The new class is the one where the method is used recurrently. Extract method refactoring is considered as one of the essential building blocks of the refactoring process. In this approach, a chunk of code is moved from the current method to a new method, whose name explains its functionality. Apart from the list of refactoring techniques mentioned in Table 5, several other techniques were used or identified in the shortlisted studies. However, since these refactoring methods have a count of 1 each, they are not included in the table due to lack of space.
RQ2.1 Technical aspects of most used refactoring techniques
1) Move method: is used when a method is more frequently used by some other class rather than its own class. It is categorized under “moving features between objects”.
- How is it performed: The method is removed from the previous class and placed in the new class where it is utilized more frequently along with its dependent data.
- Benefit: It improves the cohesion within class and decreases the inter-class dependencies.
- Bad smells eliminated: Switch statements, parallel inheritance hierarchy, shotgun surgery, feature envy, inappropriate intimacy, data class, message chains, and message chains.
2) Extract method: It is used when a method contains many lines that can be fragmented into other methods. It is categorized under “composing methods”.
- How to perform: A new method is created and named as the function that it performs. Code from the previous method is copied and placed into the new method. The dependent fields are passed as parameters to this new method.
- Benefit: It improves code readability and decreases duplication of code.
- Bad smells eliminated: Duplicate code, message chains, long method, switch statements, feature envy, comments, and data class.
3) Extract class: It is used when a class performs more operations than required. It is categorized under “moving features between objects”.
- How to perform: A new class is created, and the relevant methods and fields are placed in it as per its functionality. A relationship, preferably unidirectional, between the old and new class is created, and the classes are renamed as per their jobs.
- Benefit: The code will become more understandable, and the Single Responsibility principle of a class will be maintained. This will further improve the reliability of the class.
- Bad smells eliminated: Large class, duplicate code, data clumps, divergent change, primitive obsession, inappropriate intimacy, temporary field.
RQ3. Software metrics used
Measurement of software quality with the help of software metrics is of vital importance because metrics help to easily understand the properties of the source code. Fig. 4 shows the categorization of software metrics used in the studies on the basis of their internal quality. As Fig. 4 illustrates, complexity and coupling metrics are most frequently used, i.e., in 29 studies each, followed by cohesion and size metrics that are used in 28 numbers of studies each.
Categories of software metrics used in the studies.
RQ4. Data collection procedure
Dataset is a compilation of logically related data that can be used for research. The datasets used in the studies are categorized as: open-source, sample, and proprietary datasets. Fig. 5 represents the count of shortlisted studies for each dataset type. The chart below shows that open-source datasets were used in the majority number of studies, i.e., 55, followed by sample datasets that were used in 9 studies. Very few studies used a proprietary dataset for their research.
RQ4.1. Most frequently used datasets
After analyzing the studies, we found that five open-source datasets were the most frequently used datasets. Table 7 presents the names of the most used datasets along with paper references and count. All the five datasets are open source, large and written in the Java programming language.
Most used datasets in the studies
RQ4.2. Size of datasets
The datasets are divided into three groups (small, medium, and large) on the basis of their size. Division of size is done as follows: (1) small, number of class<100 or number of methods<1000 or lines of code <5,000; (2) medium, number of classes lies between 100–500, number of methods lies between 1,000– 5,000 or lines of code lies between 5,000–50,000; and (3) large, number of classes >500, number of methods >5,000 or lines of code >50,000. Fig. 6 illustrates the division of dataset size, and shows that large-sized dataset is mostly preferred for research purposes as evident in 41% of the studies, followed by medium-sized datasets in 31% studies.
Size-wise distribution of dataset.
RQ 4.3. Programming languages
The datasets obtained from the studies were written in C++, C, C#, or Java. Table 8 shows the count of datasets based on the programming language in which they were written. Java is the most used language; as 415 datasets were written in it. Other languages make less than 5% of the total usage in the studies.
Count of datasets on the basis of programming language
RQ5. Usage of tools
Various tools were used in the shortlisted studies to perform refactoring, identify bad smells, and extract the values of software metrics. Table 9 shows the total count and paper references of the selected studies that performed the process as mentioned above, either manually or with the help of tools. The results of Table 9 show that the usage of tools is preferred over performing the task manually. A total of 50 studies used tools for their research; whereas manual work was done in 18 studies. The complete list of tools used in the shortlisted studies is provided in Table 10.
List of the number of times specific tools were used for refactoring techniques, bad smell identification, and software metrics
RQ5.1. Various tools used for refactoring, bad smell and software metrics
In the shortlisted studies, tools were majorly used for applying refactoring, detecting/correcting bad smells, and obtaining the values of software metrics. Table 10 shows the list of tools that were used along with their paper reference and count.
From Table 10, we can conclude that in 25 studies, refactoring is performed automatically. Also 37 studies, showed automatic detection/correction of bad smells. The values for software metrics are calculated with the help of tools in 20 selected studies. The most used tools for software metrics are SourceMeter and Metrics. JDeodrant and Eclipse are most frequently used for refactoring whereas PMD and JDeodrant are most used tools for bad smell detection/correction.
In this study, a comprehensive literature review was performed to analyze refactoring techniques, code smells, and software metrics. After an exhaustive search was performed in eight digital libraries, 106 studies were selected between the years 2001 and 2019. On further filtration, 68 studies were shortlisted and analyzed to answer the RQs. The main findings of the SLR are:
Extract method, extract class, and move method are the most used refactoring techniques in the selected studies.
Feature envy, data class, and long method code smells are detected in most of the studies.
Refactoring and bad smell detection are majorly performed automatically, i.e., with the help of tools.
Complexity, coupling, cohesion, and size metrics are the most frequently used object-oriented software metrics.
Datasets from open-source are majorly used in selected studies. ArgoUML, Apache Xerces, Apache Ant, Gantt Project, and JHotDraw are the top 5 most used datasets.
Large datasets, mostly written in Java programming language, were preferred for research work.
A vast number of studies made use of tools- SourceMeter, JDeodrant, and Metrics for their research purpose.
7. Future Directions
Code smells, refactoring techniques and software metrics share a relationship which when further analyzed can help the developers to improve software quality. The current work might help the readers in exploring the three fields from a broader perspective. After analyzing various studies and on drawing conclusions, some of the research work that can be done in this field is given as follows:
It is observed that most of the case studies or projects selected in conducting research in the field of refactoring and bad smells are Java-based. Therefore, a detailed analysis can be conducted to generalize the results for all object-oriented language-based projects including C, C++, C#, etc.
It is also observed that every class of the software contains numerous numbers of bad smells, which further increases the need for the application of different types of refactoring techniques. As a future direction, research can be conducted on prioritization of classes and identification of the optimum refactoring sequence in order to reduce the efforts of maintenance phase.
There are only a few free tools available for code detection and application of refactoring techniques. Therefore, in future, a tool can be proposed that will benefit the task while simultaneously help in enhancing the software quality.
A systematic review can be conducted to analyze the impact of refactoring on different software attributes such as internal and external quality features.
An in-depth study can be conducted on the different types of code smells identified till date, along with their respective refactoring techniques.
A detailed analysis can be conducted to explore the opportunities of search based refactoring which is an optimization problem in which the best sequence for refactoring is found using a searching algorithm.
A further investigation can be done to analyze the relationship between refactoring and frequently used object-oriented software metrics, namely complexity, coupling, cohesion, and size metrics, in this research domain. Studies can also be analyzed on the basis of external quality attributes such as reliability, efficiency, usability, etc.
A study can be conducted to investigate the impact of refactoring on development time and ease of locating errors in the source code.
Effects of code refactoring techniques on development of agile software and improvement of database quality can also be explored in the coming future.
Code refactoring has become as essential discipline of software development. A further investigation can be done to explore the opportunities of refactoring on user interface, detection and correction of anti-patterns, etc.
A study can be conducted for prediction of code smells using various machine learning methods. The impact of software metrics on code smell prediction can also be analyzed.
Further, prediction rules based on object oriented software metrics to detect code smells can also be generated using machine learning classifiers.
In the near future, empirical studies can be conducted to explore the opportunities of various other fields such as web application refactoring, big data refactoring, cloud refactoring, spreadsheet refactoring, etc.