Chaoqun Li , Zhigang Chen , Tongrui Yu and Xinxia Song
Contact Tracking Development Trend Using Bibliometric Analysis
Abstract: The new crown pneumonia (COVID-19) has become a global epidemic. The disease has spread to most countries and poses a challenge to the healthcare system. Contact tracing technology is an effective way for public health to deal with diseases. Many experts have studied traditional contact tracing and developed digital contact tracking. In order to better understand the field of contact tracking, it is necessary to analyze the development of contact tracking in the field of computer science by bibliometrics. The purpose of this research is to use literature statistics and topic analysis to characterize the research literature of contact tracking in the field of computer science, to gain an in-depth understanding of the literature development status of contact tracking and the trend of hot topics over the past decade. In order to achieve the aforementioned goals, we conducted a bibliometric study in this paper. The study uses data collected from the Scopus database. Which contains more than 10,000 articles, including more than 2,000 in the field of computer science. For popular trends, we use VOSviewer for visual analysis. The number of contact tracking documents published annually in the computer field is increasing. At present, there are 200 to 300 papers published in the field of computer science each year, and the number of uncited papers is relatively small. Through the visual analysis of the paper, we found that the hot topic of contact tracking has changed from the past "mathematical model," "biological model," and "algorithm" to the current "digital contact tracking," "privacy," and "mobile application" and other topics. Contact tracking is currently a hot research topic. By selecting the most cited papers, we can display high-quality literature in contact tracking and characterize the development trend of the entire field through topic analysis. This is useful for students and researchers new to field of contact tracking ai well as for presenting our results to other subjects. Especially when comprehensive research cannot be conducted due to time constraints or lack of precise research questions, our research analysis can provide value for it.
Keywords: Bibliometrics , Citation Analysis , Contact tracking , Development Trend
In the past 30 years, epidemiological study has flourished in the relevant literature, especially after the outbreak of AIDS (acquired immune deficiency syndrome). During the period, some epidemiological problems were partially solved or mathematically modeled under certain circumstances, such as the optimal age for children’s vaccination . Modeling is an effective means of understanding epidemiology and predicting medical history at that time. However, in terms of strategic control, contact tracking is considered for most diseases, including the severe acute respiratory syndrome (SARS) pandemic in 2003 , the foot-and-mouth disease (FMD) epidemic in the UK in 2001 [3,4] and the control of sexually transmitted infections [5,6].
Traditional contact tracing requires a lot of human effort to collect virus data, which is time-consuming and laborious. Furthermore, manually collecting data from potentially infected individuals poses the risks of staff being infected. Another shortcoming of manual contact tracking is the inability to collect complete personnel information. These problems are very serious in the outbreak of new coronary pneumonia, and the traditional contact tracing mechanism is not enough to deal with the rapidly spreading and highly contagious virus . In this case, digital contact tracking based on smart devices such as mobile phones have been developed. They can track contacts, detect their infection, treat the infected, and track their contacts in turn.
China, Singapore, South Korea, and other countries are among the first to require people to download contact tracking applications to facilitate contact tracking. China requires people to fill in personal travel history and body temperature and other health information nationwide Alipay, WeChat and other apps are used to generate a health code, and users must show the health code when entering and exiting public areas . Hong Kong implements the StayHomeSafe application. Users arriving in Hong Kong need to download the application and wear an electronic bracelet with a QR code . South Korea runs a coronavirus disease 2019 (COVID-19) smart management system that tracks people infected with COVID-19 by using data from 28 organizations including the National Police Agency, three smartphone companies, and 22 credit card companies. The Australian government launched the COVIDSafe application , which encrypts the user’s identity information in a smartphone. When someone is diagnosed with COVID-19, the government will ask who they have been in contact with.
In addition to the contact tracking applications launched by the government, private organizations are also actively developing contact tracking applications. PHBC is an alliance of health stakeholders (e.g., universities, medical institutions, etc.), which have developed a blockchain-based contact tracking system . The system employs artificial intelligence (AI) and geographic information system (GIS) technologies, and obtains information in real-time from institutions that provide the latest virus infection information. In addition, there are a variety of methods proposed by IT researchers to control and keep track of the spread of the virus. Nguyen et al.  proposed a system framework based on blockchain and AI intelligence, which can obtain data about COVID-19 from mobile phones of data sources such as hospitals, research institutions, and wireless network operators, etc. They propose to use blockchain to warrant the privacy of such data. Then, the AI model uses the data to provide solutions to calculate exposure risks, assist in vaccine development, and predict future outbreaks of similar viruses. However, this only a conceptual framework without any implementation details. Torky and Hassanein  proposed a blockchain-based framework to mitigate the spread of COVID-19. Their framework includes four subsystems: an infection verifier subsystem, a P2P mobile application, a blockchain platform, and a quality monitoring system. They use blockchain to store the infection pattern and all infection cases based on the pattern.
This article aims to survey the development trend of contact tracking literature in computer science through Scopus, analyze the hot topics of contact tracking literature, understand the changes before and after contact tracking, and provide help for people who study related contact tracking fields or who need scientific measurement statistics.
2. Research Methods and Data Extraction
2.1 Goals and Research Questions
The purpose of this research is to conduct quantitative statistics on the contact tracking literature in the field of computer science, and the key analysis objects are the citations and themes of the papers. Based on the above objectives, the following research questions (RQ) are proposed. The objectives and RQs of the research are exploratory and descriptive in nature.
RQ1: Since 1983, what has been the trend in the number of papers and how many have been published each year?
RQ2: How about the citations of contact tracking literature in the field of computer science? This RQ is divided into three sub-RQs:
RQ2.1 Citation distribution. What percentage of papers have not been cited? What is the proportion of papers with one citation?
RQ2.2 What are the highly cited papers in contact tracking?
RQ2.3 What are the citation trends of different paper types? For example, do journal papers get more citations on average than conference papers?
RQ3: Theme and theme analysis. How has the focus of the thesis changed over the years?
RQ4: How did different countries perform in the publication of contact tracing papers?
2.2 Data Source and Data Extractions
2.2.1 Select search database
In order to identify suitable publication databases, we research a large number of bibliometric papers. We list the publication databases mentioned in high-quality papers as follows: PubMed, Scopus, Web of Science, and Google Scholar.
Google Scholar: The database allows readers to find a link to an article (usually a journal website), and allows users to use their own keywords and author names to perform ordinary Google web search. The Google Scholar database is a basic part of the most popular search engine.
PubMed: It can not only facilitate fast search, advanced search, but also use other search engines developed by NLM (the United States National Library of Medicine), which can provide important documents in more than 1 million journals for free. PubMed allows users to use more keywords when searching. However, it does not provide citation analysis; the only one among these four systems that misses such a functionality.
Web of Science: It includes fast subject search, advanced search, general search and cited reference search. In the cited reference search, the search results can be limited to the cited author, cited work, and cited year. If the user needs it, the cited author index and cited work index can also be displayed. The citation report is expressed in the form of a histogram, and the search results can be further refined.
Scopus: It combines the features of PubMed and Web of Science, and the total number of indexed journals are more than that of the other three, but there is a fee. Scopus provides quick search, basic search, advanced search, author search and source search. Scopus citation analysis system displays the number of cited articles in each year as a graph, and also displays the total number of cited references in all years. In addition, Scopus has search prompts written in 10 languages, which is convenient for users. Detailed database characteristics are shown in Table 1.
Comparing the basic characteristics of the four databases, we propose three important criteria for selecting suitable database:
(1) As far as the coverage of the contact tracking literature is concerned, the database should be high quality and reliability.
(2) The database should have all the citation information of the paper.
(3) The database should provide a convenient interface to retrieve and extract citation data.
Since PubMed does not have citation data analysis, it will not performing standard comparisons. In Table 2, we discuss the differences between the other three publication databases under the above selection criteria.
As far as criteria 1, Scopus has the function of searching by the source name, and its journals cover a wide range of countries . However, the countries covered by Web of Science journals tend to focus on Britain, the United States, Germany, etc., and Web of Science statistics do not include conference papers, and the statistics of interdisciplinary subjects are inaccurate.
As far as criteria 2, candidate publication databases include citation data.
As far as criteria 3, Google Scholar cannot easily save all the retrieved papers. Web of Science only allows saving the retrieved papers page by page. If the number of retrieved papers is too large, the export process will be very cumbersome. Scopus allows all the retrieved papers to be saved to a CSV file.
Through the comparison of selection criteria, the Scopus publication database was finally selected for retrieval. In addition, the paper titled “Top 100”  published in the journal Nature also used Scopus for research, and compared Web of Science and Scopus in terms of performance and coverage in various fields, e.g., social sciences [16-18].
2.2.2 Extract all contact tracking papers from Scopus
After choosing Scopus as the publication database, the next step is to search for papers.
In [19-21], the authors searched the literature by using phrases in the “source title” and found that the search function is indeed reliable, and it has been used in other disciplines such as physics. As shown in Figs. 1-3. we search for “source title” in the Scopus search interface, and obtain related papers by checking the contact tracking paper with the highest number of indexed articles.
Applying the above method results in an initial data set of 2,251 papers, and through software deduplication, 2,049 papers are finally determined. Scopus stores the following 12 categories of papers: articles, conference papers, book chapters, books, editorials, errata, letters, memos, comments, brief surveys, retracted and undefined. We only concern about scientific papers, so when analyzing article types, we only include articles, conference papers, books, reviews, and book chapters, while the rest are excluded.
3.1 RQ1: Documents in The Field of Contact Tracing over The Years
RQ1.1 Total document volume of contact tracking
Fig. 4 depicts the top 20 disciplines based on the amount of papers in different subject areas in the contact tracking literature. According to the search results, the existing contact tracking research involves 27 disciplines. Mathematics, agriculture and biological sciences and medicine are the top three in terms of the amount of published papers in the contact tracking field. There are 4,562 literatures in mathematics, accounting for 33% of the total, and 4,125 articles in the fields of agriculture and biological sciences, accounting for about 30% of the total. Contact tracking is an infectious disease control strategy. It is not a new technology and has been used in the past, such as SARS in 2003 and H1N1 in 2009.
RQ1.2 Papers over the years
Fig. 5 shows the number of contact tracking papers in the computer science field in Scopus. The earliest publication year was 1989, and one paper was included in Scopus. In the early years, the number of papers was small, but the number of papers increased sharply after 2006, reaching a peak of 412 papers in 2019. It can be seen from the figure that the paper has a rapid growth trend in recent years. The reason is that manual contact tracking cannot collect complete information of relevant personnel in the face of rapidly spreading and highly infectious viruses. In order to allow contact tracking to collect data quickly and autonomously, experts have combined computer technology to develop digital contact tracking. The focus of this article is also to analyze the development of contact tracking in the field of computer science literature.
3.2 RQ2: Citation Analysis
RQ2.1 Citation status
In any research, citations are crucial. It is generally believed that the high number of citations indicates the quality and influence of the paper .
According to the data retrieved from Scopus, Fig. 6 shows the citation overview of contact tracking in the field of computer science, which broke out between 2006 and 2012, and the maximum number of citations reached 2,915 in 2007. After 2010, the number of citations showed a negative linear decline, because the years of public citation of papers published after 2010 were less than those published in previous years, but the overall trend of papers still showed a linear growth. In addition, the number of citations in 2020 showed an abnormal surge, which was caused by the global outbreak of COVID-19. In order to deal with the disease, experts began to develop digital contact tracking systems, and the number of papers on computer science increased dramatically. In summary, about 93% of the papers were published in the past 20 years (2001-2021), and about 7% were published in the previous 17 years (1989-2000).
Among the 2,049 contact tracking papers retrieved, around 27% of the papers have never been cited, 10.7% of the papers were cited only once, and 62.3% of the papers were cited more than once. The total number of citations in the paper is 31,139, and the average number of citations per paper is 15.2. The papers with the highest number of citations have been cited 1,083 times in total. The citation data is shown in Fig. 7.
Focusing on the problem of an unbalanced distribution of citation data, as early as 1960, experts in bibliometrics conducted many studies. According to “Little Science, Big Science,” about 6% of the papers published by scientists accounted for half of all published papers [22,23]. A 2014 paper used the famous Gini coefficient to measure inequality in academic institutions and scientific journals  quantitatively. The research shows that academic inequality is universal in citations.
RQ2.2 The most cited papers
In order to determine the most cited papers, we employed two indicators: total citations and average annual citations. The average annual number of citations is a further analysis of the total number of citations to study the influence of the total number of citations on the total number of citations. This indicator is used in multiple bibliometric studies. Tables 3 and 4 list the first five papers using these two indicators.
The paper titled “The dynamics of viral marketing” published in 2007 appeared in both rankings. Table 3 shows that the paper “The dynamics of viral marketing” has the highest number of citations, but its average number of citations is low. The papers published in recent years occupy the top four in the average number of citations. This problem is mainly due to the outbreak of COVID-19 at the end of the year 2019, which has prompted an endless stream of new papers on contact tracking research, and high-quality papers have been frequently cited by experts.
Highly cited papers are very common. In the October 2014 issue of Nature, its cover paper “Top 100 papers” ranked the top 100 in all scientific fields . The research report stated that out of the 58 million articles included in Thomson’s Science Net, only 14,499 were cited more than 1,000.
RQ2.3 Number and citation statistics of different publication types
The Scopus database stores 12 categories of papers. For scientific research, the following only contains five types of papers: articles, book chapters, books, conference, and review papers, and exclude other paper types.
Table 5 shows the differences in the total number of papers and the average number of citations for different types of papers. In terms of the proportion of the number of papers, journals and conference papers have the highest proportions, accounting for 62.75% and 31.82%, respectively. In terms of the average number of citations for each type of paper, the average number of citations for reviews and books are 26.25 and 109, respectively. This is mainly due to the small number of papers of this type and the high number of citations per paper. It can be seen from the table that people basically choose journal articles and conference papers for submission, and this type of paper has a large audience group.
Compared with all other paper types, books are highly valued and have a high citation rate. In terms of the proportion of uncited documents, conference papers and book chapter papers have the highest uncited proportions, accounting for about 39.69% and 58.54%.
3.3 RQ3: Analysis of the Key Areas of the Paper
In order to understand the themes areas of the contact tracking field, we conducted a visual analysis. As shown in Fig. 8, by setting the minimum number of keyword appearances to 15,310 keywords can be obtained that meet the requirements. In order to make the visual analysis results accurate, irrelevant words such as “human,” “article,” “animal” and other words are deleted, and finally a cluster view and a label view are generated. According to the VOSviewer software, the data generates a total of 224 keywords, 3 clusters, and 8,287 connections.
In the VOSviewer, each color represents a family category, and each node represents a keyword. The size of the node depends on the number of links and the number of citations. It can be seen from Fig. 9 that the most important words are contact tracing, disease control, computer simulation, biological models, etc., which are related to other topics. The blue cluster is mainly used for the research of digital contact tracking systems, the red cluster is mainly used for the research of disease transmission and control, and the green cluster is mainly used for computer simulation and model calculation.
In the label view, the average year of the default keywords is color-mapped with the score value. It can be seen from Fig. 10 that in the earlier 10 years, key words such as “mathematical model,” “biological model,” and “algorithm” were the most common ones, but the focus area has been shifted to such as “Disease control,” “random systems,” “social networks,” “exposure detection,” and other topics. In recent years and beyond, the main targets are “digital contact tracking,” “privacy,” “mobile applications,” “digital health,” “monitoring,” and other different themes.
3.4 RQ4: Ranking of Papers Submitted by Countries
Scopus provides statistics on the country of each paper’s author. Therefore, our database contains this information. As shown in Fig. 11, we consider the indicators involved in Scopus and rank all papers published in the computer field by country. It can be seen that the contributions of papers in each country are very uneven, with the top ranked United States, China, and the United Kingdom contributing almost half of the research literature.
This paper introduces the bibliometric evaluation of contact tracking. As the research shows, contact tracking is an active research area in recent years, and the number of papers has been increasing since 1983. In the 1990s, fewer than 100 papers were published every year. However, in the 21st century, this number is between 200 and 400 per year. It’s unclear whether this indicator really represents the growth of academic and technological progress. However, it clearly shows that it is more advisable to conduct a bibliometric analysis. Otherwise, our research classroom lacks an objective and comprehensive view.
In the computer field, there are 1,275 papers with more than one citation of contact tracking literature, accounting for about 62.22%. What can be drawn is that the quality of the literature in this field is not low, and the research community is full of interest in the research topics.
By analyzing the most cited papers, we found that the papers of previous years occupy four, but in the ranking of the average number of citations, the papers of 2020 occupy four. It is mainly the outbreak of the epidemic in 2019 that prompted the development of contact tracking technology, and related literature has also followed. In addition, the citation has also caused many criticisms. For example, a recent column in the journal Nature severely criticized citation indicators: They motivate people to work in popular sub-fields instead of publishing in-depth research papers in targeted fields [26,27].
From the analysis of paper types, it can be seen that journal papers account for the largest proportion, and other types of papers only account for one-third. This phenomenon is typical in natural sciences, and people prefer to publish papers in journals. In the comparison of the total number of citations, journal papers have more citations than conference papers.
The hot spot analysis shows the changing trends of the paper, and contact tracking covers a wide range of topics ranging from past research on virus transmission to mathematical model calculations and current digital contact tracking technology. Finally, by comparing the publications of papers on contact tracing from different countries, the United States, China, and the United Kingdom account for half of all published papers.
However, so far, about 27.18% of the papers in this field have not been cited. Why is the proportion of non-cited papers so large? Does this relate to the quality of the paper or journal? As discussed in the bibliometric research titled “Characteristics of Highly Cited Papers,” the quality of papers and the cutting-edge nature of research content will increase the number of citations of papers.
The data set we use can be used for other thematic in contact tracking and its subdomains. For example, we show the hot topics in this field. Similarly, this bibliometric method can be repeatedly used on a regular basis to analyze the progress trend of the field in the next few years and understand the prospects of the field.
The work was supported by grants from the National Natural Science Foundation of China (No. 61873026), the National Defense Science and Technology Key Laboratory Fund (No. JZX7Y201911SY001101), General scientific research projects of Zhejiang Provincial Department of Education (No. Y202045430), and Zhejiang Province Public Welfare Technology Application Research (No. GF22F026173).
He received the M.Sc. degree in computer software and theory from the Northwest University in 2004, and received Ph.D. in the Nanjing University of Aeronautics and Astronautics in 2015. From 2013 to 2014, he was an academic visitor at Information Security Group of Royal Holloway, University of London. He is a professor at Zhejiang Wanli University. He is also a visiting researcher at State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences now. Currently his research focuses on fully homomorphic encryption, lattice-based cryptography and blockchain.