1. Introduction
Elliott [1] believes that health information refers to all the information related to medical or health care, including medical knowledge, health knowledge and patient health service information. At present, wireless network has gradually become an important channel for people to obtain health information. Therefore, the health information on the wireless network presents an explosive growth trend, and the phenomenon of "information technology is difficult to find" appears [1]. Due to the diversified development of various health information content in wireless network, for example, health information contains a large number of text, audio, video, graphics, images, and other elements, and the display forms are also different. Nowadays, people's attention to health issues has expanded from disease treatment to disease prevention and self-care, and the demand for health information has increased [2].
Retrieval behavior as the intermediary between individual users and wireless network health information, in-depth research on it will help to accurately identify user needs and provide health information that meets the needs for user retrieval [3].
A data mining algorithm is a set of heuristics and calculations to create a data mining model based on data. In order to create a model, a data mining algorithm need to analyze data [4]. Data mining algorithms can analyze this valuable, applied, and analyzed information and knowledge processes hidden in many incomplete random combinations of data [5,6].
2. Data Mining Algorithm
There are 10 sub-algorithms in data mining algorithm. In this paper, two algorithms which can be applied to wireless network health information retrieval are described in detail, which are K-means clustering algorithm and BP neural network algorithm. The data mining algorithm flow based on the above two algorithms is shown in Fig. 1 [7,8].
The data mining algorithm flow based on the two algorithms is shown in Fig. 1.
Data mining algorithm flow.
3. Wireless Network Health Information Retrieval Method based on Data Mining Algorithm
In this paper, we design a wireless network health information retrieval method based on data discovery algorithm, which can obtain a large number of health information data from the wireless network, and transform the obtained health information data into well-structured data [9,10].
3.1 Filtering Wireless Network Health Information
In the process of wireless network health information retrieval, there must be a large number of invalid information to interfere with the retrieval results, so this paper must filter the wireless network health information in advance [11]. After getting the vector, in order to filter some invalid information, we need to perform the vector mapping operation. The mapping process of wireless network health information is shown in Fig. 2.
According to Fig. 2, when mapping input, the original data stored in the wireless network and the mapping vector calculated by the master node, and the output is the mapped data [12]. The mapping vector is equivalent to a switch. Only columns with a value of 1 will be output, and other columns will be filtered. The filtered data is equivalent to the function of where statement. Whenever a line of data meets the search conditions, it will send out a logic high signal, otherwise it will send out a logic low signal [13].
Wireless network health information mapping.
3.2 Clustering Analysis of Wireless Network Health Information based on Data Mining Algorithm
Considering that the filtered wireless network health information is relatively scattered, there are some difficulties in the retrieval process [14-16]. Therefore, based on data mining algorithm, this paper uses K-means to cluster wireless network health information [17]. In this process, firstly, the K-means minimization error function of data mining algorithm is used to classify and calculate the filtered wireless network health information [18].
In the first step, K centers are randomly selected from different wireless network health information, the corresponding preprocessing is performed to get the purer health information. It's described as follows:
In the formula, [TeX:] $$D_k$$ represents the constant for a given data set; n represents the dimension of wireless network health information.
In the second step, each different wireless network health information data point is configured to the nearest center point from the data point, and divided into sample cluster points. The points of different sample data clusters are divided into the sample cluster represented by the center which is closer to them, that is, the center which is closer to the center of the initial cluster is divided into one class. In this step, the distance formula is introduced:
In formula (1), d(x,y) is Euclidean distance; n represents the dimension of wireless network health information; x,y represents health information of heterogeneous wireless networks [19]. According to the average value of each cluster sample wireless network health information object, the distance between each object and these central objects is calculated, and the corresponding objects are re divided according to the minimum distance, the average value of the distance from each point in each class to the central point of this class is recalculated, and each health information is allocated to its nearest central point, The center point of each wireless network health information data point in the sample cluster is used to represent the center point of the sample cluster [20]. On the basis of difference of parameter data and the data center of these clustering information can be calculated again according to the center point of different clustering information, and the minimum data calculated each time is composed of matrix D. then there is the formula (2):
In the formula, x is the set of minimum values. According to the minimum distance, the corresponding wireless network health information is re divided.
The third step is to determine whether to carry out iterative calculation until all big data values are no longer assigned or have reached the maximum number of iterations. The fourth step is to cluster K health information data points in wireless network space. It's described as follows:
In the formula, m represents the numbers of clusters; [TeX:] $$z_j$$ represents the size of cluster j; z represents the total numbers of data; [TeX:] $$E_j$$ represents the entrophy of cluster j.
3.3 High Frequency Word Frequency Classification of Wireless Network Health Information
After clustering analysis of wireless network health information based on data mining algorithm, because there are too many items involved in the actual retrieval process, it is not realistic to do correlation analysis on each search term. In order to improve its accuracy, it is necessary to count the frequency of high-frequency words of wireless network health [17]. The following table shows information.
According to Table 1, the top-10 high-frequency words of wireless network health are listed. In the retrieval process, for each retrieval instruction, we can get thousands of result URL lists, which can be sorted according to the relevance with the search words. Each result URL has a serial number, which represents the position of the URL in the return list [18]. Based on this, every 10 URLs and their summary constitute the result information of the page, so as to obtain the URL click location frequency distribution, as shown in Table 2.
Statistical results of high frequency words of wireless network health
URL click location frequency distribution
According to the frequency distribution of URL click location, the word frequency classification of wireless network health high-frequency words is realized. Table 3 shows classification of high-frequency words of wireless network health.
Classification of high-frequency words of wireless network health
3.4 Realize Health Information Retrieval in Wireless Network
Based on the above classification results of wireless network health high-frequency words, the wireless network health information retrieval is realized based on specific retrieval behavior types [19].
Table 4 shows the types and descriptions of health information retrieval behavior. The specific implementation process is as follows: firstly, the health information of wireless network is input into BP neural network, and according to the described health information retrieval behavior type, the weights and thresholds in neural network are constantly adjusted to gradually approach the required results, so as to minimize the output error [20].
In this stage, users make comprehensive judgment, evaluation, and decision-making on health information, select the required health information, and then perform further sharing behavior. In this way, the wireless network health information retrieval is realized [21].
Types and descriptions of health information retrieval behavior
4. Experimental Results
4.1 Analysis of Experimental Results
Table 5 shows a comparison of experimental data. According to the experimental results, the data mining algorithm is able to retrieve the wireless network health information accurately, and its accuracy is obviously much higher than that of the control group. These experimental results show that the wireless network health information retrieval method designed in this paper has practical significance, and it is necessary to promote the use.
Comparison of experimental data
5. Conclusion
In this paper, data mining algorithm is used to realize the classification and calculation of big data. In the whole process of data mining and calculation, it not only improves the ability of data processing, but also increases the speed of data calculation. It also enables users to improve the accuracy of data more accurately from the clustered big data, making it easy for users to grasp the law and connotation of data from the vast database, so as to realize the wireless network health information retrieval quickly and accurately. Up to now, no one in China has used this study. A real data mining in this study can show the user's health information retrieval behavior pattern from a macro perspective. Because the research is still in the initial stage, there are many deficiencies. From the perspective of research methods, the log method cannot connect the query with specific users, and can't count the relationship between demographic characteristics and retrieval behavior; from the implementation of the research steps, we cannot guarantee that the high-frequency word frequency classification can filter out all the query items related to health information. The follow-up research will focus on the changes of health information retrieval behavior of Internet users over time, in order to predict the future trend of health information retrieval, and provide further help for the improvement of search engine retrieval system and related website designers.