1、How can the information is not submerged, but from time to discover useful knowledge to improve information utilization. Face this challenge, data mining came into being, and develop rapidly, showing a strongvitality. Data Mining is the so-called data mining from a large number of incomplete, noisy,
2、 fuzzy, random extraction of the raw data implicit in them known in advance, but is potentially useful information and knowledge of the process The birth of data mining technology for people on the database the results of long-term research and development, and data mining technology Fazhan the same
3、 time it in turn led into a database technology more advanced stage: the data Huan Jing Chuan Tong basically data operation Xingstraditional information system is only responsible for data, delete and modify operations in the database can be realized on the basis of the work is OLTP (OnLine Transact
4、ion Process on-line transaction processing).Now that the growing accumulation of data, people need to analyze the type of the data environment, and so it was derived from the data warehouse database, you can achieve as a basis for OLAP (OnLine Analysis Process Online Analytical Processing): With the
5、 massive data collectionmay be enhanced computer processing technology and advanced data mining algorithms proposed, data mining technology can not only query the data of the past and the traverse, but also to identify the potential value over the past links between the data and to certain forms, an
6、d thusgreat to meet urgent needs for knowledge. Data mining is based on the formation of the original data source of knowledge, It can be structured data such as relational database, it can be semi-structured, such as text, graphics, images, data, or even distribution on the differentconfiguration d
7、ata.This article will focus on one for semi-structured data mining - WEB-based data mining, introduces its basic concepts and techniques frequently used in the final brief explanation of the XML application in which.1, based on WEB key concepts of data mining1 What is WEB-based data mining The rapid
8、 development of the current network, all sites abound.But in an increasingly competitive Internet economy, only to win customers in order to ultimately gain a competitive advantage.As a site administrator or owner, should know that users do on his Web site, to know which part of the site like most u
9、sers, which allows users to feel tired, out of a security vulnerability where, what kind of changes have brought significantcustomer satisfaction and improve the contrary, what kind of changes the user and so lost.Know thyself, to know yourself.The WEB-based data mining technology is able to meet th
10、ose needs. WEB-based data mining on the exact definition, so far not very clear and authoritative statement.Abroad that: WEB-based data mining, is to use data mining techniques to automatically document from the network and service discovery and extraction of information in the process.In Taiwan, di
11、fferent opinions, there is considered to be a large number of known data samples on the basis of the inherent characteristics of data objects, and as a basis for a purpose in the WEB in the information extraction process.At the same time, scholars will be the network environment included in the netw
12、ork information retrieval data mining and web content development and so on.In short, WEB-based data mining (Web Mining) is from the World Wide Web (World Wide Web) on access to raw data from a hidden tap the potential of available knowledge and ultimately used in commercial operations to meet the n
13、eeds of managers.2, WEB-based data mining classificationAccording to different objects excavated We can WEB-based data mining is divided into three categories:WEB-based content mining (Web Content Mining)The mining-based WEB (Web Structure Mining)WEB-based use of mining (Web Usage Mining)(1) WEB-bas
14、ed content mining The so-called WEB-based content mining is actually a document from the WEB and the description of the access to knowledge, WEB Documents mining and concept-based index or search for Agent technology should also be attributed to such resources.Many types of Web information resources
15、, the current WWW information resources has become the subject of network information resources, but in addition a large number of people directly from the web crawling, indexing, query services to achieve the resources, the considerable part of the information is hidden in the data (If the question
16、s raised by the user dynamically generated results, there is data in the database system, or some private data) can not be indexed, so they can not provide effective retrieval method, which forces us to dig out these elements.If the forms from the perspective of information resources, WEB content is
17、 text, images, audio, video, meta data such as the composition of the various forms of data, which we refer to WEB-based content mining is also a multimedia dataMining.2, based on the structure of the mining WEB This type of mining is the overall structure from the World Wide Web and web pages found
18、 on the link between knowledge of the process, it is mainly the potential of the link structure mining WEB mode.This idea comes from citation analysis, that is, by analyzing a web page link and the number of links and the object was to establish the link structure of its own mode of WEB.This model c
19、an be used for web page classification and can thus be related to and associated with different degrees of similarity between pages of information.WEB structure mining helps users find related topics in the authority of the site, and search results on the ranking of network resources is very signifi
20、cant.3, based on the use of mining WEBWEB-based use of mining, also known as WEB log mining (Web Log Mining).And the first two mining approach to the on-line data mining of the original object, use WEB-based mining face is in the process of interaction the user and the network to extract data out of
21、 second-hand.These data include: web server access logs, proxy server, logging, user registration information, and when users visit the Web site behavior and action, and so on.WEB usage mining this data 11 records to the log file, and then accumulated in the log file mining to understand the users W
22、eb behavior data with meaning.The example before us fall into this type.Excavated from five to three forms were compared with the specific content of which will be further described below. WEB-based content mining: unstructured semi-structured text document hypertext documents Bag of words n-grams w
23、ord or phrase in the concept of relational data entities TFIDF and statistical machine learning variants (including natural language processing) returnclass cluster model to explore the text extraction rules to explore the establishment of model.The mining-based WEB: semi-structured database form of
24、 web link structure of super-text document links boundary signs OEM relational data graph graphic Proprietary Algorithm for ILP (revised) of the association rules explore high-frequency sub-structure excavation site systemStructural classification clustering.WEB-based mining use: interactive forms s
25、erver log records log records browser relational table graphics Proprietary statistical machine learning algorithm (revised) association rules site construction and management of sales improved to create a user mode.3, the characteristics of data mining based on WEB(1) What is the semi-structuredThe
26、 so-called semi-structured as opposed to the purposes of structured and unstructured.We call the traditional database data fully structured data, while there are still some, such as a book, a picture so completely without structure unstructured data.Semi-structured is somewhere in between, with the
27、implicit model, information structure, irregular, non-strict types of constraints and so on.Semi-structured data model has the following characteristics Prior data, after the model;Semi-structured data model is used to describe the data structure of information, rather than mandatory constraint data
28、 structure;Semi-structured data model non-precise, it can only be described as part of the data structure may also be under various stages of data processing perspective varies;Semi-structured data model, may be very large even more than the size of the source data, and will continuously update as t
29、he data is in the process of dynamic change.(2) WEB characteristics of the dataWeb data on the most important feature is the semi-structured.However, data on the Web and traditional data in the database is different from traditional databases have some data model, can describe the model to specific
30、data and specific organizations in accordance with the law of a certain concentration or distribution of storage, structural strong; the Web, the data is very complex, no specific model to describe the data for each site, all independently designed and the data itself has a readme and dynamic variability, and therefore the data on the Web is not a strong structural.At the same time Web pages is a description of levels, a single site is in accordance with the structure of their architecture, which has some structural.Therefore, we believe that
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1