1) Component search engine selection
页面提取
2) web page text extraction
网页正文提取
1.
In order to improve the performance of Lucene system in searching Chinese web pages,the technique of web page text extraction based on statistics,Chinese word segmentation module and documents for indexing pretreatment module are added into the system by analyzing the structure of Lucene.
通过分析Lucene的系统结构,系统采用了基于统计的网页正文提取技术,并且加入了中文分词模块和索引文档预处理模块来提高检索系统的效率和精度。
5) Pick up Profile
提取剖面
6) interface extraction
界面提取
1.
It describes the model of information integration in deep web,and analyzes the advantage and disadvantages of the existed approaches in interface extraction,schema matching and result compilation,and presents the corresponding improved approaches.
介绍了隐蔽网络信息集成的模型,分析了目前界面提取、模板匹配、结果组合技术的特点和不足,并提出了相应的改进方法。
补充资料:合阳页面
养麦面粉和面,摊烙成饼,再切为条。配多种调料、佐料而成。绵韧可口,油润中有浓郁的酸辣香味。
说明:补充资料仅用于学习参考,请勿用于其它任何用途。
参考词条