After five years work, the project "Research on Automatic Abstraction Based on Statistics and Semantic Analysis for Chinese and English Texts" is accomplished. There are two prominent characteristics of this project, in techniques and in methods. One is that the system imported semantic hierarchy concept on the base of traditional word-frequency statistic. It uses extended Dictionary of Synonymy Words in Chinese, and Word-Net and related theory of hierarchy concepts in English. Thus, a more ideal Vector Space Model (VSM) was built, and it got statistic information more precisely. To analysis and identify multi-topic text, the system analyzed the distribution of many kinds of title words and key words, and made a first successful step in resolving the issue of unbalanced distribution of abstract. The other is that, to make the abstract more readable, many readable processes were applied on the raw abstract. Those mainly include sentence-form analysis in Chinese, linked grammar analysis in English, research on removing redundant repetition of abstracted sentences, research on the arrangement and transform of sentence-form, research on suspend conjunction words problem by use of the match of templates, and etc. .Based on these research works, we accomplished a more general and ideal Chinese and English Texts Abstract System technically.
随着科技的高度发展,人类已生活在信息的汪洋大海之中。如何快捷有效地获取最有用的信息,对当今经济与技术发展至关重要。本项目充分利用课题组大型语料库系统与汉语句型自动分析与分布统计系统研究的成果和经验,以中文为主,采用统计信息与语义分析相结合的综合手段,实现一个质量高覆盖面广的中英文自动文摘系统。它必将具有广泛的应用前景和巨大的社会与经济效益。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于被动变阻尼装置高层结构风振控制效果对比分析
基于LS-SVM香梨可溶性糖的近红外光谱快速检测
基于改进LinkNet的寒旱区遥感图像河流识别方法
基于文献计量学和社会网络分析的国内高血压病中医学术团队研究
血管内皮细胞线粒体动力学相关功能与心血管疾病关系的研究进展
基于统计机器翻译和自动文摘的查询扩展研究
基于语义分析和统计的自动主题标引研究
基于信息重组的多文档自动文摘技术
基于逻辑框架的多文档自动文摘技术