Entity relation extraction is a significant research direction in the field of natural language processing and information extraction. However, the traditional ACE-based supervised entity relation extraction needs large-scale manually labeled corpora as training data. Therefore, it is difficult to meet the needs of real applications. Recently, with the emerging of knowledge graph, distant supervision for entity relation extraction based on large-scale knowledge graph has become a hot research topic and attracted great attentions for researchers of related fields. This project aims to effectively complete and utilize the large-scale Chinese knowledge graph from the layers of data resources, feature representations, theories and methods. Based on the above analysis, this project focuses on the following key techniques: (1) large-scale Chinese knowledge graph completion and integration based on bilingual deep learning and semantic graph; (2) distributed word embedding representation based on collaborative semantic and lexical semantic knowledge; (3) distant supervision for entity relation extraction with multi-instance learning based on collaborative modeling and dependent assumption; (4) distant supervision for relation extraction based on joint instance modeling and distinctive feature ranking. The above achievements can not only be directly used into the entity relation extraction, but also produce significant impact upon information extraction techniques.
实体关系抽取是自然语言处理和信息抽取领域的重要研究课题,然而传统的面向ACE的有监督实体关系抽取需要大规模的人工标注语料库作为训练数据,因而通用性不强,无法满足实际应用需求。近年来,随着大规模知识图谱的出现,面向知识图谱的弱监督实体关系抽取受到了相关领域研究人员的高度重视,成为新的研究热点。本申请课题以大规模中文知识图谱的填充与有效利用为总目标,从数据资源、特征表示和理论方法三个层次入手,针对弱监督实体关系抽取的四项关键技术展开研究:(1)基于双语深度学习和语义图的大规模中文知识图谱填充与集成;(2)基于协同语义和词汇语义引导的分布式词向量表示;(3)基于协同建模与依赖假设的多实例弱监督实体关系抽取;(4)基于联合实例建模和特征显著度排序的弱监督实体关系抽取。以上研究成果一方面可以直接应用于实体关系抽取,提升其抽取的性能;另一方面也为信息抽取技术的发展产生重要影响。
实体关系抽取是自然语言处理和信息抽取领域的重要研究课题,是构建知识图谱的核心技术之一。本课题主要研究面向大规模知识图谱的弱监督中文实体关系抽取,主要研究内容包括:基于深度学习的主题实体抽取、基于知识图谱的自然语言问答、基于文档库的非结构化信息抽取、具有情感监督的开放域对话生成以及融合主题信息的情感对话生成。主要研究成果包括:9篇高水平学术论文,授权国家发明专利5项,相关成果得到了国内外同行的肯定,推动了该领域的发展。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于公众情感倾向的主题公园评价研究——以哈尔滨市伏尔加庄园为例
基于细粒度词表示的命名实体识别研究
服务经济时代新动能将由技术和服务共同驱动
卡斯特“网络社会理论”对于人文地理学的知识贡献-基于中外引文内容的分析与对比
零样本学习综述
面向专利文本中实体关系抽取的远程监督方法研究
面向开放域知识网络的实体语义关系抽取方法研究
面向互联网开放域的弱监督关系抽取关键问题研究
面向甲骨学知识图谱的实体发现及语义关系挖掘研究