因特网中文金融新闻中抽取事件及其相关时间信息的研究

基本信息
批准号:69975008
项目类别:面上项目
资助金额:12.00
负责人:苑春法
学科分类:
依托单位:清华大学
批准年份:1999
结题年份:2002
起止时间:2000-01-01 - 2002-12-31
项目状态: 已结题
项目参与者:Kam-Faiwong,黎利,陈刚,朱晓丹,赵强
关键词:
信息抽取事件抽取时间关系
结项摘要

The extraction of events and their relevant temporal information from Internet Chinese financial news can be considered as a task of structurization of the data that appear in a text. To implement it, two key problems must be solved. One is the financial field Chinese analysis, and the other is the Chinese temporal system analysis. In the study, we have paid more attention to the special characteristic of Chinese and the financial field, and made good use of them. The statistical language learning method is always used in our research, and the method based on linguistic theory is used as well. As a foundation, we have constructed a 1,100,000 tokens financial corpus firstly. After the statistical analysis of this corpus seven knowledge bases have been built for the company name identification system. Based on the seven knowledge bases and some artificial rules, the company name is identified through using a twice-scanning method in our system. The experiment result shows the F1-Measure(β =1) rates of 94.00% and 82.50% respectively for close test and open test. In the statistical language learning, the relative deficiency of training data (sparse data) is always a crucial problem. As known, maximum entropy model has adopted much better tactics for the estimation in the cases of lacking prior knowledge. Through studying and carefully comparing with experiment, the maximum entropy model has been selected to solve main language analysis finally. And at the same time, the structure risk theory is also introduced to solve the characteristic-choosing problem of maximum entropy model. This work is novel and original and has important significance. A maximum entropy model has been trained to recognize maximum NPs in sentences. This model has achieved F1-Measure rates of 93.79% in close test and 91.84% in open test. For part-of-speech (POS) tagging, the maximum entropy model has achieved accuracy of 97.77% in close test and 96.29% in open test. Based on above work, a system of extracting financial events has been integrated. The preliminary experimental result shows the F1-Measure rates of 88.99% in close test and 74.06% in open test respectively. In the temporal information analysis of Chinese, we have tagged the financial news texts of 3.25 Ms, which amount to 2000 files. Based on statistical analysis and summarization, fifteen temporal information expression patterns have been made out. Because there are no formal changes in Chinese verbs, we proposed a special method to analyze Chinese temporal information system. The key point of this method is founded on the situation type of a main verb in one sentence, and then we use the temporal noun phrase, time auxiliary word and time adverbial word of Chinese together to recognize the situation of the sentence. The experimental result shows this method is correct and feasible. The precision and the recall of the test system are both 91.1%. It is a creative work and this research will serve as a consultation for the financial activity and offer a foundation for the prediction of financial events. It is of great significance both for theory and application.

本项目以因特网上的中文金融新闻为对象,研究如何在部分语言分析的基础上从中抽取时间及其相关的时间信息。关于事件的时间信息,主要指通过句子时相结构特征所表达的时间信息,如它是一个瞬时事件的发生,还是一个持续活动的开始或结束等。通过孤立事件的绝对时间关系,可从中进一步推出多个事件之间的相对时间关系。这种信息抽取的结果,将有助于用户掌握某一经济实体的历史,了解各金融事件之间的前因后果,并可以作为经济决策的重要依据。

项目摘要

项目成果
{{index+1}}

{{i.achievement_title}}

{{i.achievement_title}}

DOI:{{i.doi}}
发表时间:{{i.publish_year}}

暂无此项成果

数据更新时间:2023-05-31

其他相关文献

1

基于图卷积网络的归纳式微博谣言检测新方法

基于图卷积网络的归纳式微博谣言检测新方法

DOI:10.3785/j.issn.1008-973x.2022.05.013
发表时间:2022
2

时间序列分析与机器学习方法在预测肺结核发病趋势中的应用

时间序列分析与机器学习方法在预测肺结核发病趋势中的应用

DOI:
发表时间:2020
3

环境信息披露会影响分析师盈余预测吗?

环境信息披露会影响分析师盈余预测吗?

DOI:
发表时间:2017
4

常用哮喘动物模型的建立

常用哮喘动物模型的建立

DOI:10. 3969/ j.issn.1671-7856.
发表时间:2020
5

基于关系对齐的汉语虚词抽象语义表示与分析

基于关系对齐的汉语虚词抽象语义表示与分析

DOI:
发表时间:2020

苑春法的其他基金

批准号:69375017
批准年份:1993
资助金额:6.00
项目类别:面上项目
批准号:60573186
批准年份:2005
资助金额:22.00
项目类别:面上项目

相似国自然基金

1

基于篇章特征的越南语新闻事件信息抽取关键技术研究

批准号:61562049
批准年份:2015
负责人:周枫
学科分类:F0211
资助金额:38.00
项目类别:地区科学基金项目
2

基于叙事模式分析的无监督新闻事件语义抽取研究

批准号:61202233
批准年份:2012
负责人:冯岩松
学科分类:F0211
资助金额:25.00
项目类别:青年科学基金项目
3

面向社交网络的中文事件抽取与预测研究

批准号:61806137
批准年份:2018
负责人:王中卿
学科分类:F0606
资助金额:26.00
项目类别:青年科学基金项目
4

面向社会舆情的中文事件抽取及其可信度计算的研究

批准号:61472265
批准年份:2014
负责人:李培峰
学科分类:F0211
资助金额:82.00
项目类别:面上项目