The extraction of events and their relevant temporal information from Internet Chinese financial news can be considered as a task of structurization of the data that appear in a text. To implement it, two key problems must be solved. One is the financial field Chinese analysis, and the other is the Chinese temporal system analysis. In the study, we have paid more attention to the special characteristic of Chinese and the financial field, and made good use of them. The statistical language learning method is always used in our research, and the method based on linguistic theory is used as well. As a foundation, we have constructed a 1,100,000 tokens financial corpus firstly. After the statistical analysis of this corpus seven knowledge bases have been built for the company name identification system. Based on the seven knowledge bases and some artificial rules, the company name is identified through using a twice-scanning method in our system. The experiment result shows the F1-Measure(β =1) rates of 94.00% and 82.50% respectively for close test and open test. In the statistical language learning, the relative deficiency of training data (sparse data) is always a crucial problem. As known, maximum entropy model has adopted much better tactics for the estimation in the cases of lacking prior knowledge. Through studying and carefully comparing with experiment, the maximum entropy model has been selected to solve main language analysis finally. And at the same time, the structure risk theory is also introduced to solve the characteristic-choosing problem of maximum entropy model. This work is novel and original and has important significance. A maximum entropy model has been trained to recognize maximum NPs in sentences. This model has achieved F1-Measure rates of 93.79% in close test and 91.84% in open test. For part-of-speech (POS) tagging, the maximum entropy model has achieved accuracy of 97.77% in close test and 96.29% in open test. Based on above work, a system of extracting financial events has been integrated. The preliminary experimental result shows the F1-Measure rates of 88.99% in close test and 74.06% in open test respectively. In the temporal information analysis of Chinese, we have tagged the financial news texts of 3.25 Ms, which amount to 2000 files. Based on statistical analysis and summarization, fifteen temporal information expression patterns have been made out. Because there are no formal changes in Chinese verbs, we proposed a special method to analyze Chinese temporal information system. The key point of this method is founded on the situation type of a main verb in one sentence, and then we use the temporal noun phrase, time auxiliary word and time adverbial word of Chinese together to recognize the situation of the sentence. The experimental result shows this method is correct and feasible. The precision and the recall of the test system are both 91.1%. It is a creative work and this research will serve as a consultation for the financial activity and offer a foundation for the prediction of financial events. It is of great significance both for theory and application.
本项目以因特网上的中文金融新闻为对象,研究如何在部分语言分析的基础上从中抽取时间及其相关的时间信息。关于事件的时间信息,主要指通过句子时相结构特征所表达的时间信息,如它是一个瞬时事件的发生,还是一个持续活动的开始或结束等。通过孤立事件的绝对时间关系,可从中进一步推出多个事件之间的相对时间关系。这种信息抽取的结果,将有助于用户掌握某一经济实体的历史,了解各金融事件之间的前因后果,并可以作为经济决策的重要依据。
{{i.achievement_title}}
数据更新时间:2023-05-31
基于图卷积网络的归纳式微博谣言检测新方法
时间序列分析与机器学习方法在预测肺结核发病趋势中的应用
环境信息披露会影响分析师盈余预测吗?
常用哮喘动物模型的建立
基于关系对齐的汉语虚词抽象语义表示与分析
基于篇章特征的越南语新闻事件信息抽取关键技术研究
基于叙事模式分析的无监督新闻事件语义抽取研究
面向社交网络的中文事件抽取与预测研究
面向社会舆情的中文事件抽取及其可信度计算的研究