网络搜索与大数据分析的结构信息途径研究

基本信息

批准号：61772503

项目类别：面上项目

资助金额：64.00

负责人：李昂生

学科分类：

依托单位：中国科学院软件研究所

批准年份：2017

结题年份：2021

起止时间：2018-01-01 - 2021-12-31

项目状态：已结题

项目参与者：彭攀,王少将,胡奇夫,刘俊,殷宪晨,聂浩

关键词：

结构熵极小化原理智慧搜索结构信息大数据分析网络

结项摘要

A search in a network is to identify the natural module of a query input. The current-generation search engine on the basis of PageRank has become a new technology in the 21st century, leading to a number of new phenomena. However, the theoretical foundation of searching that is basically missing in the existing literature, has become a grand challenge for both network algorithms and for the next-generation search engine. A closely related new phenomenon is the availability of various kinds of big data, which are usually unstructured and noisy. It is a grand challenge to discover the laws and knowledge from the available big data. Li and Pan defined the metric of K-dimensional structural information of a physical system G to be the minimum number of bits required to determine and to decode the K-dimensional code of the node that is accessible from random walk in the system. Consequently, the structural information is the measure of uncertainty occurred in a system that can be measured directly wherever the noise occurs, and the structural information is the measure that distinguish the order from disorder in a system and that extracts the knowledge tree of a complex system. The current project brings the problems of searching and data analysis together. We will establish the information theoretical foundation for both network searching and big data analysis. We will show that one-dimensional structural information minimization is the principle for structuring unstructured data and for clearing noisy data, and that two-dimensional structural information minimization is the principle for personalised smart searching, and for discovering the knowledge and laws of big data.

网络中的一个搜索是识别一个查询的自然模块，以PageRank为基础的当代搜索引擎已经成为21世纪的一个新技术，也引导出一些新现象。然而，搜索的理论基础在现有的文献中基本上是缺失的，这已经成为网络算法和下一代搜索引擎的一个重大挑战。一个密切相关的新现象是可以获得各种各样的大数据，但是这些大数据通常是非结构化和有噪音的，从这些大数据中发现规律和知识也是一个巨大的挑战。李昂生和潘祎诚定义了一个物理系统G的K维结构信息的度量，它是确定和解码系统中随机游走可达节点的K维编码所需要的最小比特数。结构信息给出一个原理可以区分系统中的有序部分和无序部分，从而可以提取出一个复杂系统的知识树。本项目将搜索和数据分析放在一起，我们将同时建立网络搜索和大数据分析的信息论基础，我们将证明一维结构信息极小化是结构化无结构数据和清理噪音数据的原理，二维结构信息极小化是个性化搜索和从大数据中发现知识和规律的原理。

项目摘要

结构信息论通过数学地实现了“层谱抽象”这个概念，从而把层谱抽象作为一个策略，即编码策略，度量了一个层谱抽象策略所消除的嵌入在一个复杂系统中的不确定性的量，即一个策略对一个信息系统的解码信息的度量，定义了其相伴的概念，系统的结构熵，压缩信息、解码信息等新度量, 建立了这些概念的相互关系的原理和理论。结构信息使得我们可以度量和通过一个策略（层谱抽象策略）来解码嵌入在一个系统中的信息（即消除嵌入在系统中的不确定性），以及研究这种解码信息的作用的理论。.结构信息作为一个新的信息理论支持很多信息处理的问题。本项目应用结构信息论研究基因组拓扑结构域预测，隐私保护问题和新算法问题，取得如下成果：.（1）开发了一套DNA折叠预测工具deDoc，这是第一个有原理的成体系的一整套DNA折叠的层谱结构，比较成功地解决了DNA折叠这一生命科学重大问题。（成果：Nature Communications 2018）.（2）开发了一个隐私保护方法，为建立隐私保护的结构信息原理与指标奠定了基础。（成果：NeurIPS 2019）.（3）证明了网络的算法小世界现象（成果：Royal Soc. Proceedings A 2019）

项目成果

DOI：{{i.doi}}

发表时间：{{i.publish_year}}

暂无此项成果

数据更新时间：2023-05-31

其他相关文献

DOI：

发表时间：

DOI：10.13197/j.eeev.2019.05.95.fuwq.009

发表时间：2019

DOI：

发表时间：

DOI：10.14050/j.cnki.1672-9250.2017.02.014

发表时间：2017

DOI：10.6041/j.issn.1000-1298.2022.07.022

发表时间：2022

李昂生的其他基金

批准号：69973048

批准年份：1999

资助金额：10.00

项目类别：面上项目

相似国自然基金

大数据环境下基于社交网络的图像搜索技术研究

批准号：61502155

批准年份：2015

负责人：严灵毓

学科分类：F06

资助金额：20.00

项目类别：青年科学基金项目

社交网络中企业创新与用户偏好的时空跨媒体大数据的搜索与推荐研究

批准号：61902037

批准年份：2019

负责人：李雅文

学科分类：F0211

资助金额：30.00

项目类别：青年科学基金项目

基于信息论的事件重要性分析与在大数据分析中的应用

批准号：61771283

批准年份：2017

负责人：樊平毅

学科分类：F0113

资助金额：16.00

项目类别：面上项目

支持大数据分析的优化理论与方法研究

批准号：11631013

批准年份：2016

负责人：戴彧虹

学科分类：A0405

资助金额：230.00

项目类别：重点项目

网络搜索与大数据分析的结构信息途径研究

{{i.achievement_title}}

暂无此项成果

其他相关文献

基于国产化替代环境下高校计算机教学的研究

基于被动变阻尼装置高层结构风振控制效果对比分析

基于LS-SVM香梨可溶性糖的近红外光谱快速检测

基于综合治理和水文模型的广西县域石漠化小流域区划研究

基于改进LinkNet的寒旱区遥感图像河流识别方法

李昂生的其他基金

计算、枚举和能行逼近研究

相似国自然基金