With explosive growth of the amount of data in the bio-medical and health-care service and other fields, many large-scale observational databases which contain lots of medical information variables come into being. How to deduce the local causal relationship between variables from the data has become the basic goal and primary interests in the medical research. With probability expression and specific causal hypothesis to evaluate the causal effects, causal graphical models can present intuitively the correlation between variables through the graph to identify the causal relationship between the local variables. However, the latest causal graphical models cannot be reliably extended to the case containing a large number of variables and cannot be able to completely evaluate the causal effects of lots of variables within a reasonable period of time. Therefore, this study, based on the causal graphical learning algorithm, improves the constraints-based and scores-based learning strategies to find out the graphical learning optimal algorithm for the vast number of variables which adopts the strategy of the minimum value of casual effects and improves the methods of causal effects estimation to improve the accuracy of causal effects estimation. Based on several public and large observational medical databases, it is significant and valuable to build casual graphical models from lots of medical information variables and put the models into the exploration of causes and effects of practical medical problems in medicine.
现今大数据时代,生物医学、医疗服务业领域中的数据量也呈现爆炸式增长,产生了很多包含大量医学信息变量的大型观察性数据库。如何从数据中去推断变量之间的局部因果关系已经成为医学领域研究中最基本的追求和主要兴趣。因果图模型可以通过直观图展示变量间的相关关系,以概率表达及特定的因果假设理论估计变量间的因果效应,进而达到识别变量间局部因果关系的目的。但既往因果图模型学习算法在合理的时间内还不能可靠的扩展到包含大量变量的情况,且无法完全估计大量变量之间的因果效应。因此,本课题在现有因果图学习算法的基础上,改进约束和评分学习策略,研究适用于大量变量的图学习最优算法,采用最小因果效应值策略,改进因果效应估计方法,提高因果效应估计的准确性。基于多个开放性的大型观察性医学数据库,从大量医学信息变量中自动准确地构建因果结构,建立因果图模型并把它用于实际医学问题的因果探索中,具有很重要的理论和应用价值。
针对因果图模型中的有向无圈图(DAGs)学习算法MMHC算法在条件独立性约束和评分搜索两个阶段存在的问题。在MMHC算法的约束构建结构过程中采用蒙特卡洛非重复抽样技术(Monte Carlo Permutation test)对条件独立性进行检验,在评分搜索阶段用随机重启爬山搜索、以及禁忌搜索来替代爬山算法,研究了上述改进算法对DAGs学习效果的影响,选取最优的算法应用到DAGs学习。实验表明改进的算法提高了估计因果关系假设的准确性和构建结构的质量。在中成药上市后的安全性再评价研究的大背景下,针对HIS数据非随机化或者说存在大量混杂偏倚的特点,本文利用改进的算法从提取的HIS观察性数据中学习DAGs,基于因果概率模型的因果理论,分析使用丹红注射液对肝功能指标发生异常变化影响的因果效应。应用的结果表明,本论文的方法能够提供有效且更加可信的因果效应。
{{i.achievement_title}}
数据更新时间:2023-05-31
玉米叶向值的全基因组关联分析
论大数据环境对情报学发展的影响
监管的非对称性、盈余管理模式选择与证监会执法效率?
粗颗粒土的静止土压力系数非线性分析与计算方法
宁南山区植被恢复模式对土壤主要酶活性、微生物多样性及土壤养分的影响
高维不完全观察数据上的因果关系推断及其应用
图模型的结构学习及因果推断
观察性研究中混杂控制的后置/外置辅助变量因果推断模型研究
时间序列的因果关系分析与图模型方法研究