Hypothesis testing of mean vectors is one of the most fundamental problems in high-dimensional data analysis and has a wide range of applications in the fields of economics and finance, biomedicine and genes. Current researches are mostly limited to some specific problems such as the two-sample problem and analysis of variance, and most of the methods are only applicable to less correlated high-dimensional data. This project focuses on testing the general linear hypothesis of high-dimensional data, where the hypothesis assumes the mean vectors of several populations satisfy some linear constraints. The hypothesis studied in this project is more general than the above specific problems, and the expected results can be directly applied to those problems. The new proposed testing procedure utilizes a moment-matched chi-squared approximation method to approximate the null distribution of the test statistic, so that it is suitable for both low-dimensional and high-dimensional data, and performs well when either the variables are less correlated or highly dependent. Besides, the new method does not rely on excessive assumptions, and has great practical value as it works well for non-normal, heteroscedastic, and ultra-high dimensional data. In this project, we will also consider computational optimization for the test statistic and related estimators, as well as the design of an algorithm for updating the testing result after adding or deleting samples or variables, which not only improves the efficiency of the testing procedure, but also provides new ideas for using high-dimensional tests in big data.
关于均值的假设检验是高维数据分析中最基本的问题之一,在经济金融、生物医药和基因等领域有广泛的应用。当前研究多局限于两总体同均值检验、方差分析等特定问题,且多数方法仅适用于变量之间弱相关的高维数据。本项目拟解决的一般线性假设检验问题关注于检验多个高维总体的均值向量是否满足一些线性约束条件,比上述特定检验问题更具一般性,预期成果也可以直接应用到上述问题当中。拟提出的新检验方法利用基于矩匹配的卡方近似来逼近检验统计量的零分布,可同时适用于低维和高维数据,且对变量之间弱相关和强相关的情形均有较好表现。此外,新方法并不依赖于过强的假设,对非正态、异方差和超高维的复杂数据也表现良好,有很强的实用性。本项目还考虑了统计量和参数估计的计算优化以及增删样本或变量后更新检验结果的算法设计,不仅可以提高检验方法的效率,还为高维检验方法在大数据中的推广使用提供新的思路。
关于均值的假设检验是高维数据分析中最基本的问题之一,在经济金融、生物医药和基因等领域有广泛的应用。已有研究多局限于两总体同均值检验、方差分析等特定问题,且多数方法仅适用于变量之间弱相关的高维数据。本项目研究的一般线性假设检验问题关注于检验多个高维总体的均值向量是否满足一些线性约束条件,比上述特定检验问题更具一般性,所建立的方法也可以直接应用到上述问题当中。本项目提出的新检验方法利用基于矩匹配的卡方近似来逼近检验统计量的零分布,可同时适用于低维和高维数据,且在变量之间弱相关和强相关的情形下均有较好表现。此外,新方法并不依赖于过强的假设,对非正态、异方差和超高维的复杂数据也表现良好,有很强的实用性。本项目还研究了基于二阶矩匹配的卡方近似方法的理论误差界,并对多个高维均值假设检验问题建立了基于三阶矩匹配的卡方近似方法。
{{i.achievement_title}}
数据更新时间:2023-05-31
粗颗粒土的静止土压力系数非线性分析与计算方法
黄河流域水资源利用时空演变特征及驱动要素
基于 Kronecker 压缩感知的宽带 MIMO 雷达高分辨三维成像
基于LASSO-SVMR模型城市生活需水量的预测
小跨高比钢板- 混凝土组合连梁抗剪承载力计算方法研究
高维数据的假设检验
异方差阵下的高维多样本均值检验
基于非正态假设下的高维协方差矩阵检验问题研究
基于充分降维方法的高维数据假设检验问题的研究