Testing for homogeneity of two independent random samples is one of the most important hypothesis testing problems in statistics. The classical nonparametric tests such as the well-known Kolmogorov-Smirnov, Cramér-von Mises and Anderson-Darling tests are built on the empirical distribution functions. Although such tests have many desirable properties such as robustness and “distribution-free” properties in the one-dimensional two-sample framework, they are far less popular in high dimensions, mainly due to the lack of theoretical properties and the curse of dimensionality. This project aims to solve these problems by means of the well-known idea of projection pursuit in the statistics community. The main contents of the research are as follows. First, we will use projections to construct nonparametric measures that characterize the differences between two populations in distributions, and further study population properties of the measures. Second, we will use the measures to formulate test statistics and further study the asymptotic behaviors of the proposed test statistics under the “large sample size, fixed dimension” paradigm. Third, we will study the asymptotic behaviors of the proposed test statistics under the “fixed sample size, large dimension” paradigm and further consider how to correct the new tests in high-dimensional situations to enhance their power performance.
两样本检验是统计学中最重要的问题之一。经典的非参数检验方法,如著名的Kolmogorov-Smirnov检验,Cramér-von Mises检验和Anderson-Darling检验,是基于经验分布函数构造的。尽管这类检验在一维两样本框架下具有很多优良的性质如稳健性质和“distribution-free”性质,但在高维数据下,因理论性质的缺失及所遭遇到的维数诅咒问题,使这类经典的检验方法备受冷落。本课题旨在借助于统计学中著名的投影追踪思想去解决这些问题。主要研究内容有:一是在总体意义下利用投影构造能够刻画两样本分布差异的非参数度量,研究新度量的总体性质;二是在样本意义下,基于新度量构造检验统计量,并在样本量趋于无穷大但维数固定情况下,研究新统计量的大样本性质;三是在样本量固定但维数发散的情况下,研究新统计量的高维性质,并进一步考虑如何在高维情形下对新检验进行修正去提高检验功效。
在众多的假设检验问题当中,增加变量的维数经常导致相应的检验统计量无法很好地控制检验的第一类错误并且会降低检验功效。为了应对变量维数对检验结果的不利影响,本项目提出了基于随机投影和积分变换的检验统计量。将多维随机变量投影为一维随机变量,从而在一定程度上回避了维数问题。本项目研究了非配对数据下的两样本分布比较问题。在适当权函数下,基于随机投影和积分变换的检验统计量形式简单易于计算。利用U和V统计量的理论方法,本项目给出了所提检验统计量相合性和弱收敛性等大样本性质,论证了其渐近收敛速度和变量维数无关。本项目进一步将基于随机投影和积分变换的统计量用以解决高维两样本检验问题。在样本量固定但维数发散的情形下,推导出了所提统计量的渐近分布。以上研究有力地解决了两样本检验中所存在的维数问题,而且所提思路方法具有一定一般性,能够对其他相关的检验问题比如独立性检验,拟合优度检验以及模型方差异质性检验等提供可行的途径。在该青年项目的资助下,项目主持人的研究成果发表在Journal of the American Statistical Association,Statistica Sinica,Journal of Multivariate Analysis,Computational Statistics & Data Analysis,Journal of Statistical Planning and Inference,Science China Mathematics等国内外期刊上。
{{i.achievement_title}}
数据更新时间:2023-05-31
一种基于多层设计空间缩减策略的近似高维优化方法
基于LS-SVM香梨可溶性糖的近红外光谱快速检测
基于文献计量学和社会网络分析的国内高血压病中医学术团队研究
二维FM系统的同时故障检测与控制
二叠纪末生物大灭绝后Skolithos遗迹化石的古环境意义:以豫西和尚沟组为例
高维两总体协方差矩阵相等检验及其探测边界研究
高维数据下多样本均值检验问题的研究
异方差阵下的高维多样本均值检验
Ornstein-Uhlenbeck 型过程多变点检验及两样本检验问题