Model selection is the bottleneck of and the key to theoretical research and practical applications of large-scale kernel methods. Due to lacking the systematic and in-depth model selection theory and being constrained by the computational overhead of large-scale datasets and models, most of current large-scale kernel methods select the kernel functions and set model parameters by the rule of thumb or according to the limited computational resources. To address these issues, we propose the stochastic approach to model selection of large-scale kernel methods. Our proposed approach subsamples large-scale datasets randomly and makes divide-and-conquer strategy on the sampled subsets, maps the original problems into randomized hypothesis spaces, uses statistical analysis and randomized optimization to select kernel functions and set model parameters, and obtains the optimal kernel functions and model parameters which are consistent with those of the classical model selection methods on the whole datasets. We establish the stochastic operator theory and randomized hypothesis spaces, which deepen the fundamental theory of large-scale kernel methods. Further, we found the stochastic model selection theory, which includes the generalization of randomized hypotheses, consistency and stability of stochastic model selection. Finally, we design and implement stochastic model selection criteria and algorithms of statistically unbiasedness and computationally effectiveness. The project can present systematic and in-depth theory, statistically unbiased methods and computationally effective tools for the model selection of large-scale kernel methods, which develops the learning theory of large-scale kernel methods and provides learning methods of large-scale kernel methods for both batch and streaming data.
大规模核方法的模型选择是大规模核方法理论研究与实际应用的瓶颈和关键。由于缺乏系统深入的模型选择理论并囿于大规模数据和模型的计算开销,现有大规模核方法的模型选择大多先验地或根据计算资源来选择核函数并设置模型参数模。针对这一现状,提出大规模核方法模型选择的随机方法,该方法随机采样并分治大规模样本,将原问题映射到随机假设空间,应用统计分析和随机优化来选择核函数、设置模型参数,并保证得到与全体样本上经典模型选择方法相一致的结果。通过建立随机算子理论和随机假设空间来深化大规模核方法的基础理论;在此基础上,研究随机假设的泛化性、随机模型选择的一致性和稳定性,建立随机模型选择理论;最后,设计并实现统计无偏且计算高效的随机模型选择准则和算法。可给出大规模核方法模型选择的系统深入的理论、统计无偏的方法和计算有效的工具,发展大规模核方法的学习理论,提供同时适用于批量数据和流式数据的大规模核方法的学习方法。
大规模核方法模型选择尚没有坚实的理论途径,基于数据或实验的方法计算复杂度至少是数据规模平方级的。针对这一问题,提出大规模核方法模型选择的随机方法。研究了离线模型选择的随机映射方法和在线模型选择的增量素描方法,均可有效解决存在的问题,并为大规模核方法、机器学习和模型选择奠定了可靠的理论基础,为设计并实现大规模核方法学习、模型选择和评价提供了对线线性复杂度的算法,发展在大规划核方法的理论、方法与算法。
{{i.achievement_title}}
数据更新时间:2023-05-31
论大数据环境对情报学发展的影响
监管的非对称性、盈余管理模式选择与证监会执法效率?
粗颗粒土的静止土压力系数非线性分析与计算方法
基于SSVEP 直接脑控机器人方向和速度研究
内点最大化与冗余点控制的小型无人机遥感图像配准
大规模核方法积分算子谱分析的模型选择方法
机器学习核方法模型选择与组合的核矩阵近似分析方法
基于随机Petri网的大规模网络服务系统行为适配模型及方法
基于序优化的大规模特征选择方法