Statistiques
An elementary analysis of ridge regression with random design
Comptes Rendus. Mathématique, Tome 360 (2022) no. G9, pp. 1055-1063.

Voir la notice de l'article provenant de la source Numdam

In this note, we provide an elementary analysis of the prediction error of ridge regression with random design. The proof is short and self-contained. In particular, it bypasses the use of Rudelson’s deviation inequality for covariance matrices, through a combination of exchangeability arguments, matrix perturbation and operator convexity.

Reçu le :
Accepté le :
Publié le :
DOI : 10.5802/crmath.367
Classification : 62J07, 62G08, 60B20

Mourtada, Jaouad 1 ; Rosasco, Lorenzo 2, 3

1 CREST, ENSAE, Palaiseau, France
2 Universita’ di Genova & Istituto Italiano di Tecnologia, Genova, Italy
3 Center for Brains, Minds and Machines, MIT, Cambridge, United States
Licence : CC-BY 4.0
Droits d'auteur : Les auteurs conservent leurs droits
@article{CRMATH_2022__360_G9_1055_0,
     author = {Mourtada, Jaouad and Rosasco, Lorenzo},
     title = {An elementary analysis of ridge regression with random design},
     journal = {Comptes Rendus. Math\'ematique},
     pages = {1055--1063},
     publisher = {Acad\'emie des sciences, Paris},
     volume = {360},
     number = {G9},
     year = {2022},
     doi = {10.5802/crmath.367},
     language = {en},
     url = {https://geodesic-test.mathdoc.fr/articles/10.5802/crmath.367/}
}
TY  - JOUR
AU  - Mourtada, Jaouad
AU  - Rosasco, Lorenzo
TI  - An elementary analysis of ridge regression with random design
JO  - Comptes Rendus. Mathématique
PY  - 2022
SP  - 1055
EP  - 1063
VL  - 360
IS  - G9
PB  - Académie des sciences, Paris
UR  - https://geodesic-test.mathdoc.fr/articles/10.5802/crmath.367/
DO  - 10.5802/crmath.367
LA  - en
ID  - CRMATH_2022__360_G9_1055_0
ER  - 
%0 Journal Article
%A Mourtada, Jaouad
%A Rosasco, Lorenzo
%T An elementary analysis of ridge regression with random design
%J Comptes Rendus. Mathématique
%D 2022
%P 1055-1063
%V 360
%N G9
%I Académie des sciences, Paris
%U https://geodesic-test.mathdoc.fr/articles/10.5802/crmath.367/
%R 10.5802/crmath.367
%G en
%F CRMATH_2022__360_G9_1055_0
Mourtada, Jaouad; Rosasco, Lorenzo. An elementary analysis of ridge regression with random design. Comptes Rendus. Mathématique, Tome 360 (2022) no. G9, pp. 1055-1063. doi : 10.5802/crmath.367. https://geodesic-test.mathdoc.fr/articles/10.5802/crmath.367/

[1] Ahlswede, Rudolf; Winter, Andreas Strong converse for identification via quantum channels, IEEE Trans. Inf. Theory, Volume 48 (2002) no. 3, pp. 569-579 | DOI | Zbl | MR

[2] Aronszajn, Nachman Theory of reproducing kernels, Trans. Am. Math. Soc., Volume 68 (1950) no. 3, pp. 337-404 | Zbl | MR | DOI

[3] Bach, Francis; Moulines, Éric Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Advances in Neural Information Processing Systems 26 (2013), pp. 773-781

[4] Bartlett, Peter L.; Bousquet, Olivier; Mendelson, Shahar Local rademacher complexities, Ann. Stat., Volume 33 (2005) no. 4, pp. 1497-1537 | Zbl | MR

[5] Bartlett, Peter L.; Long, Philip M.; Lugosi, Gábor; Tsigler, Alexander Benign overfitting in linear regression, Proc. Natl. Acad. Sci. USA, Volume 117 (2020) no. 48, pp. 30063-30070 | Zbl | MR | DOI

[6] Blanchard, Gilles; Mücke, Nicole Optimal Rates for Regularization of Statistical Inverse Learning Problems, Found. Comput. Math., Volume 18 (2018) no. 4, pp. 971-1013 | Zbl | MR | DOI

[7] Caponnetto, Andrea; De Vito, Ernesto Optimal rates for the regularized least-squares algorithm, Found. Comput. Math., Volume 7 (2007) no. 3, pp. 331-368 | Zbl | MR | DOI

[8] Carlen, Eric Trace inequalities and quantum entropy: an introductory course, Entropy and the quantum (Contemporary Mathematics), Volume 529, American Mathematical Society, 2010, pp. 73-140 | MR | DOI | Zbl

[9] Catoni, Olivier PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design (2016) (https://arxiv.org/abs/1603.05229)

[10] De Vito, Ernesto; Caponnetto, Andrea; Rosasco, Lorenzo Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math., Volume 5 (2005) no. 1, pp. 59-85 | MR | DOI

[11] Dieuleveut, Aymeric; Bach, Francis Nonparametric stochastic approximation with large step-sizes, Ann. Stat., Volume 44 (2016) no. 4, pp. 1363-1399 | Zbl | MR

[12] van de Geer, Sara Empirical Processes in M-estimation, Cambridge University Press, 1999

[13] Gonen, Alon; Shalev-Shwartz, Shai Average stability is invariant to data preconditioning. Implications to exp-concave empirical risk minimization, J. Mach. Learn. Res., Volume 18 (2018) no. 222, pp. 1-13 | Zbl | MR

[14] Hoerl, Arthur E. Application of Ridge Analysis to Regression Problems, Chem. Eng. Prog., Volume 58 (1962), pp. 54-59

[15] Hsu, Daniel; Kakade, Sham M.; Zhang, Tong Random design analysis of Ridge regression, Found. Comput. Math., Volume 14 (2014) no. 3, pp. 569-600 | Zbl | MR

[16] Jain, Prateek; Kakade, Sham M.; Kidambi, Rahul; Netrapalli, Praneeth; Sidford, Aaron Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification, J. Mach. Learn. Res., Volume 18 (2018) no. 223, pp. 1-42 | Zbl | MR

[17] Koltchinskii, Vladimir Local Rademacher complexities and oracle inequalities in risk minimization, Ann. Stat., Volume 34 (2006) no. 6, pp. 2593-2656 | Zbl | MR

[18] Koltchinskii, Vladimir Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. École d’Été de Probabilités de Saint-Flour, Lecture Notes in Mathematics, 2033, Springer, 2011

[19] Koltchinskii, Vladimir; Lounici, Karim Concentration inequalities and moment bounds for sample covariance operators, Bernoulli, Volume 23 (2017) no. 1, pp. 110-133 | Zbl | MR

[20] Koren, Tomer; Levy, Kfir Fast rates for exp-concave empirical risk minimization, Advances in Neural Information Processing Systems 28 (2015), pp. 1477-1485

[21] Lecué, Guillaume; Mendelson, Shahar Performance of empirical risk minimization in linear aggregation, Bernoulli, Volume 22 (2016) no. 3, pp. 1520-1534 | Zbl | MR

[22] Lust-Piquard, Françoise; Pisier, Gilles Non commutative Khintchine and Paley inequalities, Ark. Mat., Volume 29 (1991) no. 1, pp. 241-260 | Zbl | DOI

[23] Massart, Pascal Some applications of concentration inequalities to statistics, Ann. Fac. Sci. Toulouse, Math., Volume 9 (2000) no. 2, pp. 245-303 | Zbl | mathdoc-id | MR | DOI

[24] Mendelson, Shahar On the performance of kernel classes, J. Mach. Learn. Res., Volume 4 (2003), pp. 759-771 | MR

[25] Mourtada, Jaouad Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices, Ann. Stat., Volume 50 (2022) no. 4, pp. 2157-2178 | MR

[26] Mourtada, Jaouad; Gaïffas, Stéphane An improper estimator with optimal excess risk in misspecified density estimation and logistic regression, J. Mach. Learn. Res., Volume 23 (2022) no. 31, pp. 1-49 | MR

[27] Oliveira, Roberto I. Sums of random Hermitian matrices and an inequality by Rudelson, Electron. Commun. Probab., Volume 15 (2010), pp. 203-212 | Zbl | MR

[28] Oliveira, Roberto I. The lower tail of random quadratic forms with applications to ordinary least squares, Probab. Theory Relat. Fields, Volume 166 (2016) no. 3, pp. 1175-1194 | Zbl | MR | DOI

[29] Rosasco, Lorenzo; Villa, Silvia Learning with incremental iterative regularization, Advances in Neural Information Processing Systems 28 (2015), pp. 1630-1638

[30] Rudelson, Mark Random vectors in the isotropic position, J. Funct. Anal., Volume 164 (1999) no. 1, pp. 60-72 | Zbl | MR | DOI

[31] Smale, Steve; Zhou, Ding-Xuan Shannon sampling II: Connections to learning theory, Appl. Comput. Harmon. Anal., Volume 19 (2005) no. 3, pp. 285-302 | Zbl | MR | DOI

[32] Smale, Steve; Zhou, Ding-Xuan Learning theory estimates via integral operators and their approximations, Constr. Approx., Volume 26 (2007) no. 2, pp. 153-172 | Zbl | MR | DOI

[33] Steinwart, Ingo; Christmann, Andreas Support Vector Machines, Springer, 2008

[34] Steinwart, Ingo; Hush, Don; Scovel, Clint Optimal rates for regularized least squares regression, Proc. 22nd Conference on Learning Theory (2009), pp. 79-93

[35] Tikhonov, Andrey N. Solution of incorrectly formulated problems and the regularization method, Sov. Math., Dokl., Volume 4 (1963), pp. 1035-1038 | Zbl

[36] Tropp, Joel A. User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Volume 12 (2012) no. 4, pp. 389-434 | Zbl | MR | DOI

[37] Tsigler, Alexander; Bartlett, Peter L. Benign overfitting in ridge regression (2020) (https://arxiv.org/abs/2009.14286)

[38] Vaškevičius, Tomas; Zhivotovskiy, Nikita Suboptimality of constrained least squares and improvements via non-linear predictors (2021) (https://arxiv.org/abs/2009.09304)

[39] Vershynin, Roman Introduction to the non-asymptotic analysis of random matrices, Cambridge University Press (2012), pp. 210-268

[40] Vito, Ernesto De; Rosasco, Lorenzo; Caponnetto, Andrea; Giovannini, Umberto De; Odone, Francesca Learning from examples as an inverse problem, J. Mach. Learn. Res., Volume 6 (2005) no. 5, pp. 883-904 | Zbl | MR

[41] Wahba, Grace Spline Models for Observational Data, 59, Society for Industrial and Applied Mathematics, 1990 | DOI

[42] Yao, Yuan; Rosasco, Lorenzo; Caponnetto, Andrea On early stopping in gradient descent learning, Constr. Approx., Volume 26 (2007) no. 2, pp. 289-315 | Zbl | MR

[43] Zhivotovskiy, Nikita Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle (2021) (https://arxiv.org/abs/2108.08198)

Cité par Sources :