Modeling Moscow Institute of Physics and Technology

# Conference

Workshop “Frontiers of High Dimensional Statistics, Optimization, and Econometrics”

# Описание мероприятия

Workshop is organized by LSA HSE and PreMoLab MIPT.

Moscow February 26-27

**HSE, Shabolovskaya 26, 3211**

**Please register by email: erichkova@hse.ru**

Organizers: V. Konakov, E. Mammen, V. Spokoiny

Local organizers: M. Chuyashkin, Yu. Dorn, V.Panov, E. Richkova

Program: Thursday Feb. 26

**09:50 – 10:00 Opening**

**10:00 – 11:00 Michael Neumann**

**Bootstrap for degenerate U - and V -statistics**

Many important test statistics can be rewritten as or approximated by *U*- or *V*-statistics. Exmaples are the Cramer-von Mises statistic or numerous other statistics of $L_2$-type. The case of degenerate $V$-statistics is of particular interest since this corresponds to the above test statistics under usual null hypotheses.

In a first part, I briefly review the asymptotic theory of *U*- and *V*-statistics of independent and weakly dependent random variables. Then I introduce two methods of bootstrapping degenerate $V$-statistics and sketch proofs of consistency.

**11:00 – 11:30 Break**

**11:30 – 12:00 Mayya Zhilova **

*Simultaneous likelihood-based bootstrap confidence sets for misspecified models*

We consider a multiplier bootstrap procedure for construction of likelihood-based confidence sets for finite samples and a possible model misspecification. The approach is also generalized for the case of simultaneous confidence sets, when the number of parametric models is allowed to be exponentially large w.r.t. the sample size. Theoretical results justify the bootstrap consistency for a fixed sample size. Good numerical performance of the bootstrap procedure is demonstrated with experiments for misspecified regression models.

**12:00 – 12:30 Andzhey Kozuk **

*Linear hypothesis testing in the problem with weak instrumental variables.*

In this work a new method for testing linear hypothesis on the target parameter is developed. To compute critical values for a log-likelihood ratio statistics we use miltiplier bootstrap procedure. We justify this method and then use it for a regression model with instrumental varibles included. Numerical experiments show good power properties of proposed method under both weak and strong instruments identification.

**12:30 – 13:00 Denis Belomesty **

*SPECTRAL ESTIMATION FOR SPARSE MULTI-DIMENSIONAL LÉVY PROCESSES*

In this talk we present some recent results on the estimation of multidimensional Levy processes observed at random times.

In particular, the problem of statistical inference for sparse diffusion matrices under the presence of infinite active small jumps well be considered.

We show how to estimate such matrices in a robust way and derive convergence rates which turn out to be optimal in minimax sense.

**13:00 – 14:30 Lunch**

**14:30 – 15:00 Karl Gregori **

**A Smooth Block Bootstrap for Statistical Functionals and Time Series**

Smooth bootstrap methods have received considerable attention in the case of independent identically distributed data, while smooth bootstrap methods for time series have received comparatively little. In a smooth bootstrap, the resampled values are perturbed by independent realizations from some kernel density. This has the effect of smoothing the empirical distribution from which the bootstrap samples are drawn, which improves the estimation of sampling distributions of statistics with distributions that depend on the population density. A smooth block bootstrap procedure is proposed and its consistency is established for a variety of statistical functionals on the marginal distribution of a stationary time series. (This is a joint work with Soumendra Lahiri and Dan Nordman)

**15:00 – 15:30 Roland Hildebrand **

**Rank 1 generated spectrahedral cones**

Many nonconvex optimization problems can be written as conic programs over a linear section of the semi-definite matrix cone with an additional rank 1 constraint. By dropping this rank 1 constraint, one obtains a semi-definite relaxation of the original problem. If the linear section is such that every extreme ray is a rank 1 matrix, then the relaxation will be exact. We shall call such spectrahedral cones "rank 1 generated". We describe the basic properties of rank 1 generated spectrahedral cones and provide some ways to construct such cones.

**15:30 – 16:00 Stefan Richter **

**Cross validation for linear locally stationary processes**

Locally stationary processes behave in an approximately stationary way over short periods of time.

For specific models such as the time varying autoregressive process

it is possible to describe the evolution of the entire process with a finite set of parameter curves.

We assume that the locally stationary time series model is known, but the parameter curves are not.

For estimation of the curves we use nonparametric kernel-type maximum likelihood estimates which depend on a smoothing parameter (bandwidth).

To the best of our knowledge the theoretical behaviour of data adaptive bandwidth choice methods for such estimates

has not been considered in the literature. We propose an adaptive bandwidth choice via cross validation,

and show that it is asymptotically optimal in a specific sense with respect to a Kullback-Leibler-type distance measure.

**16:00 – 16:30 Break**

**16:30 – 17:00 Maxim Panov **

**High-dimensional Bernstein - von Mises phenomenon**

The classical results of an asymptotic normality of the posterior distribution (Bernstein - von Mises theorem) will be reconsidered in a non-classical setup allowing finite samples, high parameter dimension and model misspecification. In the case of a finite dimensional nuisance parameter we obtain an upper bound on the error of Gaussian approximation of the posterior distribution for the target parameter which is explicit in the dimension of the nuisance and target parameters. This helps to identify the so called critical dimension of the full parameter for which the BvM result is applicable. The results are extended to the case of infinite dimensional parameter families from a Sobolev class.

**17:00 – 17:30 Yuri Maximov **

**Fast rates for multi-class classification.**

In the talk we propose a new generalization error bounds for multi-class classification. A special attention is devoted to semi-supervised learning setup. The talk is based on joint work with M. R. Amini and Z. Harchaoui.

**17:30 – 18:00 Alexandra Suvorikova/ Nasar Buzun **

**Multi-scale change point detection**

The general change-point problem arises in many fields of research, e.g bioinformatics, econometrics, computer science and may others. It is the cornerstone of detection of homogeneous regions in observed data. This work presents a new approach of change-point detection, based on the idea of data analysis in a multi-scale rolling window. Decision about existence of a change point is made using critical values computed from the data: threshold tuning is carried out using \textit{multiplier bootstrap} procedure for multiple testing.

This allows algorithm to be applied even if the nature of random process under consideration is not known.

**19:00 Dinner**

Program: Friday Feb. 27

**10:00 – 10:45 Yuriy Nesterov **

**Complexity bounds for primal-dual methods minimizing the model of objective function**

We provide Frank-Wolfe (Conditional Gradients) method with a convergence analysis allowing to approach a primal-dual solution of convex optimization problem with composite objective function. Additional properties of complementary part of the objective (strong convexity) significantly accelerate the scheme. We also justify a new variant of this method, which can be seen as a trust-region scheme applying the linear model of objective function. Our analysis works also for a quadratic model, allowing to justify the global rate of convergence for a new second-order method. To the best of our knowledge, this is the first trust-region scheme supported by the worst-case complexity bound.

**10:45 – 11:30 Boris Polyak **

**Quadratic transformations: convexity vs nonconvexity**

Quadratic problems are common in optimization, uncertainty analysis, physical applications. In general they are nonconvex, nevertheless sometimes they have hidden convexity structure. There are several results where this structure can be discovered; examples are Brickman theorem on convexity of 2D image of a sphere or the theorem on convexity of nonlinear image of a small ball. We address slightly different problem formulation: given a quadratic transformation, recognize convexity or nonconvexity of the image of the unit ball under this transformation. Some convexity/nonconvexity specifications are provided; algorithms for sampling boundary points of the image are developed.

**11:30 – 12:00 Break**

**12:00 – 12:30 Ekaterina Krymova **

**Stochastic online gradient-free methods with inexact oracle (convex case vs strictly convex case; one point oracle vs two point oracle)**

In the talk we plan to lead a survey on stochastic gradient-free methods in online context and formulate some new results. The main contribution is a consideration of proper randomization depending on the prox-structure of the problem. This is a joint work with Alexander Gasnikov , Ilnura Usmanova and Fedorenko Fedor.

Literature

1. John C. Duchi, Michael I. Jordan, Martin J. Wainwright, Andre Wibisono Optimal rates for zero-order convex optimization: the power of two function evaluations // arXiv:1312.2139 item 2.2, 3

2. Abraham D. Flaxman, Adam Tauman Kalai, H. Brendan McMahan Online convex optimization in the bandit setting: gradient descent without a gradient // arXiv:cs/0408007 item 3.1

3. Alexander Gasnikov, Pavel Dvurechensky, Yurii Nesterov Stochastic gradient methods with inexact oracle // arXiv:1411.4218 item 4

4. Alexander Gasnikov, Anastasia Lagunovskaya, Ilnura Usmanova, Fedor Fedorenko Gradient-free prox-methods with inexact oracle for stochastic convex optimization problems on a simplex // arXiv:1412.3890

5. Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Proceedings of the Twenty-Third Annual Conference on Learning Theory, pages 28-40, 2010. http://research.microsoft.com/en-us/um/people/oferd/papers/AgarwalDeXi10.pdf

6. Elad Hazan Introduction to Online Convex Optimization Graduate text in machine learning and optimization // http://ocobook.cs.princeton.edu/OCObook.pdf

**12:30 – 13:00 Pavel Dvurechenski **

**Random gradient-free methods for random walk based web page ranking functions learning.**

In this talk we consider a problem of web page relevance to a search query. We are working in the framework called Semi-Supervised PageRank which can account for some properties which are not considered by classical approaches such as PageRank and BrowseRank algorithms. We introduce a graphical parametric model for web pages ranking. The goal is to identify the unknown parameters using the information about page relevance to a number of queries given by some experts (assessors). The resulting problem is formulated as an optimization one. Due to hidden huge dimension of the last problem we develop random gradient-free methods with oracle error to solve it. We prove the convergence theorem and give the number of arithmetic operations which is needed to solve it with a given accuracy. This is a joint work with A. Gasnikov and M. Zhukovskii.

**13:00 – 14:30 Lunch**

**14:30 – 15:00 Evgeny Burnaev**

*Efficiency of regression conformalization*

Conformal prediction is a method of producing prediction sets that can be applied on top of a wide

range of prediction algorithms. The method has a guaranteed coverage probability under the stan-

dard IID assumption regardless of whether the assumptions (often considerably more restrictive)

of the underlying algorithm are satisfied. However, for the method to be really useful it is desir-

able that in the case where the assumptions of the underlying algorithm are satisfied, the conformal

predictor loses little in efficiency as compared with the underlying algorithm (whereas being a con-

formal predictor, it has the stronger guarantee of validity). In this work we explore the degree to

which this additional requirement of efficiency is satisfied in the case of Bayesian ridge regression;

we find that asymptotically conformal prediction sets differ little from ridge regression prediction

intervals when the standard Bayesian assumptions are satisfied.

**15:00 – 15:30 Anna Kozhina **

**Stability of Ito diffusions transition densities with respect to coefficient perturbations**

We consider two Ito diffusions with «close» coefficients. Using the parametrix series expansions

we prove a non-uniform Gaussian type upper bound for the difference between their transition densities .

We also give an example showing that our conditions can not be weaken.

**15:30 – 16:00 Vladimir Panov **

**Statistical inference for generalized Ornstein - Uhlenbeck processes**

In this talk, we consider a special case of the generalized Ornstein-Uhlenbeck process with invariant distribution given by the exponential functional of a Levy process. We propose a new approach for estimation of the characteristics of the Levy process, which is based on some new results concerning the Mellin transform. Taking into account mixing properties of this process, we show that the rates of convergence are optimal. This is joint work with Denis Belomestny.

**16:00 – 16:30 Break**

**16:30 – 17:00 Yuri Dorn**

**First order methods for transportation problems. Complexity analysis.**

We consider few algorithms for equilibrium transportation problems and give full complexity analysis. We also propose new algorithm for this problem.

**17:00 – 17:30 Kirill Efimov **

**Smallest accepted method for subset selection**

Within the penalized model selection approach the selected model is defined by minimization of penalized empirical risk. Such procedures enjoy nice algorithmic properties especially if both the empirical risk and the penalty function are convex functions of the parameter. A number of “oracle” risk bounds for such methods are available. However, the choice of penalty is critical and there is no unified approach for fixing this penalty. This talk presents another method of model selection based on a smallest accepted rule: a model is accepted if it is not rejected against any larger model . The final choice is the simplest accepted model. Model comparison is being done by multiple testing with critical values tuned by a bootstrap procedure. We present some oracle results on subset and parameter estimation for the proposed method.

**17:30 – 18:00 Egor Klochkov**

**Semi-parametric estimation in errors-in-variables regression**

To estimate the target parameter we use semi-parametric approach with unknown design considered as nuisance parameter. We consider approach that allows inhomogeneous errors with unknown distribution. The method works under some design smoothness assumptions.

Дата | Дополнительная информация |
---|---|

26.02.2015 |