Seminar overview

×

Modal title

Modal content

Spring Semester 2020

Date & Time Speaker Title Location
Wed 22.01.2020
13:15-14:15
Uri Shalit
Technion, Haifa
Abstract
We will present three recent projects where ideas from causal inference have inspired us to find new approaches to problems in machine learning. First, we will see how the idea of negative controls led us to find a way to perform off-policy evaluation in a partially observable Markov decision process (POMDP). We will then present how using the idea of independence of cause and mechanism (ICM) can be used to help learn predictive models that are stable against a-priori unknown distributional shifts. Finally, we will see how thinking in terms of causal graphs led us to a new method for learning computer vision models that can better generalize to unseen object-attribute compositions in images.
ETH-FDS seminar
Causality-inspired machine learning
HG E 1.1
Wed 26.02.2020
16:15-17:15
Julie Josse
CMAP, Ecole Polytechnique, Paris
Abstract
In many application settings, the data have missing features which make data analysis challenging. An abundant literature addresses missing data in an inferential framework: estimating parameters and their variance from incomplete tables. Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data. I will present the consistency of different approaches in prediction for general and linear models, including the use of very simple imputation methods. We analyze further decision trees. These can naturally tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables Reference papers: https://arxiv.org/abs/1902.06931 http://juliejosse.com/wp-content/uploads/2020/01/aistats-2.pdf
ZüKoSt Zürcher Kolloquium über Statistik
Supervised learning with missing values
HG G 19.1
Fri 28.02.2020
16:15-17:00
Kaspar Rufibach
Methods, *Collaboration*, and Outreach Group
Abstract
In this talk, I will illustrate how critical decisions in drug development are typically based on a tiny fraction of the collected data only. As an example, in early development oncology clinical trials, the decision whether to move a molecule to Phase 3 is typically based on response proportions and duration of response in those that respond, while in Phase 3 the primary endpoint will be long-term endpoints such as progression-free (PFS) or overall survival (OS). Effects on response-based short-term endpoints seldom translate in effects on these relevant endpoints. We propose to make decisions not based on intermediate endpoints, but on a prediction of the OS hazard ratio (HR) between data of the new molecule collected in the early phase trial and historical data of the control treatment. This HR prediction is using a multistate model based on the various disease states a patient may go through until death. This yields a gating strategy with improved operating characteristics compared to traditional decision rules in the context of early phase clinical trials. If time permits I will further discuss how the joint distribution of PFS and OS as a function of transition probabilities in a multistate model can be derived. No assumptions on copulae or latent event times are needed and the model is allowed to be non-Markov. From the joint distribution, statistics of interest can then readily be computed. As an example, we provide closed formulas and statistical inference for Pearson's correlation coefficient between PFS and OS. Our proposal complements existing approaches by providing methods of statistical inference while at the same time working within a much more parsimonious modelling framework. The main conclusion of this talk is that multistate models are a useful and underutilized tool in the analysis of clinical trial data. This is joint work with Ulrich Beyer, Jan Beyersmann, Matthias Meller, David Dejardin, and Uli Burger.
ZüKoSt Zürcher Kolloquium über Statistik
Use of multistate models to improve decision-making in clinical trials
HG G 19.1
Mon 02.03.2020
12:15-13:15
Mário A. T. Figueiredo
Instituto Superior Técnico, Universidade de Lisboa, Portugal
Abstract
Linear (and generalized linear) regression (LR) is an old, but a still essential, tool in statistical data science: its goal is to learn to predict a (response) variable as a linear combination of other (explanatory) variables. A central problem in LR is the selection of relevant variables, an important task for several reasons: using fewer variables tends to yield better generalization; identifying the relevant variables may be meaningful (e.g.,identifying which gene expressions are relevant to predict a certain disease). In the past quarter-century, variable selection based on sparsity- inducing regularizers has been a central paradigm, the most famous of these regularizers being the LASSO, which has been intensively studied, extended, and applied. In high-dimensional problems, it is natural to have highly-correlated variables. For example, it is common that several genes are strongly correlated (co-regulated), thus simultaneously relevant as predictors of some response. In this case, sparsity-inducing regularization may fail: it may select an arbitrary subset of correlated variables; it is unstable (the selected variables may change drastically, with only small changes in the data). However, in many applications, it is desirable to identify all the relevant variables, rather than an arbitrary subset thereof, a desideratum for which several approaches have been proposed. This talk will be devoted to a recent class of regularizers for this goal, called ordered weighted l1 (OWL). The key feature of OWL is that it is able to explicitly identify (i.e. cluster) sufficiently-correlated features, without the need to actually compute these correlations. Several theoretical results characterizing OWL will be presented, including connections to the mathematics of economic inequality and with “robustified” regression. Computational and optimization aspects will also be addressed, as well as recent applications in subspace clustering, learning Gaussian graphical models, and deep neural networks.
ETH-FDS seminar
Selection and Clustering of Correlated Variables
HG D 7.1
Mon 09.03.2020
14:15-15:00
Cancelled ! Shahar Mendelson
Mathematical Sciences Institute, Australian National University
Abstract
Let X be an isotropic random vector in R^n and let X_1,...,X_N be independent copies of X for N>cn. A well known question in Asymptotic Geometric Analysis that has been studied extensively over the last 30 years is whether (and under what conditions) the symmetric convex hull of X_1,...,X_N, absconv(X_1,...,X_N), contains a large canonical convex body. The first breakthrough was in the late 80's, when Gluskin showed that if X is the standard Gaussian vector, then with high probability, absconv(X_1,...,X_N) contains c\sqrt{log(N/n)}B_2^n. Results of a similar flavour (and what "similar flavour" means here will be explained in the talk) are known, for example, when X has iid subgaussian coordinates and when X is log-concave. All these results rely on X exhibiting enough concentration and the arguments break down when X is no longer (very) light-tailed. We present a general approach to the problem that is based on the small-ball method and show that under minimal conditions on X, absconv(X_1,...,X_N) contains the dual of a natural floating body associated with X. This leads to a unified proof of all the previous results and allows one to address the problem when X is heavy-tailed. At the heart of the proof is an idea that is used frequently in the analysis of many statistical recovery procedures: obtaining a high probability, lower bound on the infimum of a nonnegative random process - in this case, on \inf_{t \in T} \|\Gamma t\|_\infty, where T is an appropriate subset of R^n, and \Gamma is the random matrix whose rows are X_1,...,X_N. A joint work with O. Guedon, F. Krahmer, Christian Kummerle and Holger Rauhut.
Research Seminar in Statistics
Cancelled !! On the geometry of random polytopes and the small-ball method (CANCELLED)
HG G 19.1
Fri 13.03.2020
15:15-16:00
Cancelled ! Karsten Borgwardt
Department of Biosystems Science and Engineering, ETH
HG G 19.1
Fri 27.03.2020
14:00-14:45
cencelled! Sven Wang
University of Cambridge
HG G 19.2
Fri 27.03.2020
15:15-16:15
Cancelled! Jakob Runge
Climate Informatics Group
Abstract
Cancelled! The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In disciplines dealing with complex dynamical systems, such as the Earth system, replicated real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated time series data opens up the use of observational causal inference methods beyond the commonly adopted correlation techniques. Observational causal inference is a rapidly growing field with enormous potential to help answer long-standing scientific questions. Unfortunately, many methods are still little known and therefore rarely adopted in Earth system sciences. In this talk I will present a Perspective Paper in Nature Communications which identifies key generic problems and major challenges where causal methods have the potential to advance the state-of-the-art in Earth system sciences. I will also present a novel causal inference benchmark platform that aims to assess the performance of causal inference methods and to help practitioners choose the right method for a particular problem. Some recent methods that address particular challenges of Earth system data will be discussed and illustrated by application examples where causal methods have already led to novel insights in Earth sciences.
ZüKoSt Zürcher Kolloquium über Statistik
Cancelled! Inferring causation from time series with perspectives in Earth system sciences (CANCELLED)
HG G 19.2
Fri 03.04.2020
15:15-16:00
cancelled! Ernst Wit
Università della Svizzera italiana, Lugano
HG G 19.2
Fri 24.04.2020
15:15-16:00
cancelled! Ming Yuan
ETH-ITS und Columbia University
Abstract
Research Seminar in Statistics
cancelld! tba (CANCELLED)
HG G 19.2
Fri 15.05.2020
15:15-16:00
Noel Gorelick
Google/UZH
HG G 19.1
Mon 18.05.2020
10:00-11:00
Ming Yuan
Institute for Theoretical Studies, ETH Zurich and Columbia University
Abstract
We investigate the optimal sample complexity of recovering a general high dimensional smooth and sparse function based on point queries. Our result provides a precise characterization of the potential loss, or lack thereof, in information when restricting to point queries as opposed to the more general linear queries, as well as the benefit of randomization and adaption. In addition, we also developed a general framework for function approximation to mitigate the curse of dimensionality that can also be easily adapted to incorporate further structure such as lower order interactions, leading to sample complexities better than those obtained earlier in the literature.
ETH-FDS seminar
e-Seminar: Information Based Complexity of High Dimensional Sparse Functions
Zoom Lecture
Fri 22.05.2020
15:00-16:00
Thomas Berrett
ENSAE, Paris
Abstract
In this talk I will discuss the problem of testing conditional independence. In standard independence testing problems, given a test statistic measuring the strength of dependence, a simple and practical approach to find critical values and calibrate our tests is to permute the data uniformly at random and recalculate the test statistic in order to simulate its behaviour under the null hypothesis of independence. Unfortunately, this is no longer effective when testing conditional independence, and may result in misleading conclusions. We propose a general new method, the conditional permutation test, for testing the conditional independence of variables X and Y given a potentially high-dimensional random vector Z that may contain confounding factors. The proposed test permutes entries of X non-uniformly, so as to respect the existing dependence between X and Z and thus account for the presence of these confounders. Like the conditional randomization test of Candès et al., our test is useful in the `Model-X’ framework, in which an approximation to the distribution of X|Z is available—while Candès et al.’s test uses this estimate to draw new X values, for our test we use this approximation to design an appropriate non-uniform distribution on permutations of the X values already seen in the true data. We provide an efficient Markov Chain Monte Carlo sampler for the implementation of our method, and establish bounds on the Type I error in terms of the error in the approximation of the conditional distribution of X|Z, finding that, for the worst case test statistic, the inflation in Type I error of the conditional permutation test is no larger than that of the conditional randomization test. We validate these theoretical results with experiments on simulated data and on the Capital Bikeshare data set. This talk is based on joint work with Yi Wang, Rina Foygel Barber and Richard Samworth.
Young Data Science Researcher Seminar Zurich
The conditional permutation test for independence while controlling for confounders
zoom
Fri 29.05.2020
15:00-16:00
Wooseok Ha
University of California, Berkeley
Abstract
Two common approaches in low-rank optimization problems are either working directly with a rank constraint on the matrix variable, or optimizing over a low-rank factorization so that the rank constraint is implicitly ensured. In this talk, we show the natural connection between the rank constrained and factorized approaches. In particular, we show that all second-order stationary points of the factorized objective function correspond to fixed points of projected gradient descent run on the original problem (where the projection step enforces the rank constraint). This result allows us to unify many existing optimization guarantees that have been proved specifically in either the rank-constrained or the factorized setting, and leads to new results for certain settings of the problem. A major tool for handling the low-rank constraint is the local concavity coefficient, which aims to measure the concavity of a rank-constrained space. We demonstrate the application of our results to several concrete low-rank optimization problems arising in the matrix inverse problems.
Young Data Science Researcher Seminar Zurich
An equivalence between critical points for rank constraints versus low-rank factorizations
Zoom Call
Fri 05.06.2020
15:00-16:00
Cong Ma
Princeton University
Abstract
This talk is concerned with noisy matrix completion: given partial and corrupted entries of a large low-rank matrix, how to estimate and infer the underlying matrix? Arguably one of the most popular paradigms to tackle this problem is convex relaxation, which achieves remarkable efficacy in practice. However, the statistical stability guarantees of this approach are still far from optimal in the noisy setting, falling short of explaining its empirical success. Moreover, it is generally very challenging to pin down the distributions of the convex estimator, which presents a major roadblock in assessing the uncertainty, or “confidence”, of the obtained estimates. Our recent work makes progress towards understanding stability and uncertainty quantification for noisy matrix completion. When the rank of the unknown matrix is a constant: (1) we demonstrate that the convex estimator achieves near-optimal estimation errors vis-à-vis random noise; (2) we develop a de-biased estimator that admits accurate distributional characterizations, thus enabling asymptotically optimal inference of the low-rank factors and the entries of the matrix. All of this is enabled by bridging convex relaxation with the nonconvex Burer-Monteiro approach, a seemingly distinct algorithmic paradigm that is provably stable against noise.
Young Data Science Researcher Seminar Zurich
Bridging convex and nonconvex optimization in noisy matrix completion: Stability and uncertainty quantification
Zoom Call
Fri 12.06.2020
16:00-17:00
Mohamed Ndaoud
University of Southern California
Abstract
Datasets with large number of features are becoming increasingly available and important in every field of research and innovation, urging practitioners to develop scalable algorithms with fast convergence. The question of fast convergence in the classical problem of high dimensional linear regression has been extensively studied. Arguably, one of the fastest procedures in practice is Iterative Hard Thresholding (IHT). Still, IHT relies strongly on the knowledge of the true sparsity parameter $s$. In this paper, we present a novel fast procedure for estimation in the high dimensional linear regression model. Taking advantage of the interplay between estimation, support recovery and optimization we achieve both optimal statistical accuracy and fast convergence. The main advantage of our procedure is that it is fully adaptive, making it more practical than state of the art IHT methods. Our procedure achieves optimal statistical accuracy faster than, for instance, classical algorithms for the Lasso. As a consequence, we present a new iterative hard thresholding algorithm for high dimensional linear regression that is minimax optimal, fast and adaptive.
Young Data Science Researcher Seminar Zurich
Fast and adaptive iterative hard thresholding in high dimensional linear regression: A non-convex algorithmic regularization approach
Zoom Call
Fri 19.06.2020
16:30-17:30
Lucy Gao
University of Washington
Abstract
In the multi-view data setting, multiple data sets are collected on a single, common set of observations. For example, we might perform genomic and proteomic assays on a single set of tumour samples, or we might collect relationship data from two online social networks for a single set of users. It is tempting to cluster the observations using all of the data views, in order to fully exploit the available information. However, clustering the observations using all of the data views implicitly assumes that a single underlying clustering of the observations is shared across all data views. If this assumption does not hold, then clustering the observations using all data views may lead to spurious results. We seek to evaluate the assumption that there is some underlying relationship among the clusterings from the different data views, by asking the question: are the clusters within each data view dependent or independent? We develop new tests for answering this question based on multivariate and/or network data views. This is joint work with Jacob Bien (University of Southern California) and Daniela Witten (University of Washington).
Young Data Science Researcher Seminar Zurich
Statistical Inference for Multi-View Clustering
Zoom Call
Fri 26.06.2020
15:00-16:00
Pragya Sur
Harvard University
Abstract
This talk will introduce a precise high-dimensional asymptotic theory for Boosting on separable data, taking both statistical and computational perspectives. We will consider the common modern setting where the number of features p and the sample size n are both large and comparable, and in particular, look at scenarios where the data is separable in an asymptotic sense. On the statistical front, we will provide an exact analysis of the generalization error of Boosting, when the algorithm interpolates the training data and maximizes an empirical L1 margin. The angle between the Boosting solution and the ground truth can be explicitly characterized. On the computational front, we will provide a sharp analysis of the stopping time when Boosting approximately maximizes the empirical L1 margin. Our theory provides several insights into properties of Boosting, for instance, we discover that the larger the margin, the smaller the proportion of active features (with zero initialization). At the heart of our theory lies a detailed study of the maximum L1 margin, using tools from stochastic convex geometry. This is based on joint work with Tengyuan Liang.
Young Data Science Researcher Seminar Zurich
A precise high-dimensional theory for boosting
Zoom Call
Fri 03.07.2020
15:00-16:00
Chi Jin
Princeton University
Abstract
Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the majority of existing theory in reinforcement learning only applies to the setting where a single agent plays against a fixed environment. It remains largely open how to design efficient self-play algorithms in two-player sequential games, especially when it is necessary to manage the exploration/exploitation tradeoff. In this talk, we present the first line of provably efficient self-play algorithms in a basic setting of tabular episodic Markov games. Our algorithms further feature the near-optimal sample complexity---the number of samples required by our algorithms matches the information-theoretic lower bound up to a polynomial factor of the length of each episode.
Young Data Science Researcher Seminar Zurich
Near-Optimal Reinforcement Learning with Self-Play
Zoom Call
Fri 10.07.2020
15:00-16:00
Sven Wang
University of Cambridge
Abstract
The problem of generating random samples of high-dimensional posterior distributions arising from Gaussian process priors is considered. The main results consist of non-asymptotic computational guarantees for Langevin-type MCMC algorithms which scale polynomially in key quantities such as the dimension of the model, the desired precision level, and the number of available statistical measurements. As a direct consequence, it is shown that posterior mean vectors as well as maximum a posteriori (MAP) estimates are computable in polynomial time, with high probability under the distribution of the data. These results are derived in a general high-dimensional non-linear regression setting where posterior measures are not necessarily log-concave, employing a set of local `geometric' assumptions on the parameter space. The theory is illustrated in a representative example from PDEs involving a non-linear inverse problem for the steady-state Schrödinger equation.
Young Data Science Researcher Seminar Zurich
On polynomial-time computation of high-dimensional posterior measures by Langevin-type algorithms
Zoom Call
Fri 17.07.2020
17:00-18:00
Nhat Ho
University of California, Berkeley
Abstract
The growth in scope and complexity of modern data sets presents the field of statistics and data science with numerous inferential and computational challenges, among them how to deal with various forms of heterogeneity. Latent variable models provide a principled approach to modeling heterogeneous collections of data. However, due to the over-parameterization, it has been observed that parameter estimation and latent structures of these models have non-standard statistical and computational behaviors. In this talk, we provide new insights into these behaviors under mixture models, a building block of latent variable models. From the statistical viewpoint, we propose a general framework for studying the convergence rates of parameter estimation in mixture models based on Wasserstein distance. Our study makes explicit the links between model singularities, parameter estimation convergence rates, and the algebraic geometry of the parameter space for mixtures of continuous distributions. Based on the convergence rates of parameter estimation under the Wasserstein distance, we propose a novel Merge-Truncate-Merge procedure to consistently estimate the true number of components in mixture models. From the computational side, we study the non-asymptotic behavior of the Expectation-Maximization (EM) algorithm, which is a stable method, and some high-order and unstable algorithms, including Newton’s method, under the over-specified settings of mixture models in which the likelihood need not be strongly concave, or, equivalently, the Fisher information matrix might be singular. Focusing on the simple setting of a two-component mixture fit with equal mixture weights to a multivariate Gaussian distribution, we demonstrate that EM updates converge to a fixed point at Euclidean distance O((d/n)^{1/4}) from the true parameter after O((n/d)^{1/2}) steps where d is the dimension. On the other hand, Newton’s method can achieve the same statistical accuracy as the EM algorithm in exponentially fewer steps - namely, with the number of iterations being reduced from O((n/d)^{1/2}) to O(log(n/d)). Our result shows that unstable algorithms are preferred to their stable counterparts under this simple setting of mixture models. As a consequence, it rules out the belief that the stability of the algorithm is a necessity for obtaining efficient statistical estimators.
Young Data Science Researcher Seminar Zurich
Statistical and computational perspectives on latent variable models
Zoom Call
Fri 24.07.2020
15:00-16:00
Haoyang Liu
University of Chicago
Abstract
One challenge for high-dimensional data is to analyze the behavior of different algorithms facilitating sparsity, since the statistical task is usually formulated into an optimization problems with a non-convex sparsity constraint. In this talk, we will address the question of algorithmic optimality for the class of iterative thresholding algorithms under the framework of restricted optimality.
Young Data Science Researcher Seminar Zurich
Algorithmic optimality for iterative thresholding algorithms
Zoom Call
JavaScript has been disabled in your browser