Seminar overview

×

Modal title

Modal content

Autumn Semester 2019

Date & Time Speaker Title Location
Fri 06.09.2019
15:15-16:00
Stefan Wagner
Stanford University
Abstract
Classical approaches to experimental design assume that intervening on one unit does not affect other units. Recently, however, there has been considerable interest in settings where this non-interference assumption does not hold, e.g., when running experiments on supply-side incentives on a ride-sharing platform or subsidies in an energy marketplace. In this paper, we introduce a new approach to experimental design in large-scale stochastic systems with considerable cross-unit interference, under an assumption that the interference is structured enough that it can be captured using mean-field asymptotics. Our approach enables us to accurately estimate the effect of small changes to system parameters by combining unobstrusive randomization with light-weight modeling, all while remaining in equilibrium. We can then use these estimates to optimize the system by gradient descent. Concretely, we focus on the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and show that our approach enables the platform to optimize p based on perturbations whose magnitude can get vanishingly small in large systems.
Research Seminar in Statistics
Experimenting in Equilibrium
HG G 19.1
Fri 20.09.2019
15:15-16:00
Guillaume Obozinski
Swiss Data Science Center
Abstract
Influence Diagrams (ID) provide a flexible framework to represent discrete stochastic optimization problems, including Markov Decision Process (MDP) and Partially Observable MDP as standard examples. In Influence Diagrams, the random variables are associated with a probabilistic graphical model whose vertices are partitioned into three types : chance, decision and utility vertices. The user has to choose the distribution of the decision vertices conditionally to their parents in order to maximize the expected utility. Leveraging a notion of rooted junction tree that we introduced with collaborators, I will show how the maximum expected utility problem on an influence diagram can be reformulated advantageously as a mixed integer linear problem on the marginal polytope of this junction tree. Then I will propose a way to obtain a good LP relaxation by identifying maximal sets that are invariant under the choice of the policy in the sense of the literature on causality. These LP relaxations allow for more efficient branch-and-bound algorithms but could also have other applications.
Research Seminar in Statistics
Integer programming and linear programming relaxation on the junction tree polytope for Influence Diagrams
HG G 19.1
Fri 27.09.2019
15:15-16:00
Alexandra Carpentier
Universität Magdeburg
Abstract
Adaptive inference - namely adaptive estimation and adaptive confidence statements - is particularly important in high of infinite dimensional models in statistics. Indeed whenever the dimension becomes high or infinite, it is important to adapt to the underlying structure of the problem. While adaptive estimation is often possible, it is often the case that adaptive and honest confidence sets do not exist. This is known as the adaptive inference paradox. And this has consequences in sequential decision making. In this talk, I will present some classical results of adaptive inference and discuss how they impact sequential decision making. (based on joint works with Andrea Locatelli, Matthias Loeffler, Olga Klopp and Richard Nickl)
Research Seminar in Statistics
Adaptive inference and its relations to sequential decision making
HG G 19.1
Mon 07.10.2019
16:15-17:15
Gitta Kutyniok
TU Berlin
Abstract
Speaker invited by: Christoph Schwab Inverse problems in imaging such as denoising, recovery of missing data, or the inverse scattering problem appear in numerous applications. However, due to their increasing complexity, model-based methods are often today not sufficient anymore. At the same time, we witness the tremendous success of data-based methodologies, in particular, deep neural networks for such problems. However, pure deep learning approaches often neglect known and valuable information from the modeling world and also currently still lack a profound theoretical understanding. In this talk, we will provide an introduction to this problem complex and then focus on the inverse problem of computed tomography, where one of the key issues is the limited angle problem. For this problem, we will demonstrate the success of hybrid approaches. We will develop a solver for this severely ill-posed inverse problem by combining the model-based method of sparse regularization by shearlets with the data-driven method of deep learning. Our approach is faithful in the sense that we only learn the part which cannot be handled by model-based methods, while applying the theoretically controllable sparse regularization technique to all other parts. We further show that our algorithm significantly outperforms previous methodologies, including methods entirely based on deep learning. Finally, we will discuss how similar ideas can also be used to solve related problems such as detection of wavefront sets.
ETH-FDS seminar
Deep Learning meets Modeling: Taking the Best out of Both Worlds
HG G 19.2
Fri 11.10.2019
15:15-16:00
Ashia Wilson
Microsoft Research
Abstract
Cross-validation (CV) is the de facto standard for selecting accurate predictive models and assessing model performance. However, CV suffers from a need to repeatedly refit a learning procedure on a large number of training datasets. To reduce the computational burden, a number of works have introduced approximate CV procedures that simultaneously reduce runtime and provide model assessments comparable to CV when the prediction problem is sufficiently smooth. An open question however is whether these procedures are suitable for model selection. In this talk, I’ll describe (i) broad conditions under which the model selection performance of approximate CV nearly matches that of CV, (ii) examples of prediction problems where approximate CV selection fails to mimic CV selection, and (iii) an extension of these results and the approximate CV framework more broadly to non-smooth prediction problems like L1-regularized empirical risk minimization. This is joint work with Lester Mackey and Maximilian Kasy.
Research Seminar in Statistics
The risk of approximate cross validation
HG G 19.1
Thr 07.11.2019
16:15-17:00
Anders Kock
Oxford University
Abstract
Consider a setting in which a policy maker assigns subjects to treatments, observing each outcome before the next subject arrives. Initially, it is unknown which treatment is best, but the sequential nature of the problem permits learning about the effectiveness of the treatments. While the multi-armed-bandit literature has shed much light on the situation when the policy maker compares the effectiveness of the treatments through their mean, economic decision making often requires targeting purpose specific characteristics of the outcome distribution, such as its inherent degree of inequality, welfare or poverty. In the present paper we introduce and study sequential learning algorithms when the distributional characteristic of interest is a general functional of the outcome distribution. In particular, it turns out that intuitively reasonable approaches, such as first conducting an experiment on an initial group of subjects followed by rolling out the inferred best treatment to the population, are dominated by the policies we develop and of which we show that they are optimal.
Research Seminar in Statistics
Functional Sequential Treatment Allocation
HG G 19.1
Fri 08.11.2019
17:15-18:15
David Donoho
Stanford University
Abstract
Machine learning became a remarkable media story of the 2010s largely owing to its ability to focus researcher energy on attacking prediction challenges like ImageNet. Media extrapolation of complete transformation of human existence has (predictably) ensued. Unfortunately machine learning has a troubled relationship with understanding the foundation of its achievements well enough to face demanding real world requirements outside the challenge setting. For example, its literature is admittedly corrupted by anti intellectual and anti scholarly tendencies. It is beyond irresponsible to build a revolutionary transformation on such a shaky pseudo-foundation. In contrast, more traditional subdisciplines of data science like numerical linear algebra, applied probability, and theoretical statistics provide time-tested tools for designing reliable processes with understandable performance. Moreover, positive improvements in human well being have repeatedly been constructed using these foundations. To illustrate these points we will review a recent boomlet in the ML literature in the study of eigenvalues of deepnet Hessians. A variety of intriguing patterns in eigenvalues were observed and speculated about in ML conference papers. We describe work of Vardan Papyan showing that the traditional subdisciplines, properly deployed, can offer insights about these objects that ML researchers had been seeking.
ETH-FDS Stiefel Lectures
Deepnet Spectra and the two cultures of data science
HG F 1
Tue 10.12.2019
16:15-17:15
Yue Lu
John A. Paulson School of Engineering and Applied Sciences, Harvard University
Abstract
The massive datasets being compiled by our society present new challenges and opportunities to the field of signal and information processing. The increasing dimensionality of modern datasets offers many benefits. In particular, the very high-dimensional settings allow one to develop and use powerful asymptotic methods in probability theory and statistical physics to obtain precise characterizations that would otherwise be intractable in moderate dimensions. In this talk, I will present recent work where such blessings of dimensionality are exploited. In particular, I will show (1) the exact characterization of a widely-used spectral method for nonconvex statistical estimation; (2) the fundamental limits of solving the phase retrieval problem via linear programming; and (3) how to use scaling and mean-field limits to analyze nonconvex optimization algorithms for high-dimensional inference and learning. In these problems, asymptotic methods not only clarify some of the fascinating phenomena that emerge with high-dimensional data, they also lead to optimal designs that significantly outperform heuristic choices commonly used in practice.
ETH-FDS seminar
Exploiting the Blessings of Dimensionality in Big Data
HG E 1.2
Fri 13.12.2019
13:00-18:00
Annette Kopp- Schneider
DKFZ
HIT E 03
Uni Zürich, Hirschgraben
Tue 07.01.2020
11:15-12:00
Abraham Wyner
Wharton, University of Pennsylvania
Abstract
AdaBoost, random forests and deep neural networks are the present day workhorses of the machine learning universe. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, "interpolating" algorithm which creates what we denote as a “spiked-smooth” classifier, and we view AdaBoost in the same light. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples to support this explanation. We conclude with a brief mention of new research that suggests that deep neural nets are effective (at least in part and in some contexts) for the same reasons.
Research Seminar in Statistics
Explaining the Success of AdaBoost, Random Forests and Deep Neural Nets as Interpolating Classifiers
HG G 19.2
JavaScript has been disabled in your browser