Seminar overview

×

Modal title

Modal content

Spring Semester 2023

Date & Time	Speaker	Title	Location
Thr 02.03.2023 15:00-16:00	Felix Krahmer TU München	Abstract The problem of recovering a high-dimensional low-rank matrix from a limited set of random measurements has enjoyed various applications and gained a detailed theoretical foundation over the last 15 years. An instance of particular interest is the matrix completion problem where the measurements are entry observations. The first rirgorous recovery guarantees for this problem were derived for the nuclear norm minimization approach, a convex proxy for the NP-hard problem of constrained rank minimization. For matrices whose entries are ”spread out” well enough, this convex problem admits a unique solution which corresponds to the ground truth. In the presence of random measurement noise, the reconstruction performance is also well-studied, but the performance for adversarial noise remains less understood. While some error bounds have been derived for both convex and nonconvex approaches, these bounds exhibit a gap to information-theoretic lower bounds and provable performance for Gaussian measurements. However, a recent analysis of the problem suggests that under small-scale adversarsial noise, the reconstruction error can be significantly amplified. In this talk, we investigate this amplification quantitatively and provide new reconstruction bounds for both small and large noise levels that suggest a quadratic dependence between the reconstruction error and the noise level. This is joint work with Julia Kostin (TUM/ETH) and Dominik Stöger (KU Eichstätt-Ingolstadt). ETH-FDS seminar Robust low-rank matrix completion with adversarial noise	HG E 1.1
Thr 09.03.2023 16:00-17:00	Yixin Wang University of Michigan	Abstract Representation learning constructs low-dimensional representations to summarize essential features of high-dimensional data like images and texts. Ideally, such a representation should efficiently capture non-spurious features of the data. It shall also be disentangled so that we can interpret what feature each of its dimensions captures. However, these desiderata are often intuitively defined and challenging to quantify or enforce. In this talk, we take on a causal perspective of representation learning. We show how desiderata of representation learning can be formalized using counterfactual notions, enabling metrics and algorithms that target efficient, non-spurious, and disentangled representations of data. We discuss the theoretical underpinnings of the algorithm and illustrate its empirical performance in both supervised and unsupervised representation learning. This is joint work with Michael Jordan: https://arxiv.org/abs/2109.03795 Young Data Science Researcher Seminar Zurich Representation Learning: A Causal Perspective	Zoom Call
Fri 17.03.2023 15:15-16:15	Sebastian Lerch Karlsruhe Institute of Technology	Abstract Ensemble weather forecasts based on multiple runs of numerical weather prediction models typically show systematic errors and require post-processing to obtain reliable forecasts. Accurately modeling multivariate dependencies is crucial in many practical applications, and various approaches to multivariate post-processing have been proposed where ensemble predictions are first post-processed separately in each margin and multivariate dependencies are then restored via copulas. These two-step methods share common key limitations, in particular the difficulty to include additional predictors in modeling the dependencies. We propose a novel multivariate post-processing method based on generative machine learning to address these challenges. In this new class of nonparametric data-driven distributional regression models, samples from the multivariate forecast distribution are directly obtained as output of a generative neural network. The generative model is trained by optimizing a proper scoring rule which measures the discrepancy between the generated and observed data, conditional on exogenous input variables. Our method does not require parametric assumptions on univariate distributions or multivariate dependencies and allows for incorporating arbitrary predictors. In two case studies on multivariate temperature and wind speed forecasting at weather stations over Germany, our generative model shows significant improvements over state-of-the-art methods and particularly improves the representation of spatial dependencies. A preprint is available at https://arxiv.org/abs/2211.01345. Research Seminar in Statistics Generative machine learning methods for multivariate ensemble post-processing	HG G 19.1
Mon 20.03.2023 15:00-17:00	Sara Magliacane University of Amsterdam and MIT-IBM Watson AI Lab Zijian Guo Rutgers University	Abstract Zijian Guo, Rutgers University Title: Statistical Inference for Maximin Effects: Identifying Stable Associations Across Multiple Studies Abstract: Integrative analysis of data from multiple sources is critical to making generalizable discoveries. Associations that are consistently observed across multiple source populations are more likely to be generalized to target populations with possible distributional shifts. In this paper, we model the heterogeneous multi-source data with multiple high-dimensional regressions and make inferences for the maximin effect (Meinshausen, B{\"u}hlmann, AoS, 43(4), 1801--1830). The maximin effect provides a measure of stable associations across multi-source data. A significant maximin effect indicates that a variable has commonly shared effects across multiple source populations, and these shared effects may be generalized to a broader set of target populations. There are challenges associated with inferring maximin effects because its point estimator can have a non-standard limiting distribution. We devise a novel sampling method to construct valid confidence intervals for maximin effects. The proposed confidence interval attains a parametric length. This sampling procedure and the related theoretical analysis are of independent interest for solving other non-standard inference problems. Using genetic data on yeast growth in multiple environments, we demonstrate that the genetic variants with significant maximin effects have generalizable effects under new environments. Sara Magliacane (University of Amsterdam and MIT-IBM Watson AI Lab) Title: Causality-inspired ML: what can causality do for ML? Abstract: Applying machine learning to real-world cases often requires methods that are robust w.r.t. heterogeneity, missing not at random or corrupt data, selection bias, non i.i.d. data etc. and that can generalize across different domains. Moreover, many tasks are inherently trying to answer causal questions and gather actionable insights, a task for which correlations are usually not enough. Several of these issues are addressed in the rich causal inference literature. On the other hand, often classical causal inference methods require either a complete knowledge of a causal graph or enough experimental data (interventions) to estimate it accurately. Recently, a new line of research has focused on causality-inspired machine learning, i.e. on the application ideas from causal inference to machine learning methods without necessarily knowing or even trying to estimate the complete causal graph. In this talk, I will present an example of this line of research in the unsupervised domain adaptation case, in which we have labelled data in a set of source domains and unlabelled data in a target domain ("zero-shot"), for which we want to predict the labels. In particular, given certain assumptions, our approach is able to select a set of provably "stable" features (a separating set), for which the generalization error can be bound, even in case of arbitrarily large distribution shifts. As opposed to other works, it also exploits the information in the unlabelled target data, allowing for some unseen shifts w.r.t. to the source domains. While using ideas from causal inference, our method never aims at reconstructing the causal graph or even the Markov equivalence class, showing that causal inference ideas can help machine learning even in this more relaxed setting. Young Data Science Researcher Seminar Zurich MS New Researchers Group, Young Data Science Researcher Seminar Zürich, and the YoungStatS Project: Distribution generalization and causal inference	I
Tue 28.03.2023 13:15-14:15	Boaz Nadler The Weizmann Institute of Science, Israel	Abstract Consider the sparse approximation or best subset selection problem: Given a vector y and a matrix A, find a k-sparse vector x that minimizes the residual \|\|Ax-y\|\|. This sparse linear regression problem, and related variants, plays a key role in high dimensional statistics, compressed sensing, and more. In this talk we focus on the trimmed lasso penalty, defined as the L_1 norm of x minus the L_1 norm of its top k entries in absolute value. We advocate using this penalty by deriving sparse recovery guarantees for it, and by presenting a practical approach to optimize it. Our computational approach is based on the generalized soft-min penalty, a smooth surrogate that takes into account all possible k-sparse patterns. We derive a polynomial time algorithm to compute it, which in turn yields a novel method for the best subset selection problem. Numerical simulations illustrate its competitive performance compared to current state of the art. Research Seminar in Statistics The Trimmed Lasso: Sparse Recovery Guarantees And Practical Optimization	HG G 19.2
Tue 04.04.2023 15:15-16:15	Boaz Nadler The Weizmann Institute of Science, Israel	Abstract Tree graphical models are common statistical models for data in a wide variety of applications. Tree models are particularly popular in phylogenetics, where an important task is to infer the evolutionary history of current species. Given observations at the leaves of the tree, a common problem is to reconstruct the tree's latent structure. We present two simple spectral-based methods for tree recovery: (i) A bottom up spectral neighbor joining method (SNJ); and (ii) STDR - a spectral based top down method. We prove that under suitable assumptions, both methods are consistent and derive finite sample recovery guarantees. We illustrate the competitive performance of our algorithms in comparison with popular tree recovery methods. Research Seminar in Statistics Spectral Methods for Reconstructing Trees	HG G 19.2
Tue 18.04.2023 14:15-15:05	Courtney Paquette McGill University, Canada	Abstract Random matrices frequently appear in many different fields — physics, computer science, applied and pure mathematics. Oftentimes the random matrix of interest will have non-trivial structure — entries that are dependent and have potentially different means and variances (e.g. sparse Wigner matrices, matrices corresponding to adjacencies of random graphs, sample covariance matrices). However, current understanding of such complex random matrices remains lacking. In this talk, I will discuss recent results concerning the spectrum of sums of independent random matrices with a.s. bounded operator norms. In particular, I will demonstrate that under some fairly general conditions, such sums will exhibit the following universality phenomenon — their spectrum will lie close to that of a Gaussian random matrix with the same mean and covariance. No prior background in random matrix theory is required — basic knowledge of probability and linear algebra are sufficient. (joint with Ramon van Handel) Pre-print link: https://web.math.princeton.edu/~rvan/tuniv220113.pdf ETH-FDS seminar DACO-FDS: Stochastic Algorithms in the Large: Batch Size Saturation, Stepsize Criticality, Generalization Performance, and Exact Dynamics (Part I)	HG G 19.1
Tue 18.04.2023 15:10-16:00	Elliot Paquette McGill University, Canada	Abstract In this talk, we will present a framework for analyzing dynamics of stochastic optimization algorithms (e.g., stochastic gradient descent (SGD) and momentum (SGD+M)) when both the number of samples and dimensions are large. For the analysis, we will introduce a stochastic differential equation, called homogenized SGD. We show that homogenized SGD is the high-dimensional equivalent of SGD -- for any quadratic statistic (e.g., population risk with quadratic loss), the statistic under the iterates of SGD converges to the statistic under homogenized SGD when the number of samples n and number of features d are polynomially related. By analyzing homogenized SGD, we provide exact non-asymptotic high-dimensional expressions for the training dynamics and generalization performance of SGD in terms of a solution of a Volterra integral equation. The analysis is formulated for data matrices and target vectors that satisfy a family of resolvent conditions, which can roughly be viewed as a weak form of delocalization of sample-side singular vectors of the data. By analyzing these limiting dynamics, we can provide insights into learning rate, momentum parameter, and batch size selection. For instance, we identify a stability measurement, the implicit conditioning ratio (ICR), which regulates the ability of SGD+M to accelerate the algorithm. When the batch size exceeds this ICR, SGD+M converges linearly at a rate of $O(1/ \kappa)$, matching optimal full-batch momentum (in particular performing as well as a full-batch but with a fraction of the size). For batch sizes smaller than the ICR, in contrast, SGD+M has rates that scale like a multiple of the single batch SGD rate. We give explicit choices for the learning rate and momentum parameter in terms of the Hessian spectra that achieve this performance. Finally we show this model matches performances on real data sets. ETH-FDS seminar DACO-FDS: Stochastic Algorithms in the Large: Batch Size Saturation, Stepsize Criticality, Generalization Performance, and Exact Dynamics (Part II)	HG G 19.1
Fri 28.04.2023 15:15-16:15	Benedikt Herwerth SwissRe	Abstract The topic of potential discrimination in statistical and machine learning (ML) is being increasingly discussed, both by the scientific community and the wider public. In the insurance industry specifically, customers and regulators demand that individuals are treated fairly. Regulations of the European Union, for example, mandate that gender is not be used as a factor in determining the prices of policies. This talk is based on a method introduced in a series of papers by Lindholm et al. on "discrimination-free insurance pricing" that address specifically the issue of potential indirect discrimination [1, 2, 3]. In the first part of the talk, we outline the solution by Lindholm et al., and we discuss that indirect discrimination is a topic that is subtle and can be difficult to understand for decision takers. In the second part of our talk, we present a Swiss Re internally built toolbox implementing the methodology. In the third part of our talk, we apply the methodology to model human mortality. We use public data of the German association of actuaries, which we interpret in terms of a Bayesian network describing the relation between age, gender, the smoker status and the mortality of individuals. [1] M. Lindholm, R. Richman, A. Tsanakas and M. V. Wuthrich, "Discrimination-free insurance pricing," ASTIN Bulletin: The Journal of the IAA, vol. 52, pp. 55 - 89, 2022. [2] M. Lindholm, R. Richman, A. Tsanakas and M. V. Wuthrich, "A Multi-Task Network Approach for Calculating Discrimination-Free Insurance Prices," 2 11 2022. [Online]. Available: https://ssrn.com/abstract=4155585. [3] M. Lindholm, R. Richman, A. Tsanakas and M. V. Wuthrich, "A Discussion of Discrimination and Fairness in Insurance Pricing," 2 09 2022. [Online]. Available: https://ssrn.com/abstract=4207310. ZüKoSt Zürcher Kolloquium über Statistik Avoiding indirect discrimination in modeling mortality	HG G 19.1
Thr 11.05.2023 16:15-17:15	Stephan Mandt University of California	Abstract Latent variable models have been an integral part of probabilistic machine learning, ranging from simple mixture models to variational autoencoders to powerful diffusion probabilistic models at the center of recent media attention. Perhaps less well-appreciated is the intimate connection between latent variable models and data compression, and the potential of these models for advancing natural science. This talk will explore these topics. I will begin by showcasing connections between variational methods and the theory and practice of neural data compression. On the applied side, variational methods lead to machine-learned compressors of data such as images and videos and offer principled techniques for enhancing their compression performance, as well as reducing their decoding complexity. On the theory side, variational methods also provide scalable bounds on the fundamental compressibility of real-world data, such as images and particle physics data. Lastly, I will also delve into climate science projects, where a combination of deep latent variable modeling and vector quantization enables assessing distribution shifts induced by varying climate models and the effects of global warming. Short Bio: Stephan Mandt is an Associate Professor of Computer Science and Statistics at the University of California, Irvine. From 2016 until 2018, he was a Senior Researcher and Head of the statistical machine learning group at Disney Research in Pittsburgh and Los Angeles. He held previous postdoctoral positions at Columbia University and Princeton University. Stephan holds a Ph.D. in Theoretical Physics from the University of Cologne in Germany, where he received the National Merit Scholarship. He received the NSF CAREER Award, a Kavli Fellowship of the U.S. National Academy of Sciences, the German Research Foundation's Mercator Fellowship, and the UCI ICS Mid-Career Excellence in Research Award. He is a member of the ELLIS Society and a former visiting researcher at Google Brain. Stephan will serve as Program Chair of the AISTATS 2024 conference, currently serves as an Action Editor for JMLR and TMLR, and frequently serves as Area Chair for NeurIPS, ICML, AAAI, and ICLR. ETH-FDS seminar Deep Latent Variable Models for Compression and Natural Science	HG D 1.2
Fri 26.05.2023 15:15-16:15	Tim Vaughan ETH, Dep. of Biosystems Science and Engineering	Abstract The start of 2020 saw a then-novel coronavirus, SARS-CoV-2, spread rapidly across the globe. A stand-out characteristic of the resulting pandemic has been the incredible level of genomic surveillance applied, which has resulted in over 15 million SARS-CoV-2 publicly-available genomes to date. In this talk I will provide a brief introduction to a statistical inference framework known as Bayesian phylodynamics. This framework combines ideas from computational phylogenetics and population genetics to produce model-based inferences of key epidemiological parameters using pathogen sequences and other epidemiological data. I will then go on to discuss some recent applications of this framework to the inference of basic SARS-CoV-2 reproductive numbers, case count dynamics, as well as the quantification of the genetic evidence for the effectiveness of Swiss contact tracing efforts. ZüKoSt Zürcher Kolloquium über Statistik Phylodynamics in Action: Using genomes and computers to understand COVID-19 outbreaks	HG G 19.1
Wed 31.05.2023 17:15-18:15	Sara van de Geer ETH Zürich	Abstract tba Research Seminar in Statistics Farewell Lecture: Data dust	HG F 30
Wed 31.05.2023 17:15-18:15	Sara van de Geer ETH Zürich	Abstract Farewell Lecture ZüKoSt Zürcher Kolloquium über Statistik Farewell Lecture: Data dust	HG F 30
Thr 01.06.2023 16:15-17:15	Yurii Nesterov UCLouvain	Abstract In the recent years, the most important developments in Optimization were related to clarification of abilities of the higher-order methods. These schemes have potentially much higher rate of convergence as compared to the lower-order methods. However, the possibility of their implementation in the form of practically efficient algorithms was questionable during decades. In this talk, we discuss different possibilities for advancing in this direction, which avoid all standard fears on tensor methods (memory requirements, complexity of computing the tensor components, etc.). Moreover, in this way we get the new second-order methods with memory, which converge provably faster than the conventional upper limits provided by the Complexity Theory. ETH-FDS seminar New perspectives for higher-order methods in Convex Optimization	HG D 1.2
Thr 15.06.2023 15:15-16:15	Sylvain Robert EPFL, Lausanne	Abstract tba ZüKoSt Zürcher Kolloquium über Statistik Title T.B.A.	HG G 19.1

Archive: SS 24 AS 23 SS 23 AS 22 SS 22 AS 21 SS 21 AS 20 SS 20 AS 19 SS 19 AS 18 SS 18 AS 17 SS 17 AS 16 SS 16 AS 15 SS 15 AS 14 SS 14 AS 13 SS 13 AS 12 SS 12 AS 11 SS 11 AS 10 SS 10 AS 09