Seminar overview

×

Modal title

Modal content

Spring Semester 2022

Date & Time Speaker Title Location
Thr 10.02.2022
16:00-17:00
Sohom Bhattacharya
Stanford University
Abstract
We consider the problem of detecting whether or not, in a given network, there is a cluster of nodes which exhibit an unusual behavior. When the nodes correspond to independent Bernoulli random variables, such detection problems are well-studied in literature. How- ever, a fundamental question in this  eld is to study how dependence characterized by a network modulate the behavior of such problems. Formally, we will address the detection question when the nodes of a network correspond to Bernoulli variables with dependence modeled by graphical models (Ising models). Our results not only provide sharp constants of detection in these cases and thereby pinpoint the precise relationship of the detection problem with the underlying dependence, but also demonstrate how to be agnostic over the strength of dependence present in the respective models. This is based on joint work with Rajarshi Mukherjee and Gourab Ray.
Young Data Science Researcher Seminar Zurich
Global Testing under dependent Bernoullis
Zoom Call
Thr 17.02.2022
16:00-17:00
Richard Guo
University of Cambridge
Abstract
We study efficient estimation of an intervention mean associated with a point exposure treatment under a causal graphical model represented by a directed acyclic graph without hidden variables. Under such model, it may happen that a subset of the variables are uninformative in that failure to measure them neither precludes identification of the intervention mean nor changes the semiparametric variance bound for regular estimators of it. Identification of such uninformative variables is particularly useful at the stage of designing a planned observational or randomized study in that measurements of such variables can be avoided without sacrificing efficiency. We develop a set of graphical criteria that are sound and complete for eliminating all uninformative variables. In addition, we construct a reduced directed acyclic graph that exactly represents the induced marginal model over the informative variables. We show that the interventional mean is identified by the g-formula (Robins, 1986) according to this graph. This g-formula is the irreducible, efficient identifying formula --- nonparametric plugin of the formula achieves the semiparametric efficiency bound of the original graphical model.
Young Data Science Researcher Seminar Zurich
Variable elimination, graph reduction and efficient g-formula
Zoom Call
Thr 24.02.2022
15:00-16:00
Anna Ma
University of California
Abstract
Signed measurements of the form $y_i = sign(\langle a_i, x \rangle)$ for $i \in [M]$ are ubiquitous in large-scale machine learning problems where the overarching task is to recover the unknown, unit norm signal $x \in \mathbb{R}^d$. Oftentimes, measurements can be queried adaptively, for example based on a current approximation of $x$, leading to only a subset of the $M$ measurements being needed. Geometrically, these measurements emit a spherical hyperplane tessellation in $\mathbb{R}^{d}$ where one of the cells in the tessellation contains the unknown vector $x$. Motivated by this problem, in this talk we will present a geometric property related to spherical hyperplane tessellations in $\mathbb{R}^{d}$. Under the assumption that $a_i$ are Gaussian random vectors, we will show that with high probability there exists a subset of the hyperplanes whose cardinality is on the order of $d\log(d)\log(M)$ such that the radius of the cell containing $x$ induced by these hyperplanes is bounded above by, up to constants, $d\log(d)\log(M)/M$. The work presented is joint work with Rayan Saab and Eric Lybrand.
Young Data Science Researcher Seminar Zurich
Gaussian Spherical Tessellations and Learning Adaptively
Zoom Call
Thr 03.03.2022
15:00-16:00
Eliza O'Reilly
California Institute of Technology
Abstract
The Mondrian process in machine learning is a recursive partition of space with random axis-aligned cuts used to build random forests and Laplace kernel approximations. The construction allows for efficient online algorithms, but the restriction to axis-aligned cuts does not capture dependencies between features. By viewing the Mondrian as a special case of the stable under iterated (STIT) process in stochastic geometry, we resolve open questions about the generalization of cut directions. We utilize the theory of stationary random tessellations to show that STIT processes approximate a large class of stationary kernels and achieve minimax rates for Lipschitz and C^2 functions. This work opens many new questions at the intersection of stochastic geometry and machine learning. Based on joint work with Ngoc Mai Tran.
Young Data Science Researcher Seminar Zurich
Random Tessellation Features and Forests
Zoom Call
Fri 04.03.2022
15:15-16:15
Magali Champion
ETH Zürich
Abstract
Detecting cluster structure is a fundamental task to understand and visualize functional characteristics of a graph. Among the different clustering methods available, spectral clustering is one of the most widely used due to its speed and simplicity, while still being sensitive to high perturbations imposed on the graph. In this work, we present a variant of the spectral clustering, called l_1-spectral clustering, based on Lasso regularization and adapted to perturbed graph models. By promoting sparse eigenbases solutions of specific l_1-minimization problems, it detects the hidden natural cluster structure of the graph. The effectiveness and robustness to noise perturbations is confirmed through a collection of simulated and real biological data. Joint work with C. Champion, M. Blazère, R. Burcelin and JM. Loubes.
ZüKoSt Zürcher Kolloquium über Statistik
l_1-spectral clustering algorithm: a spectral clustering method using l_1-regularization
HG G 19.1
Thr 10.03.2022
15:00-16:00
Tomas Vaškevičius
University of Oxford
Abstract
The local Rademacher complexity framework is one of the most successful toolboxes for establishing sharp excess risk bounds for statistical estimators based on empirical risk minimization. However, the applicability of this toolbox hinges on the so-called Bernstein condition, often limiting direct application domains to proper and convex problem settings. In this talk, we will show how to obtain exponential-tail local Rademacher complexity excess risk bounds under an alternative condition. This alternative condition, leading to a more recent notion of localization via offset Rademacher complexities, is known to hold for some estimators in non-convex and improper settings. We will discuss applications of this theory to model selection aggregation and iterative regularization problems.
Young Data Science Researcher Seminar Zurich
Exponential-tail excess risk bounds without Bernstein condition
Zoom Call
Thr 17.03.2022
15:00-16:00
Alden Green
Carnegie Mellon University
Abstract
Graph-based learning refers to a family of conceptually simple and scalable approaches, which can be applied across many tasks and domains. We study graph-based learning in a relatively classical setting: nonparametric regression with point-cloud data lying on a (possibly) low-dimensional data manifold. In this setting, many graph-based methods can be interpreted as discrete approximations of “continuous-time methods”---meaning methods defined with respect to continuous-time differential operators—that serve as some of the traditional workhorses for nonparametric regression. Motivated by this connection, we develop theoretical guarantees for a pair of graph-based methods, Laplacian eigenmaps and Laplacian smoothing, which show that they achieve optimal rates of convergence over Sobolev smoothness classes. Indeed, perhaps surprisingly, these results imply that graph-based methods actually have superior properties than are suggested by tying them to standard continuous-time tools.
Young Data Science Researcher Seminar Zurich
Statistical Theory for Nonparametric Regression with Graph Laplacians
Zoom Call
Thr 24.03.2022
16:15-17:15
Emmanuel Abbé
EPF Lausanne
Abstract
It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parametrizations: neural networks in the linear/kernel regime, and neural networks with no structural constraints. However, for the main parametrization of interest ---non-linear but regular networks--- no tight characterization has yet been achieved, despite significant developments. In this talk, we take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace, since this provides a challenging framework for linear models (curse of dimensionality) but not for neural networks that routinely tackle high-dimensional data. Accordingly, we study learning of such functions with a linear sample complexity. In this setting, we establish a necessary and nearly sufficient condition for learning, i.e., the merged-staircase property (MSP). Joint work with E. Boix (MIT) and T. Misiakiewicz (Stanford)
ETH-FDS seminar
Towards a characterization of when neural networks can learn
HG D 1.1
Thr 31.03.2022
16:15-17:15
Tom Goldstein
University of Maryland
Abstract
This talk will have two parts. In the first half of the talk, I'll survey the basics of adversarial machine learning, and discuss whether adversarial attacks and dataset poisoning can scale up to work on industrial systems. I'll also present applications where adversarial methods provide benefits for domain shift robustness, dataset privacy, and data augmentation. In the second half of the talk, I'll present my recent work on "thinking systems." These systems use recurrent networks to emulate a human-like thinking process, in which problems are represented in memory and then iteratively manipulated and simplified over time until a solution to a problem is found. When these models are trained only on "easy" problem instances, they can then solve "hard" problem instances without having ever seen one, provided the model is allowed the "think" for longer at test time. Bio: Tom Goldstein is the Perotto Associate Professor of Computer Science at the University of Maryland. His research lies at the intersection of machine learning and optimization, and targets applications in computer vision and signal processing. Before joining the faculty at Maryland, Tom completed his PhD in Mathematics at UCLA, and was a research scientist at Rice University and Stanford University. Professor Goldstein has been the recipient of several awards, including SIAM’s DiPrima Prize, a DARPA Young Faculty Award, a JP Morgan Faculty award, and a Sloan Fellowship.
ETH-FDS seminar
End-to-end algorithm synthesis with "thinking" networks
HG D 1.2
Thr 07.04.2022
16:15-17:15
Dominik Rothenhäusler
Stanford University
Abstract
During data analysis, analysts often have to make seemingly arbitrary decisions. For example during data pre-processing, there are a variety of options for dealing with outliers or inferring missing data. Similarly, many specifications and methods can be reasonable to address a certain domain question. This may be seen as a hindrance to reliable inference since conclusions can change depending on the analyst's choices. In this talk, I argue that this situation is an opportunity to construct confidence intervals that account not only for sampling uncertainty but also some type of distributional uncertainty. Distributional uncertainty is closely related to other issues in data analysis, ranging from dependence between observations to selection bias and confounding. We demonstrate the utility of the approach on simulated and real-world data. This is joint work with Yujin Jeong.
Research Seminar in Statistics
Calibrated inference: statistical inference that accounts for both sampling uncertainty and distributional uncertainty
HG D 7.1
Thr 14.04.2022
16:00-17:00
Yiqun Chen
University of Washington
Abstract
We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate, because the clusters were obtained on the same data used for testing. To overcome this problem, we take a selective inference approach. We propose a finite-sample p-value that controls the selective Type I error for testing the difference in means between a pair of clusters obtained using k-means clustering, and show that it can be efficiently computed. We apply our proposal in simulation, and on hand-written digits data and single-cell RNA-sequencing data. This is joint work with Daniela Witten.
Young Data Science Researcher Seminar Zurich
Selective inference for k-means clustering
Zoom Call
Thr 28.04.2022
15:00-16:00
Anish Agarwal
UC Berkeley
Abstract
What will happen to Y if we do A? A variety of meaningful social and engineering questions can be formulated this way: What will happen to a patient’s health if they are given a new therapy? What will happen to a country’s economy if policy-makers legislate a new tax? What will happen to a data center’s latency if a new congestion control protocol is used? We explore how to answer such counterfactual questions using observational data---which is increasingly available due to digitization and pervasive sensors---and/or very limited experimental data. The two key challenges are: (i) counterfactual prediction in the presence of latent confounders; (ii) estimation with modern datasets which are high-dimensional, noisy, and sparse. The key framework we introduce is connecting causal inference with tensor completion. In particular, we represent the various potential outcomes (i.e., counterfactuals) of interest through an order-3 tensor. The key theoretical results presented are: (i) Formal identification results establishing under what missingness patterns, latent confounding, and structure on the tensor is recovery of unobserved potential outcomes possible. (ii) Introducing novel estimators to recover these unobserved potential outcomes and proving they are finite-sample consistent and asymptotically normal. The efficacy of our framework is shown on high-impact applications. These include working with: (i) TaurRx Therapeutics to identify patient sub-populations where their therapy was effective. (ii) Uber Technologies on evaluating the impact of driver engagement policies without running an A/B test. (iii) The Poverty Action Lab at MIT to make personalized policy recommendations to improve childhood immunization rates across villages in Haryana, India. Finally, we discuss connections between causal inference, tensor completion, and offline reinforcement learning. Anish Brief Bio: Anish is currently a postdoctoral fellow at the Simons Institute at UC Berkeley. He did his PhD at MIT in EECS where he was advised by Alberto Abadie, Munther Dahleh, and Devavrat Shah. His research focuses on designing and analyzing methods for causal machine learning, and applying it to critical problems in social and engineering systems. He currently serves as a technical consultant to TauRx Therapeutics and Uber Technologies on questions related to experiment design and causal inference. Prior to the PhD, he was a management consultant at Boston Consulting Group. He received his BSc and MSc at Caltech.
Young Data Science Researcher Seminar Zurich
Causal Inference for social and engineering systems
Zoom Call
Thr 05.05.2022
17:15-18:15
Roman Vershynin
University of California, Irvine
Abstract
An emerging way to protect data privacy is to replace true data by synthetic data whenever possible. Medical records of artificial patients, for example, could retain meaningful statistical information while preserving privacy of the true patients. But is it possible to make synthetic data that is both private and useful? Is it possible, in particular, to deploy statistical sampling privately? These and other questions about privacy inspire new problems in high-dimendional probability theory about random walks, covariance estimation, and private measures. The talk is based on a research program that is joint with March Boedihardjo and Thomas Strohmer.
ETH-FDS Stiefel Lectures
Synthetic data and its privacy
HG F 30
Fri 06.05.2022
15:15-16:15
Björn Menze
Universität Zürich
Abstract
Biomedical image data offers quantitative information about health, disease, and disease progression under treatment - both at the patient and at the population level. Computational routines are instrumental in extracting this information in a structured fashion, typically following a succession of image segmentation, 'radiomic' feature extraction, and predictive modeling with respect to a given image marker or disease-related outcome. This pipeline can also be complemented by a functional and patient-specific modeling of the features or processes underlying the given image observations, for example, the tumor-growth underlying a set of magnetic resonance scans acquired prior to and after treatment. I will talk about this biomedical image data processing pipeline, focusing on two aspects of our work in Zurich: the analysis of tumor images using patient-adapted tumor growth models, and the extraction of whole brain vascular networks from 3D image data. I will demonstrate how to extract PDE model parameters from image observables using CNNs and show how we extract sparse physical networks from noisy image volumes using different learning strategies. I will also comment on data that made publicly available for both applications.
ZüKoSt Zürcher Kolloquium über Statistik
On tumors and vessels in medical image data
HG G 19.1
Thr 12.05.2022
15:00-16:00
Arshak Minasyan
CREST-ENSAE, Paris
Abstract
We propose a robust-to-outliers estimator of the mean of a multivariate Gaussian distribution that enjoys the following properties: polynomial computational complexity, high breakdown point, minimax rate optimality (up to logarithmic factor) and asymptotical efficiency. Non-asymptotic risk bound for the expected error of the proposed estimator is dimension-free and involves only the effective rank of the covariance matrix. Moreover, we show that the obtained results can be extended to sub-Gaussian distributions, as well as to the cases of unknown rate of contamination or unknown covariance matrix. Joint work with Arnak Dalalyan (https://arxiv.org/abs/2002.01432)
Young Data Science Researcher Seminar Zurich
All-In-One Robust Estimator of the Gaussian Mean
Zoom Call
Thr 12.05.2022
17:15-18:15
Song Mei
UC Berkeley
Abstract
Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks. A widely accepted explanation for the success of these architectures is that they encode hypothesis classes that are suitable for natural images. However, understanding the precise interplay between approximation and generalization in convolutional architectures remains a challenge. In this talk, we consider the stylized setting of covariates (image pixels), and fully characterize the RKHS of kernels composed of single layers of convolution, pooling, and downsampling operations. We then study the gain in sample efficiency of kernel methods using these kernels over standard inner-product kernels. In particular, we show that 1) the convolution layer breaks the curse of dimensionality by restricting the RKHS to `local' functions; 2) global average pooling enforces the learned function to be translation invariant; 3) local pooling biases learning towards low-frequency functions. Notably, our results quantify how choosing an architecture adapted to the target function leads to a large improvement in the sample complexity.
ETH-FDS seminar
A theoretical framework of convolutional kernels on image tasks
Zoom
Fri 13.05.2022
15:15-16:15
Matthias Templ
ZHAW School of Engineering, Zürich
Abstract
This talk is a practical presentation that aims to give an overview and ontology of different concepts on how to handle confidential data. It is motivated by the "fact" that different communities have different views and opinions on anonymization likely without knowing and understanding each other. To put it bluntly, a computer scientist will likely propose a very different solution to an anonymization problem than a survey statistician, and some scientists (and companies) believe that synthetic data is the sanctuary and solution par excellence, others simply promote privacy-preserving data processing, while national statistical offices generally tend to reject these concepts, etc. Given the various methodological developments in the field of sensitive data protection, a conceptual classification and comparison between different methods from different domains is missing. Specifically, the goal is thus to provide guidance to practitioners who do not have an overview of appropriate approaches for specific scenarios, whether it is differential privacy for interactive queries, $k$ anonymity methods and synthetic data generation for publishing data, or secure federated analytics for multi-party computations without sharing the data itself. After the brief introduction of the most important anonymization concepts, an overview and ontology is provided on methods based on key criteria that describe a context for handling data in a privacy-compliant manner that enables informed decisions in the face of many alternatives. Throughout this presentation, it is emphasized that there is no panacea and that – as always - context matters.
ZüKoSt Zürcher Kolloquium über Statistik
An Ontology on Data Anonymization and Privacy Computing Approaches
HG G 19.1
Thr 19.05.2022
15:00-16:00
Stefan Perko
University of Jena
Abstract
Stochastic gradient descent without replacement or reshuffling (SGDo) is predominantly used to train machine learning models in practice. However, the mathematical theory of this algorithm remains underexplored compared to its "with replacement" and "infinite data" counterparts. We propose a stochastic, continuous-time approximation to SGDo based on a family of stochastic differential equations driven by a stochastic process we call epoched Brownian motion, which encapsulates the behavior of reusing the same data points in subsequent epochs. We investigate this diffusion approximation by considering an application of SGDo to linear regression. Explicit convergence results are derived for constant learning rates and a sequence of learning rates satisfying the Robbins-Monro conditions. Finally, the validity of continuous-time dynamics are further substantiated by numerical experiments.
Young Data Science Researcher Seminar Zurich
Towards diffusion approximations for stochastic gradient descent without replacement
Zoom Call
Tue 24.05.2022
16:15-17:15
Daniel Roy
University of Toronto
Abstract
In this talk, I will advocate for rethinking the role of key data assumptions in statistical analysis. In place of assumptions, I will suggest we aim for adaptivity, much like in nonparametric regression, where we seek methods that adapt to, say, the smoothness of the unknown regression function. Not all assumptions are created equal, however. I'll discuss two examples where dropping key assumptions forces us to reconsider also what promises we make to users about our statistical methods. In the first example, we drop the i.i.d. assumption when performing sequential prediction. In the second, we drop the no-unmeasured-confounders assumption when attempting to identify the best intervention. In both cases, we must redefine our goal to arrive at a well-defined problem. This talk will be based on results described in https://arxiv.org/abs/2007.06552, https://arxiv.org/abs/2110.14804, and https://arxiv.org/abs/2202.05100, joint work with Blair Bilodeau, Nicolò Campolongo, Jeffrey Negrea, Francesco Orabona, and Linbo Wang.
ETH-FDS seminar
Replacing assumptions in statistical analysis with adaptivity
HG D 1.2
Thr 02.06.2022
16:00-17:00
Pedro Abdalla
ETH Zurich
Abstract
In this talk we introduce a new estimator of the covariance matrix that achieves the optimal rate of convergence (up to constant factors) in the operator norm under two standard notions of data contamination: We allow the adversary to corrupt an η-fraction of the sample arbitrarily, while the distribution of the remaining data points only satisfies that the Lp-marginal moment with some p≥4 is equivalent to the corresponding L2-marginal moment. Despite requiring the existence of only a few moments, our estimator achieves the same tail estimates as if the underlying distribution were Gaussian. We also discuss a dimension-free Bai-Yin type theorem in the regime p>4.
Young Data Science Researcher Seminar Zurich
Covariance Estimation: Optimal Dimension-free Guarantees for Adversarial Corruption and Heavy Tails
HG G 19.2
Zoom Call
Thr 16.06.2022
15:00-16:00
Denny Wu
University of Toronto
Abstract
We study the first gradient descent step on the first-layer weights W in a two-layer neural network, where the parameters are randomly initialized, and the training objective is the empirical MSE loss. In the proportional asymptotic limit (where the training set size n, the number of input features d, and the width of the neural network N all diverge at the same rate), and under an idealized student-teacher setting, we show that the first gradient update contains a rank-1 "spike", which results in an alignment between the first-layer weights and the linear component of the teacher model f*. To characterize the impact of this alignment, we compute the prediction risk of ridge regression on the conjugate kernel after one gradient step on W with learning rate \eta. We consider two scalings of the first step learning rate \eta. For small \eta, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random feature model, but cannot defeat the best linear model on the input. Whereas for sufficiently large \eta, we prove that for certain f^*, the same ridge estimator on trained features can go beyond this "linear regime" and outperform a wide range of (fixed) kernels. Our results demonstrate that even one gradient step can lead to a considerable advantage over random features, and highlight the role of learning rate scaling in the initial phase of training.
Young Data Science Researcher Seminar Zurich
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
HG G 19.1
Zoom Call
Wed 22.06.2022
15:15-16:15
Rainer von Sachs
UC Louvain
Abstract
In this talk we treat statistical inference for an intrinsic wavelet estimator of curves of symmetric positive definite (SPD) matrices in a log-Euclidean manifold. Examples for these arise in Diffusion Tensor Imaging or related medical imaging problems as well as in computer vision and for neuroscience problems. Our proposed wavelet (kernel) estimator preserves positive-definiteness and enjoys permutation-equivariance, which is particularly relevant for covariance matrices. Our second-generation wavelet estimator is based on average-interpolation and allows the same powerful properties, including fast algorithms, known from nonparametric curve estimation with wavelets in standard Euclidean set-ups. The core of our work is the proposition of confidence sets for our high-level wavelet estimator in a non-Euclidean geometry. We derive asymptotic normality of this estimator, including explicit expressions of its asymptotic variance. This opens the door for constructing asymptotic confidence regions which we compare with our proposed bootstrap scheme for inference. Detailed numerical simulations confirm the appropriateness of our suggested inference schemes. This is joint work with Johannes Krebs, Eichstätt, and Daniel Rademacher, Heidelberg.
Research Seminar in Statistics
Statistical inference for intrinsic wavelet estimators of covariance matrices in a log-Euclidean manifold
HG G 19.1
Tue 19.07.2022
11:15-12:15
Andreas Buja
Flatiron Institute
Abstract
Autism, now called "Autism Spectrum Disorder" (ASD), is a neuro-developmental condition that is diagnosed in early childhood.It is heavily gender-biased as it affects by today's criteria about 1% of boys and 1/4% of girls. It also has a strong genetic basis as evidenced by studies of identical twins. Unfortunately, what we have learned today is discouraging: The number of genes causally related to ASD is in the hundreds, of which about 150 have been identified, each accounting for only a tiny fraction of ASD variability. While the search for causally linked genes is ongoing, we also have to ask more global questions: How can we think about the relative protection from ASD enjoyed by females? How can the gender bias be reconciled with known inheritance mechanisms? To answer such questions, Wigler et al. (2007) proposed a "Unified Theory" according to which females are the stores of damaging genetic variants for which they have relative protection, but cause ASD in their sons who lack this protection. To capture Wigler et al.'s theory and combine it with today's knowledge of the "polygenic" nature of ASD, we developed a scatter shot model of "damaging alleles" which have "lower penetrance" in females than males. In this model we are able to match the known "prevalences" of 1% in boys and 1/4% in girls, as well as other known global features such as the existence of high risk families. Most importantly, we are able to prove mathematically a prediction of Wigler et al.s' theory: genetic sharing among autistic male siblings is greater with the mother than the father. Surprisingly, the latest empirical evidence from Wigler's lab seems to indicate that genetic sharing among autistic male siblings is greater with the father than the mother. If this evidence can be firmed up, it refutes the Unified Theory and requires new ideas. One such idea involves the existence of "protecting alleles", which we are currently incorporating in our model.
Research Seminar in Statistics
Genetic Modeling of Autism
HG G 19.1
JavaScript has been disabled in your browser