Seminar overview

×

Modal title

Modal content

Autumn Semester 2017

Date & Time Speaker Title Location
Thr 05.10.2017
16:15-17:00
Christoph Stadtfeld
Department of Humanities, Social and Political Sciences
Abstract
Important questions in the social sciences are concerned with the circumstances under which individuals, organizations, or states mutually agree to form social network ties. Examples of these coordination ties are found in such diverse domains as scientific collaboration, international treaties, and romantic relationships and marriage. This article introduces dynamic network actor models (DyNAM) for the statistical analysis of coordination networks through time. The strength of the models is that they explicitly address five aspects about coordination networks that empirical researchers will typically want to take into account: (1) that observations are dependent, (2) that ties reflect the opportunities and preferences of both actors involved, (3) that the creation of coordination ties is a two-sided process, (4) that data might be available in a time-stamped format, and (5) that processes typically differ between tie creation and dissolution (signed processes), shorter and longer time windows (windowed processes), and initial and repeated creation of ties (weighted processes). Two empirical case studies demonstrate the potential impact of DyNAM models: The first is concerned with the formation of romantic relationships in a high school over 18 months, and the second investigates the formation of international fisheries treaties from 1947 to 2010. Keywords social networks, coordination ties, time-stamped data, stochastic actor-oriented models, goldfish, longitudinal network models, relational event models, international cooperation, romantic ties, DyNAM
ZüKoSt Zürcher Kolloquium über Statistik
DYNAMIC NETWORK ACTOR MODELS: INVESTIGATING COORDINATION TIES THROUGH TIME
HG G 19.1
Thr 19.10.2017
16:15-17:00
Juan Nieto
Department of Mechanical and Process Engineering, ETHZ
Abstract
Robot localization is a key capability needed to enable truly autonomous mobile robots. In this talk we will describe the SLAM (Simultaneous Localization and Mapping) problem, which consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot (localization) moving within it. We will first present the different and necessary components of a prototypical SLAM system, from the sensor data through the data association and loop closure to the state estimator. Then, we discuss some of the classical approaches to SLAM and show what is now the de-facto standard formulation. During the talk we will cover a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping. Furthermore, we will delineate open challenges and new research issues, that deserve careful scientific investigation.
ZüKoSt Zürcher Kolloquium über Statistik
A review on SLAM: Past, Present and Open Problems
HG G 19.2
Fri 20.10.2017
15:15-16:00
Giuseppe Cavaliere
University of Bologna
Abstract
Asymptotic bootstrap validity is usually understood as consistency of the distribution of a bootstrap statistic, conditional on the data, for the unconditional limit distribution of a statistic of interest. From this perspective, randomness of the limit bootstrap measure is regarded as a failure of the bootstrap. Nevertheless, apart from an unconditional limit distribution, a statistic of interest may possess a host of (random) conditional limit distributions. This allows the understanding of bootstrap validity to be widened, while maintaining the requirement of asymptotic control over the frequency of correct inferences. First, we provide conditions for the bootstrap to be asymptotically valid as a tool for conditional inference, in cases where a bootstrap distribution estimates consistently, in a sense weaker than the standard weak convergence in probability, a conditional limit distribution of a statistic. Second, we prove asymptotic bootstrap validity in a more basic, on-average sense, in cases where the unconditional limit distribution of a statistic can be obtained by averaging a (random) limiting bootstrap distribution. As an application, we establish rigorously the validity of fixed-regressor bootstrap tests of parameter constancy in linear regression models.
Research Seminar in Statistics
Bootstrap Inference under Random Distributional Limits
HG G 19.2
Thr 26.10.2017
16:15-17:00
Anthony Unwin
Universität Augsburg, Institut für Interdisziplinäre Informatik
Abstract
Outliers may be important, in error, or irrelevant, but they are tricky to identify and deal with. Whether a case is identified as an outlier depends on the other cases in the dataset, on the variables available, and on the criteria used. A case can stand out as unusual on one or two variables, while appearing middling on the others. If a case is identified as an outlier, it is useful to find out why. This talk discusses the O3 plot (Overview Of Outliers) for supporting outlier analyses. O3 plots show which cases are often identified as outliers, which are identified in single dimensions, and which are only identified in higher dimensions. They highlight which variables and combinations of variables may be affected by possible outliers. Applications include a demographic dataset for the Bundestag constituencies in Germany and a university ranking dataset.
ZüKoSt Zürcher Kolloquium über Statistik
Understanding Potential Outliers using the O3 Plot
HG G 19.2
Fri 03.11.2017
15:15-16:00
Zoltan Szabo
Université Paris-Sarclay
Abstract
Maximum mean discrepancy (MMD) and Hilbert-Schmidt independence criterion (HSIC) are among the most popular and successful approaches in applied mathematics to measure the difference and the independence of random variables, respectively. Thanks to their kernel-based foundations, MMD and HSIC are applicable on a large variety of domains such as documents, images, trees, graphs, time series, dynamical systems, sets or permutations. Despite their tremendous practical success, quite little is known about when HSIC characterizes independence and MMD with tensor kernel can discriminate probability distributions, in terms of the contributing kernel components. In this talk, I am going to provide a complete answer to this question, with conditions which are often easy to verify in practice. [Joint work with Bharath K. Sriperumbudur (PSU). Preprint: https://arxiv.org/abs/1708.08157, ITE toolbox (estimators): https://bitbucket.org/szzoli/ite-in-python/]
Research Seminar in Statistics
Tensor Product Kernels: Characteristic Property and Universality
HG G 19.1
Thr 09.11.2017
16:15-17:00
Field Chris
Dalhousie University
Abstract
This talk is motivated by issues arising from microbial oceanic data that biological researchers have been collecting to understand variation in response to environmental changes. The data typically consists of counts of OTU's (operational taxonomic units) from ocean samples varying either in time, treatment and/or space. At any given observation there will typically be at least 1000 OTU’s of potential interest. In particular I will describe some data arising from an experimental study of microbial organisms in the oceans to asses the effect of enhanced carbon loading. I will indicate briefly our approach to modelling the data to take into account both the experimental and temporal variation. In our view, an essential first step is to carry out dimension reduction via clustering based on the results of Poisson generalized linear models. Then we can carry out the tests for any significant experimental effect. In this data, it is likely that only a small subset of the OTU’s may show a response to the carbon loading. During the talk, it will be clear that there are still some unresolved statistical issues on which we welcome feedback. I  should note that although the data I have is from the ocean, similar issues will arise with microbial data coming from many other  environments such as the human gut.
ZüKoSt Zürcher Kolloquium über Statistik
Modelling Microbial Data with Time, Treatment and/or Space Variation
HG G 19.1
Tue 14.11.2017
15:15-16:00
Patrik Guggenberger
Penn State University
Abstract
We study subvector inference in the linear instrumental variables model assuming homoskedasticity but allowing for weak instruments. The subvector Anderson and Rubin (1949) test that uses chi square critical values with degrees of freedom reduced by the number of parameters not under test, proposed by Guggenberger et al (2012), controls size but is generally conservative. We propose a conditional subvector Anderson and Rubin test that uses data-dependent critical values that adapt to the strength of identification of the parameters not under test. This test has correct size and strictly higher power than the subvector Anderson and Rubin test by Guggenberger et al (2012). We provide tables with conditional critical values so that the new test is quick and easy to use.
Research Seminar in Statistics
A more powerful subvector Anderson-Rubin test in linear instrumental variable regressions
HG G 19.2
Fri 17.11.2017
15:15-16:00
Peter Orbanz
Columbia University, New York
Abstract
Consider a very large graph---say, the link graph of a large social network. Now invent a randomized algorithm that extracts a smaller subgraph. If we use the subgraph as sample data and perform statistical analysis on this sample, what can we learn about the underlying network? Clearly, that should depend on the algorithm. We approach the problem by considering what distributional symmetries are satisfied by the algorithm. There is a specific algorithm for which the induced symmetry is precisely exchangeability. In this case, the appropriate statistical models are so-called graphon models, but things change drastically if seemingly minor modifications are made to the subsampler. I will discuss two types of results: (1) How symmetry properties explain what we can learn from a single sample. (2) Convergence properties of symmetric random variables: Laws of large numbers, central limit theorems and Berry-Esseen type bounds, which hold whether or not the symmetry property is derived from subsampling.
Research Seminar in Statistics
Subsampling large graphs and symmetry in networks
HG G 19.1
Fri 15.12.2017
11:15-12:00
Preetam Nandy
University of Pennsylvania
Abstract
We consider the problem of identifying intermediate variables (or mediators) that regulate the effect of a treatment on an outcome. While there has been significant research on this topic, little work has been done when the set of potential mediators is high-dimensional. A further complication arises when the potential mediators are interrelated. In particular, we assume that the causal structure of the treatment, potential mediators and outcome is a directed acyclic graph. In this setting, we propose novel methods for estimating and testing the influence of a mediator on the outcome for high-dimensional linear structural equation models (linear SEMs). For the estimation of individual mediation effect, we propose a modification of the IDA algorithm that was developed for estimating causal effects from observational data. While most of the approaches for estimating the influence of potential mediators ignore the causal relationship among the mediators, our IDA-based approach estimates the underlying causal graph from data. We derive a high-dimensional consistency result for the IDA-based estimators when the data are generated from a linear SEM with sub-Gaussian errors. Further, we propose a first asymptotically valid testing framework in such a setting, leading to a principled FDR control approach for the identification of the set of true mediators. Finally, we empirically demonstrate the importance of using an estimated causal graph in high-dimensional mediation analysis.
Research Seminar in Statistics
Estimating and testing individual mediation effects in high-dimensional settings
HG G 19.1
Tue 19.12.2017
15:15-16:00
Sören Künzel
University of California, Berkeley
Abstract
There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of meta-algorithms that can take advantage of any machine learning or regression method to estimate the conditional average treatment effect (CATE) function. Meta-algorithms build on base algorithms --- such as OLS, the Nadaraya-Watson estimator, Random Forests (RF), Bayesian Additive Regression Trees (BART) or neural networks --- to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a new meta-algorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than another, and it can exploit structural properties of the CATE function. For example, if the CATE function is parametrically linear and the response functions in treatment and control are Lipschitz continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In our extensive simulation studies, the X-learner performs favorably, although none of the meta-learners is uniformly the best. We also analyze two real data applications and provide a software package that implements our methods.
Research Seminar in Statistics
Meta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning
HG G 19.2
JavaScript has been disabled in your browser