Seminar overview
×
Modal title
Modal content
Autumn Semester 2017
Date & Time | Speaker | Title | Location |
---|---|---|---|
Thr 05.10.2017 16:15-17:00 |
Christoph Stadtfeld Department of Humanities, Social and Political Sciences |
Abstract
Important questions in the social sciences are concerned with the circumstances
under which individuals, organizations, or states mutually agree to
form social network ties. Examples of these coordination ties are found in
such diverse domains as scientific collaboration, international treaties, and
romantic relationships and marriage. This article introduces dynamic network
actor models (DyNAM) for the statistical analysis of coordination networks
through time. The strength of the models is that they explicitly address
five aspects about coordination networks that empirical researchers will
typically want to take into account: (1) that observations are dependent, (2)
that ties reflect the opportunities and preferences of both actors involved, (3)
that the creation of coordination ties is a two-sided process, (4) that data
might be available in a time-stamped format, and (5) that processes typically
differ between tie creation and dissolution (signed processes), shorter and longer time windows (windowed processes), and initial and repeated creation
of ties (weighted processes). Two empirical case studies demonstrate
the potential impact of DyNAM models: The first is concerned with the formation
of romantic relationships in a high school over 18 months, and the
second investigates the formation of international fisheries treaties from
1947 to 2010.
Keywords
social networks, coordination ties, time-stamped data, stochastic actor-oriented
models, goldfish, longitudinal network models, relational event models,
international cooperation, romantic ties, DyNAM
ZüKoSt Zürcher Kolloquium über StatistikDYNAMIC NETWORK ACTOR MODELS: INVESTIGATING COORDINATION TIES THROUGH TIMEread_more |
HG G 19.1 |
Thr 19.10.2017 16:15-17:00 |
Juan Nieto Department of Mechanical and Process Engineering, ETHZ |
Abstract
Robot localization is a key capability needed to enable truly autonomous mobile robots. In this talk we will describe the SLAM (Simultaneous Localization and Mapping) problem, which consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot (localization) moving within it. We will first present the different and necessary components of a prototypical SLAM system, from the sensor data through the data association and loop closure to the state estimator. Then, we discuss some of the classical approaches to SLAM and show what is now the de-facto standard formulation. During the talk we will cover a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping. Furthermore, we will delineate open challenges and new research issues, that deserve careful scientific investigation.
ZüKoSt Zürcher Kolloquium über StatistikA review on SLAM: Past, Present and Open Problemsread_more |
HG G 19.2 |
Fri 20.10.2017 15:15-16:00 |
Giuseppe Cavaliere University of Bologna |
Abstract
Asymptotic bootstrap validity is usually understood as consistency of the distribution of a bootstrap statistic, conditional on the data, for the unconditional limit distribution of a statistic of interest. From this perspective, randomness of the limit bootstrap measure is regarded as a failure of the bootstrap. Nevertheless, apart from an unconditional limit distribution, a statistic of interest may possess a host of (random) conditional limit distributions. This allows the understanding of bootstrap validity to be widened, while maintaining the requirement of asymptotic control over the frequency of correct inferences. First, we provide conditions for the bootstrap to be asymptotically valid as a tool for conditional inference, in cases where a bootstrap distribution estimates consistently, in a sense weaker than the standard weak convergence in probability, a conditional limit distribution of a statistic. Second, we prove asymptotic bootstrap validity in a more basic, on-average sense, in cases where the unconditional limit distribution of a statistic can be obtained by averaging a (random) limiting bootstrap distribution. As an application, we establish rigorously the validity of fixed-regressor bootstrap tests of parameter constancy in linear regression models.
Research Seminar in StatisticsBootstrap Inference under Random Distributional Limitsread_more |
HG G 19.2 |
Thr 26.10.2017 16:15-17:00 |
Anthony Unwin Universität Augsburg, Institut für Interdisziplinäre Informatik |
Abstract
Outliers may be important, in error, or irrelevant, but they are tricky to identify and deal with. Whether a case is identified as an outlier depends on the other cases in the dataset, on the variables available, and on the criteria used. A case can stand out as unusual on one or two variables, while appearing middling on the others. If a case is identified as an outlier, it is useful to find out why. This talk discusses the O3 plot (Overview Of Outliers) for supporting outlier analyses. O3 plots show which cases are often identified as outliers, which are identified in single dimensions, and which are only identified in higher dimensions. They highlight which variables and combinations of variables may be affected by possible outliers. Applications include a demographic dataset for the Bundestag constituencies in Germany and a university ranking dataset.
ZüKoSt Zürcher Kolloquium über StatistikUnderstanding Potential Outliers using the O3 Plotread_more |
HG G 19.2 |
Fri 03.11.2017 15:15-16:00 |
Zoltan Szabo Université Paris-Sarclay |
Abstract
Maximum mean discrepancy (MMD) and Hilbert-Schmidt independence
criterion (HSIC) are among the most popular and successful approaches in
applied mathematics to measure the difference and the independence of
random variables, respectively. Thanks to their kernel-based
foundations, MMD and HSIC are applicable on a large variety of domains
such as documents, images, trees, graphs, time series, dynamical
systems, sets or permutations. Despite their tremendous practical
success, quite little is known about when HSIC characterizes
independence and MMD with tensor kernel can discriminate probability
distributions, in terms of the contributing kernel components. In this
talk, I am going to provide a complete answer to this question, with
conditions which are often easy to verify in practice.
[Joint work with Bharath K. Sriperumbudur (PSU). Preprint:
https://arxiv.org/abs/1708.08157, ITE toolbox (estimators):
https://bitbucket.org/szzoli/ite-in-python/]
Research Seminar in StatisticsTensor Product Kernels: Characteristic Property and Universalityread_more |
HG G 19.1 |
Thr 09.11.2017 16:15-17:00 |
Field Chris Dalhousie University |
Abstract
This talk is motivated by issues arising from microbial oceanic data that biological researchers have been collecting to understand variation in response to environmental changes. The data typically consists of counts of OTU's (operational taxonomic units) from ocean samples varying either in time, treatment and/or space. At any given observation there will typically be at least 1000 OTU’s of potential interest. In particular I will describe some data arising from an experimental study of microbial organisms in the oceans to asses the effect of enhanced carbon loading. I will indicate briefly our approach to modelling the data to take into account both the experimental and temporal variation. In our view, an essential first step is to carry out dimension reduction via clustering based on the results of Poisson generalized linear models. Then we can carry out the tests for any significant experimental effect. In this data, it is likely that only a small subset of the OTU’s may show a response to the carbon loading. During the talk, it will be clear that there are still some unresolved statistical issues on which we welcome feedback. I should note that although the data I have is from the ocean, similar issues will arise with microbial data coming from many other environments such as the human gut.
ZüKoSt Zürcher Kolloquium über StatistikModelling Microbial Data with Time, Treatment and/or Space Variationread_more |
HG G 19.1 |
Tue 14.11.2017 15:15-16:00 |
Patrik Guggenberger Penn State University |
Abstract
We study subvector inference in the linear instrumental variables model assuming homoskedasticity but allowing for weak instruments. The subvector Anderson and Rubin (1949) test that uses chi square critical values with degrees of freedom reduced by the number of parameters not under test, proposed by Guggenberger et al (2012), controls size but is generally conservative. We propose a conditional subvector Anderson and Rubin test that uses data-dependent critical values that adapt to the strength of identification of the parameters not under test. This test has correct size and strictly higher power than the subvector Anderson and Rubin test by Guggenberger et al (2012). We provide tables with conditional critical values so that the new test is quick and easy to use.
Research Seminar in StatisticsA more powerful subvector Anderson-Rubin test in linear instrumental variable regressionsread_more |
HG G 19.2 |
Fri 17.11.2017 15:15-16:00 |
Peter Orbanz Columbia University, New York |
Abstract
Consider a very large graph---say, the link graph of a large
social network. Now invent a randomized algorithm that extracts a
smaller subgraph. If we use the subgraph as sample data and perform
statistical analysis on this sample, what can we learn about the
underlying network? Clearly, that should depend on the algorithm. We
approach the problem by considering what distributional symmetries are
satisfied by the algorithm. There is a specific algorithm for which
the induced symmetry is precisely exchangeability. In this case, the
appropriate statistical models are so-called graphon models, but
things change drastically if seemingly minor modifications are made to
the subsampler. I will discuss two types of results: (1) How symmetry
properties explain what we can learn from a single sample. (2)
Convergence properties of symmetric random variables: Laws of large
numbers, central limit theorems and Berry-Esseen type bounds, which
hold whether or not the symmetry property is derived from subsampling.
Research Seminar in StatisticsSubsampling large graphs and symmetry in networksread_more |
HG G 19.1 |
Fri 15.12.2017 11:15-12:00 |
Preetam Nandy University of Pennsylvania |
Abstract
We consider the problem of identifying intermediate variables (or mediators) that regulate the
effect of a treatment on an outcome. While there has been significant research on this topic,
little work has been done when the set of potential mediators is high-dimensional. A further
complication arises when the potential mediators are interrelated. In particular, we assume
that the causal structure of the treatment, potential mediators and outcome is a directed acyclic
graph. In this setting, we propose novel methods for estimating and testing the influence of a
mediator on the outcome for high-dimensional linear structural equation models (linear SEMs).
For the estimation of individual mediation effect, we propose a modification of the IDA algorithm
that was developed for estimating causal effects from observational data. While most of the
approaches for estimating the influence of potential mediators ignore the causal relationship
among the mediators, our IDA-based approach estimates the underlying causal graph from
data. We derive a high-dimensional consistency result for the IDA-based estimators when the
data are generated from a linear SEM with sub-Gaussian errors. Further, we propose a first
asymptotically valid testing framework in such a setting, leading to a principled FDR control
approach for the identification of the set of true mediators. Finally, we empirically demonstrate
the importance of using an estimated causal graph in high-dimensional mediation analysis.
Research Seminar in StatisticsEstimating and testing individual mediation effects in high-dimensional settingsread_more |
HG G 19.1 |
Tue 19.12.2017 15:15-16:00 |
Sören Künzel University of California, Berkeley |
Abstract
There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of meta-algorithms that can take advantage of any machine learning or regression method to estimate the conditional average treatment effect (CATE) function. Meta-algorithms build on base algorithms --- such as OLS, the Nadaraya-Watson estimator, Random Forests (RF), Bayesian Additive Regression Trees (BART) or neural networks --- to estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a new meta-algorithm, the X-learner, that is provably efficient when the number of units in one treatment group is much larger than another, and it can exploit structural properties of the CATE function. For example, if the CATE function is parametrically linear and the response functions in treatment and control are Lipschitz continuous, the X-learner can still achieve the parametric rate under regularity conditions. We then introduce versions of the X-learner that use RF and BART as base learners. In our extensive simulation studies, the X-learner performs favorably, although none of the meta-learners is uniformly the best. We also analyze two real data applications and provide a software package that implements our methods.
Research Seminar in StatisticsMeta-learners for Estimating Heterogeneous Treatment Effects using Machine Learning read_more |
HG G 19.2 |