Seminar overview

×

Modal title

Modal content

Spring Semester 2015

Date & Time Speaker Title Location
Thr 05.03.2015
16:15-17:00
Maria-Pia Victoria-Feser
Research Center for Statistics, Université de Genève
Abstract
The estimation of complex time-series or state-space models via maximum-likelihood can often be extremely complicated and burdensome. In addition, the existing estimation procedures can become highly biased if the true process is characterized by contamination which is unrelated to the process itself. Recently however, Guerrier et al. (2013) proposed a new methodology which employs the Wavelet Variance (WV), a measure which quantifies the amount of variation present in each of the sub-processes resulting from a wavelet decomposition. This methodology is called the Generalized Method of Wavelet Moments (GMWM) which takes advantage of the unique matching that exists between the WV and a stochastic process Pθ estimating the parameters θ which minimize the distance between the observed WV and that implied by the model Pθ. Moreover, the GMWM is often the only viable method to estimate the parameters of processes which are composed of an ensemble of underlying processes that operate at different scales (hereinafter composite processes). Nonetheless, many of the domains in which the GMWM can be employed often suffer from different sources of data contamination which can highly bias the parameter estimation process. It is therefore necessary to employ robust estimation methods which are able to limit the bias under different contamination settings. By using a robust estimator for the WV based on Huber's Proposal 2 or the approach proposed by Mondal and Percival (2011), it is possible to deliver a robust version of the GMWM (RGMWM) which provides a method to robustly estimate both simple time series models as well as complex state-space models or composite processes. References S. Guerrier, Y. Stebler, J. Skaloud, and M.P. Victoria-Feser. Wavelet variance based estimation for composite stochastic processes. Journal of the American Statistical Association, 2013, 108 (503): 1021-1030 D. Mondal and D.B. Percival. M-estimation of wavelet variance. Annals of The Institute of Statistical Mathematics, February 2012, Volume 64, pp 27-53.
ZüKoSt Zürcher Kolloquium über Statistik
Robust Generalised Method of Wavelet Moments
HG G 19.1
Wed 15.04.2015
16:15-17:00
Friedrich Leisch
Universität Wien
Abstract
Model diagnostic for cluster analysis is still a developing field because of its exploratory nature. Numerous indices have been proposed in the literature to evaluate goodness-of-fit, but no clear winner that works in all situations has been found yet. Derivation of (asymptotic) distribution properties is not possible in most cases. Resampling schemes provide an elegant framework to computationally derive the distribution of interesting quantities describing the quality of a partition. Special emphasis will be given to stability of a partition, i.e., given a new sample from the same population, how likely is it to obtain a similar clustering? This framework has been implemented in R with automatic support for parallel processing on multiple cores or compute clusters. An example from market segmentation is used to illustrate the procedures.
ZüKoSt Zürcher Kolloquium über Statistik
FLEXIBLE IMPLEMENTATION OF RESAMPLING SCHEMES FOR CLUSTER VALIDATION
HG G 19.1
Thr 23.04.2015
16:15-17:00
Diego Kuonen
Statoo Consulting, Bern
Abstract
There is no question that big data have hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms 'big data' and 'data science'. This presentation gives a professional statistician's 'big tent' view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.
ZüKoSt Zürcher Kolloquium über Statistik
A Statistician's 'Big Tent' View on Big Data and Data Science
HG G 19.1
Fri 15.05.2015
15:15-16:00
Anders Kock
Aarhus University
Abstract
In this paper we consider the conservative Lasso which we argue penalizes more correctly than the Lasso and show how it may be deparsified in the sense of van de Geer et al (2014) in order to construct asymptotically honest (uniform) confidence bands. In particular, we develop an oracle inequality for the conservative Lasso only assuming the existence of a certain number of moments. This is done by means of the Marcinkiewicz-Zygmund inequality which in our context provides sharper bounds than Nemirovski's inequality. We allow for heteroskedastic non-subgaussian error terms and covariates. Next, we desparsify the conservative Lasso estimator and derive the asymptotic distribution of tests involving an increasing number of parameters. As a stepping stone towards this, we also provide a feasible uniformly consistent estimator of the asymptotic covariance matrix of an increasing number of parameters which is robust against conditional heteroskedasticity. To our knowledge we are the first to do so. Next, we show that our confidence bands are honest over sparse high-dimensional sub vectors of the parameter space and that they contract at the optimal rate. All our results are valid in high-dimensional models. Our simulations reveal that the desparsified conservative Lasso estimates the parameters more precisely than the desparsified Lasso, has better size properties, and produces confidence bands with superior coverage rates.
Research Seminar in Statistics
Asymptotically Honest Confidence Regions for High Dimensional Parameters by the Desparsified Conservative Lasso
HG G 19.1
Fri 29.05.2015
15:15-16:00
Anastasios Magdalinos
University of Southampton
Abstract
A new econometric methodology of inference is developed in systems of cointegrating and predictive regressions with unknown and potentially multiple persistence degrees along equations. It is well known that conventional approaches to estimating cointegrating regressions fail to produce even asymptotically valid inference procedures when the regressors are nearly integrated, and substantial size distortions can occur in econometric testing. The new framework developed here enables a general approach to inference that resolves this difficulty and is robust to the persistence characteristics of the regressors, making it suitable for general practical application. Estimation of systems of time series with mixed I(0), I(1) and all intermediate near I(1) behavior is achieved by means of constructing mildly integrated "IVX" instruments by filtering the system regressors. A mixed Gaussian limit theory is established for the IVX estimator of the full system and a standard chi-squared limit theory is established for the corresponding IVX based Wald test statistic. This new IVX technique eliminates the endogeneity problems of conventional cointegration methods with near integrated regressors, accommodates the presence of stationary regressors and robustifies inference to uncertainty over the nature of the (potentially multiple) integration orders present in the system. The methods are easily implemented, widely applicable and help to alleviate practical concerns about the use of cointegration methodology.
Research Seminar in Statistics
Robust Econometric Inference in Cointegrated Systems with Multiple Persistence Degrees
HG G 19.1
Fri 12.06.2015
15:15-16:00
Ioannis Tsamardinos
Computer Science Department, University of Crete
Abstract
Scientific practice typically involves studying a system over a series of studies and data collection, each time trying to unravel a different aspect. In each study, the scientist may take measurements under different experimental conditions and measure different sets of quantities (variables). The result is a collection of heterogeneous data sets coming from different distributions. Even so, these are generated by the same causal mechanism. The general idea in Integrative Causal Analysis (INCA) is to identify the set of causal models that simultaneously fit (are consistent) with all sources of data and prior knowledge and reason with this set of models. Integrative Causal Analysis allows more discoveries than what is possible by independent analysis of datasets. In this talk, we’ll present advances in this direction that lead to algorithms that can handle more types of heterogeneity, and aim at increasing efficiency or robustness of discoveries. Specifically, we’ll present general INCA algorithms for causal discovery from heterogeneous data and proof-of-concept applications and massive evaluation on real data of the main concepts. We'll briefly mention advances for converting the results of tests to posterior probabilities and allow conflict resolution and identification of the confidence network regions, extensions that can deal with prior causal knowledge, and extensions that handle case-control data.
Research Seminar in Statistics
Advances in Integrative Causal Analysis
HG G 19.1
Thr 25.06.2015
16:15-17:00
Hadley Wickham
Rice University
Abstract
A fluent interface lets you easily express yourself in code. Over time a fluent interface retreats to your subconcious. You don't need to bring it to mind; the code just flows out of your fingers. I strive for this fluency in all the packages I write, and while I don't always succeed, I think I've learned some valuable lessons along the way. In this talk, I'll discuss three guidelines that make it easier to develop fluent interfaces: * __Pure functions__. A pure function only interacts with the world through its inputs and outputs; it has no side-effects. Pure functions make great building blocks because they're are easy to reason about and can be easily composed. * __Predictable interfaces__. It's easier to learn a function if its consistent, because you can learn the behaviour of a whole group of functions at once. I'll highlight the benefits of predictability with some of my favourite R "WAT"s (including `c()`, `sapply()` and `sample()`). * __Pipes__. Pure predictable functions are nice in isolation but are most powerful in combination. The pipe, `%>%`, is particularly in important when combining many functions because it turns function composition on its head so you can read it from left-to-right. I'll show you how this has helped me build dplyr, rvest, ggvis, lowliner, stringr and more. This talk will help you make best use of my recent packages, and teach you how to apply the same principles to make your own code easier to use.
ZüKoSt Zürcher Kolloquium über Statistik
Pure, predictable, pipeable: creating fluent interfaces with R.
HG E 1.1
JavaScript has been disabled in your browser