Navigation Area
ZüKoSt: Seminar on Applied Statistics
Main content
Would you like to be notified about these presentations via e-mail? Please subscribe here.
Spring Semester 2017
Note: The highlighted event marks the next occurring event.
Date / Time | Speaker | Title | Location | |
---|---|---|---|---|
22 February 2017 16:15-17:00 |
Stef van Buuren Utrecht University Gerko Vink Utrecht University |
A quick tour with the mice package for imputing missing data | HG G 19.1 | |
Abstract: Nearly all data analytic procedures in R are designed for complete data and fail if the data contain NA's. Most procedures simply ignore any incomplete rows in the data, or use ad-hoc procedures like replacing NA with the "best value". However, such procedures for fixing NA's may introduce serious biases in the ensuing statistical analysis. Multiple imputation is a principled solution for this problem and is implemented in the R package MICE. In this talk we will give a compact overview of MICE capabilities for R experts, followed by a discussion. | ||||
2 March 2017 16:15-17:00 |
Ben Marwick University of Washington, Seattle |
Reproducible Research Compendia via R packages | HG G 19.1 | |
Abstract: "Long considered an axiom of science, the reproducibility of scientific research has recently come under scrutiny after some highly-publicized failures to reproduce results. This has often been linked to the failure of the current model of journal publishing to provide enough details for reviewers to adequately assess the correctness of papers submitted for publication. One early proposal for ameliorating this situation is to bundle the different files that make up a research result into a publicly-available 'compendium'. At the time it was originally proposed, creating a compendium was a complex process. In this talk I show how modern software tools and services have substantially lightened the burden of making compendia. I describe current approaches to making these compendia to accompany journal articles. Several recent projects of varying sizes are briefly presented to show how my colleagues and I are using R and related tools (e.g. version control, continuous integration, containers, repositories) to make compendia for our publications. I explain how these approaches, which we believe to be widely applicable to many types of research work, subvert the constraints of the typical journal article, and improve the efficiency and reproducibility of our research." | ||||
6 April 2017 16:15-17:00 |
Sebastian Engelke EPFL Lausanne |
Models for extremes on graphs | HG G 19.1 | |
Abstract: Max-stable processes are suitable models for extreme events that exhibit spatial dependencies. The dependence measure is usually a function of Euclidean distance between two locations. In this talk, we explore two models for extreme events on an underlying graphical structure. Dependence is more complex in this case as it can no longer be explained by classical geostatistical tools. The first model concentrates on river discharges on a network in the upper Danube catchment, where flooding regularly causes huge damage. Inspired by the work by Ver Hoef and Peterson (2010) for non-extreme data, we introduce a max-stable process on the river network that allows flexible modeling of flood events and that enables risk assessment even at locations without a gauging station. The second approach studies conditional independence structures for threshold exceedances, which result in a factorization of the likelihoods of extreme events. This allows for the construction of parsimonious dependence models that respect the underlying graph. | ||||
27 April 2017 16:15-17:00 |
Marjolein Fokkema Department of Methods and Statistics der Universität Leiden, NL |
Prediction rule ensembles, or a Japanese gardening approach to random forests | HG G 19.1 | |
Abstract: Most statistical prediction methods provide a trade-off between accuracy and interpretability. For example, single classification trees may be easy to interpret, but likely provide lower predictive accuracy than many other methods. Random forests, on the other hand, may provide much better accuracy, but are more difficult to interpret, sometimes even termed black boxes. Prediction rule ensembles (PREs) aim to strike a balance between accuracy and interpretability. They generally consist of only a small set of prediction rules, which in turn can be depicted as very simple decision trees, which are easy to interpret and apply. Friedman and Popescu (2008) proposed an algorithm for deriving PREs, which derives a large initial ensemble of prediction rules from the nodes of CART trees and selects a sparse final ensemble by regularized regression of the outcome variable on the prediction rules. The R package ‘pre’ takes a similar approach to deriving PREs and offers several additional advantages. For example, it employs an unbiased tree induction algorithm, allows for a random-forest type approach to deriving prediction rules, and allows for plotting of the final ensemble. In this talk, I will introduce PRE methodology and package 'pre', illustrate with examples based on psychological research data, and discuss some future directions. | ||||
11 May 2017 16:15-17:00 |
Alexandre Pintore Winton Capital Management |
Title T.B.A. | HG G 19.2 | |
18 May 2017 16:15-17:00 |
Philip O'Neill University of Nottingham |
Title T.B.A. | HG G 19.1 | |
Abstract: tba |
Archive: SS 17 AS 16 SS 16 AS 15 SS 15 AS 14 SS 14 AS 13 SS 13 AS 12 SS 12 AS 11 SS 11 AS 10 SS 10 AS 09