Sessions

Forty years of S

Forty years of S

6 months ago
Bell Labs in the 1970s was a hotbed of research in computing, statistics and many other fields. The conditions there encouraged the growth of the S language and influenced its content. The 40th anniversary of S is an appropriate time to relate a personal view of that scene and reflect on why S (and R) […]
Wrapping Your R tools to Analyze National-Scale Cancer Genomics in the Cloud

Wrapping Your R tools to Analyze National-Scale Cancer Genomics in the Cloud

5 months ago
The Cancer Genomics Cloud (CGC), built by Seven Bridges and funded by the National Cancer Institute hosts The Cancer Genome Atlas (TCGA), that is one of the world’s largest cancer genomics data collections. Computational resources and optimized, portable bioinformatics tools are provided to analyze the cancer data at any scale immediately, collaboratively, and reproducibly. Seven […]
Profvis: Profiling tools for faster R code

Profvis: Profiling tools for faster R code

5 months ago
As programming languages go, R has a bit of a reputation for being slow. This reputation is mostly undeserved, and it hinges on the fact that R’s copy-on-modify semantics make its performance characteristics different from other many other languages. That said, even the most expert R programmers often write code that could be faster. The […]
OPERA: Online Prediction by ExpeRts Aggregation

OPERA: Online Prediction by ExpeRts Aggregation

6 months ago
We present an R package for prediction of time series based on online robust aggregation of a finite set of forecasts (machine learning method, statistical model, physical model, human expertise, …). More formally, we consider a sequence of observations y(1), …, y(t), to be predicted element by element. At each time instance t, a finite […]
Predicting individual treatment effects

Predicting individual treatment effects

6 months ago
Treatments for complicated diseases often help some patients but not all and predicting the treatment effect of new patients is important in order to make sure every patient gets the best possible treatment. We propose model-based random forests as a method to detect similarities between patients with respect to their treatment effect and on this […]
Notebooks with R Markdown

Notebooks with R Markdown

6 months ago
Notebook interfaces for data analysis have compelling advantages including the close association of code and output and the ability to intersperse narrative with computation. Notebooks are also an excellent tool for teaching and a convenient way to share analyses. As an authoring format, R Markdown bears many similarities to traditional notebooks like Jupyter and Beaker, […]
Importing modern data into R

Importing modern data into R

6 months ago
This talk explores modern trends in data storage formats and the tools, packages and best practices to import this data into R. We will start with a quick recap of the existing tools and packages for importing data into R: readr, readxl, haven, jsonlite, xml2, odbc and jdbc. Afterwards, we will discuss modern data formats […]
edeaR: Extracting knowledge from process data

edeaR: Extracting knowledge from process data

6 months ago
During the last decades, the logging of events in a business context has increased massively. Information concerning activities within a broad range of business processes is recorded in so-called event logs. Connecting the domains of business process management and data mining, process mining aims at extracting process-related knowledge from these event logs, in order to […]
R in machine learning competitions

R in machine learning competitions

6 months ago
Kaggle is a community of almost 450K data scientists who have built almost 2MM machine learning models to participate in our competitions. Data scientists come to Kaggle to learn, collaborate and develop the state of the art in machine learning. This talk will cover some of the lessons from winning techniques, with a particular emphasis […]
bamdit: An R Package for Bayesian meta-analysis of diagnostic test data

bamdit: An R Package for Bayesian meta-analysis of diagnostic test data

6 months ago
In this work we present the R package bamdit, its name stands for "Bayesian meta-analysis of diagnostic test-data". bamdit was developed with the aim of simplifying the use of models in meta-analysis, that up to now have demanded great statistical expertise in Bayesian meta-analysis. The package implements a series of innovative statistical techniques including: the […]
ETL for medium data

ETL for medium data

6 months ago
Packages provide users with software that extends the core functionality of R, as well as data that illustrates the use of that functionality. However, by design the type of data that can be contained in an R package on CRAN is limited. First, packages are designed to be small, so that the amount of data […]
Meta-Analysis of Epidemiological Dose-Response Studies with the dosresmeta R package

Meta-Analysis of Epidemiological Dose-Response Studies with the dosresmeta R package

6 months ago
Quantitative exposures (e.g. smoking, alcohol consumption) in predicting binary health outcomes (e.g. mortality, incidence of a disease) are frequently categorized and modeled with indicator variables. Results are expressed as relative risks for the levels of exposure using one category as referent. Dose-response meta-analysis is an increasing popular statistical technique that aims to estimate and characterize […]