CDI Workshop Links Page (sharepoint.com)Modern scientific workflows face common challenges including accommodating growing volumes and complexity of data and the need to update analyses as new data becomes available or project needs change. The use of automated data analysis pipelines can help overcome these challenges and more efficiently translate open data to actionable scientific insights. These data pipelines are transparent, reproducible, and robust to changes in the data or analysis, and therefore promote efficient, open science. In this workshop, participants will learn what makes a reproducible data pipeline and what differentiates it from a workflow as well as the key organizational concepts for effective pipeline development. Participants will gain hands-on experience with the basics of buildingpipelines and the R-based pipeline tool “targets”. They will also receive a brief introduction to a pipeline that queries data from the Water Quality Portal (WQP). By sharing our approach and template in this session we hope to demonstrate the value of reproducible pipelines and to provide participants a reusable template pipeline that can be customized to meet the needs of individual projects.
The attached "presentation" (Module_00_Getting_Started.zip) contains an R project and R script that includes all of the packages needed to follow along with the guided walkthroughs.
Participants do not need to have these packages installed to understand the topics of the session, but they will need to run the installation script if they want to follow along on their own laptops.
Agenda:
- An introduction to reproducible research pipelines
- Skills needed to get started with pipelines in R
- An introduction to Targets
- An overview of the WQP pipeline template