Name: Reproducible Data Pipelines in R: what are they, how to use them, and a hands on example using dataRetrieval and targets
Start: 2023-05-05T08:30:00-0400
End: 2023-05-05T10:00:00-0400

Back To Schedule

Reproducible Data Pipelines in R: what are they, how to use them, and a hands on example using dataRetrieval and targets

Feedback form is now closed.

CDI Workshop Links Page (sharepoint.com)

Modern scientific workflows face common challenges including accommodating growing volumes and complexity of data and the need to update analyses as new data becomes available or project needs change. The use of automated data analysis pipelines can help overcome these challenges and more efficiently translate open data to actionable scientific insights. These data pipelines are transparent, reproducible, and robust to changes in the data or analysis, and therefore promote efficient, open science. In this workshop, participants will learn what makes a reproducible data pipeline and what differentiates it from a workflow as well as the key organizational concepts for effective pipeline development. Participants will gain hands-on experience with the basics of buildingpipelines and the R-based pipeline tool “targets”. They will also receive a brief introduction to a pipeline that queries data from the Water Quality Portal (WQP). By sharing our approach and template in this session we hope to demonstrate the value of reproducible pipelines and to provide participants a reusable template pipeline that can be customized to meet the needs of individual projects.

The attached "presentation" (Module_00_Getting_Started.zip) contains an R project and R script that includes all of the packages needed to follow along with the guided walkthroughs.

Participants do not need to have these packages installed to understand the topics of the session, but they will need to run the installation script if they want to follow along on their own laptops.
Agenda:

An introduction to reproducible research pipelines
Skills needed to get started with pipelines in R
An introduction to Targets
An overview of the WQP pipeline template

Speakers

Lindsay Platt

Water Data Scientist, U.S. Geological Survey

I ❤ R, data visualization, data pipelining, reproducible science

Julie Padilla

Data Scientist, US Geological Survey

Let's talk about using reproducible data pipelines and workflows to help your project save money and preserve scientific integrityA little about my background:Data Scientist at U.S. Geological Survey, Data Science Branch, Integrated Information and Dissemination Division (IIDD)Reproducible... Read More →

Lauren Koenig

Data Scientist, U.S. Geological Survey

data pipelining, reproducible workflows, surface water quality, biogeochemistry

Module 00 Getting Started zip

Friday May 5, 2023 8:30am - 10:00am EDT
Instructional East (Turner) 201

Breakout Session

Target Audience We are targeting individuals who are looking for new methods to query data in a reproducible way and R users who are familiar with scripting but are new to pipelines
Session level Some prior knowledge would benefit participants AND a quick review of supporting materials before the session will be a sufficient orientation for novices to feel more comfortable in the session;
Ways to Prepare Have R and RStudio installed and familiarity with running scripts. They should have installed the packages provided in the installation script. This includes including ggplot2, tidyverse, sbtools, dataRetrieval, and targets. It would be a bonus if they are familiar with Water Quality Portal.

2023 CDI Workshop

Lindsay Platt

Julie Padilla

Lauren Koenig

Attendees (40)

2023 CDI Workshop

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Lindsay Platt

Julie Padilla

Lauren Koenig

Attendees (40)