Loading…
This event has ended. Create your own event on Sched.
Friday, May 5 • 8:30am - 10:00am
Reproducible Data Pipelines in R: what are they, how to use them, and a hands on example using dataRetrieval and targets

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
CDI Workshop Links Page (sharepoint.com)

Modern scientific workflows face common challenges including accommodating growing volumes and complexity of data and the need to update analyses as new data becomes available or project needs change. The use of automated data analysis pipelines can help overcome these challenges and more efficiently translate open data to actionable scientific insights. These data pipelines are transparent, reproducible, and robust to changes in the data or analysis, and therefore promote efficient, open science. In this workshop, participants will learn what makes a reproducible data pipeline and what differentiates it from a workflow as well as the key organizational concepts for effective pipeline development. Participants will gain hands-on experience with the basics of buildingpipelines and the R-based pipeline tool “targets”. They will also receive a brief introduction to a pipeline that queries data from the Water Quality Portal (WQP). By sharing our approach and template in this session we hope to demonstrate the value of reproducible pipelines and to provide participants a reusable template pipeline that can be customized to meet the needs of individual projects.

The attached "presentation" (Module_00_Getting_Started.zip) contains an R project and R script that includes all of the packages needed to follow along with the guided walkthroughs.

Participants do not need to have these packages installed to understand the topics of the session, but they will need to run the installation script if they want to follow along on their own laptops.
Agenda:
  • An introduction to reproducible research pipelines
  • Skills needed to get started with pipelines in R
  • An introduction to Targets
  • An overview of the WQP pipeline template



Speakers
avatar for Lindsay Platt

Lindsay Platt

Water Data Scientist, U.S. Geological Survey
I ❤ R, data visualization, data pipelining, reproducible science
avatar for Julie Padilla

Julie Padilla

Data Scientist, US Geological Survey
Let's talk about using reproducible data pipelines and workflows to help your project save money and preserve scientific integrityA little about my background:Data Scientist at U.S. Geological Survey, Data Science Branch, Integrated Information and Dissemination Division (IIDD)Reproducible... Read More →
avatar for Lauren Koenig

Lauren Koenig

Data Scientist, U.S. Geological Survey
data pipelining, reproducible workflows, surface water quality, biogeochemistry



Friday May 5, 2023 8:30am - 10:00am EDT
Instructional East (Turner) 201
  Breakout Session
  • Target Audience We are targeting individuals who are looking for new methods to query data in a reproducible way and R users who are familiar with scripting but are new to pipelines
  • Session level Some prior knowledge would benefit participants AND a quick review of supporting materials before the session will be a sufficient orientation for novices to feel more comfortable in the session;
  • Ways to Prepare Have R and RStudio installed and familiarity with running scripts. They should have installed the packages provided in the installation script. This includes including ggplot2, tidyverse, sbtools, dataRetrieval, and targets. It would be a bonus if they are familiar with Water Quality Portal.