Accessing and Using Data in webR

Author

James Joseph Balamuta

Published

January 14, 2024

Modified

June 16, 2024

Overview

When working with webR in a web environment, there are some modifications and considerations required for using data. This documentation entry guides you through a few changes related to accessing data.

Background: Virtual File System

Given the browser-based nature of webR, accessing local files is restricted. To overcome this limitation, webR establishes a virtual file system inside of your browser that is separate from your local file system. Consequently, webR does not have awareness of local file system and its paths. Thus, to use data we need to download it into the virtual file system either through: an R package, a URL using HTTPS, or a Web API.

By default, the webR virtual file system’s home directory and initial working directory is /home/web_user. This can be changed using a built-in extension document-level option home-dir.

Aside

While there are methods for mounting pre-built images using the webR’s Mounting Filesystem mechanic, the quarto-webr Extension does not support this option at the moment.

Accessing Data through R Data Packages

The quickest approach for accessing data is to store it inside of an R data package. This kind of R package consists solely of data in an R ready format with the added benefit of help documentation. If the data package is available on CRAN, there’s a good chance a version exists for webR on the main webR package repository (warning not a mobile data friendly link) and, thus, can be accessed using install.packages("pkg") or added to the documents packages key.

If the R package is not available on CRAN, then it will need to be compiled for webR, deployed, and accessed through GitHub Pages or r-universe.dev by following the advice on creating a custom webR/R WASM package repository.

Retrieving Data from the Web

Important

Before proceeding, take note of the following considerations when working with remote data:

  1. Security Protocol: webR necessitates data retrieval via the HyperText Transfer Protocol Secure (HTTPS) protocol to ensure secure connections and the Cross-Origin Resource Sharing (CORS) being enabled on the server where the data is being served.
  2. Package Compatibility: In the absence of websockets within webR, packages reliant on {curl} methods may require adaptation or alternative solutions.

Hosting Data

Standalone Repository

We suggest creating a GitHub repository that uses GitHub Pages to host the data. By default, GitHub Pages serves data files using the CORS protocol and can quickly be setup to enforce HTTPS URLs by checking a box.

You can see an example raw data repository here:

https://github.com/coatless/raw-data

The corresponding site deployment of the main branch can be seen here:

https://coatless.github.io/raw-data/

Alongside the document

There may be times when it is not feasible to create a standalone repository to host the data. In cases like this, you may wish to host the data alongside of the document through Quarto’s publishing system. In this case, please add the resources key to the top of the HTML document or inside the project’s _quarto.yml.

For my-document.qmd, this would be:

---
title: "quarto-webr document with data"
format: 
  html:
    resources:
      - my-data.csv         # Include just the CSV
      - my-data-directory/* # Include all files
engine: knitr
filters:
  - webr
---

For _quarto.yml, this would be:

---
project:
  type: website
  resources:
    - my-data.csv         # Include just the CSV
    - my-data-directory/* # Include all files
---

Subsequently, reference the data using the URL to where the document is located. For example, if the document is at:

https://example.com/folder/my-document.html

Then, the data should be accessed using:

https://example.com/folder/my-data.csv
Note

You may need to publish the document before using the URL. Also, be mindful of data version mismatches, as the data will be fetched from the HTTPS URL instead of being available locally.

Obtain Data

We can retrieve data at a URL with HTTPS through using the download.file() function and, subsequently, reading it into R using a relative path. The later can be done using either Base R or Tidyverse functions.

For example, if we wanted to work with flights.csv from the nycflights13 R package (Details), we would specify:

url <- "https://coatless.github.io/raw-data/flights.csv"
download.file(url, "flights.csv")

This action saves the file into webR’s virtual file system to be read into R’s analysis environment. Replace "https://coatless.github.io/raw-data/flights.csv" and "flights.csv" with the actual URL of your desired data source and desired local file name.

Base R

For optimized performance, leverage base R’s read.*() functions, as they do not necessitate additional package dependencies.

data <- read.csv("flights.csv")

Tidyverse

Alternatively, you can use tidyverse-based functions like readr::read_*().

Note

Note that employing tidyverse or readr functions entails additional package downloads at the session’s outset or immediately preceeding the function usage.

install.packages("readr")
data <- readr::read_csv("data.csv")

Try it!

We’ve setup the above example inside of an interactive cell for your to explore below.

```{webr-r}
#| autorun: true
# See where we are in the file system:
cat("We're currently at:\n")
getwd()

# View a list of files for the working directory.
cat("We have the following files present:\n")
list.files()

# Specify the data URL using HTTPS
url <- "https://coatless.github.io/raw-data/flights.csv"

# Download the data file from the HTTPS URL and save it as
# flights.csv
cat("Download the data ...\n")
download.file(url, "flights.csv")

# Check for the data.
cat("After downloading the data, we now have:\n")
list.files()

# Read the flights data into R
flights_from_csv <- read.csv("flights.csv")

# See the first few rows of the flights_from_csv data frame.
cat("Let's view the first 6 observations of data:\n")
head(flights_from_csv)
```