Software

rvest

Simple web scraping for R

tidyverse/rvest

rvest.tidyverse.org

1513 stars

352 forks

Tidyverse

rvest is an R package for web scraping that extracts data from HTML web pages. It uses a pipe-friendly syntax inspired by libraries like Beautiful Soup to make common scraping tasks straightforward.

The package provides functions to parse HTML, select elements using CSS selectors or XPath, extract text and attributes, and convert HTML tables directly to data frames. It integrates well with tidyverse workflows and supports both single-element and multi-element extraction. For scraping multiple pages, it works alongside the polite package to respect robots.txt and avoid overwhelming servers.

Contributors#

Hadley Wickham

Chief Scientific Officer

Jeroen Ooms

Software Engineer

Charlie Gao

Senior Software Engineer

Charlotte Wickham

Senior Developer Advocate

Resources featuring rvest#

video

Inspecting websites to find JSON data APIs | Marcos Huerta | Data Science Lab

Inspecting websites to find JSON data APIs | Marcos Huerta | Data Science Lab

The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!

The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.

On this call, Libby Heeren is joined by Marcos Huerta, a Data Science Manager at Carmax, as he walks us through the guts of websites looking for data we can play with. He shows us how to find hidden REST/JSON APIs by using the web inspector in Safari/Firefox and then how to get what’s necessary to pull the same data programmatically in python or R.

Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Daniel Chen

Marcos’s urls: Website: https://marcoshuerta.com GitHub: https://github.com/astrowonk/

Resources from the hosts and from participants in the Discord chat:

Postman: https://www.postman.com/ Insomnia (open source alternative to Postman): https://insomnia.rest/ Baseball Savant website Marcos is using: https://baseballsavant.mlb.com/gamefeed/?gamePk=777076 Isabella Velasquez’s blog on using {polite} R package to help scrape Wikipedia: https://ivelasq.rbind.io/blog/politely-scraping/ Festivas Mac app Marcos used to add the lights to his desktop: https://festivitas.app/ Ted Laderas blog post on parsing JSON in R: https://laderast.github.io/intro_apis_json_cascadia/#/how-does-r-translate-json New rvest read_html_live() function: https://rvest.tidyverse.org/reference/read_html_live.html yyjsonr R package: https://github.com/coolbutuseless/yyjsonr tuber R package: https://github.com/gojiplus/tuber WikipediaR R package: https://www.quantargo.com/help/r/latest/packages/WikipediaR/1.1/WikipediaR-package rookiepy python package: https://pypi.org/project/rookiepy/

► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu

Follow Us Here: Website: https://www.posit.co The Lab: https://pos.it/dslab Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co

Thanks for learning with us!

Timestamps 00:00 Introduction 03:05 Web scraping vs. API calls 04:12 Server-side rendering vs. client-side JSON 06:12 Warning: Rate limits and business ethics (ahem) 08:39 Demo: Baseball Savant website 08:57 Using browser Developer Tools and the Network tab 12:15 “What is curl?” 13:30 Importing curl into Postman 16:03 Generating Python code from Postman 16:50 “Are there open source alternatives to Postman?” 17:50 Using the generated code in Python/Jupyter 22:28 R packages for JSON (jsonlite, yyjsonr) 25:09 Demo: Massachusetts Lottery website 28:17 Example: scripts Marcos automated with Cron jobs 30:17 Handling logins and cookies with RookiePie 32:19 Demo: CNN Election Data 34:26 Inspecting ESPN’s website 36:58 “Can you scrape YouTube?” 38:19 Finding hidden JSON in CardsMania history 45:00 Benefits of API inspection over Beautiful Soup 46:59 New rvest function: read_html_live 50:40 Inspecting LinkedIn and finding GraphQL 53:58 Encouragement on handling API pagination

Jan 26, 2026

54 min

927 views

rvest tidyverse tidyverse.org

video

Ben Matheson | How Anchorage Built Alaska’s Vaccine Finder with R | Posit (2022)

Ben Matheson | How Anchorage Built Alaska’s Vaccine Finder with R | Posit (2022)

In January 2021, Alaska residents seeking a COVID-19 vaccine appointment faced a convoluted maze of websites. The software was made for providers—not for residents.

The Anchorage Innovation Team built a fast, and mobile vaccine finder website for Alaska using R. What started as a web scraping prototype launched statewide one week later and ultimately connected tens of thousands of Alaskans to a vaccine.

This talk will cover how we used R to build Alaska’s vaccine finder. Including:

• Scraping and http packages (rvest & httr) • Using Heroku and S3 to run R jobs 24/7 • Creating a flexible data service with R

Session: R be nimble, R be quick, R help me plan my vaccine stick: Rapidly responding to world events with R

Posts about rvest#

blog post

rvest 1.0.0

The latest version of rvest brings new tools for extracting text, a radically improved html_table(), and a bunch of interface changes to better align rvest with the rest of the tidyverse

Hadley Wickham

Mar 10, 2021

tidyverse rvest Data Wrangling Tidyverse Packages

blog post

rvest: easy web scraping with R

Introducing rvest: scrape web data with CSS selectors or XPath, extract text/tables, and navigate sites with sessions

Hadley Wickham

Nov 24, 2014

rvest Data Wrangling Packages Rstudio