vroom is a delimited file reader for R that achieves speeds up to 1.23 GB/sec by indexing file locations rather than immediately reading all data. It uses R’s Altrep framework to lazily load data only when accessed, eliminating the performance cost of reading unused columns or rows.
The package supports nearly all readr parsing features while adding multi-file reading, multi-byte Unicode delimiters, and column selection. It uses multiple threads for indexing and parsing to further improve performance, delivering speeds 50x faster than base R and 10x faster than data.table on large datasets. vroom handles all standard delimited file complexities including quoted fields, custom delimiters, type guessing, and embedded newlines.
Contributors#
Resources featuring vroom#
Open source development practices | Isabel Zimmerman & Davis Vaughan | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We’d love to see you!
We were recently joined by Isabel Zimmerman and Davis Vaughan, Software Engineers at Posit, to chat about the life of an open source developer, strategies for navigating complex codebases, and how to leverage AI in data science workflows. Plus, NERDY BOOKS!
In this Hangout, we explore the differences between maintaining established ecosystems like the Tidyverse as well as building new tools like the Positron IDE. Davis and Isabel (and sometimes Libby ) share practical advice for developers, such as the utility of AI for writing tests and “rubber ducking”, and their various approaches to writing accessible documentation that bridges the expert-novice gap.
Resources mentioned in the video and zoom chat: Positron IDE → https://posit.co/positron/ Air (R formatter) → https://posit-dev.github.io/air/ Python Packages Book (free) → https://py-pkgs.org/ R Packages Book (free) → https://r-pkgs.org/ DeepWiki (AI tool mentioned for docs) → https://deepwiki.com/tidyverse/vroom
If you didn’t join live, one great discussion you missed from the zoom chat was about Brandon Sanderson’s Cosmere books and the debate between starting with Mistborn vs. The Stormlight Archive. Are you a Cosmere fan?! Which book did you start with? (Libby started with Elantris years before picking up Mistborn Era 1 book 1, but she’d now recommend maybe starting with Warbreaker!)
► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu
Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co
Thanks for hanging out with us!
Timestamps: 00:00 Introduction 04:41 “What does a day in the life of an open source dev look like?” 09:43 “What got you into building your own R packages?” 13:00 “Personal tips for working with code bases you’re not familiar with?” 16:35 “How much of what you build is in R/Python vs. lower-level languages?” 19:57 “Does Air work inside code chunks in Positron?” 20:12 “Changing the Python Quarto formatter in Positron without an extension” 22:56 “What do your side projects look like?” 26:40 “How do you approach writing documentation?” 30:55 “What interesting trends in data science are you noticing?” 33:38 “How do you leverage AI in your work?” 37:30 “What are the hexes on Davis’s back wall?” 38:50 “What career advice would you give to someone in a similar position?” 43:45 “How can I be more resilient when things go wrong?” 47:59 “Do you have keyboard preferences?” 49:25 “What is the best way to report bugs in packages?” 50:56 “Open source dev work vs. in-house dev work” 51:50 “Tips for getting started with Positron”

