ragnar

RAG in R

tidyverse/ragnar

ragnar.tidyverse.org

169 stars

22 forks

ragnar is an R package for building Retrieval-Augmented Generation (RAG) workflows that provide LLMs with relevant context from document collections. It handles the complete pipeline: converting documents to markdown, chunking text while preserving semantic structure, generating embeddings, storing data in DuckDB, and retrieving relevant chunks based on similarity search or keyword matching.

The package emphasizes transparency and control at each step rather than black-box automation. It supports multiple document formats through MarkItDown, offers configurable chunking strategies that preserve document structure like headings, integrates with popular embedding providers (OpenAI, Ollama, Bedrock, Databricks, Google Vertex), and uses DuckDB’s vector similarity search and full-text search for efficient retrieval. ragnar can also equip ellmer Chat objects with retrieval tools, letting LLMs automatically pull relevant information from knowledge stores during conversations.

Contributors#

Resources featuring ragnar#

Should you build or buy AI? | Jay Timmerman | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We’d love to see you!

We were recently joined by Jay Timmerman, Head of Data Science & AI Platforms at Biogen, to chat about practical applications of generative AI in the pharmaceutical industry, navigating the challenges of AI like hallucinations and data privacy, the evolving role of data science teams in the age of AI, and the strategic considerations for evaluating AI vendors and building in-house solutions using platforms like Posit Connect.

In this Hangout, we explore the strategic considerations for evaluating AI vendors and the concept of “AI grift”. Jay emphasized the importance of having in-house expertise to discern genuine value from vendor offerings and suggested that organizations can often build equivalent capabilities themselves by properly framing generative AI problems. This discussion highlighted a pragmatic approach to adopting AI and the need to be discerning when considering vendor solutions versus developing internal tools.

Resources mentioned in the video and zoom chat: Abigail Haddad’s Blog on Unstructured Text Data with LLMs → https://presentofcoding.substack.com/ Hard Fork Podcast episode on vibe coding → https://podcasts.apple.com/us/podcast/is-google-search-cooked-were-getting-a-u-s-crypto/id1528594034?i=1000698254448 Posit Blog on Deploying a Streamlit Application with Posit Connect → https://posit.co/blog/deploying-a-streamlit-application-with-posit-connect/ Posit Blog on Deploying a Dash Application to Posit Connect → https://posit.co/blog/deploying-a-dash-application-to-posit-connect/ ragnar package for R users (mentioned in chat for RAG) → https://github.com/t-kalinowski/ragnar

If you didn’t join live, one great discussion you missed from the zoom chat was about “vibe coding”, with some attendees sharing humorous takes and a mention of a Hard Fork podcast episode that featured examples of such projects. Let us know below if you’d like to hear more about this topic!