BETA v2.0.0 — WINDOWS

Documentation,
ripped offline.

Turn any docs site into a clean, searchable, offline Markdown archive. Ready to grep, version, embed, or hand to an LLM.

DOWNLOAD CLIENT → EXPLORE FEATURES

docurip — session_monitor

8 threads

        visitor@docurip:~$
        docurip --url
        
        
      

            PAGES
            {{ statPages }}
          

            SIZE
            {{ statSize }}
          

            VELOCITY
            {{ statVelocity }}
          

            FAIL_RATE
            {{ statFail }}
          
[READY] Enter a target URL and hit RUN_CRAWL to mirror documentation offline.
Try: react.dev/docs, tailwindcss.com/docs, nextjs.org/docs

            folder_open
            LOCAL_ARCHIVE: react-docs/
          
📁 react-docs/
📁 reference/
📄 hooks.md — 12.4 KB
📄 context.md — 8.2 KB
📁 guide/
📄 getting-started.md — 18.6 KB
📄 rendering-modes.md — 22.1 KB
📄 index.md — 4.5 KB

                {{ previewFilename }}
                PREVIEW
              
{{ previewContent }}

01 // TOKEN ECONOMICS

Save 31× on LLM tokens.

Strip boilerplate HTML. Cache with Anthropic Prompt Caching. Pay micro-cents instead of dollars.

Documentation size {{ calcPagesLabel }}

10 pages 500 1000

RAW WEB FETCH

Every query refetches full HTML with nav, sidebar, scripts.

TOKENS (10 QUERIES)

RECOMMENDED

DOCURIP + CACHE

Clean MD loaded once. Prompt caching slashes subsequent reads 90%.

TOKENS (10 QUERIES)

SAVINGS MULTIPLIER 10 queries on same corpus

01 / HTML → MARKDOWN 5–10× fewer tokens per page

A typical docs page is 70–90% boilerplate (nav, sidebar, footer, scripts). Docurip strips it all: ~80–150 KB raw HTML becomes ~10–25 KB clean Markdown.

02 / NO REFETCHING Load once, query forever

Live-fetch tools crawl per question, paying the full HTML price each time. Docurip's merged MD loads once into context — no redundant requests.

03 / PROMPT CACHING 90% cost reduction on repeats

Anthropic's cache reads cost ~10% of normal input tokens. Cache the merged MD once and subsequent queries drop to micro-cents.

02 // CORE CAPABILITIES

Built with Rust + Tauri.

A documentation harvester optimized for speed, cleanliness, and developer experience.

01 / EXTRACTION ENGINE

Never lose context to the upstream.

Set a start URL, crawl depth, and page limit. Docurip walks the site in parallel, respecting robots.txt directives. For JS-rendered sites, it spins up headless Chrome on demand.

→ robots.txt compliance built-in → headless Chrome for CSR apps → domain-locked — never wanders off-site

crawler/active_session

dashboard/metrics

02 / REAL-TIME MONITORING

Real-time metrics & velocity tracking.

Watch crawls execute live. Speedometer animations count pages per minute. Success/failure rates. Total download size. A console drawer streams every fetch as it happens.

→ live velocity speedometer → streaming console logger → pause, resume, retry, cancel

03 / RESULT BROWSER

Built-in tree view & full-text search.

Explore your mirrored archive instantly. Debounced full-text search. Syntax-highlighted previews. Sandboxed rendering via DOMPurify to eliminate XSS risks.

→ instant full-text search → sandboxed markdown preview → hierarchical folder navigation

workspace/archive_browser

exporter/pipeline

04 / EXPORT PIPELINE

Stitch, compress, and ship anywhere.

Export as separate Markdown files, a merged handbook, PDFs, or a ZIP. Link rewriting ensures offline navigation works. Asset hashing deduplicates images and fonts.

→ automatic link rewriting → asset deduplication & hashing → RAG-ready structured output

03 // SAFETY SYSTEMS

Built to behave.

Engineered to protect your network, respect targets, and safeguard local disk space.

domain Domain Lock

Locks crawling to your target host. Never drifts into external ad networks, vendor blogs, or partner sites.

shield SSRF Protection

Blocks localhost, RFC 1918 private ranges, and link-local addresses at launch. No accidental internal hits.

lock Hardened CSP

No unsafe-inline scripts. HTML sanitized through DOMPurify. Preview pane sandboxed with strict capabilities.

pause_circle Disk Guard

Pauses on disk full, permission denied, or read-only errors. Fix the issue, hit Resume, keep your progress.

RELEASE CHANNEL: BETA v2.0.0

Ready to rip?

Download the desktop client for Windows, or explore the roadmap and changelogs.

DOWNLOAD CLIENT → VIEW CHANGELOGS

Documentation,ripped offline.