Skip to content

ScrapeGraphAI/just-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

41 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

just-scrape

Made with love by the ScrapeGraphAI team πŸ’œ

Demo Video

Command-line interface for ScrapeGraph AI β€” AI-powered web scraping, data extraction, search, and crawling.

Project Structure

just-scrape/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ cli.ts                       # Entry point, citty main command + subcommands
β”‚   β”œβ”€β”€ lib/
β”‚   β”‚   β”œβ”€β”€ env.ts                   # Zod-parsed env config (API key, debug, timeout)
β”‚   β”‚   β”œβ”€β”€ folders.ts               # API key resolution + interactive prompt
β”‚   β”‚   β”œβ”€β”€ scrapegraphai.ts         # SDK layer β€” all API functions
β”‚   β”‚   β”œβ”€β”€ schemas.ts               # Zod validation schemas
β”‚   β”‚   └── log.ts                   # Logger factory + syntax-highlighted JSON output
β”‚   β”œβ”€β”€ types/
β”‚   β”‚   └── index.ts                 # Zod-derived types + ApiResult
β”‚   β”œβ”€β”€ commands/
β”‚   β”‚   β”œβ”€β”€ smart-scraper.ts
β”‚   β”‚   β”œβ”€β”€ search-scraper.ts
β”‚   β”‚   β”œβ”€β”€ markdownify.ts
β”‚   β”‚   β”œβ”€β”€ crawl.ts
β”‚   β”‚   β”œβ”€β”€ sitemap.ts
β”‚   β”‚   β”œβ”€β”€ scrape.ts
β”‚   β”‚   β”œβ”€β”€ agentic-scraper.ts
β”‚   β”‚   β”œβ”€β”€ generate-schema.ts
β”‚   β”‚   β”œβ”€β”€ history.ts
β”‚   β”‚   β”œβ”€β”€ credits.ts
β”‚   β”‚   └── validate.ts
β”‚   └── utils/
β”‚       └── banner.ts                # ASCII banner + version from package.json
β”œβ”€β”€ tests/
β”‚   └── scrapegraphai.test.ts        # SDK layer tests (mocked fetch)
β”œβ”€β”€ dist/                            # Build output (git-ignored)
β”‚   └── cli.mjs                      # Bundled ESM with shebang
β”œβ”€β”€ package.json
β”œβ”€β”€ tsconfig.json
β”œβ”€β”€ tsup.config.ts
β”œβ”€β”€ biome.json
└── .gitignore

Installation

npm install -g just-scrape           # npm (recommended)
pnpm add -g just-scrape              # pnpm
yarn global add just-scrape           # yarn
bun add -g just-scrape               # bun
npx just-scrape --help               # or run without installing

Package: just-scrape on npm.

Coding Agent Skill

Use just-scrape as a skill for AI coding agents via Vercel's skills.sh:

bunx skills add https://github.com/ScrapeGraphAI/just-scrape

Browse the skill: skills.sh/scrapegraphai/just-scrape/just-scrape

Configuration

The CLI needs a ScrapeGraph API key. Get one at dashboard.scrapegraphai.com.

Four ways to provide it (checked in order):

  1. Environment variable: export SGAI_API_KEY="sgai-..."
  2. .env file: SGAI_API_KEY=sgai-... in project root
  3. Config file: ~/.scrapegraphai/config.json
  4. Interactive prompt: the CLI asks and saves to config
export JUST_SCRAPE_TIMEOUT_S=300     # Request/polling timeout in seconds (default: 120)
JUST_SCRAPE_DEBUG=1 just-scrape ...  # Debug logging to stderr

JSON Mode (--json)

All commands support --json for machine-readable output. When set, banner, spinners, and interactive prompts are suppressed β€” only raw JSON on stdout.

just-scrape credits --json | jq '.remaining_credits'
just-scrape smart-scraper https://example.com -p "Extract data" --json > result.json
just-scrape history smartscraper --json | jq '.requests[].status'

Smart Scraper

Extract structured data from any URL using AI. docs

Usage

just-scrape smart-scraper <url> -p <prompt>                # Extract data with AI
just-scrape smart-scraper <url> -p <prompt> --schema <json> # Enforce output schema
just-scrape smart-scraper <url> -p <prompt> --scrolls <n>   # Infinite scroll (0-100)
just-scrape smart-scraper <url> -p <prompt> --pages <n>    # Multi-page (1-100)
just-scrape smart-scraper <url> -p <prompt> --stealth      # Anti-bot bypass (+4 credits)
just-scrape smart-scraper <url> -p <prompt> --cookies <json> --headers <json>
just-scrape smart-scraper <url> -p <prompt> --plain-text   # Plain text instead of JSON

Examples

# Extract product listings from an e-commerce page
just-scrape smart-scraper https://store.example.com/shoes -p "Extract all product names, prices, and ratings"

# Extract with a strict schema, scrolling to load more content
just-scrape smart-scraper https://news.example.com -p "Get all article headlines and dates" \
  --schema '{"type":"object","properties":{"articles":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"date":{"type":"string"}}}}}}' \
  --scrolls 5

# Scrape a JS-heavy SPA behind anti-bot protection
just-scrape smart-scraper https://app.example.com/dashboard -p "Extract user stats" \
  --stealth

Search Scraper

Search the web and extract structured data from results. docs

Usage

just-scrape search-scraper <prompt>                        # AI-powered web search
just-scrape search-scraper <prompt> --num-results <n>      # Sources to scrape (3-20, default 3)
just-scrape search-scraper <prompt> --no-extraction        # Markdown only (2 credits vs 10)
just-scrape search-scraper <prompt> --schema <json>        # Enforce output schema
just-scrape search-scraper <prompt> --stealth --headers <json>

Examples

# Research a topic across multiple sources
just-scrape search-scraper "What are the best Python web frameworks in 2025?" --num-results 10

# Get raw markdown from search results (cheaper)
just-scrape search-scraper "React vs Vue comparison" --no-extraction --num-results 5

# Structured output with schema
just-scrape search-scraper "Top 5 cloud providers pricing" \
  --schema '{"type":"object","properties":{"providers":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"free_tier":{"type":"string"}}}}}}'

Markdownify

Convert any webpage to clean markdown. docs

Usage

just-scrape markdownify <url>                              # Convert to markdown
just-scrape markdownify <url> --stealth                    # Anti-bot bypass (+4 credits)
just-scrape markdownify <url> --headers <json>             # Custom headers

Examples

# Convert a blog post to markdown
just-scrape markdownify https://blog.example.com/my-article

# Convert a JS-rendered page behind Cloudflare
just-scrape markdownify https://protected.example.com --stealth

# Pipe markdown to a file
just-scrape markdownify https://docs.example.com/api --json | jq -r '.result' > api-docs.md

Crawl

Crawl multiple pages and extract data from each. docs

Usage

just-scrape crawl <url> -p <prompt>                        # Crawl + extract
just-scrape crawl <url> -p <prompt> --max-pages <n>        # Max pages (default 10)
just-scrape crawl <url> -p <prompt> --depth <n>            # Crawl depth (default 1)
just-scrape crawl <url> --no-extraction --max-pages <n>    # Markdown only (2 credits/page)
just-scrape crawl <url> -p <prompt> --schema <json>        # Enforce output schema
just-scrape crawl <url> -p <prompt> --rules <json>         # Crawl rules (include_paths, same_domain)
just-scrape crawl <url> -p <prompt> --no-sitemap           # Skip sitemap discovery
just-scrape crawl <url> -p <prompt> --stealth               # Anti-bot bypass

Examples

# Crawl a docs site and extract all code examples
just-scrape crawl https://docs.example.com -p "Extract all code snippets with their language" \
  --max-pages 20 --depth 3

# Crawl only blog pages, skip everything else
just-scrape crawl https://example.com -p "Extract article titles and summaries" \
  --rules '{"include_paths":["/blog/*"],"same_domain":true}' --max-pages 50

# Get raw markdown from all pages (no AI extraction, cheaper)
just-scrape crawl https://example.com --no-extraction --max-pages 10

Sitemap

Get all URLs from a website's sitemap. docs

Usage

just-scrape sitemap <url>

Examples

# List all pages on a site
just-scrape sitemap https://example.com

# Pipe URLs to another tool
just-scrape sitemap https://example.com --json | jq -r '.urls[]'

Scrape

Get raw HTML content from a URL. docs

Usage

just-scrape scrape <url>                                   # Raw HTML
just-scrape scrape <url> --stealth                         # Anti-bot bypass (+4 credits)
just-scrape scrape <url> --branding                        # Extract branding (+2 credits)
just-scrape scrape <url> --country-code <iso>              # Geo-targeting

Examples

# Get raw HTML of a page
just-scrape scrape https://example.com

# Scrape a geo-restricted page with anti-bot bypass
just-scrape scrape https://store.example.com --stealth --country-code DE

# Extract branding info (logos, colors, fonts)
just-scrape scrape https://example.com --branding

Agentic Scraper

Browser automation with AI β€” login, click, navigate, fill forms. docs

Usage

just-scrape agentic-scraper <url> -s <steps>               # Run browser steps
just-scrape agentic-scraper <url> -s <steps> --ai-extraction -p <prompt>
just-scrape agentic-scraper <url> -s <steps> --schema <json>
just-scrape agentic-scraper <url> -s <steps> --use-session # Persist browser session

Examples

# Log in and extract dashboard data
just-scrape agentic-scraper https://app.example.com/login \
  -s "Fill email with user@test.com,Fill password with secret,Click Sign In" \
  --ai-extraction -p "Extract all dashboard metrics"

# Navigate through a multi-step form
just-scrape agentic-scraper https://example.com/wizard \
  -s "Click Next,Select Premium plan,Fill name with John,Click Submit"

# Persistent session across multiple runs
just-scrape agentic-scraper https://app.example.com \
  -s "Click Settings" --use-session

Generate Schema

Generate a JSON schema from a natural language description.

Usage

just-scrape generate-schema <prompt>                       # AI generates a schema
just-scrape generate-schema <prompt> --existing-schema <json>

Examples

# Generate a schema for product data
just-scrape generate-schema "E-commerce product with name, price, ratings, and reviews array"

# Refine an existing schema
just-scrape generate-schema "Add an availability field" \
  --existing-schema '{"type":"object","properties":{"name":{"type":"string"},"price":{"type":"number"}}}'

History

Browse request history for any service. Interactive by default β€” arrow keys to navigate, select to view details, "Load more" for infinite scroll.

Usage

just-scrape history <service>                              # Interactive browser
just-scrape history <service> <request-id>                 # Fetch specific request
just-scrape history <service> --page <n>                   # Start from page (default 1)
just-scrape history <service> --page-size <n>              # Results per page (default 10, max 100)
just-scrape history <service> --json                       # Raw JSON (pipeable)

Services: markdownify, smartscraper, searchscraper, scrape, crawl, agentic-scraper, sitemap

Examples

# Browse your smart-scraper history interactively
just-scrape history smartscraper

# Jump to a specific request by ID
just-scrape history smartscraper abc123-def456-7890

# Export crawl history as JSON
just-scrape history crawl --json --page-size 100 | jq '.requests[] | {id: .request_id, status}'

Credits

Check your credit balance.

just-scrape credits
just-scrape credits --json | jq '.remaining_credits'

Validate

Validate your API key (health check).

just-scrape validate

Contributing

From Source

Requires Bun and Node.js 22+.

git clone https://github.com/ScrapeGraphAI/just-scrape.git
cd just-scrape
bun install
bun run dev --help

Tech Stack

Concern Tool
Language TypeScript 5.8
Dev Runtime Bun
Build tsup (esbuild)
CLI Framework citty (unjs)
Prompts @clack/prompts
Styling chalk v5 (ESM)
Validation zod v4
Env dotenv
Lint / Format Biome
Testing Bun test (built-in)
Target Node.js 22+, ESM-only

Scripts

bun run dev                          # Run CLI from TS source
bun run build                        # Bundle ESM to dist/cli.mjs
bun run lint                         # Lint + format check
bun run format                       # Auto-format
bun test                             # Run tests
bun run check                        # Type-check + lint

Testing

Tests mock all API calls via spyOn(globalThis, "fetch") β€” no network, no API key needed.

Covers: success paths, polling, HTTP error mapping (401/402/422/429/500), Zod validation, timeouts, and network failures.

License

ISC


Made with love by the ScrapeGraphAI team πŸ’œ

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •