-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
featureIssues that represent new features or improvements to existing features.Issues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Milestone
Description
Motivation
Crawlee Python already ships a Redis storage backend (RedisStorageClient) using redis[hiredis]. It covers all three storage types — Dataset, KeyValueStore, and RequestQueue — with Lua scripts for atomic operations and support for both exact (set-based) and probabilistic (Bloom filter) request deduplication.
A Redis backend in JS Crawlee would enable distributed crawling, shared state across processes/machines, and better scalability for high-throughput workloads.
Scope
- Implement a Redis-based storage client for JS Crawlee covering Dataset, KeyValueStore, and RequestQueue.
- Port the Lua scripts for atomic fetch, add, and stale request reclaim operations.
- Support configurable deduplication strategies (Redis sets vs Bloom filters).
Blockers
- Rework the storage client system #3075 — The storage client system needs to be reworked first to simplify the interface that new backends must implement.
Python reference
- Source:
src/crawlee/storage_clients/_redis/ - Lua scripts:
lua_scripts/ - Storage client rework PR: refactor!: Introduce new storage client system crawlee-python#1194
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
featureIssues that represent new features or improvements to existing features.Issues that represent new features or improvements to existing features.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.