Why I'm Planning to Migrate from Redis Queue to Temporal in My Homelab
My RQ setup works, but Temporal's workflow orchestration promises better handling of complex multi-step jobs. Here's my migration plan and the real tradeoffs.
The Current State: RQ Actually Works Pretty Well
Let me be honest upfront: my Redis Queue (RQ) setup isn’t broken. It’s running a media data ETL pipeline right now, processing TMDB API calls, populating MongoDB, and reporting job status in real-time via SocketIO. I’ve built something reasonably sophisticated:
# My actual CustomWorker implementation
class CustomWorker(Worker):
def execute_job(self, job, queue):
print(f"Starting job execution: {job.id}")
result = super().execute_job(job, queue)
if job.get_status() == JobStatus.FINISHED:
report_to_server({
"job_id": job.id,
"status": "completed",
"result": job.return_value()
})
elif job.get_status() == JobStatus.FAILED:
report_to_server({
"job_id": job.id,
"status": "failed",
"error": job.latest_result()
})
return result
The architecture separates concerns cleanly:
- Redis DB 6: API response caching (14-day TTL)
- Redis DB 7: RQ job queue for background tasks
- Redis DB 10: SocketIO pub/sub for real-time updates
Workers connect via SocketIO client and push status updates that broadcast to all connected web clients. It’s not bad.
So Why Consider Temporal?
The problems emerge when workflows get complex. My ETL pipeline has dependencies:
- Fetch movie lists from TMDB (500+ pages)
- For each movie, fetch detailed info (cast, crew, videos)
- Process and denormalize data
- Update search indexes
- Notify frontend of new content
In RQ, I’m coordinating this manually:
# Current approach - manual orchestration
def run_all_tasks(refresh: bool = False):
if refresh:
cache.flush()
create_indexes()
for i in range(1, TOTAL_PAGES + 1):
queue.enqueue(
fetch_build_tmdb,
"movie_popular",
"/movie/popular",
i,
region="US"
)
This works for simple fan-out, but when I need:
- Step 2 to wait for step 1 to complete
- Parallel execution of independent tasks
- Retry logic that picks up from failure points
- Audit trails of what ran when
…the code gets ugly fast.
What RQ Does Well
Credit where it’s due. My current setup handles:
Real-time visibility via SocketIO:
# Workers report status to web UI in real-time
sio = socketio.Client()
sio.connect("http://localhost:7010")
def report_to_server(data):
sio.emit("report", data)
Worker management via REST API:
@router.post("/start")
async def start_workers(num_workers: int = 1):
for _ in range(num_workers):
subprocess.Popen(["gnome-terminal", "--", "bash", "-c",
f"python {worker_script}; exec bash"])
return {"message": f"Started {num_workers} worker(s)"}
Built-in retry with backoff:
for attempt in range(MAX_RETRIES):
try:
response = httpx.get(url, headers=headers, params=params)
response.raise_for_status()
# ... process data
except httpx.HTTPStatusError as e:
if e.response.status_code == 429 and attempt < MAX_RETRIES - 1:
time.sleep(RATE_LIMIT_DELAY * (attempt + 1))
continue
For simple fire-and-forget tasks with manual retry logic, RQ is genuinely good enough.
Where RQ Falls Short
The Durability Problem
Redis is in-memory. Yes, RDB/AOF persistence exists, but RQ doesn’t checkpoint job progress. If a worker crashes mid-execution:
- Simple jobs? Re-run from scratch
- Multi-step jobs? Hope your code handles partial state
- Jobs calling external APIs? Pray for idempotency
I haven’t lost data yet, but I’ve come close.
The Orchestration Problem
My current UNIFIED_BACKEND_PLAN.md describes the target architecture:
backend/
├── etl/
│ ├── orchestrators/ # ETL orchestration scripts
│ │ ├── movies.py # Movie list orchestration
│ │ ├── movie_details.py # Movie details processing
“Orchestrators” sounds fancy, but it’s really just Python scripts calling queue.enqueue() in loops. Real workflow orchestration - dependencies, parallelization, error recovery - requires Temporal or similar.
The Visibility Problem
My SocketIO dashboard shows job status, but:
- No workflow-level view (what percentage complete?)
- No historical analysis (how long did step 3 take last week?)
- No easy replay of failed workflows
The Temporal Proposition
Temporal reframes background jobs as durable workflows. The mental model shift:
| RQ | Temporal |
|---|---|
| “Enqueue a job” | “Start a workflow” |
| “Job failed, retry from scratch” | “Activity failed, retry from checkpoint” |
| “Poll for status” | “Subscribe to workflow events” |
| “Manual dependency management” | “Workflow code is the dependency graph” |
Here’s what my ETL could look like:
@workflow.defn
class MovieETLWorkflow:
@workflow.run
async def run(self, region: str = "US") -> dict:
# Step 1: Create indexes
await workflow.execute_activity(
create_indexes,
start_to_close_timeout=timedelta(minutes=5),
)
# Step 2: Fetch all movie lists in parallel
movie_tasks = []
for page in range(1, MAX_PAGES + 1):
movie_tasks.append(
workflow.execute_activity(
fetch_movie_list,
args=[page, region],
start_to_close_timeout=timedelta(minutes=30),
retry_policy=RetryPolicy(maximum_attempts=3),
)
)
movie_ids = await asyncio.gather(*movie_tasks)
# Step 3: Fetch details for each movie
detail_tasks = []
for movie_id in flatten(movie_ids):
detail_tasks.append(
workflow.execute_activity(
fetch_movie_details,
args=[movie_id],
start_to_close_timeout=timedelta(minutes=5),
)
)
await asyncio.gather(*detail_tasks)
return {"movies_processed": len(detail_tasks)}
If a worker dies mid-execution? Temporal replays the workflow from the last completed activity. Not “re-runs everything” - literally continues from where it stopped.
The Comparison That Actually Matters
| Feature | My Current RQ Setup | Temporal |
|---|---|---|
| Setup complexity | Already done | 2-3 hours |
| State durability | None (Redis in-memory) | Complete (PostgreSQL) |
| Workflow visibility | Custom SocketIO dashboard | Full workflow viewer built-in |
| Retry from failure point | No | Yes |
| Long-running workflows | Works but fragile | Native support |
| Multi-step orchestration | Manual coordination | Built into workflow code |
| Resource overhead | ~150MB (Redis + workers) | ~800MB (Temporal stack) |
The overhead is significant. Temporal runs multiple services:
- Frontend service
- History service
- Matching service
- PostgreSQL (or MySQL/Cassandra)
- Web UI
For a homelab, Docker Compose makes this manageable, but it’s not trivial.
My Migration Plan
I’m not ripping out RQ tomorrow. The plan:
Phase 1: Parallel Testing (Current)
- Keep RQ running for existing workloads
- Deploy Temporal stack alongside
- Port one workflow (media processing) as proof of concept
Phase 2: Incremental Migration
- Move complex multi-step workflows to Temporal
- Keep simple fire-and-forget tasks on RQ
- Run both systems for 4-6 weeks
Phase 3: Full Migration
- Move all workflows to Temporal
- Decommission RQ
- Repurpose Redis for pure caching
Docker Compose for Temporal
version: "3.8"
services:
postgresql:
image: postgres:15
environment:
POSTGRES_USER: temporal
POSTGRES_PASSWORD: temporal
POSTGRES_DB: temporal
volumes:
- temporal_db:/var/lib/postgresql/data
networks:
- temporal
temporal:
image: temporalio/auto-setup:latest
environment:
- DB=postgres12
- DB_PORT=5432
- POSTGRES_USER=temporal
- POSTGRES_PWD=temporal
- POSTGRES_SEEDS=postgresql
depends_on:
- postgresql
ports:
- "7233:7233"
networks:
- temporal
temporal-ui:
image: temporalio/ui:latest
environment:
- TEMPORAL_ADDRESS=temporal:7233
depends_on:
- temporal
ports:
- "8088:8080"
networks:
- temporal
volumes:
temporal_db:
networks:
temporal:
When to Actually Make the Switch
Stick with RQ if:
- Tasks are independent and stateless
- You’ve already built monitoring (like I have)
- Failures just mean “retry the whole thing”
- You don’t need workflow-level visibility
- Resource constraints matter
Consider Temporal if:
- Workflows have multiple dependent steps
- Long-running jobs must survive crashes
- You need audit trails and workflow history
- Business logic requires exactly-once execution
- You’re orchestrating external services with complex failure modes
Honest Assessment
My RQ setup works. The SocketIO real-time reporting, the worker management API, the retry logic - it’s production-ready for what it does. I’ve spent considerable time making it robust.
Temporal promises to solve problems I occasionally have, not problems I constantly have. The migration is about future-proofing and reducing the custom code I maintain, not fixing something broken.
Is the 5x resource overhead worth it? For complex media pipelines and backup orchestration, probably yes. For simple cron-style jobs, absolutely not.
The migration starts next month. I’ll report back on whether Temporal lives up to the promise, or whether I’m just trading one set of problems for another.
Currently running: RQ with custom SocketIO monitoring. Planning: Temporal migration. The homelab never stops evolving.