Sign In
Access to Author tools and Claude Code Assistant requires authentication.
by Adam 6 min read

Why I'm Planning to Migrate from Redis Queue to Temporal in My Homelab

My RQ setup works, but Temporal's workflow orchestration promises better handling of complex multi-step jobs. Here's my migration plan and the real tradeoffs.

homelab temporal redis workflow-orchestration self-hosting python rq

The Current State: RQ Actually Works Pretty Well

Let me be honest upfront: my Redis Queue (RQ) setup isn’t broken. It’s running a media data ETL pipeline right now, processing TMDB API calls, populating MongoDB, and reporting job status in real-time via SocketIO. I’ve built something reasonably sophisticated:

# My actual CustomWorker implementation
class CustomWorker(Worker):
    def execute_job(self, job, queue):
        print(f"Starting job execution: {job.id}")
        result = super().execute_job(job, queue)

        if job.get_status() == JobStatus.FINISHED:
            report_to_server({
                "job_id": job.id,
                "status": "completed",
                "result": job.return_value()
            })
        elif job.get_status() == JobStatus.FAILED:
            report_to_server({
                "job_id": job.id,
                "status": "failed",
                "error": job.latest_result()
            })

        return result

The architecture separates concerns cleanly:

  • Redis DB 6: API response caching (14-day TTL)
  • Redis DB 7: RQ job queue for background tasks
  • Redis DB 10: SocketIO pub/sub for real-time updates

Workers connect via SocketIO client and push status updates that broadcast to all connected web clients. It’s not bad.

So Why Consider Temporal?

The problems emerge when workflows get complex. My ETL pipeline has dependencies:

  1. Fetch movie lists from TMDB (500+ pages)
  2. For each movie, fetch detailed info (cast, crew, videos)
  3. Process and denormalize data
  4. Update search indexes
  5. Notify frontend of new content

In RQ, I’m coordinating this manually:

# Current approach - manual orchestration
def run_all_tasks(refresh: bool = False):
    if refresh:
        cache.flush()
    create_indexes()

    for i in range(1, TOTAL_PAGES + 1):
        queue.enqueue(
            fetch_build_tmdb,
            "movie_popular",
            "/movie/popular",
            i,
            region="US"
        )

This works for simple fan-out, but when I need:

  • Step 2 to wait for step 1 to complete
  • Parallel execution of independent tasks
  • Retry logic that picks up from failure points
  • Audit trails of what ran when

…the code gets ugly fast.

What RQ Does Well

Credit where it’s due. My current setup handles:

Real-time visibility via SocketIO:

# Workers report status to web UI in real-time
sio = socketio.Client()
sio.connect("http://localhost:7010")

def report_to_server(data):
    sio.emit("report", data)

Worker management via REST API:

@router.post("/start")
async def start_workers(num_workers: int = 1):
    for _ in range(num_workers):
        subprocess.Popen(["gnome-terminal", "--", "bash", "-c",
                          f"python {worker_script}; exec bash"])
    return {"message": f"Started {num_workers} worker(s)"}

Built-in retry with backoff:

for attempt in range(MAX_RETRIES):
    try:
        response = httpx.get(url, headers=headers, params=params)
        response.raise_for_status()
        # ... process data
    except httpx.HTTPStatusError as e:
        if e.response.status_code == 429 and attempt < MAX_RETRIES - 1:
            time.sleep(RATE_LIMIT_DELAY * (attempt + 1))
            continue

For simple fire-and-forget tasks with manual retry logic, RQ is genuinely good enough.

Where RQ Falls Short

The Durability Problem

Redis is in-memory. Yes, RDB/AOF persistence exists, but RQ doesn’t checkpoint job progress. If a worker crashes mid-execution:

  • Simple jobs? Re-run from scratch
  • Multi-step jobs? Hope your code handles partial state
  • Jobs calling external APIs? Pray for idempotency

I haven’t lost data yet, but I’ve come close.

The Orchestration Problem

My current UNIFIED_BACKEND_PLAN.md describes the target architecture:

backend/
├── etl/
│   ├── orchestrators/        # ETL orchestration scripts
│   │   ├── movies.py         # Movie list orchestration
│   │   ├── movie_details.py  # Movie details processing

“Orchestrators” sounds fancy, but it’s really just Python scripts calling queue.enqueue() in loops. Real workflow orchestration - dependencies, parallelization, error recovery - requires Temporal or similar.

The Visibility Problem

My SocketIO dashboard shows job status, but:

  • No workflow-level view (what percentage complete?)
  • No historical analysis (how long did step 3 take last week?)
  • No easy replay of failed workflows

The Temporal Proposition

Temporal reframes background jobs as durable workflows. The mental model shift:

RQTemporal
“Enqueue a job”“Start a workflow”
“Job failed, retry from scratch”“Activity failed, retry from checkpoint”
“Poll for status”“Subscribe to workflow events”
“Manual dependency management”“Workflow code is the dependency graph”

Here’s what my ETL could look like:

@workflow.defn
class MovieETLWorkflow:
    @workflow.run
    async def run(self, region: str = "US") -> dict:
        # Step 1: Create indexes
        await workflow.execute_activity(
            create_indexes,
            start_to_close_timeout=timedelta(minutes=5),
        )

        # Step 2: Fetch all movie lists in parallel
        movie_tasks = []
        for page in range(1, MAX_PAGES + 1):
            movie_tasks.append(
                workflow.execute_activity(
                    fetch_movie_list,
                    args=[page, region],
                    start_to_close_timeout=timedelta(minutes=30),
                    retry_policy=RetryPolicy(maximum_attempts=3),
                )
            )
        movie_ids = await asyncio.gather(*movie_tasks)

        # Step 3: Fetch details for each movie
        detail_tasks = []
        for movie_id in flatten(movie_ids):
            detail_tasks.append(
                workflow.execute_activity(
                    fetch_movie_details,
                    args=[movie_id],
                    start_to_close_timeout=timedelta(minutes=5),
                )
            )
        await asyncio.gather(*detail_tasks)

        return {"movies_processed": len(detail_tasks)}

If a worker dies mid-execution? Temporal replays the workflow from the last completed activity. Not “re-runs everything” - literally continues from where it stopped.

The Comparison That Actually Matters

FeatureMy Current RQ SetupTemporal
Setup complexityAlready done2-3 hours
State durabilityNone (Redis in-memory)Complete (PostgreSQL)
Workflow visibilityCustom SocketIO dashboardFull workflow viewer built-in
Retry from failure pointNoYes
Long-running workflowsWorks but fragileNative support
Multi-step orchestrationManual coordinationBuilt into workflow code
Resource overhead~150MB (Redis + workers)~800MB (Temporal stack)

The overhead is significant. Temporal runs multiple services:

  • Frontend service
  • History service
  • Matching service
  • PostgreSQL (or MySQL/Cassandra)
  • Web UI

For a homelab, Docker Compose makes this manageable, but it’s not trivial.

My Migration Plan

I’m not ripping out RQ tomorrow. The plan:

Phase 1: Parallel Testing (Current)

  • Keep RQ running for existing workloads
  • Deploy Temporal stack alongside
  • Port one workflow (media processing) as proof of concept

Phase 2: Incremental Migration

  • Move complex multi-step workflows to Temporal
  • Keep simple fire-and-forget tasks on RQ
  • Run both systems for 4-6 weeks

Phase 3: Full Migration

  • Move all workflows to Temporal
  • Decommission RQ
  • Repurpose Redis for pure caching

Docker Compose for Temporal

version: "3.8"
services:
  postgresql:
    image: postgres:15
    environment:
      POSTGRES_USER: temporal
      POSTGRES_PASSWORD: temporal
      POSTGRES_DB: temporal
    volumes:
      - temporal_db:/var/lib/postgresql/data
    networks:
      - temporal

  temporal:
    image: temporalio/auto-setup:latest
    environment:
      - DB=postgres12
      - DB_PORT=5432
      - POSTGRES_USER=temporal
      - POSTGRES_PWD=temporal
      - POSTGRES_SEEDS=postgresql
    depends_on:
      - postgresql
    ports:
      - "7233:7233"
    networks:
      - temporal

  temporal-ui:
    image: temporalio/ui:latest
    environment:
      - TEMPORAL_ADDRESS=temporal:7233
    depends_on:
      - temporal
    ports:
      - "8088:8080"
    networks:
      - temporal

volumes:
  temporal_db:

networks:
  temporal:

When to Actually Make the Switch

Stick with RQ if:

  • Tasks are independent and stateless
  • You’ve already built monitoring (like I have)
  • Failures just mean “retry the whole thing”
  • You don’t need workflow-level visibility
  • Resource constraints matter

Consider Temporal if:

  • Workflows have multiple dependent steps
  • Long-running jobs must survive crashes
  • You need audit trails and workflow history
  • Business logic requires exactly-once execution
  • You’re orchestrating external services with complex failure modes

Honest Assessment

My RQ setup works. The SocketIO real-time reporting, the worker management API, the retry logic - it’s production-ready for what it does. I’ve spent considerable time making it robust.

Temporal promises to solve problems I occasionally have, not problems I constantly have. The migration is about future-proofing and reducing the custom code I maintain, not fixing something broken.

Is the 5x resource overhead worth it? For complex media pipelines and backup orchestration, probably yes. For simple cron-style jobs, absolutely not.

The migration starts next month. I’ll report back on whether Temporal lives up to the promise, or whether I’m just trading one set of problems for another.


Currently running: RQ with custom SocketIO monitoring. Planning: Temporal migration. The homelab never stops evolving.