Jungle Grid — GPU Routing for AI Workloads

The Problem

You Are the Scheduler

Right now, you pick GPUs by hand, guess at VRAM, and discover mismatches at runtime. That is the job Jungle Grid replaces.

Without Jungle Grid

xManually specifying gpu_type on every job. Guessing whether it fits.

xOOM at runtime because a 70B model got sent to a 16 GB T4. The job is dead. Start over.

xA100s sitting idle while T4s queue 12 deep. No load awareness. No rebalancing.

xNode goes offline. Job goes with it. No heartbeat check. No automatic requeue.

xLocked to one GPU type per pool. Consumer and data center cards can’t share work.

With Jungle Grid

✓Submit workload_type=inference, model_size=7. Classifier maps to tier. Matcher resolves hardware. Done.

✓VRAM-fit pre-filter blocks impossible placements before dispatch. [shipping next]

✓4-signal scorer evaluates every node on price, reliability, latency, and performance. Best node wins.

✓Failover manager detects stale heartbeats, marks nodes offline, requeues affected jobs automatically.

✓T4, L4, A10G, A100, H100, RTX 3090, RTX 4090 — all in one pool. Tiers decide eligibility.

How It Works

Intent In. GPU Out.

Every job moves through four stages. You control the first. The system handles the rest.

Declare Intent

Submit workload_type and model_size. No GPU names. The API accepts inference, training, fine-tuning, or batch — and resolves the rest.

Classify + Match

Classifier maps workload intent to one of 7 GPU tiers. Matcher resolves each tier to eligible hardware. A 5 GB inference job hits T4/RTX3090/RTX4090. A training job hits A100/H100. No config.

Mixed Hardware Pools

Consumer GPUs and data center cards in the same cluster. Tiers define eligibility. The scheduler routes a fine-tuning job to an L4, A10G, A100, or H100 — whichever scores best.

4-Signal Scoring

Every eligible node is scored on price, reliability, latency, and performance with configurable weights. Ties are broken deterministically. Best score wins. VRAM-fit and queue depth signals shipping next.

Routing Session

What Happens When You Submit a Job

Classifier, matcher, scorer, dispatch. Every stage logged. Nothing hidden.

jungle-grid orchestrator — routing trace

System Profile

What Ships Today

GPU Types

T4, L4, A10G, A100, H100, RTX 3090, RTX 4090 — one unified pool

Routing Tiers

inference-small, inference-medium, inference-realtime, training, fine-tuning, batch-heavy, general

Scoring Signals

Each node scored on price, reliability, latency, performance. Weights are configurable per deployment.

Tests Passing

Classifier and matcher covered by 40+ table-driven tests. Boundaries, unknowns, negatives.

GPU Tiers

Hardware-Aware, Human-Invisible

Users never see GPU names. The matcher resolves tiers to hardware behind the scenes. Here is what the routing map looks like.

Inference3 sub-tiers

Small (0-7 GB)

T416 GB

RTX 309024 GB

RTX 409024 GB

Medium (7-30 GB)

L424 GB

A10G24 GB

Realtime (30+ GB)

A10080 GB

H10080 GB

TrainingHigh VRAM

Full Training

A10080 GB

H10080 GB

Fine-Tuning

L424 GB

A10G24 GB

A10080 GB

H10080 GB

Eligible Workloads

trainingfine-tuning

Batch ProcessingThroughput

Batch Heavy

A10080 GB

H10080 GB

RTX 409024 GB

Eligible Workloads

batchoffline processinghigh throughput

General Fallback

Unknown workload types route to the full 7-GPU pool. The scorer decides placement — no job is ever dropped.

Architecture

The Routing Pipeline

From job submission to GPU execution — every stage is observable, scorable, and fault-tolerant.

User / API / CLI / SDK

Orchestrator

Classifier

Matcher

VRAM Filter

Scorer

Scheduler Engine

Node Agent 1
T4 x 16GB

Node Agent 2
A100 x 80GB

Node Agent 3
H100 x 80GB

Node Agent N
RTX 4090 x 24GB

HeartbeatFailoverQueue DepthThermal State

Roadmap

What We Are Building

Real milestones from the engineering roadmap. No vapor. Every item maps to code.

Workload Intent API

Classifier + Matcher shipped. 6 tiers, 40+ tests passing. Users submit intent, not hardware.

Memory Fit Checker

VRAM-fit pre-filter ensures no model is dispatched to a GPU with insufficient memory.

Queue Depth Scoring

Wire QueueDepth into scheduler scoring so jobs avoid overloaded nodes automatically.

Provider Abstraction

GPUProvider interface so adding new GPU hardware is a single registration, not a code change.

Per-Job Optimization

optimize_for: "cost" | "speed" | "balanced" — per-job weight profiles for the scorer.

Thermal Throttling

Node agents report thermal state. Scheduler excludes or deprioritizes throttled hardware.

Workload Intent API

Classifier + Matcher shipped. 6 tiers, 40+ tests passing. Users submit intent, not hardware.

Memory Fit Checker

VRAM-fit pre-filter ensures no model is dispatched to a GPU with insufficient memory.

Queue Depth Scoring

Wire QueueDepth into scheduler scoring so jobs avoid overloaded nodes automatically.

Provider Abstraction

GPUProvider interface so adding new GPU hardware is a single registration, not a code change.

Per-Job Optimization

optimize_for: "cost" | "speed" | "balanced" — per-job weight profiles for the scorer.

Thermal Throttling

Node agents report thermal state. Scheduler excludes or deprioritizes throttled hardware.

Describe the Work. We Pick the GPU.

You Are the Scheduler

Intent In. GPU Out.

Declare Intent

Classify + Match

Mixed Hardware Pools

4-Signal Scoring

What Happens When You Submit a Job

What Ships Today

Hardware-Aware, Human-Invisible

The Routing Pipeline

What We Are Building

Workload Intent API

Memory Fit Checker

Queue Depth Scoring

Provider Abstraction

Per-Job Optimization

Thermal Throttling

Workload Intent API

Memory Fit Checker

Queue Depth Scoring

Provider Abstraction

Per-Job Optimization

Thermal Throttling

Ready to Stop Fighting GPUs?