Routing Engine — Active

Describe the Work. We Pick the GPU.

Submit your workload type and model size. Jungle Grid classifies it, scores every eligible node, and dispatches to the right GPU automatically.

jungle-grid — terminal
Scroll
The Problem

You Are the Scheduler

Right now, you pick GPUs by hand, guess at VRAM, and discover mismatches at runtime. That is the job Jungle Grid replaces.

Without Jungle Grid
xManually specifying gpu_type on every job. Guessing whether it fits.
xOOM at runtime because a 70B model got sent to a 16 GB T4. The job is dead. Start over.
xA100s sitting idle while T4s queue 12 deep. No load awareness. No rebalancing.
xNode goes offline. Job goes with it. No heartbeat check. No automatic requeue.
xLocked to one GPU type per pool. Consumer and data center cards can’t share work.
With Jungle Grid
Submit workload_type=inference, model_size=7. Classifier maps to tier. Matcher resolves hardware. Done.
VRAM-fit pre-filter blocks impossible placements before dispatch. [shipping next]
4-signal scorer evaluates every node on price, reliability, latency, and performance. Best node wins.
Failover manager detects stale heartbeats, marks nodes offline, requeues affected jobs automatically.
T4, L4, A10G, A100, H100, RTX 3090, RTX 4090 — all in one pool. Tiers decide eligibility.
How It Works

Intent In. GPU Out.

Every job moves through four stages. You control the first. The system handles the rest.

01

Declare Intent

Submit workload_type and model_size. No GPU names. The API accepts inference, training, fine-tuning, or batch — and resolves the rest.

02

Classify + Match

Classifier maps workload intent to one of 7 GPU tiers. Matcher resolves each tier to eligible hardware. A 5 GB inference job hits T4/RTX3090/RTX4090. A training job hits A100/H100. No config.

03

Mixed Hardware Pools

Consumer GPUs and data center cards in the same cluster. Tiers define eligibility. The scheduler routes a fine-tuning job to an L4, A10G, A100, or H100 — whichever scores best.

04

4-Signal Scoring

Every eligible node is scored on price, reliability, latency, and performance with configurable weights. Ties are broken deterministically. Best score wins. VRAM-fit and queue depth signals shipping next.

Routing Session

What Happens When You Submit a Job

Classifier, matcher, scorer, dispatch. Every stage logged. Nothing hidden.

jungle-grid orchestrator — routing trace
System Profile

What Ships Today

0
GPU Types
T4, L4, A10G, A100, H100, RTX 3090, RTX 4090 — one unified pool
0
Routing Tiers
inference-small, inference-medium, inference-realtime, training, fine-tuning, batch-heavy, general
0
Scoring Signals
Each node scored on price, reliability, latency, performance. Weights are configurable per deployment.
0
Tests Passing
Classifier and matcher covered by 40+ table-driven tests. Boundaries, unknowns, negatives.
GPU Tiers

Hardware-Aware, Human-Invisible

Users never see GPU names. The matcher resolves tiers to hardware behind the scenes. Here is what the routing map looks like.

Inference3 sub-tiers
Small (0-7 GB)
T416 GB
RTX 309024 GB
RTX 409024 GB
Medium (7-30 GB)
L424 GB
A10G24 GB
Realtime (30+ GB)
A10080 GB
H10080 GB
TrainingHigh VRAM
Full Training
A10080 GB
H10080 GB
Fine-Tuning
L424 GB
A10G24 GB
A10080 GB
H10080 GB
Eligible Workloads
trainingfine-tuning
Batch ProcessingThroughput
Batch Heavy
A10080 GB
H10080 GB
RTX 409024 GB
Eligible Workloads
batchoffline processinghigh throughput
General Fallback

Unknown workload types route to the full 7-GPU pool. The scorer decides placement — no job is ever dropped.

Architecture

The Routing Pipeline

From job submission to GPU execution — every stage is observable, scorable, and fault-tolerant.

User / API / CLI / SDK
Orchestrator
Classifier
Matcher
VRAM Filter
Scorer
Scheduler Engine
Node Agent 1
T4 x 16GB
Node Agent 2
A100 x 80GB
Node Agent 3
H100 x 80GB
Node Agent N
RTX 4090 x 24GB
HeartbeatFailoverQueue DepthThermal State
Roadmap

What We Are Building

Real milestones from the engineering roadmap. No vapor. Every item maps to code.

Workload Intent API

Classifier + Matcher shipped. 6 tiers, 40+ tests passing. Users submit intent, not hardware.

Memory Fit Checker

VRAM-fit pre-filter ensures no model is dispatched to a GPU with insufficient memory.

Queue Depth Scoring

Wire QueueDepth into scheduler scoring so jobs avoid overloaded nodes automatically.

Provider Abstraction

GPUProvider interface so adding new GPU hardware is a single registration, not a code change.

Per-Job Optimization

optimize_for: "cost" | "speed" | "balanced" — per-job weight profiles for the scorer.

Thermal Throttling

Node agents report thermal state. Scheduler excludes or deprioritizes throttled hardware.

Ready to Stop Fighting GPUs?

Jungle Grid is in private beta and under active development. Follow along as we ship the routing engine, or request early access.