Let Your GPUs Work For You, Not Wait on I/O

Feeding GPU clusters at wire‑rate with RDMA + GPUDirect Storage to lift utilization and reduce cost.

NorthFS builds GPU‑native distributed file systems and a distributed KV cache so training and inference aren’t bottlenecked by storage. We remove slow TCP data paths, kernel copies, and object‑store latency with an RDMA‑first design and zero‑copy reads into GPU memory.

View Products
Business

What We Do & Who We Serve

Problem

Modern GPU jobs frequently idle waiting for data. Typical paths introduce latency, tail stalls, and poor cache locality. At scale this burns budget and elongates training/inference SLAs.

  • Object‑store round‑trips and small‑file storms throttle throughput.
  • Network jitter and head‑of‑line blocking create long tail I/O.

Solution

NorthFS is an RDMA‑first, GDS‑enabled file system with async prefetching and admission control. We bypass kernel copies, stripe across NVMe, and read directly into GPU memory buffers.

  • RDMA (RoCEv2/InfiniBand) kernel‑bypass data plane.
  • GPUDirect Storage (GDS) zero‑copy path CPU→GPU.
  • Adaptive prefetch + multi‑rail striping for high throughput.
  • Strong, globally consistent metadata via FoundationDB.

Target Customers

Enterprise AI/ML platform teams and HPC orgs operating 10–1000+ GPUs in cloud, on‑prem, or hybrid environments. Drop‑in for training, inference, and data preprocessing pipelines.

  • AI platform teams running LLMs and vision models.
  • Data engineering teams struggling with small‑file I/O.
  • HPC workloads needing consistent, low‑tail latency storage.
Products

Solutions

NorthFS (GPU‑Native Distributed File System)

RDMA + GDS optimized, POSIX‑style interface. Works across cloud or on‑prem clusters with connectors for S3/HDFS and Python/CLI SDKs.

  • Client: user‑space FS with kernel‑bypass I/O.
  • Storage: NVMe striping, erasure coding, tiering.
  • Control plane: FoundationDB‑backed metadata + leases.
  • Integrations: PyTorch, TensorFlow, Spark, Ray.

Stage: Private Beta (design partners onboarding).

Blackbird (Distributed KV Cache for GPU Inference)

Low‑latency, multi‑node KV cache for serving attention KV blocks across GPUs with near‑local access. Built for vLLM/SGLang‑style inference.

  • RDMA transport, rack‑aware sharding, background compaction.
  • Admission control & prefetch for long‑context requests.
  • Metrics & tracing for tail‑latency SLOs.

Stage: Alpha (prototype under active development).

Tooling & SDKs

Python & CLI integration for quick adoption in training/inference pipelines with minimal code change.

  • Dataset ingest, warmup, and cache‑management tools.
  • Observability hooks (I/O, network, cache hit‑rates).

Stage: Beta.

Architecture

Core Capabilities

NorthFS

RDMA + GDS optimized file system purpose‑built for GPUs. Deployable across cloud, on‑prem, or hybrid.

Blackbird KV Cache

Serve KV blocks across GPU nodes with near‑local latency. Works with frameworks like vLLM and SGLang.

Metadata via FoundationDB

Globally consistent metadata layer for coordination, leasing, and scale‑out.

Python & CLI Integration

Plug into training/inference pipelines with minimal code change.

Use Cases

LLM Inference
Distributed Training
Multi‑Node RL
Feature & ETL Pipelines
Team

Founders

A

Arnav Balyan

Founder, Systems Engineering

Ex-Uber, Distributed systems & AI. Contributor to Apache Spark, Velox, Gluten, Pinot. Author of peer‑reviewed papers in systems, big data, and AI.

S

Aditya Sohini

Founder, Distributed Systems

Ex-Uber, Distributed systems engineer; builds on Hive/Spark and cloud platforms. Focus on large‑scale data infra and reliability.

Contact

Get In Touch

Interested in the private beta or design partnership?

hello@northstar.run