Skip to main content

Predator - Release Notes

Version 1.0.0

Release Date: June 2025
Status: General Availability (GA)

First stable release of Predator — scalable model inference service built around NVIDIA Triton Inference Server, part of BharatMLStack. Serves Deep Learning and tree-based models with low latency in Kubernetes; integrates with OnFS and Interflow; clients use the Helix client over gRPC.

What's New

  • Triton inference engine: Unified runtime for DL and tree-based models on CPU/GPU; model repository via Init Container from GCS; gRPC API via Helix client.
  • Multi-backend support: TensorRT, PyTorch, ONNX Runtime, TensorFlow, Python, FIL, DALI, Custom.
  • Dynamic batching & concurrency: Configurable via config.pbtxt; model versioning and ensembles.
  • Kubernetes deployment: Helm-based; Init Container + Triton container; custom Triton images from Artifact Registry; health probes; CPU/GPU autoscaling.
  • Observability: Prometheus metrics, Grafana; warmup requests for cold-start avoidance.