Skip to main content

Inferflow - Release Notes

Version 1.0.0

Release Date: June 2025 Status: General Availability (GA)

We're excited to announce the first stable release of Inferflow — a graph-driven feature retrieval and model inference orchestration engine, part of BharatMLStack.


What's New

Config-Driven DAG Executor

  • No-code feature retrieval: Onboard new models with config changes only — no custom code required
  • DAG topology execution: Define component dependency graphs that are executed concurrently using Kahn's algorithm
  • Hot reload: Model configurations stored in etcd are watched and reloaded live — no redeployment needed
  • DAG caching: Topologies are cached using Murmur3 hashing with Ristretto for minimal overhead

Multi-Pattern Inference APIs

Three structured inference patterns via the Predict API:

APIPatternUse Case
InferPointWiseScore each target independentlyCTR prediction, fraud scoring
InferPairWiseScore pairs of targetsPreference learning, comparison ranking
InferSlateWiseScore groups of targets togetherWhole-page optimization, diversity-aware ranking

Plus the entity-based RetrieveModelScore API for direct feature retrieval and scoring.

Component System

Four built-in component types:

  • FeatureInitComponent — Initializes the shared ComponentMatrix
  • FeatureComponent — Fetches features from the Online Feature Store (OnFS)
  • PredatorComponent — Calls model serving endpoints with percentage-based traffic routing
  • NumerixComponent — Calls compute engine for operations like reranking

Online Feature Store Integration

  • gRPC-based feature retrieval via FeatureService.RetrieveFeatures
  • Batched retrieval with configurable batch size and deadline
  • Token-based authentication
  • Dynamic key resolution from the ComponentMatrix

In-Memory Feature Caching

  • Optional per-component caching to reduce OnFS load
  • Configurable TTL per component
  • Zero-GC-overhead cache option (freecache)
  • Cache hit/miss metrics

Inference Logging

  • Async logging to Kafka for model monitoring and debugging
  • Three serialization formats: Proto, Arrow, Parquet
  • Configurable sampling rate and feature selection
  • Batched log message grouping

Performance

Built in Go

Inferflow is written entirely in Go, delivering:

  • ~80% lower memory usage compared to equivalent Java services
  • Lower CPU utilization
  • Faster, more efficient deployments

Concurrency

  • DAG components at the same level execute concurrently in goroutines
  • Feature retrieval parallelized across entity types
  • Connection pooling for all external gRPC calls

Serialization

  • gRPC with Proto3 for all APIs
  • Binary feature encoding in the ComponentMatrix
  • Configurable compression for Kafka logging (ZSTD support)

APIs & Protocols

gRPC API

Inferflow Service:

service Inferflow {
rpc RetrieveModelScore(InferflowRequestProto) returns (InferflowResponseProto);
}

Predict Service:

service PredictService {
rpc InferPointWise(PredictRequest) returns (PredictResponse);
rpc InferPairWise(PredictRequest) returns (PredictResponse);
rpc InferSlateWise(PredictRequest) returns (PredictResponse);
}

Data Types Supported

TypeVariants
Integersint8, int16, int32, int64
Floatsfloat8 (e4m3, e5m2), float16, float32, float64
StringsVariable length
BooleansBit-packed
VectorsAll scalar types

Enterprise Features

Production Readiness

  • Health checks: HTTP health endpoints via cmux
  • Graceful shutdown: Clean resource cleanup
  • Structured logging: JSON-formatted logs via zerolog
  • Signal handling: SIGTERM/SIGINT support for container environments

Monitoring & Observability

  • StatsD / Telegraf integration: Request rates, latencies, error rates
  • Per-component metrics: Execution time, feature counts, cache hit rates
  • External API metrics: OnFS, Predator, Numerix call tracking
  • Kafka logging metrics: Messages sent, errors

Configuration Management

  • etcd-based: All model configs stored in etcd
  • Watch & reload: Live config updates without restart
  • Multi-model support: Multiple model_config_id entries served concurrently

Deployment

Container Support

  • Docker image: Multi-stage build (Go Alpine builder + Debian runtime)
  • Optional Kafka: librdkafka support via build flag
  • Static binary: Single binary deployment

Supported Environments

  • Kubernetes (K8s)
  • Google Kubernetes Engine (GKE)
  • Amazon EKS

Compatibility

Supported Go Versions

  • Minimum: Go 1.19
  • Recommended: Go 1.24+

External Dependencies

ServiceVersionProtocol
etcd3.5+gRPC
Online Feature Store (OnFS)1.0+gRPC
Predator (Helix)1.0+gRPC
Numerix1.0+gRPC
Kafka2.0+TCP

Download & Installation

Source Code

git clone https://github.com/Meesho/BharatMLStack.git
cd BharatMLStack/inferflow

Build

go build -o inferflow-server cmd/inferflow/main.go

Docker

docker build -t inferflow:latest .

Contributing

We welcome contributions from the community! Please see our Contributing Guide for details on how to get started.

Community & Support

License

BharatMLStack is open-source software licensed under the BharatMLStack Business Source License 1.1.


Built with ❤️ for the ML community from Meesho
If you find this useful, ⭐️ the repo — your support means the world to us!