Skip to main content

Inferflow - Configuration Guide

Inferflow is fully config-driven. All model onboarding, feature retrieval logic, DAG topology, and inference behavior are controlled through configuration stored in etcd — with zero code changes required.


Configuration Overview

Inferflow configuration is organized into two layers:

  1. Static config — Environment variables loaded at startup (via Viper)
  2. Dynamic config — Model configurations stored in etcd, hot-reloaded on change

Static Configuration (Environment Variables)

These are set at deployment time and require a restart to change.

Server

VariableDescriptionExample
APP_PORTgRPC/HTTP server port50051
APP_ENVEnvironment nameproduction

etcd

VariableDescriptionExample
ETCD_ENDPOINTSComma-separated etcd endpointsetcd-0:2379,etcd-1:2379
ETCD_DIAL_TIMEOUTConnection timeout5s

Online Feature Store (OnFS)

VariableDescriptionExample
externalServiceOnFs_hostOnFS gRPC hostonfs-api:50051
externalServiceOnFs_callerIdCaller ID for authinferflow
externalServiceOnFs_callerTokenCaller token for auth<token>
externalServiceOnFs_batchSizeBatch size for feature retrieval100
externalServiceOnFs_deadlineRequest deadline200ms

Predator (Model Serving)

VariableDescriptionExample
externalServicePredator_defaultDeadlineDefault inference deadline100ms

Numerix (Compute Engine)

VariableDescriptionExample
numerixClientV1_hostNumerix gRPC hostnumerix:50052
numerixClientV1_deadlineRequest deadline100ms

Kafka (Inference Logging)

VariableDescriptionExample
KafkaBootstrapServersKafka broker addresseskafka-0:9092,kafka-1:9092
KafkaLoggingTopicTopic for inference logsinferflow-logs

Metrics (StatsD / Telegraf)

VariableDescriptionExample
TELEGRAF_HOSTStatsD hosttelegraf
TELEGRAF_PORTStatsD port8125

In-Memory Cache

VariableDescriptionExample
CACHE_SIZE_MBCache size in MB512
CACHE_TYPECache implementationfreecache

Dynamic Configuration (etcd Model Config)

Model configurations are stored in etcd and hot-reloaded. Each model is identified by a model_config_id.

Config Structure

{
"model_config_id_example": {
"dag_execution_config": {
"component_dependency": {
"feature_initializer": ["fs_user", "fs_product"],
"fs_user": ["ranker_model"],
"fs_product": ["ranker_model"],
"ranker_model": []
}
},
"component_config": {
"feature_component_config": {
"fs_user": { ... },
"fs_product": { ... }
},
"predator_component_config": {
"ranker_model": { ... }
},
"numerix_component_config": {},
"cache_enabled": true,
"cache_version": "v1",
"cache_ttl": 300,
"error_logging_percent": 10
},
"response_config": {
"features": ["ranker_model:score"],
"model_schema_perc": 100,
"logging_perc": 5,
"log_features": ["fs_user:profile:age", "ranker_model:score"],
"log_batch_size": 100
}
}
}

DAG Execution Config

Defines the component dependency graph.

{
"component_dependency": {
"<parent_component>": ["<child_1>", "<child_2>"],
"<child_1>": ["<grandchild>"],
"<child_2>": ["<grandchild>"],
"<grandchild>": []
}
}

Rules:

  • The graph must be a valid DAG (no cycles)
  • Components with no parents (zero in-degree) execute first
  • Components with empty dependency arrays [] are leaf nodes
  • All component names must match registered components in the ComponentConfig

Feature Component Config

Configures how features are fetched from the Online Feature Store.

{
"fs_user": {
"fs_keys": {
"schema": ["user_id"],
"col": "context:user:user_id"
},
"fs_request": {
"entity_label": "user",
"feature_groups": [
{
"label": "demographics",
"feature_labels": ["age", "location", "income_bracket"]
},
{
"label": "behavior",
"feature_labels": ["click_rate", "purchase_freq"]
}
]
},
"fs_flatten_resp_keys": ["user_id"],
"col_name_prefix": "user",
"comp_cache_enabled": true,
"comp_cache_ttl": 600,
"composite_id": false
}
}
FieldDescription
fs_keysHow to extract lookup keys from the matrix. schema defines key column names; col references a matrix column
fs_requestOnFS query: entity label + feature groups with specific features
fs_flatten_resp_keysKeys to flatten in response mapping
col_name_prefixPrefix for matrix column names (e.g., user:demographics:age)
comp_cache_enabledEnable in-memory caching for this component
comp_cache_ttlCache TTL in seconds
composite_idWhether entity keys are composite

Predator Component Config

Configures model inference endpoints.

{
"ranker_model": {
"model_name": "product_ranker_v3",
"model_endpoint": "predator-ranker:8080",
"model_end_points": {
"predator-ranker-v3:8080": 80,
"predator-ranker-v4:8080": 20
},
"deadline": 100,
"batch_size": 50,
"calibration": {
"enabled": false
},
"inputs": {
"feature_map": {
"user:demographics:age": "INT32",
"user:behavior:click_rate": "FP32",
"product:attributes:category_id": "INT32"
}
},
"outputs": {
"score_columns": ["score", "confidence"]
},
"slate_component": false
}
}
FieldDescription
model_nameModel identifier on the serving platform
model_endpointPrimary model serving endpoint
model_end_pointsMultiple endpoints with percentage-based traffic routing
deadlineInference timeout in milliseconds
batch_sizeMax items per inference batch
calibrationScore calibration settings
inputs.feature_mapMap of matrix column → data type for model input
outputs.score_columnsColumn names for model output scores
slate_componentIf true, runs per-slate inference

Numerix Component Config

Configures compute operations (e.g., reranking).

{
"reranker": {
"score_column": "final_score",
"data_type": "FP32",
"score_mapping": {
"ranker_model:score": "FP32",
"user:behavior:click_rate": "FP32"
},
"compute_id": "diversity_rerank_v1",
"slate_component": false
}
}
FieldDescription
score_columnOutput column name for the computed score
data_typeOutput data type
score_mappingMap of matrix columns to include as compute inputs
compute_idIdentifies the compute operation on Numerix
slate_componentIf true, runs per-slate compute

Response Config

Controls what data is returned to the client and what is logged.

{
"features": ["ranker_model:score", "reranker:final_score"],
"model_schema_perc": 100,
"logging_perc": 5,
"log_features": [
"user:demographics:age",
"ranker_model:score",
"reranker:final_score"
],
"log_batch_size": 100
}
FieldDescription
featuresMatrix columns to include in the gRPC response
model_schema_percPercentage of requests that include full schema in response
logging_percPercentage of requests to send to Kafka for logging
log_featuresSpecific features to include in log messages
log_batch_sizeBatch size for grouped log messages

Service-Level Config

Global settings that apply across all models.

{
"v2_logging_type": "proto",
"compression_enabled": false
}
FieldValuesDescription
v2_logging_typeproto, arrow, parquetSerialization format for Kafka inference logs
compression_enabledtrue, falseEnable compression for log messages

Example: Onboarding a New Model

To onboard a new ranking model, update the etcd config:

Step 1: Define the feature retrieval graph

"component_dependency": {
"feature_initializer": ["fs_user", "fs_product", "fs_user_x_category"],
"fs_product": ["fs_user_x_category"],
"fs_user": ["new_ranker"],
"fs_user_x_category": ["new_ranker"],
"new_ranker": []
}

Here fs_user_x_category depends on fs_product because it needs the category ID extracted from the product entity to resolve the user x category key.

Step 2: Configure each component (feature groups, model endpoints, etc.)

Step 3: Push the config to etcd — Inferflow picks it up automatically via watchers.

No code changes. No redeployment. The new model is live.


Contributing

We welcome contributions from the community! Please see our Contributing Guide for details on how to get started.

Community & Support

License

BharatMLStack is open-source software licensed under the BharatMLStack Business Source License 1.1.


Built with ❤️ for the ML community from Meesho
If you find this useful, ⭐️ the repo — your support means the world to us!