Skip to main content

Inference Store

The Inference Store enables storage of production data running through your models and facilitates rich metadata querying of the data and associated inferences.

How It Works

A core component of the Chariot platform is a model. Chariot supports models of various task types that span the domains of computer vision, natural language processing, and tabular data. In order to serve models or provide an endpoint to access them, Chariot allows users to create Inference Servers with a variety of settings. One of these settings is the ability to store inferences.

In the simplest form of model serving, a request to a model will look like:

Roughly: client -> inference request -> inference server -> inference response -> client

When inference storage is turned on, an intermediate proxy is inserted to further process and save inference request inputs and outputs.

Roughly: client -> inference request -> inference-proxy -> inference server -> inference-proxy -> inference response -> client

The inference proxy service is the component that ultimately informs the Inference Store that an inference event has taken place.

Inference Request

A standard inference request in Chariot resembles the following:

{
"inputs":[
{
"data": [
"<base64 encoded data>"
],
"datatype": "BYTES",
"name": "try-it-out",
"parameters": {
"action":"predict"
},
"shape":[height, width, channels]
}
]
}

The data field holds the inference data that the Inference Server will infer upon. The parameters field holds the action that the Inference Server should perform on the data (usually conditional on task type). The parameters field also holds optional user-defined metadata that will flow to the Inference Store. An example is shown below:

{
"inputs":[
{
"data": [
"<base64 encoded data>"
],
"datatype": "BYTES",
"name": "try-it-out",
"parameters": {
"action":"predict",
"metadata": "[
{"key": "latitude", "type": "float", "value": "-32.1"},
{"key": "longitude", "type": "float", "value": "-43.2"},
{"key": "data_source", "type": "string", "value": "hallway-camera-feed"},
{"key": "organization", "type": "string", "value": "striveworks"},
{"key": "group", "type": "string", "value": "team-remediation"},
{"key": "frame_number", "type": "int", "value": "987"},
{"key": "camera_position", "type": "json", "value": "{"angle": "10.5", "tilt": "1.6"}"}
]"
},
"shape":[height, width, channels]
}
]
}

This inference will then be queryable by any of the metadata key-value pairs that have been specified.

Inference Proxy

Inference requests are directed to the inference proxy when inference storage is turned on. The inference proxy introduces three core stages: preprocessing, predict, and postprocessing.

The preprocessing stage includes the following steps:

  • The inference request is accepted from the client
  • The inference data (image, text, tabular row) is stripped from the request and placed in a temporary buffer
  • User-defined metadata is stripped from the request and placed in a temporary buffer

The predict stage includes the following steps:

  • Forward the inference request to the Inference Server and await a response

The postprocessing stage includes the following steps:

  • Compute and attach the data hash
  • Store the data, metadata, and inference response

Next Steps

  • Enable Inference Store: Learn how to turn on inference storage for your models
  • Understanding Metadata: Work with metadata in the Inference Store
  • Querying the Inference Store: Filter and retrieve inferences
  • Dataset Curation: Create and curate datasets from inference data
  • Check out additional documentation:
    • The Chariot SDK's Inference Store documentation can be found at https://%%CHARIOT-HOST%%/docs/sdk_api_docs/chariot.inference_store.html
    • The Inference Store Swagger API documentation can be found at https://%%CHARIOT-HOST%%/api/inference-store/v1/swagger/index.html