Inference Store
The inference store is a collection of services responsible for listening for, and indexing data of, inference events, enabling querying of inferences, and managing retention policies for inferences. It enables storage of production data running through your models and facilitates rich metadata querying of the data and associated inferences.
How It Works
A core component of the Chariot platform is a model. Chariot supports models of various task types that span the domains of computer vision, natural language processing, and tabular data. In order to serve, or provide an endpoint to access these models, Chariot allows users to create inference servers with a variety of settings. One of these settings is the ability to store inferences.
In the simplest form of model serving, a request to a model will look like:
Roughly: client -> inference request -> inference server -> inference response -> client
When inference storage is turned on, an intermediate proxy is inserted to further process and save inference request inputs and outputs.
Roughly: client -> inference request -> inference-proxy -> inference server -> inference-proxy -> inference response -> client
The inference proxy service is the component that ultimately informs the inference store that an inference event has taken place.
Inference Request
A standard inference request in Chariot resembles the following:
{
"inputs":[
{
"data": [
"<base64 encoded data>"
],
"datatype": "BYTES",
"name": "try-it-out",
"parameters": {
"action":"predict"
},
"shape":[height, width, channels]
}
]
}
The data
field holds the inference data that the inference server will infer upon. The parameters
field holds the action that the inference server should perform on the data (usually conditional on task type). The parameters
field also holds optional user-defined metadata that will flow to the inference store. An example is shown below:
{
"inputs":[
{
"data": [
"<base64 encoded data>"
],
"datatype": "BYTES",
"name": "try-it-out",
"parameters": {
"action":"predict",
"metadata": "[
{"key": "latitude", "type": "float", "value": "-32.1"},
{"key": "longitude", "type": "float", "value": "-43.2"},
{"key": "data_source", "type": "string", "value": "hallway-camera-feed"},
{"key": "organization", "type": "string", "value": "striveworks"},
{"key": "group", "type": "string", "value": "team-remediation"},
{"key": "frame_number", "type": "int", "value": "987"},
{"key": "camera_position", "type": "json", "value": "{"angle": "10.5", "tilt": "1.6"}"}
]",
},
"shape":[height, width, channels]
}
]
}
This inference will then be queryable by any of the metadata key-value pairs that have been specified.
Inference Proxy
Inference requests are directed to the inference proxy when inference storage is turned on. The inference proxy introduces three core stages: pre-processing, predict, and post-processing.
The pre-processing stage includes the following steps:
- The inference request is accepted from the client
- The inference data (image, text, tabular row) is stripped from the request and placed in a temporary buffer
- User-defined metadata is stripped from the request and placed in a temporary buffer
The predict stage includes the following steps:
- Forward the inference request to the inference server and await a response
The post-processing stage includes the following steps:
- Compute and attach the data hash
- Upload the data to blob storage
- Upload the metadata to blob storage
- Upload the inference response to blob storage
- Publish an inference event
### Inference Event
The structure of an inference event is detailed below:
```golang
type NewInferenceStorageRequest struct {
// The id of the model that produced the inference
ModelID string `json:"model_id"`
// The id returned by the inference server/process
InferenceID string `json:"inference_id"`
// When the inference proxy accepted the inference request
RequestReceivedAt time.Time `json:"request_received_at"`
// When the inference proxy forwarded the inference request to the inference server
RequestForwardedAt time.Time `json:"request_forwarded_at"`
// When the inference response was returned from the inference server back to the inference proxy
RequestRespondedAt time.Time `json:"request_responded_at"`
// When the inference proxy published the inference event
EventPublishedAt time.Time `json:"event_published_at"`
// The internal storage url of the data file
DataStorageKey string `json:"data_storage_key"`
// The internal storage url of the inference file
InferenceStorageKey string `json:"inference_storage_key"`
// The internal storage url of the metadata file
MetadataStorageKey string `json:"metadata_storage_key"`
}