Oracle Data Science Model Deployment

Oracle Data Science Model Deployments are a managed resource in the OCI Data Science service for deploying machine learning models as HTTP endpoints.

Depending on how you intend to consume the predictions, you can deploy them for batch(批次) consumption or real-time(實時) consumption.

For batch consumption, the predictions can be scheduled (e.g., every hour, every day.).
For real-time consumption, a trigger would initiate the process of using the persisted model to serve a prediction. For example, deciding whether a transaction is fraudulent or not when payment is initiated, requires real-time prediction.

Deploying machine library models as HTTP web endpoints, serving predictions in real-time, is the most common way that models are productionized

Model deployments are a managed resource in the OCI Data Science service for deploying machine learning models as HTTP endpoints.

The client application request the model deployed as HTTP endpoint using API calls. These deployed models are from a list of candidate models from the model catalog. The service supports models running in a Python runtime environment, and their dependencies can be packaged in a conda environment. Again, the model deployments can be integrated with the logging service. This optional integration allows you to emit logs from a model and then inspect these logs.客戶端應用程序使用 API calls 請求為 HTTP 端點的model deployed。這些部署的模型來自model catalog中的候選模型列表。該服務支持在 Python runtime環境中運行的模型，它們的相依的物件可以打包在 conda 環境中。最後模型部署可以與loging service 結合。

Key Components

Load Balancer: Distribute traffic from one entry point to multiple model servers
VM Instances Pool: Each instance has a copy of the model server for handling concurrent inference requests
Model Artifact: The actual model file to load and its predicted code for inferencing
Conda Environment: Encapsulates all the third-party Python dependencies that the model requires
Logs: Emit logs from the inference code to OCI Logging

Create and Invoke Model Deployment

A model entry in the model catalog has two components.

Scored.py: Python file with the code to load the model and call it to perform the predictions from the requested data. 它提供了有關使用模型進行推理的說明。
Runtime.yaml: Defines the required conda environments to use for the deployment. This is a YAML file documenting the runtime environment of the model. It has information like worsening model deployment parameters defined in it.
Deployment configuration: There are other deployment configurations, such as compute shape, number of instances, logs, and such.

Creating a Model Deployment from the UI Console

Creating a Model Deployment from ADS

You can also deploy a model from the model catalog using ADS. The .deploy method of the model deployment class is used to create a model deployment. Again, there are two ways to use the .deploy method. You can create a model deployment properties object and pass that in, or you can define the model deployment properties using the .deploy methods. 可以使用 ADS 從模型目錄部署模型。模型部署類的 .deploy 方法用於創建模型部署。有兩種方法可以使用 .deploy 方法。可以創建模型部署屬性對象並將其傳入，也可以使用 .deploy 方法定義模型部署屬性。

from ads.model.deployment import ModelDeployer

deployer =ModelDeployer()
deployment = deployer.deploy(
    model_id="<MODEL_OCID>",
    display_name="Model Deployment Demo using ADS",
    instance_shape="VM.Standard2.1",
    instance_count=1,
    project_id="<PROJECT_OCID>",
    compartment_id="<COMPARTMENT_OCID>",

    # The following are optional
    access_log_group_id="<ACCESS_LOG_GROUP_OCID>",
    access_log_id="<ACCESS_LOG_OCID>",
    predict_log_group_id="<PREDICT_LOG_GROUP_OCID>",
    predict_log_id="<PREDICT_LOG_OCID>"
)

Creating a Model Deployment from the CLI

You can use OCI CLI to create a model deployment using the OCI Data Science model-deployment create command.

oci data-science model-deployment create \
--compartment-id <MODEL_DEPLOYMENT_COMPARTMENT_OCID> \
--model-deployment-configuration-details file://<MODEL_DEPLOYMENT_CONFIGURATION_FILE> \
--project-id <PROJECT_OCID> \
--category-log-details file://<OPTIONAL_LOGGING_CONFIGURATION_FILE> \
--display-name <MODEL_DEPLOYMENT_NAME>

This takes configuration JSON files as shown here, where you defined the deployment type and model configuration details.

{
      "deploymentType": "SINGLE_MODEL",
      "modelConfigurationDetails": {
       "bandwidthMbps": <YOUR_BANDWIDTH_SELECTION>,
        "instanceConfiguration": {
          "instanceShapeName": "<YOUR_VM_SHAPE>"
        },
        "modelId": "<YOUR_MODEL_OCID>",
        "scalingPolicy": {
            "instanceCount": <YOUR_INSTANCE_COUNT>,
            "policyType": "FIXED_SIZE"
         }
     }
 }

an optional log configuration JSON file can also be added to access and predict logs.

{
    "access": {
    "logGroupId": "<YOUR_LOG_GROUP_OCID>",
    "logId": "<YOUR_LOG_OCID>"
   },
    "predict": {
      "logGroupId": "<YOUR_LOG_GROUP_OCID>",
      "logId": "<YOUR_LOG_OCID>"
    }
}

Generating Predictions with the Deployed Model

Once their deployment is completed and in an active state, it can be invoked to generate predictions on the new data. This is done by sending HTTP requests to the endpoints. Model deployment in return sends an HTTP response with the predictions.

完成部署後，可以調用模型來生成對新數據的預測。
這是通過向端點發送 HTTP 請求來完成的。

Invoking Your Model

Invoking a model deployment means that you can pass feature vectors or data samples to the predicted endpoint, and then the model returns prediction for those data samples. 調用model deployment 意味著可以將特徵向量或數據樣本傳遞到預測端點，然後模型返回對這些數據樣本的預測。

You can use the sample code from the model deployment detail, which enables you to invoke the model endpoint using OCI CLI. 可以使用model deployment詳細信息中的示例代碼，這使您能夠使用 OCI CLI 調用model endpoint 。

Alternatively, you could also use the OCI Python SDK or the Java SDK. You can invoke the model using these two with the provided code sample. You can also check the detailed instructions in the OCI console itself on the model deployment page. 或者也可以使用 OCI Python SDK 或 Java SDK。可以通過提供的代碼示例使用這兩個來調用模型。還可以在model deployment console上查看 OCI 控制臺本身的詳細說明。

Managing a Model Deployment

You can view, edit, and manage your model deployments.

View deployment details: OCID, compute configuration, who created and when deployed model
Logs: Links to the prediction and access logs
Work Requests: List and status of all the operations applied on the deployment
Invoking your model: Details and instructions on how to invoke the endpoint for predictions
Edit: You can update the Name, Description, Model, VM compute shape, VM compute instances count, Logs, Load Balancer bandwidth
- Active: When the deployment is active only one change at a time can be applied. There is zero downtime (i.e., no downtime for the endpoint) during the update.
- Inactive: When the deployment is inactive: all changes can be made at once.

Deactivating or Reactivating

Model deployments can be also deactivated and reactivated. Deactivating a model deployment shuts down the instances that are associated with your deployment. Metering and billing of the model deployment instances and load balancer stops when a model deployment is deactivated.

A deactivated model deployment can be reactivated. The same model as HTTP endpoint is available upon reactivation, and requests can be made to that model endpoint. Well, you can also delete a model.

Monitoring a Model Deployment (Logs)

Logs: You can optionally use OCI Logging to log important information. This is also helpful for debugging.

Access Logs: Custom log that captures detailed information about requests sent to the model endpoint.

Predict Logs: Originate from logging (stdout and stderr) calls made in the score.py code.

Monitoring a Model Deployment (Metrics)

Monitor the health, capacity, and performance of Model Deployments with the built-in metrics using OCI Monitoring.

Model Deployment has metrics for CPU utilization, memory utilization, and network utilization (request count, latency, bandwidth).

From the console, open the Metrics page to view all the built-in metrics. You can also do the following with OCI Monitoring from the “Options” menu:

Dive deeper with each metric by opening the metric’s query in the Metrics Explorer
Create an alarm based on the metric when crossing a threshold.