Sagemaker InvokeEndpoint API.
Parse the byte stream input.
The output of the model will be in the following format:
b'{"outputs": [" a"]}
' b'{"outputs": [" challenging"]} ' b'{"outputs": [" problem"]} ' ...
While usually each PayloadPart event from the event stream will
contain a byte array with a full json, this is not guaranteed
and some of the json objects may be split acrossPayloadPart events.
For example:
{'PayloadPart': {'Bytes': b'{"outputs": '}}
{'PayloadPart': {'Bytes': b'[" problem"]}
'}}
This class accounts for this by concatenating bytes written via the 'write' function
and then exposing a method which will return lines (ending with a '
' character) within the buffer via the 'scan_lines' function. It maintains the position of the last read position to ensure that previous bytes are not exposed again.
For more details see:
https://aws.amazon.com/blogs/machine-learning/elevating-the-generative-ai-experience-introducing-streaming-support-in-amazon-sagemaker-hosting/
Handler class to transform input from LLM to a format that SageMaker endpoint expects.
Similarly, the class handles transforming output from the SageMaker endpoint to a format that LLM class expects.
Content handler for LLM class.
Sagemaker Inference Endpoint models.
To use, you must supply the endpoint name from your deployed Sagemaker model & the region where it is deployed.
To authenticate, the AWS client uses the following methods to automatically load credentials: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
If a specific credential profile should be used, you must pass the name of the profile from the ~/.aws/credentials file that is to be used.
Make sure the credentials / roles used have the required policies to access the Sagemaker endpoint. See: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html