ChatBedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

%pip install --upgrade --quiet  langchain-aws

Note: you may need to restart the kernel to use updated packages.

from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage

API Reference:HumanMessage

chat = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    model_kwargs={"temperature": 0.1},
)

messages = [
    HumanMessage(
        content="Translate this sentence from English to French. I love programming."
    )
]
chat.invoke(messages)

AIMessage(content="Voici la traduction en français :\n\nJ'aime la programmation.", additional_kwargs={'usage': {'prompt_tokens': 20, 'completion_tokens': 21, 'total_tokens': 41}}, response_metadata={'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0', 'usage': {'prompt_tokens': 20, 'completion_tokens': 21, 'total_tokens': 41}}, id='run-994f0362-0e50-4524-afad-3c4f5bb11328-0')

Streaming

To stream responses, you can use the runnable .stream() method.

for chunk in chat.stream(messages):
    print(chunk.content, end="", flush=True)

Voici la traduction en français :

J'aime la programmation.

LLM Caching with OpenSearch Semantic Cache

Use OpenSearch as a semantic cache to cache prompts and responses and evaluate hits based on semantic similarity.

from langchain_aws import ChatBedrock
from langchain.globals import set_llm_cache
from langchain_core.messages import HumanMessage
from langchain_community.cache import OpenSearchSemanticCache
from langchain_aws import BedrockEmbeddings

bedrock_embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v1", 
    region_name='us-east-1'
)

chat = ChatBedrock(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    model_kwargs={"temperature": 0.5}
)

# Enable LLM cache. Make sure OpenSearch is set up and running. Update URL accordingly.
set_llm_cache(
    OpenSearchSemanticCache(
        opensearch_url="http://localhost:9200", 
        embedding=bedrock_embeddings
    )
)

API Reference:set_llm_cache | HumanMessage | OpenSearchSemanticCache

%%time
# The first time, it is not yet in cache, so it should take longer
messages = [HumanMessage(content="tell me about Amazon Bedrock")]
response_text = chat.invoke(messages)

print(response_text)

%%time
# The second time, while not a direct hit, the question is semantically similar to the original question,
# so it uses the cached result!

messages = [HumanMessage(content="what is amazon bedrock")]
response_text = chat.invoke(messages)

print(response_text)

ChatBedrock

Streaming

LLM Caching with OpenSearch Semantic Cache

Was this page helpful?

You can leave detailed feedback on GitHub.

ChatBedrock

Streaming​

LLM Caching with OpenSearch Semantic Cache​

Was this page helpful?

You can leave detailed feedback on GitHub.

Streaming

LLM Caching with OpenSearch Semantic Cache