DopikAI - Your Trusted AI Development Partner
DopikAI - Your Trusted AI Development Partner
  • About
  • Services
    • AlaaS
    • AI development
  • Case Study
  • Blogs
Contact us
Anthropic’s new Claude prompt caching will save developers a fortune
By ML Experts | August 18th, 2024 |  
2,996
 views

Anthropic introduced prompt caching on its API, which remembers the context between API calls and allows developers to avoid repeating prompts.

The prompt caching feature is available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku, but support for the largest Claude model, Opus, is still coming soon.

Prompt caching, described in this 2023 paper, lets users keep frequently used contexts in their sessions. As the models remember these prompts, users can add additional background information without increasing costs. This is helpful in instances where someone wants to send a large amount of context in a prompt and then refer back to it in different conversations with the model. It also lets developers and other users better fine-tune model responses.

Anthropic said early users “have seen substantial speed and cost improvements with prompt caching for a variety of use cases — from including a full knowledge base to 100-shot examples to including each turn of a conversation in their prompt.”

The company said potential use cases include reducing costs and latency for long instructions and uploaded documents for conversational agents, faster autocompletion of codes, providing multiple instructions to agentic search tools and embedding entire documents in a prompt.

Pricing cached prompts

One advantage of caching prompts is lower prices per token, and Anthropic said using cached prompts “is significantly cheaper” than the base input token price.

For Claude 3.5 Sonnet, writing a prompt to be cached will cost $3.75 per 1 million tokens (MTok), but using a cached prompt will cost $0.30 per MTok. The base price of an input to the Claude 3.5 Sonnet model is $3/MTok, so by paying a little more upfront, you can expect to get a 10x savings increase if you use the cached prompt the next time.

Claude 3 Haiku users will pay $0.30/MTok to cache and $0.03/MTok when using stored prompts.

While prompt caching is not yet available for Claude 3 Opus, Anthropic already published its prices. Writing to cache will cost $18.75/MTok, but accessing the cached prompt will cost $1.50/MTok.

However, as AI influencer Simon Willison noted on X, Anthropic’s cache only has a 5-minute lifetime and is refreshed upon each use.

Of course, this is not the first time Anthropic has tried to compete against other AI platforms through pricing. Before the release of the Claude 3 family of models, Anthropic slashed the prices of its tokens.

It’s now in something of a “race to the bottom” against rivals including Google and OpenAI when it comes to offering low-priced options for third-party developers building atop its platform.

Highly requested feature

Other platforms offer a version of prompt caching. Lamina, an LLM inference system, utilizes KV caching to lower the cost of GPUs. A cursory look through OpenAI’s developer forums or GitHub will bring up questions about how to cache prompts.

Caching prompts are not the same as those of large language model memory. OpenAI’s GPT-4o, for example, offers a memory where the model remembers preferences or details. However, it does not store the actual prompts and responses like prompt caching.

Most popular

How to use ChatGPT’s new memory feature, temporary chats, and chat history
ChatGPT’s memory can now reference all past conversations, not just what you tell it to
Blockchain network provider Horizen launches no-code tokenization platform
Related
5,000 vibe-coded apps just proved shadow AI is the new S3 bucket crisis
Intercom, now called Fin, launches an AI agent whose only job is managing another AI agent
Salesforce launches Agentforce Operations to fix the workflows breaking enterprise AI
Microsoft launches 3 new AI models in direct shot at OpenAI and Google
The three disciplines separating AI agent demos from real-world deployment
DopikAI - Your Trusted AI Development Partner
  • Home
  • Blog
  • About DopikAi
  • Contact us
  • Our Services
  • Case Study
  • Privacy Policy
Address: No.41 Lane 99 Ai Mo street, Bo De Ward, Long Bien District, Hanoi, Vietnam Email: [email protected]
Contact Us
Fill out the form below and we will get in touch with you shortly.

    © Copyright DopikAI 2022 | All Rights Reserved.