LLM Streaming Cache Test
What does this worker do?
This Cloudflare Worker provides an intelligent caching layer for Large Language Model (LLM) API requests:
- Request Deduplication: When multiple identical requests are made, the worker
ensures only one API call goes to the LLM provider
- Live Streaming: The first request streams the response in real-time from the LLM
- Smart Broadcasting: Subsequent identical requests receive the same stream without
making additional API calls
- Response Caching: Completed responses are cached in Cloudflare KV for instant
retrieval
- Cost Optimization: Reduces API costs by preventing duplicate LLM requests
This test demonstrates the behavior with three simultaneous requests: immediate, 1-second delayed, and
5-second delayed.
Request 1 (Immediate)
Ready
Request 2 (1.5s delay)
Ready
Request 3 (5s delay)
Ready