Azure OpenAI “On your data”: Stream only one JSON field to the UI while processing the rest in the background (without extra latency)

Kaja Sherief 115 Reputation points
2025-08-11T07:22:07.2066667+00:00

I'm experiencing performance issues when trying to optimize streaming responses from Azure OpenAI "on your data" feature for selective JSON property extraction.
Current Setup:

  • Using Azure OpenAI "on your data" feature
  • Implementing streaming responses (chunk-by-chunk delivery)
  • UI correctly receives and displays streaming data
  • Azure "on your data" doesn't natively support JSON object responses, so I'm converting string responses to JSON programmatically

Current Challenge: With streaming responses, I receive the same JSON format as non-streaming, but my UI only requires one specific property from the JSON response. The remaining JSON properties need to be processed for background tasks.

Performance Issue:

  • Direct streaming response: 3.5-4.5 seconds ✅
  • Post-processing approach (wait for complete response → clean up → re-chunk): 6-7 seconds ❌

What I've Tried:

  1. Waiting for the complete streaming response to finish
  2. Processing/cleaning the full JSON response
  3. Re-sending the processed data chunk-by-chunk via API
  4. Various optimization attempts on the post-processing pipeline

Questions

  1. Is there a supported way in Azure OpenAI “On your data” streaming to separate fields (e.g., stream answer tokens as plain text while delivering citations/metadata as a distinct channel or event type) so that I can render the answer immediately and handle the rest in the background?
  2. If not, what’s the recommended pattern to keep latency near pure streaming (~3.5–4.5s) while still obtaining structured data for background tasks?
  3. What are the best practices for handling mixed streaming scenarios where UI needs immediate data but background processing requires the full response?
  4. Can I implement real-time JSON parsing during streaming to extract the required property without waiting for the complete response?
  5. Are there examples for incremental/streaming JSON parsing that work well with token-level deltas (so I can extract only the answer path without waiting for the full object)?
  6. Would Microsoft recommend a JSON Lines (JSONL) or framed output approach for streaming (e.g., model first emits {"type":"answer","delta":"..."} lines, then {"type":"meta",...}), or is there a better practice for On your data?

Any guidance on optimizing this workflow while maintaining the 3.5-4.5 second response time would be greatly appreciated.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.