Add live audio transcription streaming support to Foundry Local Python SDK by rui-ren · Pull Request #612 · microsoft/Foundry-Local

rui-ren · 2026-04-08T18:48:55Z

Description

Adds real-time audio streaming support to the Foundry Local Python SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI's StreamingProcessor API (Nemotron ASR).

This is the Python port of C# PR #485 with full feature parity. The existing AudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionSession that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as a synchronous generator.

What's included

New files

src/openai/live_audio_transcription_client.py — Streaming session with start(), append(), get_transcription_stream(), stop()
src/openai/live_audio_transcription_types.py — LiveAudioTranscriptionResponse (ConversationItem-shaped), LiveAudioTranscriptionOptions, CoreErrorResponse, TranscriptionContentPart
test/openai/test_live_audio_transcription.py — 22 unit tests for deserialization, settings, state guards, streaming pipeline
test/openai/test_live_audio_transcription_e2e.py — E2E test with real native DLLs and nemotron model
test/openai/conftest.py — DLL preload for E2E tests
samples/python/live-audio-transcription/src/app.py — Live microphone transcription demo

Modified files

src/openai/audio_client.py — Added create_live_transcription_session() factory method
src/detail/core_interop.py — Added StreamingRequestBuffer struct, execute_command_with_binary(), start_audio_stream, push_audio_data, stop_audio_stream methods, and _load_dll_win() for robust DLL loading on Windows
src/openai/__init__.py — Exported new live transcription types
test/conftest.py — Pre-load ORT/GenAI DLLs before brotli import to avoid Windows DLL search conflicts

API surface

audio_client = model.get_audio_client()
session = audio_client.create_live_transcription_session()

session.settings.sample_rate = 16000
session.settings.channels = 1
session.settings.language = "en"

session.start()

# Push audio from microphone callback (thread-safe)
session.append(pcm_bytes)

# Read results as synchronous generator
for result in session.get_transcription_stream():
    print(result.content[0].text)

session.stop()

C# parity

C# API	Python API	Notes
`CreateLiveTranscriptionSession()`	`create_live_transcription_session()`	✅
`StartAsync(ct)`	`start()`	Sync (matches Python SDK convention)
`AppendAsync(ReadOnlyMemory<byte>, ct)`	`append(bytes)`	Thread-safe, copies data
`GetTranscriptionStream()`	`get_transcription_stream()`	Generator (sync equivalent of IAsyncEnumerable)
`StopAsync(ct)`	`stop()`	Drains push queue, sends native stop, surfaces final result
`IAsyncDisposable`	Context manager (`with`)	Idiomatic Python equivalent
`LiveAudioTranscriptionOptions`	`LiveAudioTranscriptionOptions`	Same fields: sample_rate, channels, bits_per_sample, language, push_queue_capacity
`LiveAudioTranscriptionResponse`	`LiveAudioTranscriptionResponse`	ConversationItem-shaped: content[0].text/transcript, is_final, start_time, end_time

Design highlights

Output type alignment — LiveAudioTranscriptionResponse uses the OpenAI Realtime ConversationItem shape (content[0].text/transcript) for forward compatibility
Internal push queue — Bounded queue.Queue serializes audio pushes from any thread (safe for mic callbacks) with backpressure
Fail-fast on errors — Push loop terminates immediately on any native error (no retry logic)
Settings freeze — Audio format settings are snapshot-copied at start() and immutable during the session
Buffer copy — append() copies input data to avoid issues with callers reusing buffers (e.g., PyAudio)
Routes through existing exports — start_audio_stream and stop_audio_stream route through execute_command; push_audio_data routes through execute_command_with_binary — no new native entry points required
DLL loading fix — Uses LoadLibraryExW with LOAD_WITH_ALTERED_SEARCH_PATH on Windows to prevent conflicts with stale system-level ORT DLLs

Verified working

✅ 22 unit tests passing (deserialization, settings, state guards, streaming pipeline with mocked core)
✅ E2E test passing (SDK → Core.dll → onnxruntime-genai.dll → onnxruntime.dll with nemotron model)
✅ Full session lifecycle: start → push synthetic PCM → stop → verify results
✅ Existing tests unaffected

vercel · 2026-04-08T18:49:03Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
foundry-local	Ready	Preview, Comment	Apr 8, 2026 11:25pm

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds end-to-end live audio (PCM chunk) streaming transcription to the Foundry Local Python SDK, including session lifecycle management, native interop support for binary payloads, and tests/samples to validate Windows DLL loading and Nemotron ASR streaming.

Changes:

Introduces LiveAudioTranscriptionSession + supporting response/options/error types for streaming microphone-style PCM input.
Extends CoreInterop with a StreamingRequestBuffer and execute_command_with_binary() to push raw audio to native core.
Adds unit + E2E coverage and a sample app, including Windows DLL preload workarounds for brotli/LoadLibrary behavior.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
sdk/python/src/openai/live_audio_transcription_client.py	Implements the streaming session (start/append/stream/stop) and background push loop.
sdk/python/src/openai/live_audio_transcription_types.py	Adds response/options/error DTOs and JSON parsing helpers.
sdk/python/src/detail/core_interop.py	Adds binary-command execution path and Windows DLL loading hardening for ORT/GenAI.
sdk/python/src/openai/audio_client.py	Adds factory method to create the live transcription session.
sdk/python/src/openai/init.py	Exports new session and types from the openai package surface.
sdk/python/test/openai/test_live_audio_transcription.py	Unit tests for parsing/options/state guards and mocked streaming behavior.
sdk/python/test/openai/test_live_audio_transcription_e2e.py	Windows-only E2E test exercising real native DLLs and nemotron model pipeline.
sdk/python/test/openai/conftest.py	Preloads ORT/GenAI DLLs for E2E to avoid brotli-related DLL search changes.
sdk/python/test/conftest.py	Preloads ORT/GenAI DLLs early in all tests to avoid Windows DLL search conflicts.
samples/python/live-audio-transcription/src/app.py	Demonstration app using PyAudio to stream microphone PCM into the session.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sdk/python/src/openai/live_audio_transcription_client.py

sdk/python/test/openai/test_live_audio_transcription.py

sdk/python/src/openai/live_audio_transcription_types.py

sdk/python/test/openai/conftest.py

support python sdk e2e live audio transcription

e91b014

Copilot AI review requested due to automatic review settings April 8, 2026 18:48

vercel bot deployed to Preview April 8, 2026 18:49 View deployment

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Copilot started reviewing on behalf of rui-ren April 8, 2026 19:09 View session

fix copilot review

f6b7857

vercel bot deployed to Preview April 8, 2026 19:27 View deployment

'fix'

f6e91b8

vercel bot deployed to Preview April 8, 2026 21:47 View deployment

Merge branch 'main' into ruiren/live-audio-stream-python

7f58113

vercel bot deployed to Preview April 8, 2026 23:25 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add live audio transcription streaming support to Foundry Local Python SDK#612

Add live audio transcription streaming support to Foundry Local Python SDK#612
rui-ren wants to merge 4 commits intomainfrom
ruiren/live-audio-stream-python

rui-ren commented Apr 8, 2026

Uh oh!

vercel bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rui-ren commented Apr 8, 2026

Description

What's included

API surface

C# parity

Design highlights

Verified working

Uh oh!

vercel bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Apr 8, 2026 •

edited

Loading