Add live audio transcription streaming support to Foundry Local Rust SDK by rui-ren · Pull Request #613 · microsoft/Foundry-Local

rui-ren · 2026-04-08T19:45:05Z

Add live audio transcription streaming support to Foundry Local Rust SDK

Description

Ports the C# live audio transcription feature (PR #485) to the Rust SDK with full API parity.

The existing AudioClient only supports file-based transcription. This PR introduces LiveAudioTranscriptionSession that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async stream.

What's included

New files

sdk/rust/src/openai/live_audio_client.rs — Streaming session with start(), append(), get_transcription_stream(), stop(), plus types, cancellation support, and unit tests
sdk/rust/tests/integration/live_audio_test.rs — E2E integration test with synthetic PCM audio
samples/rust/live-audio-transcription-example/ — Full sample with real microphone capture (cpal) and resampling

Modified files

sdk/rust/src/detail/core_interop.rs — Added StreamingRequestBuffer FFI struct and execute_command_with_binary() for binary audio data
sdk/rust/src/openai/audio_client.rs — Added create_live_transcription_session() factory method
sdk/rust/src/detail/model.rs, model_variant.rs — Wired factory method to Model
sdk/rust/src/openai/mod.rs, src/lib.rs — Module registration and public exports
sdk/rust/Cargo.toml — Added tokio-util dependency for CancellationToken

API surface

let audio_client = model.create_audio_client();
let session = audio_client.create_live_transcription_session();

session.settings.sample_rate = 16000;
session.settings.channels = 1;
session.settings.language = Some("en".into());

session.start(None).await?;

// Push audio from microphone callback
session.append(&pcm_bytes, None).await?;

// Read results as async stream
use tokio_stream::StreamExt;
let mut stream = session.get_transcription_stream()?;
while let Some(result) = stream.next().await {
    let result = result?;
    println!("{}", result.content[0].text);
}

session.stop(None).await?;

C# API parity

C#	Rust	Status
`CreateLiveTranscriptionSession()`	`create_live_transcription_session()`	✅
`StartAsync(CancellationToken)`	`start(Option<CancellationToken>)`	✅
`AppendAsync(ReadOnlyMemory<byte>, CancellationToken)`	`append(&[u8], Option<CancellationToken>)`	✅
`GetTranscriptionStream(CancellationToken)`	`get_transcription_stream()`	✅
`StopAsync(CancellationToken)` + cancel-safe cleanup	`stop(Option<CancellationToken>)` + cancel-safe cleanup	✅
`IAsyncDisposable.DisposeAsync()`	`Drop` with best-effort native stop	✅
`LiveAudioTranscriptionResponse.Content[0].Text`	`response.content[0].text`	✅
`LiveAudioTranscriptionResponse.Content[0].Transcript`	`response.content[0].transcript`	✅
`LiveAudioTranscriptionResponse.IsFinal`	`response.is_final`	✅
`LiveAudioTranscriptionResponse.StartTime/EndTime`	`response.start_time` / `response.end_time`	✅
`LiveAudioTranscriptionOptions` (SampleRate, Channels, BitsPerSample, Language, PushQueueCapacity)	`LiveAudioTranscriptionOptions` (sample_rate, channels, bits_per_sample, language, push_queue_capacity)	✅
`CoreErrorResponse.TryParse()`	`CoreErrorResponse::try_parse()`	✅
Native commands: `audio_stream_start`, `audio_stream_push`, `audio_stream_stop`	Same commands via `execute_command` / `execute_command_with_binary`	✅

Design highlights

CancellationToken support — start/append/stop accept Option<CancellationToken> via tokio_util::sync::CancellationToken
Cancel-safe stop — stop() always performs native audio_stream_stop even if token fires, preventing native session leaks (matches C# StopAsync pattern)
Response envelope — LiveAudioTranscriptionResponse uses content: Vec<ContentPart> matching C#'s ConversationItem.Content[0].Text/Transcript
Bounded push queue — Backpressure via bounded channel (capacity=100); prevents unbounded memory growth
Push loop on blocking thread — execute_command_with_binary FFI calls run on spawn_blocking, keeping async runtime free
Settings freeze — Audio format settings are cloned at start() and immutable during the session
Drop safety — Best-effort synchronous audio_stream_stop in Drop to prevent native session leaks
FFI null pointer safety — Empty binary slices use std::ptr::null() to avoid dangling pointer across FFI boundary

Verified working

✅ SDK build succeeds (0 errors, 0 clippy warnings)
✅ 13 unit tests passing (JSON deserialization, settings defaults, error parsing, content envelope)
✅ E2E pipeline: Microphone (48kHz/2ch/F32) → Resample (16kHz/mono/16-bit) → SDK → Core.dll → onnxruntime-genai.dll → nemotron model
✅ Synthetic audio test: 30 chunks (96KB PCM) pushed with clean session lifecycle
✅ Live microphone test: real-time capture, session start/stop, no native errors

Stats

14 files changed, 1,329 additions, 2 deletions

vercel · 2026-04-08T19:45:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
foundry-local	Ready	Preview, Comment	Apr 8, 2026 11:30pm

Copilot

Pull request overview

This PR adds live (chunked) PCM audio transcription streaming to the Foundry Local Rust SDK, aligning the Rust API with the existing C# live audio transcription session feature and extending the SDK beyond file-based transcription.

Changes:

Introduces LiveAudioTranscriptionSession + associated response/options/types and stream wrapper in the Rust SDK.
Extends the Rust FFI bridge with execute_command_with_binary() to send JSON params + binary PCM payloads.
Adds integration test coverage and a new Rust sample demonstrating microphone capture (cpal) and streaming transcription.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
sdk/rust/src/openai/live_audio_client.rs	New streaming session implementation, types, cancellation support, and unit tests
sdk/rust/src/detail/core_interop.rs	Adds `StreamingRequestBuffer` + optional `execute_command_with_binary` symbol support
sdk/rust/src/openai/audio_client.rs	Adds `create_live_transcription_session()` factory method
sdk/rust/src/detail/model.rs	Exposes `Model::create_live_transcription_session()`
sdk/rust/src/detail/model_variant.rs	Wires variant factory for live transcription sessions
sdk/rust/src/openai/mod.rs	Registers and re-exports live audio transcription module/types
sdk/rust/src/lib.rs	Public re-exports for the new live transcription session/types
sdk/rust/Cargo.toml	Adds `tokio-util` for `CancellationToken`
sdk/rust/tests/integration/main.rs	Registers the new integration test module
sdk/rust/tests/integration/live_audio_test.rs	New E2E-ish integration test using synthetic PCM audio
samples/rust/live-audio-transcription-example/src/main.rs	New microphone/synthetic streaming transcription sample
samples/rust/live-audio-transcription-example/Cargo.toml	Declares sample dependencies (cpal, tokio, sdk path dep)
samples/rust/Cargo.toml	Adds the new sample crate to the workspace
codex-feedback.md	Adds review/validation notes for the feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sdk/rust/src/openai/live_audio_client.rs

Copilot · 2026-04-08T19:49:41Z

sdk/rust/src/openai/live_audio_client.rs

+                _ = token.cancelled() => {
+                    return Err(FoundryLocalError::CommandExecution {
+                        reason: "Start cancelled".into(),
+                    });
+                }


When the cancellation token fires during start(), this branch returns without awaiting start_future. The spawn_blocking task will continue running; if it ends up creating a native session successfully, the session handle is dropped and the native session may leak. To keep the “clean (not-started) state” guarantee, ensure you await start_future and, if a handle is produced after cancellation, issue a best-effort audio_stream_stop cleanup (possibly in a detached background task) before returning.

sdk/rust/src/openai/live_audio_client.rs

sdk/rust/tests/integration/live_audio_test.rs

Port the C# live audio transcription feature (PR #485) to the Rust SDK with full API parity. New files: - src/openai/live_audio_client.rs: LiveAudioTranscriptionSession with start/append/get_transcription_stream/stop lifecycle, response types, CoreErrorResponse, and unit tests - tests/integration/live_audio_test.rs: E2E test with synthetic PCM audio Modified files: - src/detail/core_interop.rs: StreamingRequestBuffer FFI struct and execute_command_with_binary method for binary audio data - src/openai/audio_client.rs: create_live_transcription_session() factory - src/detail/model.rs, model_variant.rs: create_live_transcription_session() - src/openai/mod.rs, src/lib.rs: Module and public type exports API surface: let audio_client = model.create_audio_client(); let session = audio_client.create_live_transcription_session(); session.settings.sample_rate = 16000; session.start().await?; session.append(&pcm_bytes).await?; let mut stream = session.get_transcription_stream()?; // use tokio_stream::StreamExt; while let Some(result) = stream.next().await { ... } session.stop().await?; Design highlights: - Bounded push channel with backpressure (capacity=100) - Push loop runs on blocking thread via spawn_blocking - Fail-fast on native errors (no retry logic) - Settings frozen at start() via clone snapshot - Output channel completed on stop() after final result Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds samples/rust/live-audio-transcription-example/ that demonstrates the full pipeline: SDK Core.dll onnxruntime-genai.dll nemotron. Tested E2E with synthetic 440Hz PCM audio (30 chunks, 96000 bytes): - FoundryLocalManager initialized - nemotron model loaded - audio_stream_start (session handle) - audio_stream_push 30 (execute_command_with_binary) - audio_stream_stop (clean shutdown) - No errors from native core / onnxruntime-genai Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Uses cpal for cross-platform microphone capture with automatic format adaptation: - Queries device default config (e.g. 48kHz/2ch/F32) - Resamples to 16kHz mono via linear interpolation - Converts f32 16-bit PCM little-endian for the SDK Two modes: cargo run # Live microphone (press ENTER to stop) cargo run -- --synth # Synthetic 440Hz sine wave Tested E2E: Microphone SDK Core.dll onnxruntime-genai.dll Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- core_interop.rs: Use std::ptr::null() for empty binary_data slices to avoid passing dangling pointer across FFI boundary - live_audio_client.rs: Call native audio_stream_stop synchronously in Drop to prevent native session leaks when stop() is not called Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Address codex-feedback.md parity gaps: 1. CancellationToken support: start/append/stop now accept Option<CancellationToken> (via tokio_util::sync::CancellationToken). stop() uses cancel-safe pattern matching C# StopAsync native session stop is always performed even if token fires. 2. Response envelope matches C#: LiveAudioTranscriptionResponse now has content: Vec<ContentPart> with text/transcript fields, so callers use result.content[0].text (identical to C# Content[0].Text). 3. Added tokio-util dependency for CancellationToken. Updated E2E sample and integration test to use new API shape. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Sample: update download progress callback from &str to f64 to match upstream API change (PR #608) - Apply cargo fmt to all SDK and sample files for CI compliance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The function and its AudioClient import triggered -D warnings (dead_code) in the CI build. The E2E test creates the session directly via model.create_audio_client() and doesn't use this helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 8, 2026 19:45

Copilot started reviewing on behalf of rui-ren April 8, 2026 19:46 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

ruiren_microsoft and others added 7 commits April 8, 2026 14:28

Update codex-feedback.md: mark parity gaps as resolved

d8459b2

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rui-ren force-pushed the ruiren/live-audio-stream-rust branch from 3422045 to 0f8ae7a Compare April 8, 2026 21:33

vercel bot deployed to Preview April 8, 2026 21:34 View deployment

vercel bot deployed to Preview April 8, 2026 23:30 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add live audio transcription streaming support to Foundry Local Rust SDK#613

Add live audio transcription streaming support to Foundry Local Rust SDK#613
rui-ren wants to merge 8 commits intomainfrom
ruiren/live-audio-stream-rust

rui-ren commented Apr 8, 2026

Uh oh!

vercel bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rui-ren commented Apr 8, 2026