Add live audio transcription streaming support to Foundry Local Rust SDK#613
Add live audio transcription streaming support to Foundry Local Rust SDK#613
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR adds live (chunked) PCM audio transcription streaming to the Foundry Local Rust SDK, aligning the Rust API with the existing C# live audio transcription session feature and extending the SDK beyond file-based transcription.
Changes:
- Introduces
LiveAudioTranscriptionSession+ associated response/options/types and stream wrapper in the Rust SDK. - Extends the Rust FFI bridge with
execute_command_with_binary()to send JSON params + binary PCM payloads. - Adds integration test coverage and a new Rust sample demonstrating microphone capture (cpal) and streaming transcription.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/rust/src/openai/live_audio_client.rs | New streaming session implementation, types, cancellation support, and unit tests |
| sdk/rust/src/detail/core_interop.rs | Adds StreamingRequestBuffer + optional execute_command_with_binary symbol support |
| sdk/rust/src/openai/audio_client.rs | Adds create_live_transcription_session() factory method |
| sdk/rust/src/detail/model.rs | Exposes Model::create_live_transcription_session() |
| sdk/rust/src/detail/model_variant.rs | Wires variant factory for live transcription sessions |
| sdk/rust/src/openai/mod.rs | Registers and re-exports live audio transcription module/types |
| sdk/rust/src/lib.rs | Public re-exports for the new live transcription session/types |
| sdk/rust/Cargo.toml | Adds tokio-util for CancellationToken |
| sdk/rust/tests/integration/main.rs | Registers the new integration test module |
| sdk/rust/tests/integration/live_audio_test.rs | New E2E-ish integration test using synthetic PCM audio |
| samples/rust/live-audio-transcription-example/src/main.rs | New microphone/synthetic streaming transcription sample |
| samples/rust/live-audio-transcription-example/Cargo.toml | Declares sample dependencies (cpal, tokio, sdk path dep) |
| samples/rust/Cargo.toml | Adds the new sample crate to the workspace |
| codex-feedback.md | Adds review/validation notes for the feature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _ = token.cancelled() => { | ||
| return Err(FoundryLocalError::CommandExecution { | ||
| reason: "Start cancelled".into(), | ||
| }); | ||
| } |
There was a problem hiding this comment.
When the cancellation token fires during start(), this branch returns without awaiting start_future. The spawn_blocking task will continue running; if it ends up creating a native session successfully, the session handle is dropped and the native session may leak. To keep the “clean (not-started) state” guarantee, ensure you await start_future and, if a handle is produced after cancellation, issue a best-effort audio_stream_stop cleanup (possibly in a detached background task) before returning.
Port the C# live audio transcription feature (PR #485) to the Rust SDK with full API parity. New files: - src/openai/live_audio_client.rs: LiveAudioTranscriptionSession with start/append/get_transcription_stream/stop lifecycle, response types, CoreErrorResponse, and unit tests - tests/integration/live_audio_test.rs: E2E test with synthetic PCM audio Modified files: - src/detail/core_interop.rs: StreamingRequestBuffer FFI struct and execute_command_with_binary method for binary audio data - src/openai/audio_client.rs: create_live_transcription_session() factory - src/detail/model.rs, model_variant.rs: create_live_transcription_session() - src/openai/mod.rs, src/lib.rs: Module and public type exports API surface: let audio_client = model.create_audio_client(); let session = audio_client.create_live_transcription_session(); session.settings.sample_rate = 16000; session.start().await?; session.append(&pcm_bytes).await?; let mut stream = session.get_transcription_stream()?; // use tokio_stream::StreamExt; while let Some(result) = stream.next().await { ... } session.stop().await?; Design highlights: - Bounded push channel with backpressure (capacity=100) - Push loop runs on blocking thread via spawn_blocking - Fail-fast on native errors (no retry logic) - Settings frozen at start() via clone snapshot - Output channel completed on stop() after final result Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds samples/rust/live-audio-transcription-example/ that demonstrates the full pipeline: SDK Core.dll onnxruntime-genai.dll nemotron. Tested E2E with synthetic 440Hz PCM audio (30 chunks, 96000 bytes): - FoundryLocalManager initialized - nemotron model loaded - audio_stream_start (session handle) - audio_stream_push 30 (execute_command_with_binary) - audio_stream_stop (clean shutdown) - No errors from native core / onnxruntime-genai Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Uses cpal for cross-platform microphone capture with automatic format adaptation: - Queries device default config (e.g. 48kHz/2ch/F32) - Resamples to 16kHz mono via linear interpolation - Converts f32 16-bit PCM little-endian for the SDK Two modes: cargo run # Live microphone (press ENTER to stop) cargo run -- --synth # Synthetic 440Hz sine wave Tested E2E: Microphone SDK Core.dll onnxruntime-genai.dll Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- core_interop.rs: Use std::ptr::null() for empty binary_data slices to avoid passing dangling pointer across FFI boundary - live_audio_client.rs: Call native audio_stream_stop synchronously in Drop to prevent native session leaks when stop() is not called Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address codex-feedback.md parity gaps: 1. CancellationToken support: start/append/stop now accept Option<CancellationToken> (via tokio_util::sync::CancellationToken). stop() uses cancel-safe pattern matching C# StopAsync native session stop is always performed even if token fires. 2. Response envelope matches C#: LiveAudioTranscriptionResponse now has content: Vec<ContentPart> with text/transcript fields, so callers use result.content[0].text (identical to C# Content[0].Text). 3. Added tokio-util dependency for CancellationToken. Updated E2E sample and integration test to use new API shape. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Sample: update download progress callback from &str to f64 to match upstream API change (PR #608) - Apply cargo fmt to all SDK and sample files for CI compliance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3422045 to
0f8ae7a
Compare
The function and its AudioClient import triggered -D warnings (dead_code) in the CI build. The E2E test creates the session directly via model.create_audio_client() and doesn't use this helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add live audio transcription streaming support to Foundry Local Rust SDK
Description
Ports the C# live audio transcription feature (PR #485) to the Rust SDK with full API parity.
The existing
AudioClientonly supports file-based transcription. This PR introducesLiveAudioTranscriptionSessionthat accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async stream.What's included
New files
sdk/rust/src/openai/live_audio_client.rs— Streaming session withstart(),append(),get_transcription_stream(),stop(), plus types, cancellation support, and unit testssdk/rust/tests/integration/live_audio_test.rs— E2E integration test with synthetic PCM audiosamples/rust/live-audio-transcription-example/— Full sample with real microphone capture (cpal) and resamplingModified files
sdk/rust/src/detail/core_interop.rs— AddedStreamingRequestBufferFFI struct andexecute_command_with_binary()for binary audio datasdk/rust/src/openai/audio_client.rs— Addedcreate_live_transcription_session()factory methodsdk/rust/src/detail/model.rs,model_variant.rs— Wired factory method toModelsdk/rust/src/openai/mod.rs,src/lib.rs— Module registration and public exportssdk/rust/Cargo.toml— Addedtokio-utildependency forCancellationTokenAPI surface
C# API parity
CreateLiveTranscriptionSession()create_live_transcription_session()StartAsync(CancellationToken)start(Option<CancellationToken>)AppendAsync(ReadOnlyMemory<byte>, CancellationToken)append(&[u8], Option<CancellationToken>)GetTranscriptionStream(CancellationToken)get_transcription_stream()StopAsync(CancellationToken)+ cancel-safe cleanupstop(Option<CancellationToken>)+ cancel-safe cleanupIAsyncDisposable.DisposeAsync()Dropwith best-effort native stopLiveAudioTranscriptionResponse.Content[0].Textresponse.content[0].textLiveAudioTranscriptionResponse.Content[0].Transcriptresponse.content[0].transcriptLiveAudioTranscriptionResponse.IsFinalresponse.is_finalLiveAudioTranscriptionResponse.StartTime/EndTimeresponse.start_time/response.end_timeLiveAudioTranscriptionOptions(SampleRate, Channels, BitsPerSample, Language, PushQueueCapacity)LiveAudioTranscriptionOptions(sample_rate, channels, bits_per_sample, language, push_queue_capacity)CoreErrorResponse.TryParse()CoreErrorResponse::try_parse()audio_stream_start,audio_stream_push,audio_stream_stopexecute_command/execute_command_with_binaryDesign highlights
start/append/stopacceptOption<CancellationToken>viatokio_util::sync::CancellationTokenstop()always performs nativeaudio_stream_stopeven if token fires, preventing native session leaks (matches C#StopAsyncpattern)LiveAudioTranscriptionResponseusescontent: Vec<ContentPart>matching C#'sConversationItem.Content[0].Text/Transcriptexecute_command_with_binaryFFI calls run onspawn_blocking, keeping async runtime freestart()and immutable during the sessionaudio_stream_stopinDropto prevent native session leaksstd::ptr::null()to avoid dangling pointer across FFI boundaryVerified working
Stats