Skip to content

Fully configure frame processors when they are used directly on an audio stream#679

Open
1egoman wants to merge 20 commits into
mainfrom
frame-processor-on-audio-stream
Open

Fully configure frame processors when they are used directly on an audio stream#679
1egoman wants to merge 20 commits into
mainfrom
frame-processor-on-audio-stream

Conversation

@1egoman
Copy link
Copy Markdown
Contributor

@1egoman 1egoman commented May 20, 2026

Updates the python sdk so that FrameProcessor-based noise cancellation providers can be used directly on AudioStream, without having to go through the agent's RoomIO to be able to initialize itself with credentials.

For example, with this change, something like the below becomes possible:

stream = rtc.AudioStream.from_track(                                                                                                                   
    track=track,
    sample_rate=SAMPLE_RATE,                                             
    num_channels=CHANNELS,
    noise_cancellation=ai_coustics.audio_enhancement(model=ai_coustics.EnhancerModel.QUAIL_VF_L)  ,
) 

The way this works - Tracks now keep track of which room they are part of (holding a weakref value). When the room a track is in changes, it computes new frame processor options and sends these to any AudioStreams which are associated with the track.

The noise_cancellation_leave_open parameter allows the agents sdk to call this from_track method with a frame processor which remains open across the whole session, and won't be auto-closed when the track is closed.

This goes along with livekit/agents#5867, which removes the relevant event handling logic in the agents sdk. I will follow up with a node version of this once the python one is in a good state.

Todo

  • Add some tests for this newly added behavior

@1egoman 1egoman force-pushed the frame-processor-on-audio-stream branch from 3e5a9ab to f62c247 Compare May 26, 2026 15:15
Comment thread livekit-rtc/livekit/rtc/track.py Outdated
@1egoman 1egoman marked this pull request as ready for review May 26, 2026 21:25
devin-ai-integration[bot]

This comment was marked as resolved.

1egoman added 2 commits May 27, 2026 11:28
@1egoman 1egoman force-pushed the frame-processor-on-audio-stream branch from 564b2c7 to 8d3f4fe Compare May 27, 2026 17:02
1egoman added 2 commits May 27, 2026 13:26
These tests exercise all the frame processor track reparenting under
room / etc paths.
num_channels: int = 1,
frame_size_ms: int | None = None,
noise_cancellation: Optional[NoiseCancellationOptions | FrameProcessor[AudioFrame]] = None,
noise_cancellation_leave_open: bool = False,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
noise_cancellation_leave_open: bool = False,

Can we move that inside NoiseCancellationOptions?

Copy link
Copy Markdown
Contributor Author

@1egoman 1egoman May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, no - this is important to the FrameProcessor[AudioFrame] side of that noise_cancellation union. Open to putting it somewhere else but it needs to be settable in the FrameProcessor path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, not sure if it's a good idea, but could it be a field on the FrameProcessor interface instead?

Then we could add it to NoiseCancellationOptions and new FrameProcessors would be able to set it on the processor itself

Copy link
Copy Markdown
Contributor Author

@1egoman 1egoman May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a setting that a frame processor would always want to have set or not have set, so I'm not sure that would really make sense either.

For context, the reason this is here is so the agents sdk can reuse a single FrameProcessor across multiple underlying tracks. Previously, this wasn't a problem in the way this used to work, because the agents sdk had the responsibility of closing the FrameProcessor, so it could easily do it at room disconnection time. But in order to support the ability to use FrameProcessors directly on an AudioStream, calling close needs to be pushed down deeper than the agents sdk layer. This flag allows the caller to explictly tell AudioStream that they will manage cleaning up the FrameProcessor so that both use cases can continue to work.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this flag is not really configuring the noise suppression behavior, but how AudioStream deals with its own noise suppression, maybe the naming of noise_cancellation_leave_open is a bit confusing ?

how about close_noise_cancellation_on_stream_close or manage_noise_cancellation_processor ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants