Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discord] Audio Flow Diagrams #694

Open
AbdurrehmanSubhani opened this issue Nov 27, 2024 · 0 comments
Open

[Discord] Audio Flow Diagrams #694

AbdurrehmanSubhani opened this issue Nov 27, 2024 · 0 comments
Assignees

Comments

@AbdurrehmanSubhani
Copy link
Contributor

AbdurrehmanSubhani commented Nov 27, 2024

Maintaining the Discord Audio flow for both the client side and server side here:

Server Side

Connecting to Voice channels on discord

  • Connecting to Voice channels
  • Receiving the incoming voice streams for Discord voice channels
sequenceDiagram
    box rgba(128, 128, 128, 0.1) Discord.js Components
        participant VR as Voice Receiver<br>(@discordjs/voice)
        participant AP as Audio Player<br>(@discordjs/voice)
        participant VC as Voice Channel<br>(discord.js)
    end
    participant DC as Our Discord Client
    participant WS as Our WebSocket Server

    Note over DC,WS: Connection Establishment
    DC->>VC: joinVoiceChannel()
    VC-->>DC: connection established
    DC->>VR: Create Voice Receiver

    par Input Flow (Discord → WebSocket)
        Note over VR,WS: Voice Detection & Input Stream
        VR->>VR: speaking.on('start')
        VR->>DC: User started speaking
        
        rect rgba(128, 128, 128, 0.1)
            Note over DC,WS: createListeningStream() Flow
            DC->>DC: Generate streamId
            DC->>VR: Subscribe to user's audio
            Note over VR,DC: Configure EndBehavior:<br>AfterSilence: 1000ms
            DC->>WS: Emit 'voicestart' event
            
            loop Until voice ends
                VR->>DC: opusStream data
                DC->>WS: voicedata event (zbencoded)
            end
        end
        
        VR->>DC: opusStream end
        DC->>WS: voiceend event
    and Output Flow (WebSocket → Discord)
        Note over WS,VC: Playback Stream
        WS->>DC: playVoiceStart {streamId}
        DC->>AP: Create Audio Player
        
        loop Until playback ends
            WS->>DC: playVoiceData (zbencoded)
            DC->>AP: Write to inputStream
            AP->>VC: Audio playback
        end
        
        WS->>DC: playVoiceEnd
        AP->>DC: State: idle
        DC->>WS: voiceidle event
    end

Image

Client Side

Receive User's audio stream and Sending Audio streams to Discord

sequenceDiagram
    box rgba(128, 128, 128, 0.1) Client Components
        participant AG as Agent
        participant DI as DiscordInput
        participant DO as DiscordOutput
        participant AM as AudioMerger
        participant TV as TranscribedVoiceInput
    end
    participant WS as WebSocket

    Note over AG,WS: Bidirectional Voice Flow

    rect rgba(128, 128, 128, 0.1)
        Note over AG,WS: Input Flow (From Agent → To WebSocket Server)
        AG->>DI: pushStream(audioStream)
        DI->>DI: Generate streamId
        DI->>WS: playVoiceStart {streamId}
        
        loop Until stream ends
            AG->>DI: Audio chunks
            DI->>DI: Package with streamId
            DI->>WS: playVoiceData {zbencoded}
        end
        
        DI->>WS: playVoiceEnd {streamId}
    end

    rect rgba(128, 128, 128, 0.1)
        Note over WS,AG: Output Flow (From WebSocket Server → To Agent)
        WS->>DO: voicestart {userId, streamId}
        DO->>DO: Check existing userStream
        alt No existing userStream
            DO->>AM: Create new AudioMerger
            DO->>TV: Create TranscribedVoiceInput
            DO->>DO: Setup opus decoder stream
        end

        loop Until voice ends
            WS->>DO: voicedata {zbencoded}
            DO->>DO: Get stream by streamId
            DO->>AM: Write decoded audio
            AM->>TV: Process merged audio
            TV-->>AG: Emit transcription
        end

        WS->>DO: voiceend {streamId}
        DO->>DO: Close writer
        DO->>DO: Cleanup stream
    end

    rect rgba(128, 128, 128, 0.1)
        Note over AG,WS: Cleanup Flows
        WS-->>DI: voiceidle {streamId}
        DI->>DI: Cancel stream
        DI->>DI: Remove from streamSpecs

        AM->>DO: Timeout after 2000ms
        DO->>DO: Remove userStream
        DO->>DO: Cleanup resources
    end

Image

@AbdurrehmanSubhani AbdurrehmanSubhani self-assigned this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant