PairwiseTurnGPT: a multi-stream turn prediction model for spoken dialogue

Sean Leishman, Peter Bell, Sarenne Wallbridge

August 2024

PDF

Abstract

Spoken conversation is characterised by rapid turn transitions and frequent speaker overlaps. However, existing models of turn-taking treat dialogue as a series of incremental turns. We propose PairwiseTurnGPT, a language model that captures the temporal dynamics of lexical content by modelling dialogue as two aligned speaker streams. PairwiseTurnGPT provides a much more nuanced understanding of how lexical content contributes to predicting turn-taking behaviour in speech. By training the model with data configurations containing different turn-taking behaviours, we demonstrate the relative contributions of partial, complete, and backchannel overlaps for accurately predicting the variety of turn ends that occur in spoken dialogue. We also show that PairwiseTurnGPT improves on serialised models of dialogue for predicting turn ends and the more difficult task of predicting when a turn will start.

Type

Conference paper

Publication

In SemDial 2024; Rovereto, Italy

“In SemDial 2024; Rovereto, Italy”

Source Themes