Abstract: Speech-driven facial animation is a challenging problem where each input audio can have multiple plausible facial outputs, leading to overly smooth results. Although the two-stage framework ...