Zero-shot Cross-lingual Voice Cloning

The reference voice and the generated voice can be in any languages outside the massive-speaker multi-lingual dataset. We use “U” to denote the unseen languages in the following examples.

Reference - English
0:00
Generated – Mixed Lingual (U)
0:00
Generated - Japanese
0:00
Generated - Spanish (U)
0:00
Generated - German (U)
0:00
Generated - Russian (U)
0:00
Reference - English
0:00
Generated - Japanese
0:00
Generated - Spanish (U)
0:00
Generated - German (U)
0:00
Generated - French (U)
0:00
Generated - Chinese
0:00
Reference - Dutch (U)
0:00
Generated - Chinese
0:00
Generated - Chinese
0:00
Reference - Spanish (U)
0:00
Generated - Japanese
0:00
Generated - Japanese
0:00
Reference - French (U)
0:00
Generated - Russian (U)
0:00
Generated - Russian (U)
0:00
Reference - Norwegian (U)
0:00
Generated - Spanish (U)
0:00
Generated - Spanish (U)
0:00