InferGrad: Improving Diffusion Models For Vocoder By Considering Inference In Training

Authors: Zehua Chen, Xu Tan, Ke Wang, Shifeng Pan, Danilo Mandic, Lei He, Sheng Zhao (Submitted to ICASSP 2022)
Note: Audio samples are generated with the searched optimal schedule of WaveGrad instead of InferGrad, in order to maximize the baseline model performance. Three audio samples LJ001-0001, LJ001-0002, LJ001-0003 are exhibited below.

6-step generation

“Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition”
Ground Truth: WaveGrad Baseline: InferGrad General Model: InferGrad Specified Model:
“in being comparatively modern.”
Ground Truth: WaveGrad Baseline: InferGrad General Model: InferGrad Specified Model:
“For although the Chinese took impressions from wood blocks engraved in relief for centuries before the woodcutters of the Netherlands, by a similar process”
Ground Truth: WaveGrad Baseline: InferGrad General Model: InferGrad Specified Model:

3-step generation

“Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition”
Ground Truth: WaveGrad Baseline: InferGrad General Model: InferGrad Specified Model:
“in being comparatively modern.”
Ground Truth: WaveGrad Baseline: InferGrad General Model: InferGrad Specified Model:
“For although the Chinese took impressions from wood blocks engraved in relief for centuries before the woodcutters of the Netherlands, by a similar process”
Ground Truth: WaveGrad Baseline: InferGrad General Model: InferGrad Specified Model:

2-step generation

“Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition”
Ground Truth: WaveGrad Baseline: InferGrad General Model: InferGrad Specified Model:
“in being comparatively modern.”
Ground Truth: WaveGrad Baseline: InferGrad General Model: InferGrad Specified Model:
“For although the Chinese took impressions from wood blocks engraved in relief for centuries before the woodcutters of the Netherlands, by a similar process”
Ground Truth: WaveGrad Baseline: InferGrad General Model: InferGrad Specified Model:

Ablation study for InferGrad in 2-step generation

“Printing, in the only sense with which we are at present concerned, differs from most if not from all the arts and crafts represented in the Exhibition”
InferGrad Without Phase Loss: InferGrad Specified Model:
“in being comparatively modern.”
InferGrad Without Phase Loss: InferGrad Specified Model:
“For although the Chinese took impressions from wood blocks engraved in relief for centuries before the woodcutters of the Netherlands, by a similar process”
InferGrad Without Phase Loss: InferGrad Specified Model: