[babysor/MockingBird]声码器一般训练多久?看什么参数?

2024-06-27 238 views
3

声码器一般训练多久?看什么参数? Epoch: 95 Steps : 3670, Gen Loss Total : 25.942, Mel-Spec. Error : 0.381, s/b : 1.108 Steps : 3675, Gen Loss Total : 27.207, Mel-Spec. Error : 0.382, s/b : 1.116 Steps : 3680, Gen Loss Total : 26.531, Mel-Spec. Error : 0.383, s/b : 1.109 Steps : 3685, Gen Loss Total : 27.694, Mel-Spec. Error : 0.389, s/b : 1.118 Steps : 3690, Gen Loss Total : 25.060, Mel-Spec. Error : 0.368, s/b : 1.112 Steps : 3695, Gen Loss Total : 25.569, Mel-Spec. Error : 0.383, s/b : 1.111 Steps : 3700, Gen Loss Total : 25.829, Mel-Spec. Error : 0.376, s/b : 1.102

回答

4

训练了几个小时,拿去测试,结果电音比原来的更强烈,训练的是hifigan,看来还是不训练为好

2

数据集质量一般的话不建议训练

4

训练这个估计要有专业的数据集处理流程了

2

image 训练hifigan声码器遇到了问题 spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[str(y.device)], center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=False) # spec = torch.sqrt(spec.pow(2).sum(-1)+(1e-9)) 后spec的输出已经为[nan][nan][nan]了,导致后续在计算loss的时候就全部为nan了