[2noise/ChatTTS]参考 Readme 生成的是电流声

2024-08-19 530 views
2

commit id: e58fe48d2ee99310ce2066005c5108ac86942ad4 步骤

git clone https://github.com/2noise/ChatTTS
cd ChatTTS
conda create -n chattts
conda activate chattts
pip install -r requirements.txt
python examples/cmd/run.py "chat T T S is a text to speech model designed for dialogue applications."

生成的 output_audio_0.wav如下: output_audio_0.zip

回答

7

但是,我使用以下的代码又是可以的

import ChatTTS
from IPython.display import Audio
import torch
import torchaudio

from dotenv import load_dotenv
load_dotenv()

chat = ChatTTS.Chat()
chat.load_models(compile=False) # Set to True for better performance

###################################
# Sample a speaker from Gaussian.

rand_spk = chat.sample_random_speaker()

params_infer_code = {
  'spk_emb': rand_spk, # add sampled speaker 
  'temperature': .3, # using custom temperature
  'top_P': 0.7, # top P decode
  'top_K': 20, # top K decode
}

inputs_en = """
chat T T S is a text to speech model designed for dialogue applications. 
[uv_break]it supports mixed language input [uv_break]and offers multi speaker 
capabilities with precise control over prosodic elements [laugh]like like 
[uv_break]laughter[laugh], [uv_break]pauses, [uv_break]and intonation. 
[uv_break]it delivers natural and expressive speech,[uv_break]so please
[uv_break] use the project responsibly at your own risk.[uv_break]
""".replace('\n', '') # English is still experimental.

params_refine_text = {
  'prompt': '[oral_2][laugh_0][break_4]'
} 
# audio_array_cn = chat.infer(inputs_cn, params_refine_text=params_refine_text)
audio_array_en = chat.infer(inputs_en, params_refine_text=params_refine_text)
torchaudio.save("output3.wav", torch.from_numpy(audio_array_en[0]), 24000)
7

无法复现。请提供更详细信息,如系统版本,python版本,torch版本,GPU型号,CUDA版本等。