[babysor/MockingBird]如何解决colab炼丹每次都要上传数据集预处理数据集还爆磁盘的蛋疼问题

2024-07-15 963 views
2

许多同鞋因为家里设备不佳训练模型效果不好,不得不去世界最大乞丐炼丹聚集地colab上训练。但是对于无法扩容google drive和升级colab的同鞋来说,上传数据集真的如同地狱一般,网速又慢空间又不够,而且每次重置都要上传,预处理令人头疼。我耗时9天终于解决了这个问题,现在给各位同学分享我的解决方案。 首先要去kaggle这个网站上面注册一个账号,然后获取token 我已经把预处理了的数据集(用的aidatatang_200zh)上传在上面了,但是下载数据集需要token,token需要注册账号,具体获取token的方法请自行百度,在此不过多赘述。

然后打开colab 修改-> 笔记本设置->运行时把 None 改成 GPU 输入以下代码:

!pip install kaggle
import json
token = {"username":"你的账号","key":"你获取到的token"}
with open('/content/kaggle.json', 'w') as file:
  json.dump(token, file)
!mkdir -p ~/.kaggle
!cp /content/kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!kaggle config set -n path -v /content

第三行请根据之前获取到的token填写 这一步是准备好kaggle命令行

然后是下载数据集并解压

!kaggle datasets download -d bjorndido/sv2ttspart1
!unzip "/content/datasets/bjorndido/sv2ttspart1/sv2ttspart1.zip" -d "/content/aidatatang_200zh"
!rm -rf /content/datasets
!kaggle datasets download -d bjorndido/sv2ttspart2
!unzip "/content/datasets/bjorndido/sv2ttspart2/sv2ttspart2.zip" -d "/content/aidatatang_200zh"
!rm -rf /content/datasets

为了怕某些童鞋用和我一样的免费版,如果从下载未处理的数据集开始磁盘要爆炸,所以我把预处理过的数据集上传到kaggle了 而且解压后会自己删掉zip,非常滴银杏 实测下载速度能达到200MB/s,网慢点也有50MB/s,非常滴快 这一步要不了10分钟就可以弄好了

!git clone https://github.com/babysor/MockingBird.git
!pip install -r /content/MockingBird/requirements.txt
git仓库,下载依赖,这一步不多说

然后改hparams

%%writefile /content/MockingBird/synthesizer/hparams.py
import ast
import pprint
import json

class HParams(object):
    def __init__(self, **kwargs): self.__dict__.update(kwargs)
    def __setitem__(self, key, value): setattr(self, key, value)
    def __getitem__(self, key): return getattr(self, key)
    def __repr__(self): return pprint.pformat(self.__dict__)

    def parse(self, string):
        # Overrides hparams from a comma-separated string of name=value pairs
        if len(string) > 0:
            overrides = [s.split("=") for s in string.split(",")]
            keys, values = zip(*overrides)
            keys = list(map(str.strip, keys))
            values = list(map(str.strip, values))
            for k in keys:
                self.__dict__[k] = ast.literal_eval(values[keys.index(k)])
        return self

    def loadJson(self, dict):
        print("\Loading the json with %s\n", dict)
        for k in dict.keys():
            if k not in ["tts_schedule", "tts_finetune_layers"]: 
                self.__dict__[k] = dict[k]
        return self

    def dumpJson(self, fp):
        print("\Saving the json with %s\n", fp)
        with fp.open("w", encoding="utf-8") as f:
            json.dump(self.__dict__, f)
        return self

hparams = HParams(
        ### Signal Processing (used in both synthesizer and vocoder)
        sample_rate = 16000,
        n_fft = 800,
        num_mels = 80,
        hop_size = 200,                             # Tacotron uses 12.5 ms frame shift (set to sample_rate * 0.0125)
        win_size = 800,                             # Tacotron uses 50 ms frame length (set to sample_rate * 0.050)
        fmin = 55,
        min_level_db = -100,
        ref_level_db = 20,
        max_abs_value = 4.,                         # Gradient explodes if too big, premature convergence if too small.
        preemphasis = 0.97,                         # Filter coefficient to use if preemphasize is True
        preemphasize = True,

        ### Tacotron Text-to-Speech (TTS)
        tts_embed_dims = 512,                       # Embedding dimension for the graphemes/phoneme inputs
        tts_encoder_dims = 256,
        tts_decoder_dims = 128,
        tts_postnet_dims = 512,
        tts_encoder_K = 5,
        tts_lstm_dims = 1024,
        tts_postnet_K = 5,
        tts_num_highways = 4,
        tts_dropout = 0.5,
        tts_cleaner_names = ["basic_cleaners"],
        tts_stop_threshold = -3.4,                  # Value below which audio generation ends.
                                                    # For example, for a range of [-4, 4], this
                                                    # will terminate the sequence at the first
                                                    # frame that has all values < -3.4

        ### Tacotron Training
        tts_schedule = [(2,  1e-3,  10_000,  32),   # Progressive training schedule
                    (2,  5e-4,  15_000,  32),   # (r, lr, step, batch_size)
                    (2,  2e-4,  20_000,  32),   # (r, lr, step, batch_size)
                    (2,  1e-4,  30_000,  32),   #
                    (2,  5e-5,  40_000,  32),   #
                    (2,  1e-5,  60_000,  32),   #
                    (2,  5e-6, 160_000,  32),   # r = reduction factor (# of mel frames
                    (2,  3e-6, 320_000,  32),   #     synthesized for each decoder iteration)
                    (2,  1e-6, 640_000,  32)],  # lr = learning rate

        tts_clip_grad_norm = 1.0,                   # clips the gradient norm to prevent explosion - set to None if not needed
        tts_eval_interval = 500,                    # Number of steps between model evaluation (sample generation)
                                                    # Set to -1 to generate after completing epoch, or 0 to disable
        tts_eval_num_samples = 1,                   # Makes this number of samples

        ## For finetune usage, if set, only selected layers will be trained, available: encoder,encoder_proj,gst,decoder,postnet,post_proj
        tts_finetune_layers = [], 

        ### Data Preprocessing
        max_mel_frames = 900,
        rescale = True,
        rescaling_max = 0.9,
        synthesis_batch_size = 16,                  # For vocoder preprocessing and inference.

        ### Mel Visualization and Griffin-Lim
        signal_normalization = True,
        power = 1.5,
        griffin_lim_iters = 60,

        ### Audio processing options
        fmax = 7600,                                # Should not exceed (sample_rate // 2)
        allow_clipping_in_normalization = True,     # Used when signal_normalization = True
        clip_mels_length = True,                    # If true, discards samples exceeding max_mel_frames
        use_lws = False,                            # "Fast spectrogram phase recovery using local weighted sums"
        symmetric_mels = True,                      # Sets mel range to [-max_abs_value, max_abs_value] if True,
                                                    #               and [0, max_abs_value] if False
        trim_silence = True,                        # Use with sample_rate of 16000 for best results

        ### SV2TTS
        speaker_embedding_size = 256,               # Dimension for the speaker embedding
        silence_min_duration_split = 0.4,           # Duration in seconds of a silence for an utterance to be split
        utterance_min_duration = 1.6,               # Duration in seconds below which utterances are discarded
        use_gst = True,                             # Whether to use global style token    
        use_ser_for_gst = True,                     # Whether to use speaker embedding referenced for global style token  
        )

我用的batch size是32,同鞋们可以根据情况自行更改

开始训练

%cd "/content/MockingBird/"
!python synthesizer_train.py train "/content/aidatatang_200zh" -m /content/drive/MyDrive/

注意,开始这个步骤前请先挂载谷歌云盘,不想挂载的就把-m后面的路径改了 我选择drive是因为下次训练又能继续上传训练的进度继续训练 然后就是欢快的白嫖时间了 氪金的同鞋可以运行!nvidia-smi查看显卡信息,白嫖版的都是tesla t4 16g显存 实测9k步的时候开始出现注意力曲线,loss值为0.45 注意!白嫖版的用户长时间不碰电脑colab会自动断开 再次打开环境会还原成最初的样子 这个时候选择drive保存的优势就体现出来了:不用担心模型重置被删掉

第一次写,写得不好请见谅 希望这篇教程可以帮助到你们

回答

1

很有帮助 赞!

2

colab 免费版 浏览器时常断 有点烦。。。

9

训练效果是不是比用3060好?另:你用的batch size是多少?

2

我现在用的40在跑 显存大就是好啊 哈哈

4

我现在用的40在跑 显存大就是好啊 哈哈 极限一点可以用60的,我用的都是50

6

加精一下

6

wow,我只是共享一下我的解决方法,没想到还能被加精。希望我的方法能帮助更多人

6

colab老是爆显存,请问怎么设置PyTorch显存分配好一点

CUDA out of memory. Tried to allocate 3.51 GiB (GPU 0; 14.75 GiB total capacity; 10.81 GiB already allocated; 1.65 GiB free; 11.99 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

1

是不是patches太大了?也有可能运气不好分配到了显存少的显卡。我有一段时间没登COLAB了,不知道有没有调整显卡分配。

2

谢谢 调整patches解决了,但是colab老是断连