[PaddlePaddle/PaddleOCR]训练ch_PP-OCRv4_rec_distill.yml，各种报错：1. KeyError: 'NRTRLabelDecode'；2. KeyError: 'valid_ratio'；3. The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [128, 240, 256]

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment：docker ubuntu 20.04，是官方提供的镜像
版本号/Version：Paddle：2.5.1， PaddleOCR：release/2.7 或者 dygraph
问题相关组件/Related components：train.py
运行指令/Command Code：python tools/train.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_distill.yml
完整报错/Complete Error Message：

Traceback (most recent call last): File "/workspace3/xlg/paddle-ocr/tools/train.py", line 227, in main(config, device, logger, vdl_writer) File "/workspace3/xlg/paddle-ocr/tools/train.py", line 135, in main model = build_model(config['Architecture']) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/architectures/init.py", line 34, in build_model arch = getattr(mod, name)(config) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/architectures/distillation_model.py", line 47, in init model = BaseModel(model_config) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/architectures/base_model.py", line 76, in init self.head = build_head(config["Head"]) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/heads/init.py", line 71, in build_head module_class = eval(module_name)(**config) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/heads/rec_multi_head.py", line 74, in init out_channels=out_channels_list['NRTRLabelDecode']) KeyError: 'NRTRLabelDecode' LAUNCH INFO 2023-08-22 13:38:33,426 Exit code -9 [2023-08-22 13:38:33,426] [ INFO] controller.py:149 - Exit code -9

应该是以下代码出了问题：
还有yml文件这里，也漏了imageshape，否则会报错 KeyError: 'valid_ratio'：
但是最后还是报错了： ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [128, 240, 256].

[2023-08-22 15:14:38,216] [ INFO] controller.py:117 - ------------------------- ERROR LOG DETAIL ------------------------- ls/train.py", line 227, in main(config, device, logger, vdl_writer) File "/workspace3/xlg/paddle-ocr/tools/train.py", line 198, in main program.train(config, train_dataloader, valid_dataloader, device, model, File "/workspace3/xlg/paddle-ocr/tools/program.py", line 301, in train preds = model(images, data=batch[1:]) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, kwargs) File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/parallel.py", line 531, in forward outputs = self._layers(*inputs, *kwargs) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(inputs, kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/architectures/distillation_model.py", line 59, in forward result_dict[model_name] = self.model_list[idx](x, data) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/architectures/base_model.py", line 100, in forward x = self.head(x, targets=data) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, *kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/heads/rec_multi_head.py", line 92, in forward ctc_encoder = self.ctc_encoder(x) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(inputs, kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/necks/rnn.py", line 261, in forward x = self.encoder(x) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/necks/rnn.py", line 208, in forward z = self.conv1(z) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(*inputs, *kwargs) File "/workspace3/xlg/paddle-ocr/ppocr/modeling/backbones/rec_svtrnet.py", line 68, in forward out = self.conv(inputs) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1254, in call return self.forward(inputs, kwargs) File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/conv.py", line 710, in forward out = F.conv._conv_nd( File "/usr/local/lib/python3.10/dist-packages/paddle/nn/functional/conv.py", line 133, in _conv_nd pre_bias = _C_ops.conv2d( ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [128, 240, 256]. [Hint: Expected in_dims.size() == 4 || in_dims.size() == 5 == true, but received in_dims.size() == 4 || in_dims.size() == 5:0 != true:1.] (at ../paddle/phi/infermeta/binary.cc:468)

xlg-go

out_channels_list['NRTRLabelDecode']) 这个我用了 out_channels_list['NRTRLabelDecode'] = out_channels_list['CTCLabelDecode'] + 3，应该是固定换算过来的。

RecResizeImg不加确实不通过，要么会报batch因为图片size不同无法拼接，我看文档v4使用了不同尺度的输入训练，这里我怀疑是没适配好。sampler里面也新增了一个多尺度的。

最后这个是svtr teacher模型似乎出了问题，正常来说svtr的ctc head是不需要额外的svtr模块的，lcnet的版本才需要，我猜测是要把ctc head的svtr部分改了。但是teacher的模型权重没有给，即便要训练起来的话，也需要自己先训一个teacher模型做监督。yml中teacher模型不更新梯度。

wangz315

官方还是花点时间改一下吧，很多人都遇到了相同的问题

dengmingD

out_channels_list['NRTRLabelDecode']) 这个我用了 out_channels_list['NRTRLabelDecode'] = out_channels_list['CTCLabelDecode'] + 3，应该是固定换算过来的。

RecResizeImg不加确实不通过，要么会报batch因为图片size不同无法拼接，我看文档v4使用了不同尺度的输入训练，这里我怀疑是没适配好。sampler里面也新增了一个多尺度的。

最后这个是svtr teacher模型似乎出了问题，正常来说svtr的ctc head是不需要额外的svtr模块的，lcnet的版本才需要，我猜测是要把ctc head的svtr部分改了。但是teacher的模型权重没有给，即便要训练起来的话，也需要自己先训一个teacher模型做监督。yml中teacher模型不更新梯度。

把那个下标[-1]改成[-2]就行了，这个都不是问题，就是那个cov2d不好搞啊。坐等官方吧

xlg-go

官方还是花点时间改一下吧，很多人都遇到了相同的问题

是的，很多人遇到问题了，坐等官方

xlg-go

ValueError: (InvalidArgument) The input of Op(Conv) should be a 4-D or 5-D Tensor. But received: input's dimension is 3, input's shape is [128, 240, 256]. [Hint: Expected in_dims.size() == 4 || in_dims.size() == 5 == true, but received in_dims.size() == 4 || in_dims.size() == 5:0 != true:1.] (at ../paddle/phi/infermeta/binary.cc:475) 同遇到这个问题，请问有人解决了吗？

crjxixixi

百度文心一言回答：根据您提供的错误信息，问题出现在使用 PaddlePaddle 的卷积层（conv layer）时，输入的张量（Tensor）维度不正确。

根据错误信息，卷积操作期望输入的维度是4或5，但实际输入的维度是3。这意味着您在构建模型时可能没有正确地设置输入的形状或输入的维度不正确。

根据错误堆栈追踪，问题可能出在 "D:\work\apps\Anaconda3\envs\paddle\lib\site-packages\paddle\nn\layer\conv.py" 文件的第710行。这表明在卷积层的前向传播方法中出现了问题。

为了解决这个问题，您可以尝试以下步骤：

检查输入张量的形状和维度。确保输入张量具有正确的形状和维度以匹配卷积层的输入要求。通常情况下，卷积层的输入维度应该是（批量大小，通道数，高度，宽度），即（N，C，H，W）。
检查模型的结构。检查您的模型结构是否正确配置了卷积层。确保卷积层的输入维度与上一层或模型的输入维度匹配。
检查数据预处理。如果输入数据在预处理阶段发生了改变，例如缩放或裁剪操作，请确保这些操作不会改变数据的维度。
更新PaddlePaddle版本。根据错误信息，您正在使用的是较旧的PaddlePaddle版本。如果可能的话，尝试更新到最新版本以修复可能存在的bug。

如果上述步骤无法解决问题，您可以提供更多的代码和模型结构信息，以便更详细地分析和解决问题。

dengmingD

是的，把错误代码翻译了一遍！！！！

xlg-go

he input of Op(Conv) should be a 4-D or 5-D Tensor

请问解决了吗

0-Maxwei-0

he input of Op(Conv) should be a 4-D or 5-D Tensor

请问解决了吗

我只修复了一部分bug，还有一些还没来得及修复！

xlg-go

不调整neck就行了，如果是SVTR还是用的reshape

tohsakask

@xlg-go were you able to fine-tune the V4 rec model? Can you share the steps?

asif-ca

@xlg-go were you able to fine-tune the V4 rec model? Can you share the steps?

not yet! no time yet

xlg-go

您好，请问v4 rec模型微调您跑通了吗？能分享一下具体步骤吗？

iPengXPro

SHH F X278XBBP@3Q}RK6M4 又碰到同样的问题咯

wakejay