[PaddlePaddle/Paddle]paddle动态图转静态图后进行推理 转换后scatter.cu.h报错

2024-03-22 996 views
3
请提出你的问题 Please ask your question

报错内容:Error: /paddle/paddle/phi/kernels/funcs/scatter.cu.h:101 Assertion index_value >= 0 && index_value < output_dims[j] failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [1] and greater or equal to 0, but received [0]
Error: /paddle/paddle/phi/kernels/funcs/scatter.cu.h:101 Assertion index_value >= 0 && index_value < output_dims[j] failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [21] and greater or equal to 0, but received [0] 报错代码部分:

paddle.scatter_nd(
        index=sparse_tensor_indices,
        updates=point_indices + 1,
        shape=output_shape) - 1

报错提醒 It should be less than [1] and greater or equal to 0, but received [0] ` 接收为0 为什么会不符条件? 将报错代码部分修改为自己构造一个形状和paddle.scatter_nd结果相同的Tensor 就不会报错

环境:paddlepaddle-gpu=2.4.1.post112 cuda=11.1 python3.8 cudnn=8.0.5

回答

6

麻烦提供一下paddle版本,或试一下最新的paddle dev版本

6

paddlepaddle-gpu=2.4.1.post112

5

@LokeZhou ,使用paddle dev版本报新的错误,输出如下:

2023-05-09 16:49:37,945 -  WARNING - No custom op iou3d_nms_cuda found, try JIT build
Compiling user custom op, it will cost a few seconds.....
2023-05-09 16:49:39,816 - INFO - using custom operator only
2023-05-09 16:49:39,821 -     INFO - iou3d_nms_cuda builded success!
2023-05-09 16:49:40,324 -  WARNING - No custom op voxelize found, try JIT build
Compiling user custom op, it will cost a few seconds.....
W0509 16:49:42.137914 11248 custom_operator.cc:1210] Operator (nms_normal_gpu) has been registered.
W0509 16:49:42.137989 11248 custom_operator.cc:1210] Operator (nms_gpu) has been registered.
W0509 16:49:42.138005 11248 custom_operator.cc:1210] Operator (boxes_overlap_bev_gpu) has been registered.
W0509 16:49:42.138546 11248 custom_operator.cc:1210] Operator (boxes_iou_bev_gpu) has been registered.
W0509 16:49:42.138568 11248 custom_operator.cc:1210] Operator (boxes_iou_bev_cpu) has been registered.
2023-05-09 16:49:42,162 - INFO - using custom operator only
2023-05-09 16:49:42,165 -     INFO - voxelize builded success!
2023-05-09 16:49:42,166 -  WARNING - No custom op pointnet2_ops found, try JIT build
Compiling user custom op, it will cost a few seconds.....
W0509 16:49:44.024470 11248 custom_operator.cc:1210] Operator (nms_normal_gpu) has been registered.
W0509 16:49:44.024618 11248 custom_operator.cc:1210] Operator (nms_gpu) has been registered.
W0509 16:49:44.024639 11248 custom_operator.cc:1210] Operator (boxes_overlap_bev_gpu) has been registered.
W0509 16:49:44.024652 11248 custom_operator.cc:1210] Operator (hard_voxelize) has been registered.
W0509 16:49:44.024664 11248 custom_operator.cc:1210] Operator (boxes_iou_bev_gpu) has been registered.
W0509 16:49:44.024675 11248 custom_operator.cc:1210] Operator (boxes_iou_bev_cpu) has been registered.
2023-05-09 16:49:44,047 - INFO - using custom operator only
2023-05-09 16:49:44,052 -     INFO - pointnet2_ops builded success!
I0509 16:49:44.054957 11248 helper.cc:56] The operator `farthest_point_sample` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.054980 11248 helper.cc:56] The operator `grouping_operation_stack` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.054986 11248 helper.cc:56] The operator `voxel_query_wrapper` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.054992 11248 helper.cc:56] The operator `grouping_operation_batch` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.054998 11248 helper.cc:56] The operator `ball_query_batch` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.055004 11248 helper.cc:56] The operator `gather_operation` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.055009 11248 helper.cc:56] The operator `nms_normal_gpu` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.055016 11248 helper.cc:56] The operator `ball_query_stack` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.055022 11248 helper.cc:56] The operator `nms_gpu` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.055027 11248 helper.cc:56] The operator `boxes_overlap_bev_gpu` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.055033 11248 helper.cc:56] The operator `hard_voxelize` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.055039 11248 helper.cc:56] The operator `boxes_iou_bev_gpu` has been registered. Therefore, we will not repeat the registration here.
I0509 16:49:44.055045 11248 helper.cc:56] The operator `boxes_iou_bev_cpu` has been registered. Therefore, we will not repeat the registration here.
--- Running analysis [ir_graph_build_pass]
I0509 16:49:48.664840 11248 executor.cc:186] Old Executor is Running.
--- Running analysis [ir_analysis_pass]
--- Running IR pass [map_op_to_another_pass]
--- Running IR pass [identity_scale_op_clean_pass]
I0509 16:49:48.869513 11248 fuse_pass_base.cc:59] ---  detected 11 subgraphs
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [delete_quant_dequant_linear_op_pass]
--- Running IR pass [delete_weight_dequant_linear_op_pass]
--- Running IR pass [constant_folding_pass]
--- Running IR pass [silu_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
I0509 16:49:50.090797 11248 fuse_pass_base.cc:59] ---  detected 14 subgraphs
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [vit_attention_fuse_pass]
--- Running IR pass [fused_multi_transformer_encoder_pass]
--- Running IR pass [fused_multi_transformer_decoder_pass]
--- Running IR pass [fused_multi_transformer_encoder_fuse_qkv_pass]
--- Running IR pass [fused_multi_transformer_decoder_fuse_qkv_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_encoder_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_encoder_fuse_qkv_pass]
--- Running IR pass [multi_devices_fused_multi_transformer_decoder_fuse_qkv_pass]
--- Running IR pass [fuse_multi_transformer_layer_pass]
--- Running IR pass [gpu_cpu_squeeze2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_reshape2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_flatten2_matmul_fuse_pass]
--- Running IR pass [gpu_cpu_map_matmul_v2_to_mul_pass]
I0509 16:49:53.852699 11248 fuse_pass_base.cc:59] ---  detected 10 subgraphs
--- Running IR pass [gpu_cpu_map_matmul_v2_to_matmul_pass]
I0509 16:49:53.861145 11248 fuse_pass_base.cc:59] ---  detected 5 subgraphs
--- Running IR pass [matmul_scale_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [gpu_cpu_map_matmul_to_mul_pass]
I0509 16:49:53.998447 11248 fuse_pass_base.cc:59] ---  detected 1 subgraphs
--- Running IR pass [fc_fuse_pass]
I0509 16:49:54.034281 11248 fuse_pass_base.cc:59] ---  detected 2 subgraphs
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
W0509 16:49:54.193147 11248 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W0509 16:49:54.193181 11248 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
W0509 16:49:54.193186 11248 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W0509 16:49:54.193192 11248 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
W0509 16:49:54.193197 11248 op_compat_sensible_pass.cc:232]  Check the Attr(axis) of Op(elementwise_add) in pass(conv_elementwise_add_fuse_pass) failed!
W0509 16:49:54.193202 11248 conv_elementwise_add_fuse_pass.cc:94] Pass in op compat failed.
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [conv2d_fusion_layout_transfer_pass]
--- Running IR pass [auto_mixed_precision_pass]
--- Running IR pass [delete_cast_op_pass]
free(): double free detected in tcache 2

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle_infer::Predictor::Predictor(paddle::AnalysisConfig const&)
1   std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
2   paddle::AnalysisPredictor::Init(std::shared_ptr<paddle::framework::Scope> const&, std::shared_ptr<paddle::framework::ProgramDesc> const&)
3   paddle::AnalysisPredictor::PrepareProgram(std::shared_ptr<paddle::framework::ProgramDesc> const&)
4   paddle::AnalysisPredictor::OptimizeInferenceProgram()
5   paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument*)
6   paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument*)
7   paddle::inference::analysis::IRPassManager::Apply(std::unique_ptr<paddle::framework::ir::Graph, std::default_delete<paddle::framework::ir::Graph> >)
8   paddle::framework::ir::Pass::Apply(paddle::framework::ir::Graph*) const
9   paddle::framework::ir::DeleteCastOpPass::ApplyImpl(paddle::framework::ir::Graph*) const
10  paddle::framework::ir::DeleteCastOpPass::ApplyCastPass(paddle::framework::ir::Graph*) const
11  paddle::framework::ir::GraphPatternDetector::operator()(paddle::framework::ir::Graph*, std::function<void (std::map<paddle::framework::ir::PDNode*, paddle::framework::ir::Node*, paddle::framework::ir::GraphPatternDetector::PDNodeCompare, std::allocator<std::pair<paddle::framework::ir::PDNode* const, paddle::framework::ir::Node*> > > const&, paddle::framework::ir::Graph*)>)
12  void std::vector<paddle::framework::ir::Node*, std::allocator<paddle::framework::ir::Node*> >::_M_realloc_insert<paddle::framework::ir::Node* const&>(__gnu_cxx::__normal_iterator<paddle::framework::ir::Node**, std::vector<paddle::framework::ir::Node*, std::allocator<paddle::framework::ir::Node*> > >, paddle::framework::ir::Node* const&)

----------------------
Error Message Summary:
----------------------
FatalError: `Process abort signal` is detected by the operating system.
  [TimeInfo: *** Aborted at 1683622194 (unix time) try "date -d @1683622194" if you are using GNU date ***]
  [SignalInfo: *** SIGABRT (@0x3e900002bf0) received by PID 11248 (TID 0x7f16fa20b080) from PID 11248 ***]

Aborted (core dumped)
2

@wayyeah 您好,可否看下:

  • 使用CPU 推理是否有问题,以排除是否是相关Op CUDA Kernel的问题
  • 若CPU也存在问题,尝试使用 paddle.static.load_inference_model API 加载预测看是否有问题,以排除是否是推理引擎Predictor的问题

我这边check了下报错栈对应的源码,当 index == 0 也应该是合法的才对,这里确实有一些奇怪的行为,辛苦先确认下上面两个实验的结果。

6

@Aurelius84 您好,感谢你的回复,因为模型是基于Paddle3DvoxelRCNN模型进行修改的,带有许多GPU操作,没有办法使用GPU推理,使用paddle.static.load_inference_model API 加载预测,报错如下:

Traceback (most recent call last):
  File "/home/aistudio/work/Paddle3D/deploy/ted/python/load.py", line 139, in <module>
    main(args)
  File "/home/aistudio/work/Paddle3D/deploy/ted/python/load.py", line 132, in main
    results = exe.run(inference_program,
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1463, in run
    six.reraise(*sys.exc_info())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/six.py", line 719, in reraise
    raise value
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1450, in run
    res = self._run_impl(program=program,
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/paddle/fluid/executor.py", line 1661, in _run_impl
    return new_exe.run(scope, list(feed.keys()), fetch_list,
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.9/site-packages/paddle/fluid/executor.py", line 631, in run
    tensors = self._new_exe.run(scope, feed_names,
ValueError: (InvalidArgument) Axis should be less than 1, but received axis is 1.
  [Hint: Expected axis < max_dim, but received axis:1 >= max_dim:1.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:53)
  [operator < elementwise_add > error]
4

@wayyeah 方便提供下你导出的模型文件,以及使用 paddle.static.load_inference_model 加载预测执行的python脚本,打包成一个tar包发下么?

4

@wayyeah 我这边能正常执行你的load.sh脚本,但执行输出的结果如下,符合预期么?

image
4

@Aurelius84 符合预期,那我这边报错原因可能是什么呢?尝试过paddlepaddle2.4.1 2.4.2 AIstudio 线上和本地环境,全部报错

8

我是用develop分支的whl包加载的你的模型,我不是很确定是不是已知的问题被修复了。 你可以使用pip install paddlepaddle-gpu==2.5.0rc 安装预发布的版本试下? 或者 参考官网文档 安装下nightly build 的whl 试下呢?

0

感谢你的帮助,我稍后尝试看看

6

@Aurelius84 ,在paddlepaddle-gpu==2.5.0rc版本使用[paddle.static.load_inference_model] API 加载预测确实不会报错了,但是我更换多个数据,输出还是全为空,感觉存在问题,并且使用推理引擎Predictor同样报错Error: ../paddle/phi/kernels/funcs/scatter.cu.h:107 Assertion index_value >= 0 && index_value < output_dims[j] failed. The index is out of bounds, please check whether the dimensions of index and input meet the requirements. It should be less than [21] and greater or equal to 0, but received [0],同时开启TensorRT加速同样报相同的错误

4

@wayyeah 有如下排查思路:

  1. 在通过paddle.seed(2023)固定随机性后,准备相同的输入,确保动态图模型eval下预测有结果输出。将此相同的输入借助paddle.load_inference_model API 预测查看是否输出相同
  2. 若输出不同,表明模型导出可能有问题。可以在动态图下借助「二分思想」来定位是哪部分代码转静导出的问题,方式如下:
class Net(Layer):
     def __init__(self, xxx):
         # ....(省略)

      @to_static                <------- 注释掉,获取动态图结果,保留就是动转静结果
      def forward(self, ....):
           out1 = self.func1(xxx)
           # return out1         <------  第 2 次二分,提前返回
           out2 = self.func2(xxx)
            # return out2        <------  第 1 次二分,提前返回
           out3 = self.func3(xxx)
           # return out13        <------  第 3 次二分,提前返回
           out4 = self.func(xxx)

           return out4

x = prepare_data()

out = net(x)
print(out)

可以借助插入return xxx 来触发动转静的提前返回,这样可以二分定位到是哪段code前后出现了第一次结果diff,再进行分析就比较容易了。

8

感谢你的帮助,我自己再排查排查

6

@Aurelius84 解决问题了,代码中部分存在numpy操作导致静态图推理出错