[PaddlePaddle/PaddleOCR]对一篇中文文献OCR，能检测到文字，画出了框，但是整个页面却基本没有OCR出文字

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment：Windows
版本号/Version：Paddle：2.6.0
PaddleOCR：
问题相关组件/Related components：CUDA 11.7 CUDNN8.4
运行指令/Command Code：
python ppstructure/predict_system.py \ --image_dir=./doc/doc_jc/sarcopenia.pdf \ --det_model_dir=ppstructure/inference/ch_PP-OCRv4_det_server_infer \ --rec_model_dir=ppstructure/inference/ch_PP-OCRv4_rec_server_infer \ --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt \ --table_model_dir=ppstructure/inference/ch_ppstructure_mobile_v2.0_SLANet_infer \ --table_char_dict_path=ppocr/utils/dict/table_structure_dict.txt \ --layout_model_dir=ppstructure/inference/picodet_lcnet_x1_0_fgd_layout_cdla_infer \ --layout_dict_path=ppocr/utils/dict/layout_dict/layout_cdla_dict.txt \ --vis_font_path=doc/fonts/simfang.ttf \ --table=False \ --recovery=False \ --output=saves/
完整报错/Complete Error Message：

我们提供了AceIssueSolver来帮助你解答问题，你是否想要它来解答(请填写yes/no)?/We provide AceIssueSolver to solve issues, do you want it? (Please write yes/no): yes

请尽量不要包含图片在问题中/Please try to not include the image in the issue.

Thunderltx

如图所示 show_1

Thunderltx

把det_db_thresh设置为0.00005这么低，依旧没有改变

Thunderltx

方便提供一下原图吗？我们自测下

tink2123

方便提供一下原图吗？我们自测下

好的，PDF文件已经上传 sarcopenia.pdf

Thunderltx

hello 这个问题解决了吗

yanchujian

hello 这个问题解决了吗

目前还没有，我也不知道是为什么。

Thunderltx

方便提供一下原图吗？我们自测下

请问，后续结果如何？

Thunderltx

抱歉，没有复现出您的问题测试命令：

python ppstructure/predict_system.py --image_dir=sarcopenia.pdf \
    --det_model_dir=inference/ch_PP-OCRv4_det_server_infer \
    --rec_model_dir=inference/ch_PP-OCRv4_rec_server_infer \
    --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt \
    --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_cdla_infer \
    --layout_dict_path=ppocr/utils/dict/layout_dict/layout_cdla_dict.txt \
    --vis_font_path=doc/fonts/simfang.ttf \
    --table=False \
    --recovery=False \
    --output=saves/

预测图片：

或许您可以尝试设置recovery=True，可以一定程度上改善可视化效果

tink2123

抱歉，没有复现出您的问题测试命令：

python ppstructure/predict_system.py --image_dir=sarcopenia.pdf \
    --det_model_dir=inference/ch_PP-OCRv4_det_server_infer \
    --rec_model_dir=inference/ch_PP-OCRv4_rec_server_infer \
    --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt \
    --layout_model_dir=inference/picodet_lcnet_x1_0_fgd_layout_cdla_infer \
    --layout_dict_path=ppocr/utils/dict/layout_dict/layout_cdla_dict.txt \
    --vis_font_path=doc/fonts/simfang.ttf \
    --table=False \
    --recovery=False \
    --output=saves/

预测图片：

或许您可以尝试设置recovery=True，可以一定程度上改善可视化效果

"det_limit_side_len"这个参数您设置的是多少？

Thunderltx

"det_limit_side_len"这个参数您设置的是多少？

2000

tink2123

"det_limit_side_len"这个参数您设置的是多少？

2000

OK，估计是这个参数影响的结果，我这里还是默认的960

Thunderltx

"det_limit_side_len"这个参数您设置的是多少？

2000

OK，估计是这个参数影响的结果，我这里还是默认的960

好的～由于PDF分辨率比较大，可以适当放大长边，先将此问题关闭了。

tink2123

[PaddlePaddle/PaddleOCR]对一篇中文文献OCR，能检测到文字，画出了框，但是整个页面却基本没有OCR出文字

回答