在linux环境下对pdf格式文件脱敏后结果异常

您好:
我从windows环境中打的jar包, 放在linux环境 和 windows 环境 对同样的内容进行测试, 结果在linux环境下测试结果异常,windows环境测试结果正常, 麻烦帮忙查看下, 谢谢!

其中OCR recognize方法代码 如下:
public final RecognizedImage recognize(InputStream imageStream)
{
ArrayList lines = new ArrayList<>();
return new RecognizedImage(lines.toArray(new TextLine[0]));
}

测试文档:
OCR_sample.pdf (175.5 KB)

测试内容:对’1234 5678’进行屏蔽

linux环境下测试结果如下:文字部分敏感内容屏蔽掉(没有使用颜色),card被错误屏蔽
image.png (68.4 KB)

测试内容:对 12进行屏蔽,整个文档内容消失,剩下一个屏蔽框, 结果如图显示:
image.png (6.6 KB)

下面是不同系统的环境:
linux机器环境:

java -version
java version “1.8.0_281”
Java™ SE Runtime Environment (build 1.8.0_281-b09)
Java HotSpot™ 64-Bit Server VM (build 25.281-b09, mixed mode)

uname -a
Linux 8eb05a372686 5.4.0-152-generic #169-Ubuntu SMP Tue Jun 6 22:25:45 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

cat /etc/issue
Ubuntu 20.04.3 LTS \n \l

cat /proc/version
Linux version 5.4.0-152-generic (buildd@bos02-arm64-053) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)) #169-Ubuntu SMP Tue Jun 6 22:25:45 UTC 2023

启动日志
:: Spring Boot :: (v2.4.3)

2023-10-20 18:07:55.015 INFO 1473191 — [ main] com.skyguard.redaction.DemoApplication : Starting DemoApplication v2.0 using Java 1.8.0_281 on 8eb05a372686 with PID 1473191 (/opt/skyguard/redaction/java-engine/redaction-2.0.jar started by root in /opt/skyguard/redaction/java-engine)
2023-10-20 18:07:55.024 INFO 1473191 — [ main] com.skyguard.redaction.DemoApplication : No active profile set, falling back to default profiles: default
2023-10-20 18:07:58.097 INFO 1473191 — [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
2023-10-20 18:07:58.134 INFO 1473191 — [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2023-10-20 18:07:58.136 INFO 1473191 — [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.43]
2023-10-20 18:07:58.272 INFO 1473191 — [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2023-10-20 18:07:58.272 INFO 1473191 — [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 3014 ms
2023-10-20 18:08:06.626 INFO 1473191 — [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Initializing ExecutorService ‘applicationTaskExecutor’
2023-10-20 18:08:07.257 INFO 1473191 — [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ‘’
2023-10-20 18:08:07.285 INFO 1473191 — [ main] com.skyguard.redaction.DemoApplication : Started DemoApplication in 13.637 seconds (JVM running for 15.064)

windows 机器环境:
maven 3.3.3
java版本 1.8.0_45
windows 7 64位操作系统

启动log
:: Spring Boot :: (v2.4.3)

2023-10-20 18:05:53.782 INFO 24744 — [ main] com.skyguard.redaction.DemoApplication : Starting DemoApplication using Java 1.8.0_45 on zhaojun-256G4 with PID 24744 (D:\workspace\java-engine\target\classes started by zhaojun in D:\workspace\java-engine)
2023-10-20 18:05:53.790 INFO 24744 — [ main] com.skyguard.redaction.DemoApplication : No active profile set, falling back to default profiles: default
2023-10-20 18:06:03.758 INFO 24744 — [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
2023-10-20 18:06:03.807 INFO 24744 — [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2023-10-20 18:06:03.810 INFO 24744 — [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.43]
2023-10-20 18:06:05.025 INFO 24744 — [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2023-10-20 18:06:05.025 INFO 24744 — [ main] w.s.c.ServletWebServerApplicationContext : Root WebApplicationContext: initialization completed in 11084 ms
2023-10-20 18:06:17.090 INFO 24744 — [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Initializing ExecutorService ‘applicationTaskExecutor’
2023-10-20 18:06:17.748 INFO 24744 — [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ‘’
2023-10-20 18:06:17.787 INFO 24744 — [ main] com.skyguard.redaction.DemoApplication : Started DemoApplication in 25.826 seconds (JVM running for 31.51)

使用的java jar包 的版本:
com.groupdocs
groupdocs-redaction
23.9

@weilin
我们已在内部问题跟踪系统中打开以下新票证,并将根据 Free Support Policies 中提到的条款提供修复。

 问题 ID:REDACTIONJAVA-181

如果您需要优先支持,以及直接联系我们的付费支持管理团队,您可以获取 Paid Support Services

@weilin

根据公开文档,GroupDocs.Redaction 本身不包含 OCR 作为其分布式的一部分,并且您必须集成任何付费的 或通过重写 RecognizedImage recognize(InputStream imageStream) 方法来免费 OCR 解决方案。 您提供的实现返回一个空行数组,就好像没有识别任何文本,因此无法进行替换。 例如,您可以查看 Aspose.OCR for Cloud 实现,其中提供了示例

至于第二种情况,“测试内容:屏蔽12,整个文档内容消失,只留下一个屏蔽框”。 - 您能否分享更多详细信息(编辑类型、模式和设置等)?