Hello,
I’m experiencing significant thread contention when reading metadata using GroupDocs.Metadata for Java in a multithreaded processing pipeline.
Environment
- Java: 21
- GroupDocs.Metadata for Java
- Processing files concurrently using multiple worker threads
Problem description
When multiple threads create Metadata objects and read metadata from different files concurrently, several threads become BLOCKED waiting for a monitor inside GroupDocs internal classes. This effectively serializes metadata processing and prevents efficient parallelization.
In my code I create the metadata reader like this:
try (Metadata metadata = new Metadata(is)) {
// metadata processing
}
However, during execution I observe many threads blocked on the same internal monitor:
"category-classifier-worker-0" #152 daemon prio=5 os_prio=0 cpu=262771.68ms elapsed=2586.51s tid=0x00007fa21d026c70 nid=198 waiting for monitor entry
java.lang.Thread.State: BLOCKED (on object monitor)
at com.groupdocs.metadata.internal.c.a.i.X.a(Unknown Source)
- waiting to lock <0x0000000708e58330> (a java.lang.Object)
at com.groupdocs.metadata.internal.c.a.i.T.a(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.T.i(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.T.getFileFormat(Unknown Source)
at com.groupdocs.metadata.core.nD.a(Unknown Source)
at com.groupdocs.metadata.core.eF.b(Unknown Source)
at com.groupdocs.metadata.core.eh.blm(Unknown Source)
at com.groupdocs.metadata.core.ip.a(Unknown Source)
at com.groupdocs.metadata.core.ip.<init>(Unknown Source)
at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
at com.foo.pipeline.classification.GroupDocsMetadataCategoryReader.accept(GroupDocsMetadataCategoryReader.java:92)
"category-classifier-worker-1" #153 daemon prio=5 os_prio=0 cpu=266452.47ms elapsed=2586.51s tid=0x00007fa21d028520 nid=199 waiting for monitor entry
java.lang.Thread.State: BLOCKED (on object monitor)
at com.groupdocs.metadata.internal.c.a.i.X.a(Unknown Source)
- waiting to lock <0x0000000708e58330> (a java.lang.Object)
at com.groupdocs.metadata.internal.c.a.i.T.a(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.T.i(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.T.getFileFormat(Unknown Source)
at com.groupdocs.metadata.core.cW.a(Unknown Source)
at com.groupdocs.metadata.core.eF.b(Unknown Source)
at com.groupdocs.metadata.core.eh.blm(Unknown Source)
at com.groupdocs.metadata.core.ip.a(Unknown Source)
at com.groupdocs.metadata.core.ip.<init>(Unknown Source)
at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
at com.foo.pipeline.classification.GroupDocsMetadataCategoryReader.accept(GroupDocsMetadataCategoryReader.java:92)
"category-classifier-worker-2" #154 daemon prio=5 os_prio=0 cpu=263077.62ms elapsed=2586.51s tid=0x00007fa21d029ab0 nid=200 runnable
java.lang.Thread.State: RUNNABLE
at com.groupdocs.metadata.internal.c.a.i.internal.lg.h.j(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.internal.lg.p.b(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.internal.lg.p.a(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.internal.lg.p.a(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.internal.lg.p.d(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.internal.kl.a.<init>(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.internal.fW.c.a(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.internal.jb.f.b(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.X.a(Unknown Source)
- locked <0x00000007afda9ca0> (a java.lang.Object)
- locked <0x00000007afda9ca0> (a java.lang.Object)
- locked <0x0000000708e58330> (a java.lang.Object)
at com.groupdocs.metadata.internal.c.a.i.T.a(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.T.i(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.T.getFileFormat(Unknown Source)
at com.groupdocs.metadata.core.bv.a(Unknown Source)
at com.groupdocs.metadata.core.eF.b(Unknown Source)
at com.groupdocs.metadata.core.eh.blm(Unknown Source)
at com.groupdocs.metadata.core.ip.a(Unknown Source)
at com.groupdocs.metadata.core.ip.<init>(Unknown Source)
at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
at com.foo.pipeline.classification.GroupDocsMetadataCategoryReader.accept(GroupDocsMetadataCategoryReader.java:92)
"metadata-extractor-worker-0" #156 daemon prio=5 os_prio=0 cpu=122496.34ms elapsed=2586.51s tid=0x00007fa21d02bdb0 nid=202 waiting for monitor entry
java.lang.Thread.State: BLOCKED (on object monitor)
at com.groupdocs.metadata.internal.c.a.i.X.a(Unknown Source)
- waiting to lock <0x0000000708e58330> (a java.lang.Object)
at com.groupdocs.metadata.internal.c.a.i.T.a(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.T.i(Unknown Source)
at com.groupdocs.metadata.internal.c.a.i.T.getFileFormat(Unknown Source)
at com.groupdocs.metadata.core.dc.a(Unknown Source)
at com.groupdocs.metadata.core.eF.b(Unknown Source)
at com.groupdocs.metadata.core.eh.blm(Unknown Source)
at com.groupdocs.metadata.core.ip.a(Unknown Source)
at com.groupdocs.metadata.core.ip.<init>(Unknown Source)
at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
at com.foo.pipeline.metadata.GroupDocsMetadataExtractor.extract(GroupDocsMetadataExtractor.java:156)
Observations
- Many threads are blocked waiting for the same internal monitor (
<0x0000000708e58330>). - The blocking happens during
getFileFormat()while constructingMetadata. - This effectively serializes metadata reading across threads.
Questions
- Is GroupDocs.Metadata thread-safe when multiple
Metadatainstances are created concurrently? - Is there some global synchronization or shared cache inside the library that could cause this contention?
- Is this expected behavior (e.g., during file format detection)?
- Are there recommended practices to avoid this contention when processing files in parallel?
- Would using a different initialization pattern or configuration help?
My goal is to process many files concurrently, but currently the internal synchronization prevents scaling with multiple threads.
Any clarification or recommendations would be greatly appreciated.
Thank you.