Threads synchronization - performance degradation

Hello,

I’m experiencing significant thread contention when reading metadata using GroupDocs.Metadata for Java in a multithreaded processing pipeline.

Environment

  • Java: 21
  • GroupDocs.Metadata for Java
  • Processing files concurrently using multiple worker threads

Problem description

When multiple threads create Metadata objects and read metadata from different files concurrently, several threads become BLOCKED waiting for a monitor inside GroupDocs internal classes. This effectively serializes metadata processing and prevents efficient parallelization.

In my code I create the metadata reader like this:

try (Metadata metadata = new Metadata(is)) {
    // metadata processing
}

However, during execution I observe many threads blocked on the same internal monitor:

"category-classifier-worker-0" #152 daemon prio=5 os_prio=0 cpu=262771.68ms elapsed=2586.51s tid=0x00007fa21d026c70 nid=198 waiting for monitor entry
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.groupdocs.metadata.internal.c.a.i.X.a(Unknown Source)
	- waiting to lock <0x0000000708e58330> (a java.lang.Object)
	at com.groupdocs.metadata.internal.c.a.i.T.a(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.T.i(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.T.getFileFormat(Unknown Source)
	at com.groupdocs.metadata.core.nD.a(Unknown Source)
	at com.groupdocs.metadata.core.eF.b(Unknown Source)
	at com.groupdocs.metadata.core.eh.blm(Unknown Source)
	at com.groupdocs.metadata.core.ip.a(Unknown Source)
	at com.groupdocs.metadata.core.ip.<init>(Unknown Source)
	at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
	at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
	at com.foo.pipeline.classification.GroupDocsMetadataCategoryReader.accept(GroupDocsMetadataCategoryReader.java:92)

"category-classifier-worker-1" #153 daemon prio=5 os_prio=0 cpu=266452.47ms elapsed=2586.51s tid=0x00007fa21d028520 nid=199 waiting for monitor entry
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.groupdocs.metadata.internal.c.a.i.X.a(Unknown Source)
	- waiting to lock <0x0000000708e58330> (a java.lang.Object)
	at com.groupdocs.metadata.internal.c.a.i.T.a(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.T.i(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.T.getFileFormat(Unknown Source)
	at com.groupdocs.metadata.core.cW.a(Unknown Source)
	at com.groupdocs.metadata.core.eF.b(Unknown Source)
	at com.groupdocs.metadata.core.eh.blm(Unknown Source)
	at com.groupdocs.metadata.core.ip.a(Unknown Source)
	at com.groupdocs.metadata.core.ip.<init>(Unknown Source)
	at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
	at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
	at com.foo.pipeline.classification.GroupDocsMetadataCategoryReader.accept(GroupDocsMetadataCategoryReader.java:92)

"category-classifier-worker-2" #154 daemon prio=5 os_prio=0 cpu=263077.62ms elapsed=2586.51s tid=0x00007fa21d029ab0 nid=200 runnable
   java.lang.Thread.State: RUNNABLE
	at com.groupdocs.metadata.internal.c.a.i.internal.lg.h.j(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.internal.lg.p.b(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.internal.lg.p.a(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.internal.lg.p.a(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.internal.lg.p.d(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.internal.kl.a.<init>(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.internal.fW.c.a(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.internal.jb.f.b(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.X.a(Unknown Source)
	- locked <0x00000007afda9ca0> (a java.lang.Object)
	- locked <0x00000007afda9ca0> (a java.lang.Object)
	- locked <0x0000000708e58330> (a java.lang.Object)
	at com.groupdocs.metadata.internal.c.a.i.T.a(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.T.i(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.T.getFileFormat(Unknown Source)
	at com.groupdocs.metadata.core.bv.a(Unknown Source)
	at com.groupdocs.metadata.core.eF.b(Unknown Source)
	at com.groupdocs.metadata.core.eh.blm(Unknown Source)
	at com.groupdocs.metadata.core.ip.a(Unknown Source)
	at com.groupdocs.metadata.core.ip.<init>(Unknown Source)
	at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
	at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
	at com.foo.pipeline.classification.GroupDocsMetadataCategoryReader.accept(GroupDocsMetadataCategoryReader.java:92)

"metadata-extractor-worker-0" #156 daemon prio=5 os_prio=0 cpu=122496.34ms elapsed=2586.51s tid=0x00007fa21d02bdb0 nid=202 waiting for monitor entry
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.groupdocs.metadata.internal.c.a.i.X.a(Unknown Source)
	- waiting to lock <0x0000000708e58330> (a java.lang.Object)
	at com.groupdocs.metadata.internal.c.a.i.T.a(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.T.i(Unknown Source)
	at com.groupdocs.metadata.internal.c.a.i.T.getFileFormat(Unknown Source)
	at com.groupdocs.metadata.core.dc.a(Unknown Source)
	at com.groupdocs.metadata.core.eF.b(Unknown Source)
	at com.groupdocs.metadata.core.eh.blm(Unknown Source)
	at com.groupdocs.metadata.core.ip.a(Unknown Source)
	at com.groupdocs.metadata.core.ip.<init>(Unknown Source)
	at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
	at com.groupdocs.metadata.Metadata.<init>(Unknown Source)
	at com.foo.pipeline.metadata.GroupDocsMetadataExtractor.extract(GroupDocsMetadataExtractor.java:156)

Observations

  • Many threads are blocked waiting for the same internal monitor (<0x0000000708e58330>).
  • The blocking happens during getFileFormat() while constructing Metadata.
  • This effectively serializes metadata reading across threads.

Questions

  1. Is GroupDocs.Metadata thread-safe when multiple Metadata instances are created concurrently?
  2. Is there some global synchronization or shared cache inside the library that could cause this contention?
  3. Is this expected behavior (e.g., during file format detection)?
  4. Are there recommended practices to avoid this contention when processing files in parallel?
  5. Would using a different initialization pattern or configuration help?

My goal is to process many files concurrently, but currently the internal synchronization prevents scaling with multiple threads.

Any clarification or recommendations would be greatly appreciated.

Thank you.

Hello,

You’re seeing global synchronization during file format detection when constructing Metadata. Here’s what’s going on and how to reduce it.

What’s happening

  • All blocked threads are waiting on the same monitor (0x0000000708e58330) inside com.groupdocs.metadata.internal.*, in the path that leads to getFileFormat() during Metadata construction.

  • In the open-source part of the flow, each new Metadata(stream) calls FileFormatChecker.createRootPackage(). If you don’t pass a known format via LoadOptions, the library has to detect the format (e.g. by trying loaders and possibly calling into dependencies such as Aspose.Imaging for some formats). That detection path uses shared/static state or a global lock in the internal/dependency code, so many threads end up blocked on the same monitor and metadata reading is effectively serialized.

Answers to your questions

  1. Is GroupDocs.Metadata thread-safe when multiple Metadata instances are created concurrently?

Each Metadata instance is independent and safe to use from a single thread. However, creating many Metadata instances in parallel is not fully parallel: there is global synchronization (and/or shared caches) used during format detection, so you see the contention you described.

  1. Is there global synchronization or shared cache?

Yes. In the library code there is a singleton ServiceLocator (with synchronized access) and an ApplicationInitializer with a static mutex. The blocking you see is in com.groupdocs.metadata.internal.* (dependency/internal code), which almost certainly uses a global lock (or similar) around format detection. So format detection is a shared, serialized section when you don’t specify the format.

  1. Is this expected?

With the current design, yes: format detection is implemented in a way that uses shared synchronization, so parallel construction of Metadata without a known format will hit that bottleneck.

  1. Recommended practices to avoid contention
  • Pass the format when you know it (e.g. from file extension or content-type). If you use the constructor that accepts LoadOptions and set the file format there, the library can skip automatic format detection and call the right loader directly, avoiding the contended internal path:

  • LoadOptions loadOptions = new LoadOptions(FileFormat.Pdf); // or Jpeg, etc.

  • new Metadata(stream, loadOptions)

  • Prefer a small number of worker threads for metadata extraction (e.g. a bounded pool) so that fewer threads compete for that internal lock.

  • If your pipeline allows, pre-classify by format (e.g. by extension) and use the appropriate LoadOptions(FileFormat) per file so that most or all Metadata constructions skip the heavy detection path.

  1. Would a different initialization pattern or configuration help?

Yes. Using Metadata(InputStream, LoadOptions) with LoadOptions set to the correct FileFormat (when you know it) is the main way to avoid the format-detection lock and get better parallelism. There is no public configuration in this library to disable or parallelize that internal synchronization; the practical approach is to avoid triggering it by specifying the format.

Summary: the bottleneck is global synchronization during format detection when the format is not specified. Specifying the format via LoadOptions when possible is the recommended way to reduce contention and scale better with multiple threads.

Thank you.