Annotation API format and saving issues in Java

Mariusz_Pala · March 3, 2015, 11:23am

Hi,

I’m trying to understand how the annotations are generated, saved and loaded within GroupDocs.

Let me explain how this suppose to work and how it works in all other annotation tools.

Other I mean Brava, Daeaja ViewOne Pro, Acrobat, Acrobat plugins, PDF Annotation Services. Basically all that tools are used with the ECM systems like Documentum, Alfresco, Oracle Web Center, Sharepoint. All those tools use I’d say a standard way to store the annotations. That means that every annotation/stamp/watermark is a single object that can be saved or exported in different format (XML, FDF, XFDF) depending on the tool. Plus can be easily converted from one to another. Anyway, such an object contains all the annotation definition plus owner name and creation date. That’s it. It’s either linked with the document via a relation (ECM system) or embedded like in PDF where it an be exported/imported as FDF or XFDF.

Perfect and simple solution.

Document <–> Relation (who, when) <–> Annotation

Withing GroupDocs I can see a completely diffent approach/model that I’m not able to understand.

There is an annotation that has to reference a session?? and userId??. Session is volatile by definition so I can’t understand how a session ID can be saved with the annotation. User ID is also GD specific whether it all other system UserId is actually a String that can be used and understood by other systems.Then there is ID and GUID. Not sure why there are two identifiers for a single object.

Then there is a Reply, another objects that references the annotation and has also two identifiers and userId reference.

To sum up, two or more objects per annotations containing references to internal GD objects.

In other words - no way to save the annotation within the ECM system, is that correct?

I’d need a way to combine the annotation with all the replies into some single XML file. That’s doable I guess, although with two different DAOs that handle those objects separately it might be quite complicated.

Anyway, let’s assume that I’m able to store the annotation with the replies as a single file that can be stored within the ECM system. I loose ID, annotationSessionID, userId, GUID as those parameters are irrelevant. Within the ECM system I have a document with that related annotation and some metadata that identifies who and when created it.

That’s the expected output that allows the data to be migrated to another ECM system or another repository within the same system (very common situation).

Now I need a way to read it back to the GD format. And… I don’t see a way to do it. How to populate annotationSessionID that requires a session which requires a document with INT identifier. How to populate ID, GUID and userId. How to populate all those relations on the fly when there are seven different DAOs depending one on another. It looks like I should copy over the ECM database, which includes millions of documents and tens of millions of annotations into the GD internal database and keep that synchronized. Otherwise, literally, there is no way to make that work together.

I encourage you to take a look at the existing ECM systems, because that’s your target. There are millions of people working with it using the annotation tools every day. GroupDocs seem a great alternative to them. But unfortunately I don’t see a way to make that work together.

We’ve been trying to do the integration for half a year now. Each version fixed a dozen of bugs and introduced a bunch of new bugs. Nevertheless, I hope that at some point it will be amazing tool.

Unfortunately I reached the point when I don’t see a light in a tunnel. The connector design and approach seems so unreasonable that I don’t think there is any way to make that work in real world.

Keep in mind that I represent 500k+ of potential users.

Please let me know if there is anything that you plan to change or anything that we can do to do the ECM integration.

Kind Regards,

Mariusz Pala

ihor.mykhalevych1 · March 4, 2015, 11:26am

Hello Mariusz,

Thank you for all these precise questions. Let me try to explain our approach and how your situation can be resolved.

First of all, this is the entire meta-data structure that is handled with the GroupDocs.Annotation for Java library. Due to historically reasons it is what it is. There are fields that are not used and can be freely skipped. We continue to work on it. Two different id types are for exactly for this reason, we try to make it more comfortable and are in the transition state.

By the way, it should not prevent from creating a connector to some EMS. The provided APIs (entities, Dao interfaces, connector, etc.) expose functionality at the high level. It opens almost all availabilities but in the same time gives not so much examples or specific ways to solve situations. We exposed the APIs in this way to give an ability of saving data to any database without redundant effort.

From another side it means that for the AnnotationHandler there is no matter from where the ICustomConnector implementation (in particular IAnnotationDao, ICollaboratorDao, etc. implementations) takes data the main point is that it should be provided in a form of the ITable entities and valid.

In your case one of the solutions is to implement the whole connector and supported daos to read from/save to the database only “live” data (annotations and replies with some document identifiers). Any other data create on demand during the workflow or skip if it’s not used.

Let’s consider the example of saving annotations. We want to save the annotation linked to the document through the document ID. In the IAnnotationDao.insert method we receive the annotation object with all data and bound to the session (annotation session) object, due to the meta-data schema. You can use this picture to better understand the structure and connections (https://www.dropbox.com/). Now we can get the document through the session the annotation is linked with:

ISessionDao sessionDao = DaoFactory.create().getSessionDao();

ISession session = sessionDao.selectBy(Arrays.asList(ISession.ID), annotation.getAnnotationSessionId());

IDocument document = documentDao.selectBy(Arrays.asList(IDocument.ID), session.getDocumentId());

Now, having annotation and document objects we can save data to the EMC in the form we need it.

The procedure of reading the annotation back will require search for the session object by the document id (retrieved from the EMC annotation data) to fill the IAnnotation object session linking data. This search can be implemented in the ISessionDao.selectBy method and in case of such session absence we can create a new one.

Thus annotation sessions are not needed in the EMC and are temporary data required by the GroupDocs.Annotation for Java library processing we can keep them in the RAM (through some cache implemented in a form of map or some another structure).

To sum things up, we need:

Make Dao implementations “permanent” (to eliminate recreation on every request)
Implement virtual entity Daos to manipulate with objects in the memory
Implement “live” entity Daos to manipulate with objects in the EMC

Here are additional images which visualize the meta-data structure and connections, they can help you in better understanding:

https://www.dropbox.com/

Mariusz_Pala · March 4, 2015, 1:04pm

Hi,

I’m not able to implement it. As I explained I need to have the connector per user session as it’s up to the user credentials.

Additionally I need to save the annotation with all the replies in a single XML file.

Are you able to provide an Alfresco sample or a CMIS sample that can do that?

Let’s imagine that within the session you get the object ID and repository name, user name, password. Custom InputHandler and Connector may reuse that data to authenticate the user and initialize a session to the repository. Session is passed to the Connector and InputHandler.

Then in the connector I have access to the following API:

objectId - 16-digit string, document ID

EcmObject ecmSession.getObject(String objectId) - returns the ECM object,

ecmObject.getId() - returns object/document ID

ecmObject.getName() -> returns name with extension

ecmObject.getContent() -> returns object

List ecmSession.getObject(String objectId).getAnnotations() -> return a list of annotations, one XML contains annotations and reply.

ecmAnnotation.getId() - ID of the annotation as 16-digit string

ecmAnnotation.getCreator() - userName of the author

ecmAnnotation.getCreationDate() - annotation creation date

ecmAnnotation.getModifyDate() - annotation modify date

ecmAnnotation.getContent() - returns the XML content as InputStream. XML should contains annotation and all the replies, but no session or document ID

ecmObject.addAnnotation(File file) -> creates an annotation with the current user name as the author

ecmObject.removeAnnotation(String annotationId) - used to remove the annotation with all the replies

ecmAnnotation.setContent(File file) - used to update the annotation

Can you prepare just a Draft sample that can operate just on that API and everything else is just volatile/kept in memory.

Note, that connector must be initialized per HTTP session - not sure how that can work with Atmosphere and collaboration.

Thanks in advance,

Mariusz

ihor.mykhalevych1 · March 17, 2015, 9:54am

Hi Mariusz, please refer to this thread http://groupdocs.comhttps://forum.aspose.com/t/1554.