How to get rid of: WARN (Periodic Recovery) com.arjuna.ats.jta ARJUNA016027?

marielle.spiteri's picture
marielle.spiteri
Blog Categories: 

In this article, you'll learn about the error and how to remove it from your log file.

What is the WARN you see in your Bonita log file?

Depending on your RDBMS database server, the trace will vary a little. For example with Oracle database, you'll see:

2020-03-30 22:00:07.887 +0200 WARN (Periodic Recovery) com.arjuna.ats.jta ARJUNA016027: Local XARecoveryModule.xaRecovery got XA exception XAException.XAER_RMERR
javax.transaction.xa.XAException
at oracle.jdbc.xa.OracleXAResource.recover(OracleXAResource.java:732)
at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass(XARecoveryModule.java:634)
at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass(XARecoveryModule.java:226)
at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass(XARecoveryModule.java:171)
at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:770)
at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382)

2020-03-30 22:02:17.888 +0200 WARN (Periodic Recovery) com.arjuna.ats.jta ARJUNA016027: Local XARecoveryModule.xaRecovery got XA exception XAException.XAER_RMERR
javax.transaction.xa.XAException
at oracle.jdbc.xa.OracleXAResource.recover(OracleXAResource.java:732)
at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass(XARecoveryModule.java:634)
at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass(XARecoveryModule.java:226)
at com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass(XARecoveryModule.java:171)
at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:770)
at com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382)

Why do you see this error in your Bonita log file?

These messages are the result of some transactions having been caught half-way by the database restart and thus not having fully completed.

The transactions are seemingly still outstanding in the transaction manager's transaction log (what with Tomcat not having received the transaction completion message from the database). Hence they are detected by the recovery mechanism (that triggers automatically every 2 minutes by default), but for some reason the recovery mechanism can not proceed.

For instance, it could be that the connection pool did not flush the old connections and therefore the recovery mechanism can not get hold of a "clean" connection.

Bonita server depends closely on the database where it stocks all the elements necessary for the execution of the process flows. You must understand that Bonita can not function correctly without access to its database.

What should you do to fix that error for now?

Restarting the Bonita server should help, in that:

  • the connection pool will be created anew, on the one hand,
  • and the engine will regenerate the work queue again, on the other hand.

What should you do to prevent that error from happening?

Whenever the database has to be restarted, the recommended procedure is as follows:

  1. Stop the Bonita Tomcat server(s).
  2. Stop the Database.
  3. Start the Database.
  4. Start the Bonita Tomcat server(s).

What can you do to prevent this in the future?

Until this error is fixed in the product, you will need to go further with the tuning of your platform: Bonita does have a mechanism to get over unexpected temporary database shutdown/crash/lack of availability (without having to restart the server): it is the work retry mechanism.

These are the configuration parameters for that mechanism (you can find them in the following file: bonita-tenant-sp-custom.properties):

#Works execution

#number of times the works will be retried in case of issues that can be retried

#bonita.tenant.work.maxRetry=10

#bonita.tenant.work.retry.delay=1000

#bonita.tenant.work.retry.factor=2

By default, when a work fails because the database could not be reached, it will be retried:

  • up to 10 times if necessary
  • the interval between each retry starting at 1 second (1000 ms) and doubling up (factor 2) as the retries go (2 s, 4 s, 8 s, 16 s, ....)

Thus, you may uncomment set new values for these parameters based on the maximum length of downtime you wish to make your server resilient to. If needed, please follow the usual procedure to set these new values:

  1. Run: <TOMCAT_HOME>/setup/setup.sh pull
  2. Edit the file for your tenant 1: <TOMCAT_HOME>/setup/platform_conf/current/tenants/1/tenant_engine/bonita-tenant-sp-custom.properties
  3. Set the appropriate values.
  4. Run: <TOMCAT_HOME>/setup/setup.sh push
  5. Restart the Tomcat server.

Please be aware, though, that in your current version (7.10.+), if the connection to the database is lost some works might still be lost despite the retry mechanism.

What about the "Periodic Recovery" warnings?

Good question! As for the "Periodic Recovery" warnings:

  • these warnings are logged by Narayana's (the XA transaction manager) XA recovery process
  • the XA recovery manager is triggered (by default) every two minutes (this period can be changed by configuring the property: RecoveryEnvironmentBean.periodicRecoveryPeriod)
  • the XA recovery manager checks the transaction log (in the Tomcat) and launches the recovery process for the XA transactions that may be "orphaned"
  • as soon as a transaction is completed, it is removed from the transaction log
  • an "orphan" transaction is a transaction that has been lying in the transaction log for longer than the value 'com.arjuna.ats.jta.orphanSafetyInterval' property: beyond this time, the recovery manager considers the transaction has failed and tries to recover it
  • the database connection loss is most likely responsible for having transactions still outstanding in the transaction log
  • but if the orphanSafetyInterval value (10 s by default) is shorter than the duration of the connection loss, the recovery process will also fail as it won't be able to reach the database either
  • furthermore, when database is shut down the Tomcat's connection pool could sometimes stay with "old" connections, so, even after the connection to the database is back, the recovery process might still fail in some cases unless the "old" connections are flushed and the pool is filled with renewed connections.

But what can you do about it in Bonita?

You may avoid these by increasing the value of the 'com.arjuna.ats.jta.orphanSafetyInterval' property (whose default value is 10000ms). To do so:

  1. Edit the file:
  2. if you run your Tomcat with "start-bonita.sh": <TOMCAT_HOME>/setup/tomcat-templates/setenv.sh
  3. else: <TOMCAT_HOME>/server/bin/setenv.sh
  4. Add the following property in the CATALINA_OPTS (e.g. 50s): -Dcom.arjuna.ats.jta.common.orphanSafetyInterval=50000
  5. Restart the Tomcat server.

For further information, put your glasses on, here's some reading!

Narayana Periodic Recovery: https://jbossts.blogspot.com/2018/01/narayana-periodic-recovery-of-xa.html Narayana Failure Recovery: https://narayana.io//docs/product/index.html#d0e823 https://stackoverflow.com/questions/7752593/jboss-strange-exception-after-auto-reconnecting-to-xa-datasource https://jbossts.blogspot.com/2018/01/narayana-periodic-recovery-of-xa.html https://narayana.io//docs/product/index.html#d0e823

Whew, you did it! good job! Have fun with Bonita!

Notifications