[SOLVED] ActiveMQ Issues - Out of Memory - Server crashing

Jaymer

Jump to the solution here: [SOLVED] ActiveMQ Issues - Out of Memory - Server crashing>

We're in final testing of an 8.1 application we are about to roll out.

Its basically a Lead tracking system.
Each lead has 7 possible ps_xxx references (owner of the lead, referrer, 2nd_owner, source, etc.) + a couple of om_xxx lists.

All has worked fine with lots of Leads loaded and Queries to search & browse worked fine.
Then, all we did for one set of data (about 200 out of 50,000) was fully attach/point to all the ps_xxx fields. (its not mandatory that all 4 be DEFINED).

1st thing we noticed was a significant delay in paging during Queries.
Screens that were almost instant to go to the next page, now took 1 second, for example... long enough to see the spinner flipping.
(Going to another tenant WITHOUT these references filled out was still fast. That was the only difference.)
Almost simultaneous with this was out of memory errors... which sent me on a search of the forum to no avail.

Getting these primary errors:
java.lang.OutOfMemoryError: Java heap space
Out of memory error encountered in the source OS

java.lang.OutOfMemoryError: Java heap space

2018-06-19 14:13:49,714 org.apache.activemq.broker.region.Topic  -Usage(Main:memory:topic://defaultoutputtopic:memory) percentUsage=222%, usage=149466463, limit=67108864, percentUsageMinDelta=1%;Parent:Usage(Main:memory) percentUsage=222%, usage=149466463, limit=67108864, percentUsageMinDelta=1%, 
Usage Manager memory limit reached for topic://defaultoutputtopic. 

Producers will be throttled to the rate at which messages are removed from this destination to prevent flooding it. 
See http://activemq.apache.org/producer-flow-control.html for more info

2018-06-19 14:25:31,039 org.apache.activemq.broker.region.Queue  -Usage Manager Memory Limit (67108864) reached on queue://defaultinputqueue. 
Producers will be throttled to the rate at which messages are removed from this destination to prevent flooding it. 

See http://activemq.apache.org/producer-flow-control.html for more info

Out of memory error encountered in the source OS

I started to read about this on the Web.
One thing I found was a critical bug in ActiveMQ 5.8, and per this thread Scaling Challenges (comment)&hilit=activemq#p44219
I downloaded 8.12.1, renamed the .jar file, copied it into place, relaunched Aware and then the query still hung with out of memory error, PLUS this:

2018-06-19 14:56:07,384 org.apache.activemq.broker.TransportConnection.Transport  -Transport Connection to: tcp://127.0.0.1:49812 failed: java.io.EOFException

which totally Fubars TCP and you have to relaunch Aware again.

So the system works fine on queries of the Lead table that dont involve as much system resources.
The more indepth queries (ie. have more joins to return simple reference values, like a Name to display in a pulldown) generate the Out of memory error and the user cannot do anything.

I'll keep researching.
Anyone else seen this?

Jaymer

posts online reference their activemq.ini file, where there are memory settings.

i don't see this file in AIM's install.
I wonder how you adjust these parms?

<pendingMessageLimitStrategy> 
   <constantPendingMessageLimitStrategy limit="1000"/> 
</pendingMessageLimitStrategy> 

System usage is: 

<systemUsage> 
   <systemUsage> 
       <memoryUsage> 
           <memoryUsage percentOfJvmHeap="70" /> 
       </memoryUsage> 
       <storeUsage> 
           <storeUsage limit="100 gb"/> 
       </storeUsage> 
       <tempUsage> 
           <tempUsage limit="50 gb"/> 
       </tempUsage> 
   </systemUsage> 
</systemUsage>

Jaymer

bin/startAwareIM.bat
start javaw -Xms512m -Xmx1024m -XX😛ermSize=128m -XX:MaxPermSize=256m -Xss512k -cp %CLASSPATH% com.bas.controlcenter.ControlCenterApp

bin/startupOptions.props
AWAREIM_SERVER_STARTUP=java -Xmx1024m -Xms512m -classpath ../Tomcat/lib/;../CustomJars/ com.bas.newcp.ServerStarterECP

bin/startupOptions.props
TOMCAT_STARTUP=java -Xmx512m -Djava.endorsed.dirs\=../Tomcat/common/endorsed -classpath ../Tomcat/lib/tools.jar;../Tomcat/bin/bootstrap.jar;../Tomcat/bin/tomcat-juli.jar -Dcatalina.base\=../Tomcat -Dcatalina.home\=../Tomcat -Djava.io.tmpdir\=../Tomcat/temp org.apache.catalina.startup.Bootstrap start

ConfigTool/eclipse/ConfigTool.ini
-Xmx384m

ACDC

I am not on 8.1 yet, but in 7.1 there are heap settings in the awareim.CONF file.
Have you tried increasing them

Jaymer

No Joy

I saw 2 parms. 1 was 20Mb, forget the other.
Made both 1024Mb.

Screen Shot 2018-06-19 at 5.50.21 PM.png

same results:

Exception thrown com.bas.connectionserver.server.BASServerException Unknown system error
com.bas.connectionserver.server.BASServerException: Unknown system error
	at com.bas.connectionserver.server.ConnectionFactory.sendMessage(ConnectionFactory.java:271)
	at com.bas.connectionserver.server.ConnectionFactory.sendMessage(ConnectionFactory.java:229)
	at com.bas.webapp.common.WebAppUtils.sendMessageToExecutionEngine(Unknown Source)
	at com.bas.webapp.common.WebAppUtils.sendMessageToExecutionEngine(Unknown Source)
	at com.bas.webapp.thin.handlers.F.K.A(Unknown Source)
	at com.bas.webapp.thin.handlers.F.K.A(Unknown Source)
	at com.bas.webapp.thin.handlers.F.K.A(Unknown Source)
	at com.bas.webapp.thin.handlers.XMLRequestHandler.A(Unknown Source)
	at com.bas.webapp.thin.handlers.XMLRequestHandler.handleURLRequest(Unknown Source)
	at com.bas.webapp.thin.servlets.WebInterfaceURLServlet.doPost(Unknown Source)
	at com.bas.webapp.thin.servlets.WebInterfaceURLServlet.doGet(Unknown Source)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198-)
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:621)
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
	at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:650)
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803)
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)
Web server returned error: Unknown system error

aware_support

You don't need to change ActiveMQ settings, you need to increase memory to the Aware IM Server - the more you give it the better. Don't know what you are doing in your application, but if you are doing something crazy (like allowing the system to return a thousand of huge records), give it 16Gb if your machine has them.

Jaymer

Thx for reply Vlad

Not doing anything weird.
Only returning 40-50 records per Query.

Odd thing is, my "All" query find 186 records, but only brings 50 into the grid each page.
You can next/prev pages, but every now and then it will get out of Memory.

But Another menu button finds only 31 out of the 186 - with a VERY similar query... just one to narrow the list down.
But it always hangs - the recs can be found easily/quickly in SQL tool.
Another Query finds 16 recs and it usually works, but still sometimes just hung.

Thats it. No other users. No activity.
Another Tenant that doesn't have all the 8 reference fields populated works fine bringing hundreds back to the grid (still only 40-50 per page).

aware_support

If you are sure that it's a query that returns a few records and nothing else, then your memory settings must be very low - maybe you are using default values.

ACDC

I don't know if this could be related to your problem, but keep in mind if you have a reference on an object that is set to "Fetch All Records at once" it loads everything into memory

So for example, if you have 50 000 product items and on the invoice line item the product reference was set to "Fetch All Records at once", every time a user selects a product item it will load 50 000 into memory potentially consuming all available memory .( multiply this by more than one user, things could get very nasty)

weblike

If you are sure that it's a query that returns a few records and nothing else, then your memory settings must be very low - maybe you are using default values.

Same issues here....
Production server:

Having 20 concurrent users max.... (RAM goes up to 8GB)...In PHP I could have 300 users with half of memory...
1.b. DB size (table with no more 8000 rows takes few seconds to load (25 records per page). I'm seeing the spinner...spinner...spinner
Same VPS which worked fine with 7.1 (24GB RAM, Intel XEON 16 vCPU) => is crashing with 8.1
Configurator is blocking randomly ("not responding" messages in Task Manager)...screen freeze....

IF we are doing wrong, please tell us how to do it right!

Thank you.

chris29

We have a similar issue. Found that you need to ensure that all log outputs are closed and put the following in the cp.ini at the end

-vmargs
-Xmx1024M

rocketman

"IF we are doing wrong, please tell us how to do it right!"

Your memory allocation seems very high for a 24GB server If it's a windows server - keep in mind you have to leave some space for the OS services plus a chunk for MySQL. Try halving them - and you shouldn't need anywhere near that much for the configurator surely?

RLJB

I'm very interested in learning more about this thread - we have some very slow searches, we noticed a difference moving from V7 to V8 and it sounds worse in 8.1 (we haven't deployed any prod servers to 8.1 yet).

George - for a 24 GB server, we would set:

Config tool: to 1 GB or less (assuming this is a production server and you don't do config on it, you don't need to allocate it memory, all our prod servers run with Config tool at 256mb - no issues, we publish through the browser anyway)

Tomcat: We were told to set Tomcat to 1gb. We had a java expert review our set up and tell us anything more than this is wasted and not needed. I didn't entirely believe him, so we set it to 4GB. Tomcat never falls over on our prod servers anymore.

Server: We give this 8GB or 16GB depending on the server size and the user numbers we have. I have no proof that the 16GB runs any better/faster than the 8GB - there are too many variables at play (BSVs, usernumbers, etc).

This leaves 4GB for the OS, and we have DB running on different server. Hope that helps.

RLJB

Hey Chris (hope you're well mate)... what does this do, vmargs etc? And also what do you mean by "all log outputs are closed " is that the Aware logging on the control panel?

We have a similar issue. Found that you need to ensure that all log outputs are closed and put the following in the cp.ini at the end

-vmargs
-Xmx1024M

chris29

Hey there, yes all good thanks

Anything after -vmargs are passed as arguments to the JVM in Eclipse. We found that if you don't set max memory on the cp and the logging windows open you will eventually run out of server memory.

Since moving from 7.1/8 to 8.1 we have had to increase memory settings. In V8 we resorted back the the old control panel.

Jaymer

this thread brought to light several issues.
NOTE _ I'm not blaming Aware/Vlad for anything. I'm not criticizing the product. Not in any way. I'm describing what I see going on because it will help me (and others hopefully) design Aware BOs to work more efficiently once I know whats going on behind the scenes. The Aware "concept" and the fact you don't need to be a professional programmer to write these apps is still true. You don't really need to know whats going on behind the scenes, which is why some ideas in the next post "Food for Thought" might help you avoid unexpected results.

1) I had previously posted about higher CPU time when a process ran - AND - you had a server Output window open. Under v8+, it was possible to open a grid and then observe the TomCat Output still scrolling for many seconds after the grid had completed on the client PC. Esp. when you have conditional formatting/buttons on each row in that returned grid... each grid row result may "expand" into 3,5 or more actual lines in the Tomcat window. And if the user does a next page, or a sort, then there's even more rows that still have to be displayed in the Tomcat window.

I believe this is one reason why CHRIS29 had discovered those Output windows need to be closed. A Habit I've tried to get in myself, but still can bite me as it did yesterday when I had them open and got a User issue with a grid "hanging" because it had already reported a Out of Memory issue.
The Tomcat window is the biggest Offender I think and you may be able to leave open the Server Output.

2) This process also got me looking more indepth at MS SQL SERVER Profiler Traces... to see why [it appeared] one grid was responsible for causing this OOM issue.
WHAT I DISCOVERED (that others may already know) is when you have a BO with 50 fields, for example, and you make a grid call to show 25 records:
a) a SELECT * from BO is issued, returning ALL fields, regardless of what you are displaying in the Grid
b) a SELECT * FROM PS_BO is issued for each reference in that row, regardless of whether you display linked fields in the Grid

EXAMPLE: Lead system. Each Lead has a ps_EnteredBy, ps_AssignedTo, ps_ClosedBy, ps_Branch, ps_Tenant
Just displaying the first 25 Leads causes:
SELECT TOP 25 Lead.*
SELECT * FROM REGULARUSER for EnteredBy
SELECT * FROM REGULARUSER for AssignedTo
SELECT * FROM BRANCH for 1 branch
SELECT * FROM TENANT for 1 Tenant

Clicking NEXT PAGE reissues similar statements, except the TOP xxx grows. For the 3rd page of 25 rows shown, the statement is:
SELECT TOP 75 Lead.*
Internally, Vlad knows he only is interested in the 51-75 rows, but the underlying database still returns more and more rows as the user pages through a grid.

THE FULL MONTY and the SOLUTION
3) taking all this into account, I started reviewing what were unused fields and references in my App.
This particular app was started using the CRM as a framework.
the BO Group COMMUNICATION contains 4 BOs (email out, email in, sms, ???). We were only using Email in this list. But Queries, procedures, etc. had been built on "Communication", so you can't easily remove tables from the group and still Publish because you get a rash of Integrity issues. This is why this hadn't been done in the past. So I bite the bullet and dig in to start cleaning up this BO.
This wasn't a big deal, but it came to light when examining the SQL Trace and seeing a over complicated query (for what I needed) for showing Emails - since the Query was built on Communication (not just OutgoingEmail) it had a join of all tables in the group.

OK, here's the solution, promise
4) I started looking at DOCDATA fields in the App. Esp. in the lead table. There WAS one - from the CRM - we were not using it, but I deleted it anyway.
THEN, I realized a PHOTO was in RegularUser.
We were not using it either but it WAS in the UserEdit form and some users had had uploaded photos.

I ran a SQL Query using DATALENGTH([Photo_DOCDATA]) and found most photos were around 100k, but THE PHOTO of the user where we saw ALL THE SLOWNESS AND OutOfMemory issues was...
:oops: 12 Megabytes :oops:

Yes, take a picture of yourself at a portrait studio, have them give you a CD cause the image is so big, and use THAT DAMN HUGE IMAGE as your tiny little picture in the CRM - Makes sense to me !!!! :roll:

And, reference back to # 2 above, she was the sales rep assigned to all those leads.
So for every read of every grid line, her "Select * from LoggedInRegularUser" dataset included a 12Meg image that wasn't even needed.

A quick
Update [R3].[dbo].[REGULARUSER]
set Photo_DOCDATA=NULL, Photo_DOCTYPE=Null
wiped all that crap out and performance IMMEDIATELY returned to normal.

Jaymer

Food for thought:

If you have a photo in a Tenant master, to allow for a nice attached Logo, then it "appears" thats going to be transmitted in just about every grid for BOs that point directly to the Tenant.
making sure you use Stored in FS (instead of in db) would seem to solve this.

Make sure you resize or limit the upload size of an image

Not only photos, but just Documents stored in the db inside a Owner or Parent record may affect performance of child related activities that point UP to that parent rec.

ALSO
I'm NOT saying that the way Vlad is doing things is wrong.
Lets say you make a grid with only 6 fields. How is Vlad to know that you don't need the 7th field in a formula to show/hide a button in Row Operations (or in the Applicable when box)? He can't know that, so he returns all the fields.

One thing frequently said is to NOT to put lots of temp fields in LIRU - I don't recall the stated reason for that. Maybe in some designs, people have run across a similar binary data issue in LIRU that has hindered performance and gave birth to this idea. Seeing the Trace files, and that RegularUser is hit so many times (as described in the above "Solution" post), one more read of LIRU to get some saved parms hardly seems like it would be any issue at all.
However, as an example, in a certain design, where every user must have a signed Waiver and the designer said "Lets just stick this in the User File" and now there's a huge Scanned Image stuck in Regular User, then YES, certainly, there would be a massive performance hit reading LIRU many times.

BobK

chris29 wrote
We have a similar issue. Found that you need to ensure that all log outputs are closed and put the following in the cp.ini at the end

-vmargs
-Xmx1024M

Chris, are you sure that works?

I needed to pass an argument to the JVM and did some googling and found that the above is how you pass arguments to the eclipse JVM.
AwareIM is built on the eclipse framework but it is NOT eclipse.
When I put my argument in cp.ini like you did above, I did not get the result I was hoping for.
To get a better idea of what was being passed to JVM, I wrote a small java plug-in to display the arguments being used by the JVM and my argument was not displayed.
I also tried adding my argument to startAwareIM.bat. After restarting AwareIM with that old batch file, my argument was displayed.
After some experimentation, I found that adding my argument to startupOptions.props and starting AwareIM with cp.exe, my plug-in displayed my argument and I was able to get the results I needed with the argument set.

tford

making sure you use Stored in FS (instead of in db) would seem to solve this.

Great detailed post, Jaymer. Very thankful for your time investment to share what you learned in such a detailed way.

I'm not sure I understand the mechanics of "Stored in the FS (instead of in db)" and the impact of efficiency.

Jaymer

thanks Tom,
i'm not sure you were actually asking a question about "stored in DB vs. FS", but let me add this:

Consider MySQL vs. SQL Server.

MySQL stores physical files on the filessystem for each BO.
MSSql & oracle do not - all their data is co-mingled in database pages inside 1 big physical file for the entire db (there are exceptions).

So, I always questioned the validity of moving Documents (lets use this for large blob binary objects of any kind) out of the DB and into the FS FOR MY SQL because they were really already in the FS. Whats the diff between 1000 seperate images in C:\wherever vs. 1000 images stored as binary inside c:\ProgramData\MYSQL\Data\Documents.MYD?

With MSSql its different (IMHO) because all those images are bloating the single database file. Someone else would have to weigh in on pros vs cons of this structure.

OK, but back to the issue at hand.
Maybe some of the speed issues (and systems dragging performance-wise) seen over the past few years through many posts on the Forum were not just because of the images in the db, but the issues I posted in the "Solution".
From what I see, Images stored in a child table to the RegUser wouldn't impact performance... because many queries from other transactional files that point to a User record are NOT going to ever see these large fields stored in subordinate related rows.
But one could easily see a system that allowed a high res image of an employee or customer or contact - and might believe "Sure, its in the User record, but I'm not displaying except on ONE screen, so its presence is irrelevant to my overall system." au contraire
Remember, this issue isn't related to just Pictures - Documents also.
It might be possible that if my BSV stored that image in the FS, then this would never have become an issue.
Let Aware do a "Select * from BO" all day long, but since the image isn't being transmitted (only a string thats the PATH to the Doc), then it avoids all this Query overhead.