I have a good problem which is that sometimes my app just has too many concurrent users and it just dies. btw, 64 bit JDK, 2GB to AwareIM and 2GB to Tomcat, AwareIM 5.9. When it dies, the server just stops responding - webpage does not load. On the server, the CPU and memory usages are good and the two Java processes are not too high, like 1.5GB. So I'm not sure what the factor is that kills things, or why. I would appreciate a conversations on experiences and practices to just keep it going. Note that for the most part, it just runs like heck. But too many users and things go south. I do have another AwareIM license and have considered setting up a load balancing server on AWS. That will help. But let's discuss what works.

Do you run the latest Tomcat regardless of what is packaged with AIM? I will certainly move to AIM 6.0 and I know that a newer Tomcat version is packaged with it. Are you proposing that an AwareIM developer should install and run the latest Tomcat and JDK at all times? If so, do you have procedures and caveats for doing so?

Yes, I overwrite the packaged Tomcat with the latest downloaded version with every AIM update.
I then replace the ActiveMQ_x-x-x.jar, Tools.jar & mysql_connector to the latest version (Tomcat/lib).

With AIM 5.9 I ran the latest 64bit Tomcat 6.x.x & now the same thing with AIM 6.
64bit should allocate more memory to itself.

So I'm running whatever Tomcat is in the package along with 64 bit JDK. Do you think that Tomcat is my weak link?

It makes sense, packaged version is 32bit (and not even the latest one). It's probably buggy.
Installation steps in the link above.

If so, do you have procedures and caveats for doing so?

No caveats.

If your java system processes are are not as high as the settings in AIM then there must be something else at play. How many concurrent users does it take to die ?

It's hard to tell - I think between 50 and 100.

Maybe the problem lies within MySql settings. Have you looked at the MySql settings for Concurrent users and transactional type setup.

I am no expert on MySql, but I would get a Mysql expert to view your setup so it rules out the DB before trying to scale out. Its always good to scale up and then scale out

I don't suspect the DB server at all - it's practically snoozing... When the app dies, I just restart the control panel and all is well.

What is the memory usage of your MySql server ?

It's on a separate machine currently using 630K. The server machine has about 1GB memory overhead.

that seems very low, I am running MySql configured for 100 users and its up at 1.5GB - throw some more ram at it and see if the problem goes away

You continue to focus on the MySQL server. Buy service is restored after restarting the control panel and the AwareIM stack. I don't follow the logic.

My logic, the database is number 3 on the list of priorities. If the two java processes are running well within the memory allocation then the DB needs to be looked at.

When you reset the control panel it resets the connection to the DB and all is well until the concurrent threshold is reached again

MySql has no explicit memory setting. It sets up a memory ceiling based on its configuration, so if its not configured correctly it wont use available memory and will run very inefficiently

The amount of users you have and the size of memory thats being utilised by the DB seems odd to me unless your app is very small. Hence the focus on the DB

Kklosson,
If you figure this out and find potential bottlenecks or configuration misses in the default Aware installation please share a comment or two.

One thing I have thought of looking into is more advanced server monitoring tools that could monitor server (CPU, memory), MySQL and Java etc. processes for a server. There exists tools that actually monitor all of these in 1 tool and then one can see performance ETC. over time. If you or anyone else either looks into this or already know of tools we could use please share the knowledge.

The most important thing is to collect the output of Tomcat and the Aware IM server AS SOON AS the system dies. The post-mortem analysis depends on what's in this output. The rest is pure speculation and guessing in the dark.

By the way, one of the latest builds of 6.0 introduced more robust handling of the situations when Tomcat lost connection with the server for whatever reason. If this happened the entire system used to go down. Now it doesn't - it just reconnects and carries on. It's quite possible that you are experiencing the same issue and update to the latest version should fix the problem.

Good news. I'm about to update the production system to 6.0 and I'll report back on this thread.