CF911 - High CPU in ColdFusion? Some common but perhaps unexpected causes

Posted At : June 24, 2014 12:15 PM | Posted By : Charlie Arehart
Related Categories: admin, cf911, fusionreactor, seefusion, railo, monitoring, troubleshooting, CF Server Monitor, lucee
Comments: (37)

Note: This blog post is from 2014. Some content may be outdated--though not necessarily. Same with links and subsequent comments from myself or others. Corrections are welcome, in the comments. And I may revise the content as necessary.

I often help people who are reporting that CF is "running hot on the CPU", maybe reaching 80 or even 100% of the CPU, whether in spikes or for extended periods. What might you propose people look at, when you've heard that? I've heard all kinds of things over the years, often focused on coding, or perhaps jvm tuning.

But as is often the case in a lot of the CF server troubleshooting consulting I do, I find the causes to be far less often what most people seem to suspect. So what would I look for when someone reported high CPU in ColdFusion (or Lucee or Railo )? Read on.

(BTW, from here out I will just mention CF for the sake of convenience, but what I offer applies just as readily to Lucee as CF, except for one variation below.)

Here are several things to consider as perhaps-unexpected causes of high CPU in ColdFusion, which I elaborate on below:

Make sure it's really CF using the high CPU
Running out of heap/memory within CF/excessive garbage collection
Traffic of an unexpected volume or nature
Excessive Client Variable processing (don't skip this!)
Yes, possibly expensive code in your app
Possibly rogue code from a hacker
CFML image processing and its default "interpolation"
New: Consider entropy issues on Linux
New: Consider VM issues
New: Consider possible monitoring/profiling configuration issues

Make sure it's really CF using the high CPU

Let me say first, of course, that it's vital for folks reporting this problem to be really clear that the problem is indeed an issue of high CPU within CF itself. I've had more than a few occasions where people said the CPU was high, and it turned out it was NOT CF after all. It was something else on the box, but they assumed that "all the box did was CF", so they didn't bother to confirm if it really was CF.

I've seen virus scanners, security scanners, backup tools, and all kinds of other things steal CPU. (Of course, a database running on the same box as CF could be a killer of CPU, but nobody does that these days, right? Trust me, people do, all the time!)

That said, there is one possible cause of high CPU that COULD be something related to CF but running high CPU in some OTHER process, but because of your having been hacked. More on that later.

So what could cause high CPU? A number number of things, first and foremost...

Running out of heap/memory within CF/excessive garbage collection

This may not seem obvious: why would high memory use contribute to CPU problems? But CPU use can indeed be very high when the heap within CF reaches its upper limit (your maxheap size in the CF Admin) and the JVM (underneath CF) is thrashing about, doing excessive garbage collection (especially "oldgen" GCs) and perhaps even taking heap dumps, as it suffers from outofmemory ("oom") errors.

If you have FusionReactor or any sort of JVM monitor, you may be able to see it identify how many GCs are taking place, as well as how long they're taking. Beware that most modern GCs split memory and may do frequent "minor" GCs: We'd be concerned if there were frequent MAJOR GCs (what might be identified as "old gen" GCs). If there are many within a minute or minutes, that would be trouble. And it's usually caused by the heap running out. Of course the next challenge is to figure out why... but that's a topic for another post.

Now, do you HAVE to have some Java monitor to see if this problem of high heap use is happening? Not really. And in fact, by the time any monitor finds that there is an OOM condition, it will often be too late to do anything about it except restart CF. But you can watch CF's logs, either while the problem is happening or once you recover/restart.

To find this sort of evidence, look in the ColdFusion-out.log and/or ColdFusion-error.log (in the [cf11/10]\cfusion\logs folder, or the [cf11/10]\[instance]\logs folder if using an instance in CF 10/11 Enterprise, or the [cf9]\runtime\logs folder, or [jrun4]\logs, or if on*nix, see the cfserver.log in the CF logs folder). You want to find the log for the instance in question, and you want to focus especially on any errors in the log BEFORE CF crashes or you restart it.

Here's a trick: do a search for the phrase "ColdFusion Started". That's among the last lines CF shows as it's coming up. (If you need to search across many of the logs, there are OS tools to search files, but on Windows I recommend the nifty free tool, File Locator Lite, which I've written about before.)

Then having found that spot in the log where CF is starting up (after a crash), look at the lines BEFORE CF started to come up--it may be a page or pages above that line, to see if you find any "outofmemory" error lines in either of those two logs, in the seconds or minutes (or longer) before CF crashed or was restarted. (You can also look for *.pid logs in the same directory as the jvm.config file. These are hotspot crash logs and often are related to memory problems.)

So why might memory be high in the first place? Well that's really a subject for an entirely different blog entry.

I'll share at least one thing to consider: you could also have high memory/CF heap use due to your (your application's) use of CF query caching, CF's enhanced eh-caching features as added in CF 9, to name just a couple of things. And note that in CF10, each of these is not stored per application, whereas previously they were cached instance-wide.

Traffic of an unexpected volume or nature

Another surprising reason for high traffic is when folks discover unexpected traffic coming into their CF instance (whether in its volume or its nature).

That traffic could be from spider or bots (including search engines, and as an update in 2026, AI spiders).

At a minimum the unexpected traffic may cause more load than you ever dreamed your site would have. (And some people are misled by watching "google analytics" and such, whose counts specifically would NOT include requests from spiders/bots and other clients that don't run the javascript of the GA code.)

Another unexpected consequence of this traffic ties back to the memory discussion above: you could find you have exorbitantly high session counts (because ever hit from a spider or bot acts like it's never been to your server before and does not present any session cookies), which can contribute to creating a large count of sessions and thus high memory use, especially the longer your session timeout is, and the more things you might store in your sessions. I blogged more on how to find a current count of sessions within CF (without any code changes) here: Tracking number of CF sessions per application easily, and why you should care.

Excessive Client Variable processing (don't skip this!)

Moving on to the second most common reason I see for high CPU in CF, I'd suspect excessive client variable processing, especially during the frequently recurring "purge" process, which by default happens in ColdFusion every 67 minutes (after the previous one), as CF purges long-unused entries from any of the repositories listed in the CF Admin Client Variables page (other than cookies, so meaning the registry and any listed datasources).

Now, before you skip this thinking, "we don't use client variables", think again. Or better, take my word for it and do the following check to find out if it's impacting you. I have helped MANY folks who SWORE they did not use client variables, only to find out not only that they did use them (unknowingly) but that it was KILLING their server.

Whether this purge processing is a problem at all can be viewed also in that ColdFusion-out.log. Look for a line saying "Run Client Storage Purge", and note the time. It should happen about every 67 minutes (or whatever the CF admin client variable page's purge interval may have been changed to, from its default of 1 hour and 7 minutes). If each purge log line starts exactly 1:07:00 after the previous one, then client variable purging is not a problem for you. But if there is a deviation of any more than 67 minutes and 0 seconds from one purge to the next, then the higher the difference the greater the impact of purging.

Of course, the purging could be in the database, which could cause a negative impact but not high CPU.

But if you are storing client variables in the registry, that's where you WOULD see high CPU (and other performance impact) due to the purging of that, especially so often, and this is indeed a frequent cause of high CPU. (And besides the CF Admin setting for the default client storage, note that your own code can set its own preference for client variable storage with the clientstorage attribute or cfapplication, or the this.clientstorage property in application.cfc.)

Worse, if you are on *nix, and you have CF set to store client variables in the "registry", CF will NOT store client variables in the registry (because of course there is none on *nix), but instead CF will store (and update0 them (all of them) in a single FILE (horrors) called cf.registry, stored in a "registry" folder in your coldfusion install directory. (And in fact, in Lucee, even if you set it to use "registry" in the Admin or code, Lucee will also itself still store client variables in the file system instead.) All this could be just as bad as storing in the registry, if you are using client variables a lot.

And that's the point I was making above: you may never in your code "use" client variables (by setting a variable like client.name="bob"), but that DOES NOT MATTER. If you enable clientmanagement (in cfapplication or as a property of application.cfc), then you WILL cause CF/Lucee to be using (reading and writing) client variables on EVERY page request (unless you use the option to "disable global variable updates" in your the Admin setting for each repository).

But you can't just go turning that off, nor can you just raise the purge interval, or change the default repository from one value to another, as any of those could impact possible legitimate use of client variables, if any. Sadly, it's a rather complicated subject to understand and resolve (which is why you don't hear about it often), but I have talked about it to a considerable degree here: Suffering CPU, DB, or memory problems in CF? Spiders could be killing you in ways you'd never dream.

And you can see how this topic of client variables is closely tied to the memory problem above, where I said that high memory due to large numbers of sessions can also be due to spiders and bots.

Yes, possibly expensive code in your app

In my experience, those two above (CF running out of memory, or client variable purge processing) are far more common explanations of high CPU usage in CF than anything else, but we shouldn't rule out the possibility that it's simply due to some rogue CFML code. I really doubt it would ever be about any one tag or function, or the choice between one and another, but of course if you had an infinite loop (by mistake, of course), or perhaps even some tight loop over some large number of items that was then doing some cpu-intensive activity, then we could expect to see high CPU in CF.

So how would you find this happening? With any of the traditional CF monitors, like the CF Enterprise Server Monitor, or FusionReactor or SeeFusion. These could help you readily spot a long-running task as that's one of their primary tasks. So sure, if you saw some one (or a few) long-running requests and CPU was high, you may want to suspect them as possible causes and use one of these tools to investigate. And FusionReactor in particular will show you the CPU thread time used for any request, both while it's running and in its request history details and request log.

And as for finding WHAT line of code may be running at a point in time for a current request is on, that's where the "stack trace" feature (available in all 3 monitors but especially helpful in FR) comes in, whether obtained in the interface or generated automatically as an alert. I've blogged on stack tracing in the past. See CF911: Easier thread dumps and stack traces in CF: how and why, as well as a presentation I did at cfobjective in 2010 and recorded later on the CFMeetup: CF911: Stack Tracing CFML Requests to Solve Problems.

On the other hand, sometimes you may have a problem of high CPU within a particular running request, but caused by some code you might never have suspected...

Possibly rogue code from a hacker

Even if it's not"your" code causing problem, another possible explanation for high CPU could be rogue code that a hacker may have been placed on your server.

You may have heard about the bitcoin mining exploit that has hit some CF servers, for instance. When this sort of rogue app gets on your server, you may not see high CPU in CF itself but rather in a process implemented by the hack. The point is that you should definitely keep in mind the possibility that high CPU "on the box" may well be connected to CF even if not showing up as "in" CF.

You'll want to be sure to have applied all needed CF security hotfixes and protections, as discussed in the ColdFusion lockdown guide. For more on the lockdown guide, and other CF security resources, see #ColdFusion Lockdown/Security guides: there are several, and some you may have missed. And for more on applying hotfixes effectively and carefully, see Applying hotfixes to #ColdFusion 9 and earlier? A guide to getting it right.

CFML image processing and its default "interpolation"

Related to the above, here is something you could do in code with no consideration at all that there could be a CPU impact: I've talked before about performance problems due to the default "interpolation" setting for CFML image processing. Check out Could CF image processing be killing your #ColdFusion server? Explanation and solutions, which also discusses how to address the issue.

New: Consider entropy issues on Linux

This is an update since I first posted this entry: consider also that if you're on Linux, the cpu problem could be due to an issue with configuration regarding "entropy". For a bit more, see my comment below when I first added it to this post.

New: Consider VM issues

This is another update since I first posted this entry: consider that if you're CF instance is running on a VM (especially one where the hypervisor is managed within your org as opposed to "in the cloud"), consider that the VM host (hypervisor) may well be over-allocated. Some managers of such VM environments could create too many VMs (for a given host and its resources).

Also, they may have configured the VM (the instance running CF) to have "dynamically allocated" resources like CPU or memory, such that when it gets stressed it tries to obtain that level of resources--only to find that it's not available or needs to be taken from other vms. (This could also be how CF could suffer because ANOTHER VM is "stealing" resources from CF.)

New: Consider possible monitoring/profiling configuration issues

This is another update since I first posted this entry: consider also that if you're CF instance is being monitored by any kind of tool that might "profile" each request (or the entire jvm), that could add CPU time to each request. For those on CF20168or earlier, you could have enabled the ?CF ?S?erver Monitor's various "start" features. For those on CF2018 or later, you may have enabled the profiling features in the CF PMT.

Those on any CF (or Lucee) instance might be using FusionReactor and could have tweaked how it profiles requests such that it's causing impact. By default it doesn't profile requests until they are at least 3 seconds old, and for no more than 60 seconds (with a 200 millisecond sample rate), all of which have proven over years to have little negative consequence. But if someone changed those (under Profiler>Settings), that may be an issue.

Conclusion

Hope some of that helps you find and resolve your CF or Lucee CPU problems. If not, and you think you could use a hand, again this is the kind of troubleshooting I do with people every day. Let me know if I can help, whether directly (remote, no minimum, satisfaction guaranteed) or perhaps with a quick question here in the blog entry. I welcome feedback as well, including if you think I may have missed another key cause of high CF CPU.

For more content like this from Charlie Arehart:

Signup to get his blog posts by email:

Follow his blog RSS feed

View the rest of his blog posts

View his blog posts on the Adobe CF portal

Need more help with problems?

If you may prefer direct help, rather than digging around here/elsewhere or via comments, he can help via his online consulting services

See that page for more on how he can help a) over the web, safely and securely, b) usually very quickly, c) teaching you along the way, and d) with satisfaction guaranteed

Comments (37)

Comments

[Add Comment]

I put one more common reason on the table. The Adobe / DataDirect database driver for MsSQL has serious bugs. See http://www.hass.de/c...

# Posted By Marc B | 8/25/14 9:59 AM

@Marc, thanks for trying to be helpful, but I must say I have some doubts about the veracity of the assertion on that blog. Sadly, they offer no means to make comments, so I'll share my thoughts here.

First, I would note that I have seen CF10 and 11 used in production by several dozen shops running SQL Server who have not seen that memory profile at all. If this was such an egregious bug, we'd expect to have heard many people complain of it, but I've not heard one until you point this out.

Second, I appreciate that they say the eventually run out of memory, but I'd really like if in their tests they confirmed that a GC done when the memory was high did not recover memory. I see many people claim a memory leak when it's just the JVM using memory and being lazy about GCing it (until it absolutely has to).

In fact, that graph shows to be covering 2 days, with each entry at the bottom covering a 3-hour span (since there are 8 per day). If memory is raising to 90+% and staying there for upwards of 3 hours, without crashing, I'd guess there's no memory problem but rather just the JVM being lazy about doing a major GC.

Of course, I'm just speculating, too. I appreciate that the blogger has shared lots of details to help people use the Java MAT tool to '"demonstrate" the problem. I've just seen too many cases in the past where such heap dump analysis has been misconstrued to see a problem that was not there.

Sometimes it's that people find objects in the heap that are simply no longer in use and can be GCed (I'm not 100% positive whether the MAT tool hides such no longer used objects). Other times it's that people claim a leak when in fact it turns out that something they are (or have CF on their behalf) doing is what's "holding memory", which then isn't necessarily a leak.

For instance, I find it curious that he also claims there's a memory leak in web services in cf10, which he sees disappear if he clears the template cache. (http://www.hass.de/c...). I'd argue that if clearing the template cache makes the held memory now able to be GC'ed, then that's not a "leak". The question instead is why whatever he's doing causes creation of so many templates in the template cache, whose memory is released when they are cleared.

Again, I'd argue that's not a "leak" (because technically a leak usually can't be "cleared", because objects are being created in an unexpected manner that can't be GCed. Since these can be cleared, it seems less a "leak" but could well be a bug.).

this could be a semantic distinction. He and others may say "if CF holds memory unexpectedly, I'm going to call it a leak".

I just would like to see more info and clarification, myself, before I'd agree to the conclusion. Again, sadly, there's no way to reply on the blog with comments (or if I'm missing the means, I welcome suggestions). And while it could get cumbersome to have an extended discussion with the blog author here, if you know him and can reach him I'd welcome his comments.

I realize I could be sounding condescending here. I'm just saying that for many years I have helped people look at what were asserted to be "memory leaks" in CF and 9 times out of 10 it was not a leak but something with an explanation, or that was being misconstrued. But if further clarification demonstrates the issue I could be persuaded. :-)

As always, just trying to help.

# Posted By charlie arehart | 8/25/14 11:13 PM

Charlie, we are trying to nail out these bugs for about 9 months!!! Adobe Support is a mess. Never seen such a bad support. Currently DataDirect is debugging this SQL driver bugs. We are running everything with cfqueryparam and a lot of large and dynamic requests, tons of CFCs and so on. Very many applications. We simply replaced the database driver and memory usage is fine after 2 weeks run time, no more than 2 GB. No line of CFM code has changed nor any JVM server setting has changed! Zero!

The machine goes out of memory. It runs against the max memory heap wall and code starts failing. After several hours it goes clashes closer to the wall of death and the CF *dies* and gets restarted by monitoring as it no longer respongs. It never recovers from the max heap memory limit with GC'ing. I suspect this is not related at all to JVM. Always keep in mind - replace the JDBC driver and the issue is gone. We heard the same from a friend that also used these drivers with SQL2008R2 that shows the same picture like our SQL2012.

The SOAP webservice bug was also reproducible on our side and after 9 months of instability we simple moved to JSON, we have not changed any logic code, only the output has been changed and after a week of runtime with JSON output only we have ~600MB memory usage on these instance. If I think about the out of the box broken CF10 axis server XML file I get more pickax. We have read about the same nightmare on Ben Nadels and other websites. They all recommend not creating SOAP webservices with CF. We can only confirm this. I'm not talking about requests to webservices. This is about running a webservice for your customers.

I have no clue why the template cache was filled up, but Adobe found that they loose track of the webservice instantiating and start it again and again in parallel. If Adobe and we have not been able to find why the template cache fills up. The SOAP application uses very complex objects, structs and array and structs in structs deeply nested. This is all well running code. The application itself is only ~3MB CFM code in size. I have no idea how 3 MB CFM code can fill a trusted cache of 6GB if this is not a bug in Axis/CF10. The CF integration in Axis is very buggy.

Feel free to email me directly, I guess you see my email.

# Posted By A Hass | 9/1/14 11:28 AM

@A Has, thanks for replying. So it's definitely an interesting set of problems you've found. I'd say there must be something rather unique about your situation, though, because this is not a problem being reported by all users of CF10 and 11. Again, I've not even seen it once before, and all I do all day is help troubleshoot issues with CF (though granted, people are just as often using CF9 or earlier as CF10 or later). But still, I'd think if this was such a severe issue for more people we'd all know about it.

So again, that's not to dismiss your experience but rather to propose that it may help you, Adobe, or DataDirect to perhaps find what is unique about your setup that stresses the driver (or the way CF loads and uses it) that differs so much when you change out to another driver.

I can't think of anything more I'd add, in terms of further follow-up with you, but good on ya' for identifying the problem, and a workaround, and I hope you may ultimately get resolution from DataDirect (in which case it seems unfortunate to class it as if it's a CF or Adobe bug. They may be the victim here. Only time will tell).

If you do find and post a solution (to that or the web services issue), I certainly wouldn't mind if you or @Marc were to post a follow-up here for the benefit of readers.

That said, the discussion is quite a bit far afield of my original post here (possible reasons for CPU problems in CF), so I'd hope any further discussion of it (including by others interested in it) could take place on your blog. To that end, do you offer a means to comment? If so, how does one do it?

# Posted By charlie arehart | 9/1/14 6:06 PM

@hass and @charlie,

I'd be interested in knowing specifically what conditions he's referring to. We use the DD drivers everywhere inlcuding on very high volume clusters with very complex code. We've never seen an issue like he is describing. Perhaps there is a specific setting (in the cf admin of on the DB server itself maybe?) that is causing the issue. I think it would be valuable to know for sure.

# Posted By Mark Kruger | 10/14/14 12:22 PM

I'll throw my 2 cents in here. I've been working on this same issue for many weeks since upgrading to ColdFusion 11. We're seeing the same memory heap problems where garbage collection can't keep up and eventually we have to restart the ColdFusion service. It's crashed so hard it takes roughly 4-5 minutes for it to stop. That's how bad it is.

Just recently I found that blog post mentioned by Marc B on the first comment. I also found new JVM arguments over at CF Whisperer by Mike Brunt.

http://www.hass.de/c...

http://www.cfwhisper...

I have implemented the JVM changes 2 days ago and I implemented the new Microsoft database drive (version 4.1) about 16 hours ago.

Let me tell you the difference is massive. The JVM arguments helped a lot but heap was still very high after 24 hours. After implementing the new Microsoft database drive the heap has not crossed 2-1/2 GB. I am very impressed so far with this change.

This is a very large system doing (15-20 websites sharing one datasource) roughly 30 million queries in any 24 hour span so I'm thinking that's the link between the Adobe direct driver performing poorly in a large system like this. Only time will tell but this new driver is looking really good right now. I am VERY surprised this memory leak is not all over the Adobe community.

# Posted By Dave Cordes | 12/7/14 10:39 PM

So Dave, please do be clear: was the benefit from the new driver or the changed JVM args? And if the latter, which ones? If you did both, it would seem important to know how things went WITHOUT the jvm changes and only the changed driver. That would tell us if the problem was indeed the driver.

BTW, clarify for folk exactly what version of CF this is, because that affects the driver. BTW, it's not an Adobe driver but rather a DataDirect one that they license. (Maybe that's what you meant by "Adobe direct".) Older version of CF had older drivers, and some updates to later versions of CF do update the drivers (that said, CF10 Update 14 updated only the Postgres driver, but CF11 Update 3 says it adds support for SQL Server 2014 as well as updates for Postgres, Sybase, and DB2.)

Anyway, glad to hear that you feel you've gotten to the bottom of your challenges. As for why others may not experience them, it could be that there's an environmental difference unique to you (not solely related to load but also perhaps CF configuration, sql coding, db configuration, and more).

If you do learn more, I'm sure everyone reading along would want to hear it. (Then again, the blog entry was on high CPU. You refer to heap filling, but I'll assume that also led to high CPU as a side-effect. But by "this same issue" do you mean that raised by "A Hass"? If so, ok. Just clarifying.)

And note that Mark Kruger had replied to him as I had saying that he'd not seen this as a general issue, even in very high load, and he too wondered what may be unique about his setup to have caused his problem. We'd feel the same about yours if you may ever learn more.

Until then (or in case you may never get to it), I do thank you for sharing your experience and your workaround. Hope it may help others.

BTW, I'll assume that you were not finding queries to be slow, but rather just that you feel that the drivers had a leak causing high memory usage. And to that I'll note that I've been meaning to create a blog entry just like this one but about "common but perhaps unexpected causes for high memory". Several of them are NOT (not, not) about memory leaks, but rather simply things holding memory beyond the life of a request that many never seem to consider (like the session, application, and server scopes, as well as query and ehcache caching). But I'also d acknowledge in conclusion that yes sometimes there ARE true leaks--not expected, and hopefully fixed by updates, whether from Adobe or 3rd parties as you found. But I never presume the first explanation to be a leak.

Only when all other more common possibilities have been considered first would I then consider a true leak as a possible explanation. It's just not how most approach such problems, so they may miss the far more common real explanation for high memory use. More in the later blog entry! :) (Could be a while, as I have some pressures that may preclude it for now.)

Hope that's helpful, Dave and others.

# Posted By charlie arehart | 12/8/14 12:45 PM

Charlie,

The benefit was from changing the database driver. I did see some improvement with the new JVM arguments I applied but memory still climbed a lot like before they were applied.

This is ColdFusion 11 Enterprise with update 2 applied on Windows Server 2008 with a SQL Server 2008 database.

I believe this is the answer I was looking for but only time will tell. It's only been 30 hours since I applied the database driver update so I'll need more time to assess. But so far this is looking really good.

I reached out to Adobe and it looks like they want to work with me so they can send my findings to the DataDirect people. I'll report back if we can come to any sort of conclusion.

# Posted By Dave Cordes | 12/8/14 1:26 PM

@Dave, cool and thanks for the update. Glad to hear that clarification. So I'd be curious what improvement you may ultimately discern came from the JVM tweaks, especially since you say memory is still high.

And I'll say again (as I have elsewhere and did in previous conversations with you), for you as well as readers here: high memory (high heap use in CF) is not itself bad. If either it falls on a GC (in the CF Server Monitor, FusionReactor, or SeeFusion), or it stays high but you see no impact, then it's interesting but not a problem.

High heap is a problem when either you get outofmemory errors or experience slowness you can't otherwise explain (which may be related to thie high heap, but may not).

And to that point, I'd wonder: what was the "problem" that was "solved" by the change in the driver? Was it that requests were slow? And did FR (or the other tools) show you that queries were slow? or was it that they were slow but NOT due to queries?

If it was just overall slowness that couldn't be explained, and especially if there was high CPU, then as I noted in the blog entry above, it COULD have been that there was high heap and problems with it not being able to recover unused memory by even forced GCs, and that COULD contribute to high CPU. But there could be other explanations for high CPU.

But back to your case, I'm still wanting to be clear (for readers) what your issue was (which you said was "the same issue" in your first comment).

Again, not hassling you. Just trying to help readers who may see this whether today or months from now.

As you may recall, I don't like to contribute to people going off and "trying things". I want to help them know a) what the problem is, b) what to change, and c) if the problem is not gone, how to watch diagnostically for what may be amiss, so that the real problem can then be solved. :-)

But anyway, thanks again for sharing that whatever your problem was it was fixed by a change of the driver. That can happen sometimes, and yes I'd be as interested as Adobe to know what was amiss and what got corrected in the updated driver. :-)

Until then, for my readers, take note of Dave's experience. Forewarned is forearmed. :-)

# Posted By charlie arehart | 12/8/14 1:44 PM

Charlie,

It was the high heap that we couldn't explain. Neither could you when you helped me last week. So I have been playing whack a mole here for the last few days and over the weekend. That's when I came across the blog post explaining that there could be a memory leak in the DataDirect driver. That's when I decided to try it. Since it made a big difference so far I would categorize it as solved but I'll need more time to make absolutely sure.

So the problem was high heap due to a memory leak in the DataDirect driver which was solved when it was replaced with the Microsoft driver. That's basically it in a nutshell.

# Posted By Dave Cordes | 12/8/14 2:12 PM

@Dave, ok, but I'll say again: high heap (in and of itself) is not necessarily a problem. And you've even said that after these changes, the heap is still high, right? So that's not "the issue", right? :-)

So please do clarify (for readers, not relying on out previous conversations, please) what the nature of the problem was that "was there" and is "now gone". I'll assume it was poor performance of pages, but I've listed some specifics to help guide you offer a quick answer. you only started with "the same issue" but did not clarify. Again, high heap itself is not a problem, if it doesn't lead to oom errors or poor performance.

Since you're seeing high heap now, I gather that some poor performance has stopped. Any details will really help future readers know better if your solution would suit them. That's my goal here. Not meaning to badger you. :-)

# Posted By charlie arehart | 12/8/14 2:29 PM

I agree that high heap is not a problem but this was a case where the heap was high and garbage collection was not bringing it back down. If you look at the screenshots on the following link you can get an idea what was happening.

http://www.hass.de/c...

# Posted By Dave Cordes | 12/8/14 2:32 PM

@Dave, ok, but I'm going to ask (only) one more time. :-) What (other than high heap) was the "problem"? Slow requests? High CPU? You never say, and all he says in his first comment above is that "After several hours it goes clashes closer to the wall of death and the CF *dies* and gets restarted by monitoring as it no longer respongs" (sic).

Is that what you experienced? And for both of you, what else happens besides "cf *dies*", and what does that mean? It crashes? it stops responding? Do you view running requests with a monitor and see many piling up? what are they doing (when you stack trace them), etc.?

I just am seeking clarification on behalf of readers to know how to connect the dots (other than high heap) between your solution and whatever your "problem" was. I don't want people thinking "oh my gosh I need to update my db driver" without knowing clearly how that related to a problem (they may or may not have).

Thanks. As always, just trying to help.

# Posted By charlie arehart | 12/8/14 3:06 PM

It was high heap combined with high CPU. Once the JVM couldn't do any more garbage collecting ColdFusion requests would start slowing down and queries would begin to timeout.

# Posted By Dave Cordes | 12/8/14 3:13 PM

@Dave, ok. Thanks for the clarification. :-) Hope it helps others reading along in the future.

# Posted By charlie arehart | 12/8/14 3:43 PM

@Dave Cordes: I'm happy that the MS driver helped you and you see the same issue. May I ask what setting you are using for "Max Pooled Statements"? I used 1000 in past and reduced to 100 (default). I'm not yet sure if this may be the source as we have not investigated further yet.

# Posted By A Hass | 12/9/14 12:19 PM

@A Hass - It was set at 100. I didn't ever change it on the old datasource.

# Posted By Dave Cordes | 12/9/14 12:21 PM

Ok. I thought there was a change by this max pooled statement reduction, but this is a confirmation that this is not the root cause.

Unicode and clob enabled? We have...

# Posted By A Hass | 12/9/14 12:36 PM

String Format, CLOB and BLOB are all unchecked so they are not enabled.

# Posted By Dave Cordes | 12/9/14 12:43 PM

@A, I'm not surprised myself to hear that changing that made no difference. I've never myself seen it matter (but I recall at least one case where someone felt confident/proved for himself that changing it did help).

Of course, I do understand you're just trying to help Dave find out what may be unique in his environment/setup/code/load that may make his change of the driver help so much. We shall see. :-)

# Posted By charlie arehart | 12/9/14 12:44 PM

@Dave: That makes clear that the issue is not settings related. Can you compare your settings with the one I listed in http://www.hass.de/c... and dcoument your differences? I have not changed some advanced settings, but they may be unrelated.

DataDirect said the Spy trace that I provided list abour 1850 prepared stmts with resultsetType=java.sql.ResultSet.TYPE_SCROLL_INSENSITIVE. These preparedStatements execute 2 statements :
- select data from CGLOBAL where cfid = ?
- select cfid,app,data from CDATA where cfid = ? and app = ?

This are client variables. I have no clue why there are 1850 prepared statements as there should be only two from my point of view. Otherwise a prepared statement makes no sense.

Are you appplications using client variables for real load balancing? I need to follow up on this again with adobe as I have not received and answer.

# Posted By A Hass | 12/10/14 4:38 AM

I mean - I have changed some advanced settings.

What Updater are you running?

# Posted By A Hass | 12/10/14 5:00 AM

Hi Alex,

I am running ColdFusion 11, update 2 at the moment. My database settings look exactly the same as yours after adding the datasource for the new Microsoft driver. I am not using client variables at all on my server.

# Posted By Dave Cordes | 12/10/14 10:30 AM

I hoped we might find something that may bring us nearer to the possible root cause, but this sounds more and more like we are both running the same issues with totally diffent applications on cf default and also with some advanced settings. Sounds like we cannot identify the real cause ourself. Is Anit from Adobe working on your issue or any other? We could try pushing Adobe together... DataDirect support has not really shed any new light on the issue either. I feel like lost...

# Posted By A Hass | 12/10/14 4:19 PM

Alex,

Anit called me a couple days ago and said he was contacting DataDirect to see how they wanted to proceed. I have not heard back from him yet. I really feel like Adobe should follow through with this but that's just my opinion.

# Posted By Dave Cordes | 12/10/14 4:36 PM

@Alex, on your observation of the client variable queries, why would you think you'd see only 2? Each query will be (to CF) a "prepared statement" but which will pass in different values (for your users' different cfid/cftoken values). You'd be right that *in the database* there should theoretically be only prepared statement CREATED and executed over and over (with those different passed in values).

The spy log tracks what leaves CF to go to the db, not what executes *inside* the DB, technically. Hope that's helpful.

Also, Alex, that high rate of calls to the cglobals and cdata tables may well indicate an opportunity for improvement for you (and an explanation of possible performance problems). If you want to find out more on your own, check out http://www.carehart.... See the discussion of client variables and the "global client variable updates" (in two places there). If you'd like help investigating/remediating things, I can help in consulting session (which should be brief).

But this is all getting well beyond the scope of this blog entry, as is the back and forth on the issue you guys are having. Can I recommend you consider opening either a bug entry (bugbase.adobe.com) or a forum thread on the Adobe CF forums, and then perhaps a still-wider audience (possibly to include more Adobe folks than see this) could both see it and also contribute?

If one of you does it, feel free to point a link here to it. The salient point of your past discussion, with respect to this blog entry, is that you each feel you had a memory leak caused by the built-in CF driver for SQL Server, and changing to the MS-provided one solved it. That suffices for what needs to be said here.

Of course, if you do ultimately resolve things and think to come back here and share that observation, I'd welcome it. So I'm not saying "get off my lawn" (not at all) but rather that I think you'll get more value holding your protest in front of town hall rather than "on my lawn". :-)

(And Alex, if you have a response about the client var issue, I'd welcome that since I responded to your having brought it up. As always, just trying to help.)

# Posted By charlie arehart | 12/10/14 5:33 PM

@Charlie: If I'm not totally wrong a SQL statement is prepared (columns validated) and cached for further reuse. If a second request comes in the statement is no longer prepared and is just reused. There is no need to prepare 2000 sql requests as preparement makes no sense than. You may have seen how this works if you change the database column type, but not the CF script that is already active. In such a case you run into SQL errors as the prepared statement is still executed without verifying the database again. I therefore think there must be only one prepared statement of the same SQL statement cached or something is wrong in code. Values are not cached... they are only bound to the "?" placeholders in the cached and prepared statement.

> You'd be right that *in the database* there should theoretically be only prepared > statement CREATED and executed over and over (with those different passed in > values).
It is also in CF... try changing the column type and you see that a CF script starts failing.

>The spy log tracks what leaves CF to go to the db, not what executes *inside* the DB, >technically. Hope that's helpful.
Yes, but the memory issues we see are in Java memory... not in SQL memory. I should share a heap dump with you if you are interested to take a look. Maybe you get an idea... but note - data direct has no idea yet :-(

>"global client variable updates"
We have disabled the global updates. Timestamps are not updated with every save as I know, data is read on page start, un-wddx'ed and wddx'ed onRequestEnd and written back to the database. That is the best you can do. We do not access every single variable at it's own. Client variable cleanup job is disabled in CF and made as a manual job on SQL server level once a day. We are not that crazy to use the slow build in stuff. :-)

# Posted By A Hass | 12/11/14 4:17 AM

@Alex,

I think you are describing in your first paragraph is the process of SQL server execution planning and caching. SQL caches the execution plan and compares the incoming statement against plans in the cache - using a cached plan (a cache "hit") if one exists. Cache hits are much more likely when using CFQUERYPARAM because the binding makes variable passed irrelevant to the plan whereas passing the variable without the binding makes it "look" more like a constant to the SQL planning engine. It's also why passing column names is crucial rather than just an asterisk - so it can validate columns.

Having said that I'm confused - doesn't the statement still need to be prepared by the DRIVER before it is forwarded to the SQL engine? I thought there was a prep/execute process happening on the client side - no?

@charlie, I realize this is straying, but it seems like we started with CF CPU usage (the "client side" of this equation).

# Posted By Mark Kruger | 12/11/14 9:34 AM

@Mark: How do you explain that a cache clear in ColdFusion administrator clears the prepared statements if these are not in CF?

# Posted By A Hass | 12/11/14 10:47 AM

Using SQL 2008 workgroup which uses 2 CPU's on a VM it would be nice to restrict CF from using CPU 0 & CPU 1 which often peg, and then add processors for CF to utilize.

# Posted By Craig Baker | 5/28/15 10:31 AM

Back to this post (from 2014) about possible reasons for high CPU, I'll add another: if you're in a VM environment, and the VM host is over-allocated, that could cause what seems to be CPU problems in CF.

Over-allocation means that a finite VM host resource (memory, CPU, etc.) is split among multiple VMs such that the sum of the allocated resource (let's say CPU) across the VM's is greater than the max amount available in the host.

Or it may be that the resource is dynamically allocated by the VMs, and more than one is contending for that available resource, and so the VM is "slow" to get what it's been told it could.

It may seem a stretch to some readers, but I have seen it and had it reported as the solution for some folks. Hope it's helpful.

Anyone found new reasons for high CPU, or know of ones not listed above? I'm sure I may have others that I am just not thinking of now, and didn't think of in 2014. Glad to keep adding to this list.

# Posted By Charlie Arehart | 5/17/16 4:21 PM

The things I've done that seem to have helped the most are
1. Assign SQL Server to CPUs 3 & 4
2. Re-Tune the SQL Indexes
3. Run CFX_imageCR instead of CFIMAGE for photo uploads
4. Deliver static content whenever possible

# Posted By craig baker | 5/17/16 4:33 PM

"67 Seconds" or "67 Minutes"?
You wrote both. Which one is it?

# Posted By Alan Holden | 6/28/17 2:59 PM

OK, Alan, thanks and I have corrected it.

FWIW, I did indeed say "67 minutes" twice, and also said "1:07:00 after the previous one". And even if there was remaining doubt, the CF admin page I pointed to for the client var purge setting would also have confirmed things.

Anyway, I do always want to correct any mistakes I have, so again thanks.

# Posted By Charlie Arehart | 6/29/17 11:34 PM

You're right, I found the Adobe reference after my knee-jerk reaction.

# Posted By Alan Holden | 6/30/17 11:31 AM

Here's yet another potential cause of high CPU in CF...not about CF requests necessarily but about an underlying jetty web server (put in CF in CF 9.0.1 as an alternative way to get to the CF Enterprise Server monitor...but impacting even those in CF Standard, even in 2018).

For more, see:

https://community.adobe.com/questions-582/coldfusion-launcher-application-maxing-out-cpu-272322#post1979680

(was <strike>https://forums.adobe.com/thread/2347245</strike>)

# Posted By Charlie Arehart | 7/6/18 9:59 AM

Here's yet another cause of high CPU in CF, especially on Linux servers. I forgot I'd not added it to these comments about still more causes.

Your server may be "running out of entropy", and some have found they've solved the problem by changing (or implementing) the JVM argument:

-Djava.security.egd=file:/dev/./urandom

(as opposed to the similar-looking -Djava.security.egd=/dev/urandom). I'll leave this for folks to consider. There is indeed more to the topic that you can find with some google searching, but I've had many clients see high CPU problems go away instantly with this, and with no subsequent negative impact that I ever heard about.

# Posted By charlie arehart | 10/1/18 4:09 AM

[Add Comment]

Sun	Mon	Tue	Wed	Thu	Fri	Sat
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31