[Looking for Charlie's main web site?]

CF911: Lies, damned lies, and when memory problems not be at all what they seem, Part 1

Note: This blog post is from 2010. Some content may be outdated--though not necessarily. Same with links and subsequent comments from myself or others. Corrections are welcome, in the comments. And I may revise the content as necessary.
Following on my earlier entry, CF911: Lies, Damned Lies, and CF Request Timeouts...What You May Not Realize, another common source of confusion and misunderstanding for people is when they think their server is "running out of memory", when in fact the problem is often not at all what they think. In this entry, I want to apply the same "cranky" tone :-) and extended explanation to this equally controversial/confusing topic.

I hear people raise concerns with memory problems quite often, whether in my CF Server Troubleshooting practice, or just in my participating in many mailing lists. Indeed, addressing this issue more than a few times the past couple of weeks has motivated me to create this, which will be a series of blog entries.

The series parts are expected to be:

  • Step 1: Determine if indeed you are getting "outofmemory" errors (this entry)
  • Step 2: Realize that having high memory usage is not necessarily a problem (entry to come)
  • Step 3: Realize that OutOfMemory does not necessarily mean "out of heap" (entry to come)
  • Step 4: Diagnose why you really are running out of heap (if you are) (entry to come)
  • Step 5: Realize that CF is maybe suffering because you set the heap too large (entry to come)
  • Step 6: If CF is hanging up but NOT due to memory, what could it be? (entry to come)

Let's get started and see how far we get...

Common refrains about memory issues

The common complaints about memory issues (and my quick responses, to give you a sense of where I'll be going in this series) are:

  • "CF is crashing. Is it running out of memory?" (there's a log that could/should prove that, and that's what we'll discuss in this part)
  • "CF's use of memory is high" (which may not be a problem. If you're looking at memory from the OS perspective, it may not matter as much as heap use within CF. This will be covered in part 2)
  • "CF's use of heap memory is high" (to which you'd think, "ah, well that's got to be a problem, right?", but no, not necessarily, as I'll explain in part 3)
  • "CF has a memory leak" (to which I'd retort, no, generally, it does not. There's nearly always some other explanation. We'll cover that among things in part 4)
  • "CF is running at 100% CPU before it crashes" (this could be related to memory problems, and more a consequence rather than a cause, or it could be entirely unrelated to memory issues. See part 5)
  • "CF is crashing all the time" (well, is it really crashing, or just hanging up and not responding? That's not a good thing, but it's very different from it crashing on its own. See part 6)

So what if we really are suffering a problem?

I'm not saying there's never a real problem of really "running out of memory". It's just that often things are not at all what they seem (or what most presume them to be, from my experience helping people), and that's going to be the bulk of what I'll talk about in this series. But what if your server is really crashing (or simply not responding), and you think/swear/know that it's a memory problem....

What should you do? Increase the heap size? Increase the permspace? Change the GC algorithm?

Sacrifice a chicken?

I'd say, none of them (though if you're in a rural setting, then perhaps cooking and eating the chicken might help settle your blood sugar so you can stay calm). Really, I know that goes against conventional wisdom, which seems always to suggest diving into the JVM settings. I'd say "hold on there, pardner."

Step 1: Determine if indeed you are getting "outofmemory" errors

This is one that surprisingly few people consider when faced with their server crashing or not responding. They go with whatever conveys to them a sense of there being a memory problem, perhaps adding their own experience or what they read, and they start chasing solutions.

I can't tell you how often I hear people lament that they've googled and found all manner of conflicting and confusing recommendations. And it doesn't help at all that they may be running on CF 8 or 9 (with Java 1.6) while reading about a "solution" written in the time of CF 6 or 7, when it ran on Java 1.4. Of course, the writer often won't have thought ahead to clarify that.

Instead, I'm saying, "stop, drop, and roll".

"Stop" the tail-chasing, "drop" into the pertinent logs directory in CF, and "roll" through them looking for an occurrence of "outofmemory".

Look first in the Console/Runtime/JRun logs

Let me be more explicit: the logs you want to look at for these outofmemory errors are NOT (necessarily) the ones you see in the CF Admin Log Files page. Those are the [cf]\logs directory (or are buried deep within an instance on Multiserver).

Instead, you want to see the "console" or "runtime" logs. Where those are depends on how you are running CF:

  • If you're running CF from the console, then look at the console display of logging info. (And if you started CF within CFBuilder, and did not set it to start as a Windows Service, then look in the console view of CFB for this info.)
  • If on Linux, look at the cfserver.log in the main CF logs directory.
  • If on Windows, running CF as a service, look instead at the -out.logs, found in the [cf]\runtime\logs directory (or [jrun]\logs on Multiserver, in which case there will be a prefix for each instance name in the log file names). You can generally ignore the -event logs there, as they typically just have a subset of what's in the -out logs. (Update for CF10: these logs ARE in fact now in the same directory as other CF logs.)

Some refer to these as the runtime logs, or the jrun logs, or perhaps the jvm or console logs. Whatever you may call them, their location is above and the explanation of their value follow will become clear n this series of posts.

Bonus topic: you can increase the max log size

(I will note that the way these -out.log files work, by default, in CF9 and before is that they fill at 200k increments. Yep, not 200mb, but 200kb!, and you may blow through dozens of them in a few minutes if things are going nuts. That size is configurable, but not through means you'd normally expect. See the blog entry I recently published, CF911: How to control the size of CF's -out.logs.)

Look next in the hotspot/pid/jvm abort logs

Separately, there are some other potentially important logs that may relate info concerning memory problems: what some call the "pid", "hotspot", or jvm abort logs. The filename is in a form like hs_err_pidnnnn.log, with some number in place of the n's.

These logs are found in a very unexpected place (for logs): in the directory where CF stores the jvm.config. So on Standard/Server deployments, that's [cf]\runtime\bin. For Multiserver, that's [jrun]\bin. (As of CF10, it's in the [cf10]\cfusion\bin, or the [cf10]\instance\bin for multiple instances.)

Look in that folder for any .log files. There would be one such "pid" log for each time that the jvm "crashes" due to certain kinds of problems. It could be a crash in the hotspot compiler, in hotspot compiled code, or in native code. (To be clear, many "crashes" of CF are not of the sort that will create such logs, so again, it's only certain kinds of crashes that will lead to them.)

What matters most, for this post, is that again you look at least look for such logs in this folder (occurring around the time of some crash you're interested), and if there look in the logs for any reference to the phrase "outofmemory". Of course, there may be such PID logs that don't refer to "outofmemory", but such crashes are beyond the scope of this post. And while the pid logs have lots of information in then, explaining all that is also beyond the scope of this entry.

The point here (of this section of this post) is that when you have a crash and you suspect it's a memory issue (or if you don't know the cause and want to learn more about what it may have been), you want to look in these two log directories mentioned above. Many never do, and this is part of why they end up chasing their tails, going instead on gut feelings or trying out various alternative "solutions". I say instead: find the diagnostic info, and act on it.

Searching through the log files (console or pid logs), the easy way

But rather than "look" at all these logs in these directories, one at a time, I suggest instead that you automate the process and search them (I was tempted to say "stop, drop, and mole" in my quip above, since you're "ferreting" through the logs, but that seemed a stretch.)

If you're on *nix, I don't need to give you any more info on how to search the files. Just grep it and rip it. :-)

If you're on Windows, though, you'll perhaps be tempted to think I'm referring to the built-in Windows Search tool to search the directory. I am not. Indeed, let me plead, for the sake of all things decent, please use something better. It's just not a completely reliable (or fast) tool.

I have blogged about a wonderful free alternative, called FileLocator Lite. Use that instead. (If you have another tool or editor you favor, that's fine. No need to bring that up here. I recognize those other options in the other blog entry.)

The beauty of FLL is that once installed (which is fast, itself), if you right-click on the log directory (or any directory) and choose "FileLocator Lite" from the menu, which will open the FLL UI. You can then just put the string outofmemory in the search box and hit enter (or click the "start" button in the UI). You could use *.log to the files searched, though since this directory is so small it's not critical.

In just moments it will show any found files in the lower left pane. You can sort the list by the "modified" column, to focus on files from around the time of your crash.

Then, here's the real beauty of this file search tool over others: to look inside the found file(s), you don't need to double-click the files to open them. Just single-click each file, and in the right pane it shows any lines in the found file which had the string that was searched. Brilliant, and again, a really fast way to find things.

So we're using this to find if any files in either of the folders above has any outofmemory (oom) errors. And if there are any such files, then we're looking at the occurrences of the oom errors within each log.

Don't stop at the last outofmemory error before a crash

This feature of the File Locator Lite tool, to see all the lines in the file with the given string, is especially useful in this case, because when searching for outofmemory errors, you also want to be able to quickly see the time for *all* the error messages you may find.

And you *do not* want to focus solely on the last error prior to the crash (or the slowdown, that made you want to restart CF).

(I should add that when it comes to the console logs (as opposed to the PID logs), sometimes those found oom error lines may not have a date/time stamp, to help you readily assess if the error is occurring at the time you were interested in. You may need to go ahead and open the files with an editor and search for the oom string, and then look at other nearby lines to find a timestamp.)

Once you find one (or more) preceding the time of the crash, you want to look for any occurring prior to it. It may be that the problem started several minutes before the crash (or your restarting CF). Further, it may be that the outofmemory error just prior to the crash is different from the one that started things out.

The point here is that you want to find out a) if there WERE any outofmemory errors around the time of your crash of interest, and then b) what information if any appears in the logs around that time. We'll then use that to proceed in evaluating the nature of your situation.

And of course it could be that there WERE no outofmemory errors in the logs, which would indicate that your problem was NOT likely an outofmemory error (or not a heap error after all). We would need to proceed to doing more investigation and assessment of whatever diagnostics you DO have.

Step 1 down, 5 more to go

OK, that's step 1 in determining whether memory problems are really at all what they seem. As I mentioned at the outset, the planned parts in the series are:

  • Step 1: Determine if indeed you are getting "outofmemory" errors (this entry)
  • Step 2: Realize that having high memory usage is not necessarily a problem (entry to come)
  • Step 3: Realize that OutOfMemory does not necessarily mean "out of heap" (entry to come)
  • Step 4: Diagnose why you really are running out of heap (if you are) (entry to come)
  • Step 5: Realize that CF is maybe suffering because you set the heap too large (entry to come)
  • Step 6: If CF is hanging up but NOT due to memory, what could it be? (entry to come)

After I publish them, I'll update the lists here to link to them.

As always, I look forward to your feedback (pro, con, or indifferent).

For more content like this from Charlie Arehart: Need more help with problems?
  • If you may prefer direct help, rather than digging around here/elsewhere or via comments, he can help via his online consulting services
  • See that page for more on how he can help a) over the web, safely and securely, b) usually very quickly, c) teaching you along the way, and d) with satisfaction guaranteed
Comments
very nice
Eagerly waiting for the next three parts, Charlie! Thanks, as always, for your great writing on this great piece of software.
# Posted By Sung | 11/30/11 6:51 PM
We've consulted with you before and benefitted greatly. This post is really helpful in remembering what we talked about. I hope you'll finish the series soon!
# Posted By gkl | 10/10/12 9:01 AM
We are also eagerly awaiting the next three parts. Please keep us updated and as always GREAT JOB!!!
# Posted By hopsguy | 11/29/12 11:14 AM
Great article as always. Eagerly waiting for part 2,3,4.
# Posted By Ajas Mohammed | 1/7/13 12:20 PM
Thanks, @Ajas, and others. Sorry for my dropping the ball on the planned follow up on the "memory lies" blog entry series. Just been slammed. Still hoping to get to them, for sure. Occasional nags are welcomed. :-)
Here's a nag. Fighting a nullpointer problem that is being blamed for performance issues and can't track it down.
# Posted By Smoke | 7/10/13 1:59 PM
By "a nag", do you mean you are wondering if I might be able to create a blog entry to address that? I'm afraid you've not given enough info for me to perceive what may be your issue.

If you mean you'd like help with it, perhaps I could, as part of my CF server troubleshooting services, which can be offered remotely (without you granting me "remote access", creating an account for me, opening any firewall holes) and often very quickly. Please check out my consulting page for more info on my rates, approach, and satisfaction guarantee, and more: http://www.carehart....
You said " Occasional nags are welcomed. :-)" So I was nagging you. I'm rapidly closing in on this particular error, I think, so I'll continue to go it alone. Seriously would like to see the remaining articles though.
# Posted By Smoke | 7/10/13 6:57 PM
Oh, ok. Sorry. The comment you were referring to was from 6 months ago, so I had forgotten it, and I didn't think to check the past comments before replying.

So you are thinking that the nullpointer problem you're facing has something to do with memory? which is why you'd hoped I'd finish those remaining entries I'd planned?

Well, I'll say that in my experience, I've never seen a nullpointer problem to be related to memory, not once. Of course, I could be wrong and missed if it happened to others, but I really would not expect it to be related to memory.

So otherwise, I wish you well on your venture to solve the problem you do have on your own, but again keep my services in mind should it remain unresolved.

As for the remaining entries, I just have to find the motivation/muse to get back into them. As readers can see from other recent posts, I certainly have no problem writing once I can get started on a topic. :-)
Charlie, just curious if you ever got around to finishing up this series? Or if you can recommend any other articles or blog posts that might help me learn how to go through and fine tune my server and check on memory issues and such? Thank you in advance.
# Posted By Justin Cook | 2/21/19 8:31 PM
Sorry, Justin. I did not. Just busy.

No good single resource, as there's so much to cover (and why this was going to be multiple parts).

I will say that if you present to me a single problem or question, I may have a more specific answer.

I can also point you to an FR webinar I did on solving memory problems with FusionReactor. See https://www.fusion-r...

Finally, I will say that if you have a problem you just need to solve, and it may not be a simple single question or issue, I will say that of course I help people with such problems on a consulting basis all the time, at carehart.org/consulting. Some problems can be solved in as little as an hour, even if it may have plagued you for hours, days, or weeks.
Copyright ©2024 Charlie Arehart
Carehart Logo
BlogCFC was created by Raymond Camden. This blog is running version 5.005.
(Want to validate the html in this page?)

Managed Hosting Services provided by
Managed Dedicated Hosting