[Looking for Charlie's main web site?]

Solving metaspace errors, once and for all

Note: This blog post is from 2020. Some content may be outdated--though not necessarily. Same with links and subsequent comments from myself or others. Corrections are welcome, in the comments. And I may revise the content as necessary.
I have a really simple solution to offer here, for a problem that has been nagging people running ColdFusion for the past few years. This post may also benefit those NOT running CF, especially if they have found confusing/conflicting information about the Java metaspace error and jvm argument that relates to it.

Perhaps you're getting errors referring to "metaspace" or "OutOfMemoryError: Metaspace", whether in your web sites, error logs, or even the CF Admin, and you wonder "what to do". Or you may be getting odd occurrences of blank pages, and if you look in your coldfusion-error.log you are finding such metaspace errors.

TLDR; In all these cases, the solution is simple (and may seem contrarian to some ears): REMOVE the maxmetaspace element from your JVM arguments. Indeed, I would go so far as to say everyone should simply remove it, even BEFORE you may get errors.

In the post that follows, I will explain how to remove it, including how you need to be VERY careful when doing that. You may also wonder why I recommend removing it, versus raising it. I cover that, as well as a bug report I filed with Adobe related this tis (which was fixed as of CF2021), below.

I also created an abbreviated version of this post, on the Adobe CF portal, if that may interest some readers.

How to know if you are having the problem

I mentioned how the problem may exhibit itself (as pages coming back blank or failing), but of course the real proof will be that you will see errors in your logs. You may find in your coldfusion-out.log or application.log, perhaps looking like this:

Feb 23, 2020 17:06:46 PM Error ... - Metaspace The specific sequence of files included or processed is...

or more specifically, in your coldfusion-error.log, you may see lines like this:

SEVERE: Servlet.service() for servlet [CfmServlet] in context with path [/] threw exception [ROOT CAUSE:
java.lang.OutOfMemoryError: Metaspace

That's the absolute proof that this is your problem (you're running out of metaspace), especially if you find them happening around the same time you experienced odd behavior from CF (or your app server. Those on other app servers should look to the logs for this info.)

Do beware that the CF logs are set to rotate when they fill, so even if you may not find these errors in the specific logs I named, look at their archived versions, which would have a -1 and so on, reflecting when they filled and were archived.

The solution

And if the CF settings for maxmetaspacesize are indeed the default (192m), or even if you have tried to "raise" it, I'm proposing here that for most people, you just need to REMOVE that argument and stop "chasing that rabbit".

More below. First, some precautions.

Be careful about modifying JVM arguments

First and foremost, whether you're experienced in modifying the JVM arguments underlying CF (or other Java app servers), you do need to be VERY cautious about making this modification. If you make certain easy mistakes, the configuration will be broken and CF (or your other app server) will not be able to start (heck, it may not even be able to STOP, as the same JVM args are used in the process that STOPS CF as starts it).

Before proceeding, you should backup the file that holds the arguments (before making any changes), and then do be very careful about editing it, as I detail below.

Where to find the JVM args in CF

In the case of ColdFusion at least, this and other JVM args will be found in the CF Admin, in its "Java and JVM" page, and then its "JVM arguments" field.

Changing that page ends up changing the underlying jvm.config file, where you can also find the java arguments appearing on its java.args line (you may see slightly different args which appear on that line than in the "JVM arguments" field of the Admin).

If you have only one CF instance, it's in your cfusion/bin folder, while if you have multiple instances, it will be found in a sibling folder to cfusion with the name of your instance, and then in its bin folder.

Backing up the JVM config file before changing things

Whether you will be modifying the Admin or that file, you should make a backup of this underlying jvm.config file.

You can call the backup jvm.config.bak, for instance. (Note that if you change the CF admin JVM page, it will automatically make its own backup as jvm.bak, and you may be well-served to name your backup something else, as I have proposed.)

How to remove the argument, carefully

Finally, whether changing the args in the CF Admin or in jvm.config, you will find the argument looking like this:

-XX:MaxMetaspaceSize=192m
listed among other java arguments.

You may also find an -XX:MetaspaceSize argument, which represents the "initial" size for the metaspace. CF does not set one by default, but you or someone else may have set one. I would argue also to remove that.

When removing either, you want to be sure to remove the opening dash before that XX: prefix. Be sure also to leave a space between the args currently found before and after this one.

If you make either mistake (leave the dash, or fail to leave a space between args), CF (your Java app) may not start or stop. Notice in the screenshot, where I point to the leading dash, and how it can appear on the end of line to the argument, so could easily be missed and "left" by mistake.

Restart CF, after making the change

After removing the argument, be sure to restart CF. (And don't make the change unless you are prepared to restart CF right away, lest you be gone when CF is restarted unexpectedly, and perhaps won't start because of a mistake you made.)

If somehow CF won't start, be prepared to restore the file you changed, and try again. If you had opened an editor to make the change to the file, and it's still open, look closely at what your changed (and if your editor supports an undo feature, see what happens if you undo your last change).

With this maxmetaspace argument removed, you should no longer ever get any outofmemory errors about the metaspace.

You don't need to change anything about the "failing" pages

And to be clear, there's nothing about whatever pages GOT the error that needs to change. They did not CAUSE the error: they were merely victims of it. Any CF restart would "resolve" the problem, at least temporarily... until CF ran out of metaspace again--in which case some OTHER page would likely fail. Thus the need to solve the REAL problem, by removing this artificially low maxmetaspace setting.

Still more, including the related bug report I have just filed

Update: As of the release of CF2021, Adobe has in fact removed the maxmetaspacesize setting (assuming you do not set it or import it from another instance.) So I have stricken out the remaining paragraphs in this section.

Again, I have filed a bug report today asking Adobe to STOP setting this maxmetaspacesize value for us. If you agree, please add your vote here: https://tracker.adobe.com/#/view/CF-4207269.

Note that it offers some additional perspective on how it came to be that Adobe does set it, and why I think they should not (and why I recommend removing it vs raising the argument's value).

This problem has plagued CF shows for a few years now. Let's stop the madness!

And again, as the above demonstrates, I think everyone should remove the argument. It's simply no longer needed, in the way that the older maxpermsize was. (I have added this sentence and a similar one at the top, prompted by a helpful comment from "Wouter". And it also led me to think to add these following two additional sections.)

Then why does Java even offer the argument?

Fair question. It's there if you ever wanted to keep the user of the metaspace from growing WILDLY large. Again, metaspace is taken from available memory on the box (not from within the heap), so it's less of a concern for most to bother limiting it. But maybe you may want to say, "ok, feel free to grow large. I won't set an arbitrarily low limit--like CF's default 192m setting--but don't go using more than 1gb". In that case, feel free to set the maxmetaspace to 1g (and you can say 1024 or 1g. Java will figure it out).

But I'm saying that I've never seen anyone NEED to. But that leads to another question...

How to watch the use of metaspace within CF/the JVM

Another reasonable question. If you have CF2018 or above (Standard or Enterprise), you can view it within the PMT (Performance Monitoring Toolset), if you set that up. One of the things it can track is the use of memory in various JVM memory spaces, including the Metaspace. See the "jvm" page, and the "non-heap" graph there, which breaks down the heap spaces that are NOT in the heap. An article including screenshots is here.

And the same can be viewed with FusionReactor, which tracks memory spaces via the Resources menu on the left, then "Memory spaces", and then you can select a desired memory space in the top right corner of that graph, as shown here. FR also tracks such memory space usage in its logs (written every 5 seconds and kept for 30 days), viewable in the Metrics>Archive Metrics page, and the "Memory section of logs it shows. (To be clear, this also allows you to look at values BEFORE a restart.)

Finally, of course, various jvm tools (command line or gui's, or GC logging) can show JVM memory spaces, including the metaspace.

Indeed, if you were to watch the metaspace and see it rising substantially and over time, you would want to really address WHY it is rising (rather than merely "limit" it)--as well as understanding then why and when it is that SOME but not other pages experience the error. I will say that classically, unexpected metaspace growth in the metaspace (as well as formerly in the permgen space) has been due to excessive class-loading, which can have any of many explanations. But that's getting well beyond the scope of this post, of how to readily solve the maxmetaspace problem itself. Perhaps I'll do another on that broader, background topic.

Conclusion

In the meantime, I do hope that my suggestion at the top (for YOU to remove the offending argument if you get these errors) will help many folks, and I do realize that this post may lead to debate and questions, so fire away.

And as always, if you may prefer help either implementing this change (or somehow further assessing it), or recovering from problems if you try it yourself, I am available for online remote consulting, with satisfaction guaranteed. More at carehart.org/consulting.

For more content like this from Charlie Arehart: Need more help with problems?
  • If you may prefer direct help, rather than digging around here/elsewhere or via comments, he can help via his online consulting services
  • See that page for more on how he can help a) over the web, safely and securely, b) usually very quickly, c) teaching you along the way, and d) with satisfaction guaranteed
Comments
Charlie, if I understand your reasoning correctly, then you don't really need to see error messages about the MetaSpace in order to benefit from the deletion of the MaxMetaSpace parameter - every CF runtime would benefit from it (every CF runtime after CF8 that is). Correct?
# Posted By Wouter | 2/25/20 9:40 AM
Yes. That's why I also say adobe should not even set it.
But you make a good point that I don't specifically assert that. I will tweak the post to make that very point. Thanks.
I'll go update my 6 servers then! Thanks for pointing this out!
# Posted By Wouter | 2/25/20 11:11 AM
OK, I added a mention of that at the top and the bottom (that really, there's no reason for most people to leave that maxmetaspace arg at all).

And then I went on to add two new sections: "Then why does Java even offer the argument?", and "How to watch the use of metaspace within CF/the JVM".

Hope those may be as helpful to those who already read the post, as to future readers. :-)
The question is this: absurdly not setting it, the memory could grow more than the physical one by crashing?
# Posted By Paolo | 2/25/20 11:35 PM
Should we still set the -XX:MetaspaceSize? Or can we remove that too?
# Posted By Dave | 2/26/20 12:50 AM
Paolo, please read the final sections. My suggestion is not "absurd". There's reasoned logic behind it.

Dave, it's usually not necessary to set that min level, no. CF does not set it, so I did not address it. Bottom line, unless you have specific requirements to have set either, don't bother
Thank you for this info, Charlie.
# Posted By pmascari | 2/26/20 7:20 AM
Thanks for the support and encouragement, Paul, here and on my CF portal post.
I should have added a mention of this the other day: I posted also a far more brief version of this post (the "tldr version") at the Adobe CF Portal. In case folks reading this may prefer to share that with some folks, it's here:

https://coldfusion.a...

And of course it points here for the additional "detail" that I offer. :-)
Oddly, CF2018 was crashing strait out of the box on a brand new MS Server 2016.

Removing this did the trick:
-XX:MaxMetaspaceSize=192m

Thank you for all your great blogs.
# Posted By G. Melanson | 3/11/20 8:56 AM
Thanks, g. That is indeed what motivated me to write. Glad to have helped.
Hey, Charlie! This was the exact error I was getting on a new client machine. I'm removing the MaxMetaspaceSize, which will hopefully take care of the issues we have been seeing!
# Posted By Kyle Shiflett | 3/11/20 9:30 AM
Great to hear, Kyle. If you've been getting that error, this change will indeed fix it. Glad to have helped.
Thanks, Charlie. CF2016 wouldn't restart for us after Update 15. This solved it. Thanks!
# Posted By Chris Simmons | 4/16/20 8:11 AM
Great to hear. I am surprised that the update would have any impact, but what matters is your server is running.
Awesome explanations and directions. Worked!
# Posted By Marion Andrin | 6/8/20 2:16 PM
Glad to help, Marion.
Thank you Charlie.
Just what I needed after an upgrade from CF 9.
# Posted By Gerry | 9/13/21 8:51 AM
And thank you, Gerry.
Thanks for this information. It seems to match our issue and I've made the change. I couldn't find anything close to the issue on the Adobe site except this one.
Thanks for sharing :)
# Posted By David | 12/8/21 12:46 PM
Glad to have helped, David. Again, good news is that cf2021 no longer enables it by default (unless imported in a migration of prior settings), so over time this problem will fade away like so many others...though it will remain for years as many are slow to upgrade. As always, I just want to help. :-)
Does this post apply to Lucee? I removed the setting and within a day our server crashed with the
lucee.runtime.exp.NativeException: Metaspace
Caused by: java.lang.OutOfMemoryError: Metaspace
lucee.runtime.exp.NativeException: Metaspace
error.

I also came across this thread here
https://luceeserver....
and the follow-up one here
https://luceeserver....
where I believe we may be a victim of the latest 5.x release issue reported there for sites that build and update the file templates regularly. It causes classses to recompile but don't get released apparently
# Posted By Scott Conklin | 8/3/22 5:13 PM
Sure, Scott. All this should apply.

First, to clarify, that's why I said in the first paragraph, "This post may also benefit those NOT running CF, especially if they have found confusing/conflicting information about the Java metaspace error and jvm argument that relates to it." :-)

Second, you say you removed the setting yet you still get the error. Well, you're running on Lucee which may mean you're running on Tomcat. And setting/changing the JVM settings in Tomcat is not as clear as some expect. Indeed, many docs many resources talk about editing a file, when instead if you run Tomcat as a Windows service, note that there is a separate app (tomcatw.exe or a similar name) which offers a UI to control the settings, and which saves them as registry entries. So how did you make the change?

Some good news is that the Tomcat logs do also echo the jvm startup values. See either the catalina or lucee-stderr log, which may be found in your lucee\tomcat\logs (different deployment types offer differing logs and folder names). In either log, one of the lines tracked during the instance startup should look something like this:
16-Jul-2022 15:46:37.131 INFO [main] org.apache.catalina.startup.VersionLoggerListener.log Command line argument: -Xmx256m

And there's a line for each JVM arg. Can you confirm that none refers to maxmetaspacesize? If any do, then no, you did not properly remove it and you could be hitting that limit.

If you confirm it's NOT there, then note that you can still hit "a limit", but it's not THAT size but your available memory, since the metaspace is taken from available OS memory. Do you have anything that logs your total memory usage over time?

And finally, I will say that if you get FusionReactor (even the free 14-day trial), that tracks the metaspace (and indeed all jvm memory spaces), logging it every 5 seconds so that you can see it also over a restart. And if this is about ephemeral instances or docker containers which "go away" on restart, then the FR Cloud feature tracks the info OFF the server.

Either way, you can see at the time of your error what FR tracked the metaspace size to be (again, every 5 seconds). What is its size when you get the error?

I realize that your focus for now is what may be CAUSING high metaspace use, and that's a noble goal. And indeed you could be hitting a Lucee issue. But let's start first with confirming what value you're hitting when the error fires. It may not be "that high" after all. :-)

And I can help you with implementing FR or using it or the FR Cloud UI, if interested.

Let us know what you find in the logs or FR, or otherwise.
Hi Charlie

answers to your questions follow:

1) I am familiar with the Tomcat applet which is where I made the changes
https://shareimg.io/...

2) I have confirmed that it is properly reading the java arguments when I make a change in the Tomcat applet.
see the catalina file here https://shareimg.io/...

(note: the MaxMetaSpace=2048m that you see in this screenshot is because we put the setting back
in as when we removed it the server crashed within 24 hours with an example metaspace error like the one below.)

07-Aug-2022 11:24:58.677 SEVERE [Thread-211253] org.apache.catalina.core.ContainerBase.stopInternal A child container failed during stop
   java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Metaspace

3) We do have Fusion Reactor and over the last week, I do see two graphs that seem to be problematic, but do not know what to do about it.

Firstly some background.

a) the server was seemingly running fine as far as I know. I mean every once in a while, in a 3-month period we might have to cycle the services
when we get an unexpected amount of traffic usually from a bot or hacker hammering the server.

b) I saw this article and removed the MaxMetaSpace setting per your recommendation but did not cycle the service immediately as I made the change during regular business hours.
I forgot about it and at some point, the hosting company applied windows update and rebooted the server, but I don't recall when.
Bottom line is I have no idea how much time passed from when the new java argument was applied up until we got the metaspace error, but it did come.

c)Since, the server was seemingly running ok prior to making the change, the hosting company (Hostek) recommended putting it back in, but neither of us could remember
what the setting was before, so we started with MaxMetaSpace=512m

d) within 36 hours it crashed with java.lang.OutOfMemoryError: Metaspace.

e) I put it up to MaxMetaSpace=1024m and it crashed after 4 days. I now have it at MaxMetaSpace=2048m which is what you see in the screenshot above. I expect it to crash again,
just proloning when

It seems what you are saying, that setting this setting is only going to make it reach the Metaspace error sooner and is not a solution to the problem? What confuses me is how the server seemed to work fine before removing the setting, but maybe it was not. maybe it was just dying a slower death.
What I do know is that I get the error with and without the argument.


4) The graph for the Metaspace in FR was indeed at 1024M when it crashed today before I bumped it up to 2048.
in cycling the service, I, of course, lost the very visually telling upward trend graph showing how the metaspace was being slowly but surely eaten up.
This screenshot is one of newly started service running only 6 hours. you can't tell but the metaspace memory has ticked up from 119 MB to 200MB in 6 hours.
Hitting the garbage collection button seems to do nothing. I assume this will slowly climb until all 2048 MB are consumed and then it will crash again?
https://shareimg.io/...

I should also point out the only on other graph that seems to correlate to this upward slope trend is the classes graph. (Heap and non-heap seemed to look normal with GC happening as they should, as best as I can tell.)

When the server was restarted 4 days ago the classes graph showed about 10,000 classes were loaded and when the crash occurred the classes were at 88,000. Since today's restart now 6 hours ago, the classes have already gone from 10k to more than 20k.
https://shareimg.io/...

It was seeing this that made me find and follow what is going on in this thread:
https://luceeserver....

When I saw that this could have something to do with systems that make heavy use of files that are cfincluded and/or updated often, I took notice.
We use an admin building tool to build out application files in all the web roots of our client's content management powered websites.
We don't this often but can do it on occasion when we alter the code in our base template. We do make heavy use of cfinclude in this architecture.
This system has been in operation for 22 years without many changes and has run on ADCF 5.0 on up to CF 11 before we moved over to Lucee 4.0 around 2015. We have only recently experienced this issue. We are now on the latest version of Lucee.

If the climbing classes is the issue, then why now and never before and more importantly how can I further troubleshoot it and solve it?


Thanks for any help or suggestions you might have
Scott, this forum thread comment area really isn't the best place to hash all this out. Many who are subscribed to comments may not care for all the back and forth. :-) I'll offer some thoughts in reply, but I'd recommend you reach out to me directly. More on this at the end of my comment.

1) First yes, the high rate of class loading you show is likely the cause of the high metapace. I was sure I'd have clarified that in the post, as it's nearly always been the cause of high metapace use (or before that permgen space), since what those hold primarily are "metadata" about class loads. But I see somehow I did not indicate this here. I have updated the post to clarify that--and also updated it to reflect that Adobe DID agree with this Mar 2020 post, such that starting with cf2021 (released Nov 2020) they now no longer set a maxmetaspacesize by default.

As for why it's rising for you, if not due to a change in your app or how it's used, then it would most likely be due to a change in Lucee, which may be sensitive to how your app does this pre-loading you refer to.

You referred in your previous comment to a known issue in recent Lucee updates. Did that prove somehow to not be the culprit?

2) As for the problem arising oniy when you removed the max, again the metaspace should be using available system memory. I can only fathom that you may have less now than before (when things "worked", if it's NOT due to some change in your app, lucee, or your load, etc).

To be clear, no, you will not cause it to "be able to use MORE metapace by setting a max than not". I sense you wonder if that may be possible. I'd bet money it's not.

3) You refer to how you lost track of fr watching the rise upon lucee'a restart. It so sad to find people still laboring under that old understanding of using fr. You're not alone.

FR 7.2 introduced the wonderful "archived metrics" feature somel years ago: with it you CAN see graphs of nearly everything fr tracks, even across restarts--because it also logs nearly all it tracks, and the archived metrics let you view that logged data graphically. See metrics>archived metrics. Find the time of your crash. I have a YouTube video on postcrash troubleshooting with fr, with more specifics.

Then confirm you see the metaspace rising over time. Yes, I realize you can see it now BEFORE it causes a crash. I'm just saying this will let you see the rise (and max it reached) in the seconds, minutes, or hours prior to the crash happening--after it happened. In this case, I grant it may only confirm the max you know you'll hit.

But there's also a graph of the class loads. It may be interesting to see what that reached before the crash.

Also, if you REMOVE the maxmetaspacesize, this would let you see what ITS size got to before a crash. Right now, you don't know that.

This is what I meant about how fr could help.

4) Again, for now, please don't reply here but to me directly. My contact info is offered here on my site. Let's hash more out directly, and then since you have a problem that should be solved ultimately (whether on your own or together), you can then report in the end what did work.

This is all just going SO MUCH deeper and wider than is warranted for this post's original topic. I appreciate you also want to help others who may hit the specific scenario you did. And you can do that with a follow up conclusion post, I hope. :-)
Copyright ©2024 Charlie Arehart
Carehart Logo
BlogCFC was created by Raymond Camden. This blog is running version 5.005.
(Want to validate the html in this page?)

Managed Hosting Services provided by
Managed Dedicated Hosting