CF911: Lies, Damned Lies, and CF Request Timeouts...What You May Not Realize
Note: This blog post is from 2010. Some content may be outdated--though not necessarily. Same with links and subsequent comments from myself or others. Corrections are welcome, in the comments. And I may revise the content as necessary.How often have you seen (or seen others complain of getting) a CF page running longer than it's "supposed to" by a timeout you have set. Maybe you've set the CF Admin "request timeout" (first setting on first page of the Admin), or used the cfsetting requesttimeout tag or the timeout attribute on some specific tag if it's available, trying to get the request to "end" in 60 seconds, and yet you see a request running for 3 minutes, 3 hours, or 3 days! How can that happen?
Or same with if you've set the request to timeout using an alerting feature in a CF monitor like CF Enterprise server monitor, FusionReactor, or SeeFusion.
And perhaps you've seen this error from ColdFusion, in your logs or on-screen:
Do you know what this means? It's usually not what you think, and it may appear as I said 3 hours after a request was "supposed to timeout" in 60 seconds. I've even seen experienced CF developers who get thrown by this challenge. It's not new (and for those reading this even in the CF2016 era, it still happens). And it's not so much a "bug" (in either CF or the monitor tools) but just a situation that you need to understand, and there can be some ways to resolve things.
In this entry I'll try to help explain this surprisingly common problem and I hope to correct some equally common misconceptions. I'll even contend that the info in this error message is often useless and indeed misleading (and therefore the feature producing it ought not be relied upon completely, and should perhaps even be turned off for many). More important, again, there may be a way to "really" kill such a long-running request. Along the way, I'll share some things that I've not seen documented elsewhere.
I also share a solution that may work for query processing but it's NOT about a tag attribute but rather a CF Admin setting in the datasource "advanced settings" to set a "query timeout". This was added in CF 9, but many never noticed. If that's your problem and you want to skip to more on that here, feel free. but you may want to come back and read the rest as it is STILL not a perfect solution.
Strap on your seatbelts. We're going for a bit of a ride (if this situation was easy to understood in the length of a tweet, then perhaps everyone would already understand it and not find it challenging!) As always, I welcome feedback.
A) What the error does and does not mean, typically
So people are often mystified by an error like the above, wondering, "Why in the heck would a CFOUTPUT take a long time?"
Or perhaps they're a little more savvy as to what's happening, and they assume, "No, it's just that the CF timeout time was reached when it got to the CFOUTPUT". That could be.
Sadly though, in most cases, neither is what has happened. CF is usually NOT reporting that "here where the app timed out".
"OK, smarty-pants. What does the error really mean? Are you saying that CF is lying to me?" Well, often, yes, I'm afraid so, but it's not something nefarious.
Rather, what's more typically the explanation is that some previous activity in the page/request, such as a CFQUERY, CFHTTP, invocation of a web service, or the like is what really took a "long time".
If it was this which caused the request to exceed the timeout (either as defined in the CF Admin Settings page, or using CFSETTING RequestTimeout, or a Timeout attribute on a tag), you'd of course expect CF to report it then and there. The problem is that, often, it cannot report it "right then". And it's not its fault.
B) There are simply some operations which CF/the JVM cannot interrupt
So the crux of the problem is that CF (and the JVM) cannot interrupt a request while it's processing what's called a "native method". That is quite typically the mode that a request is in while it's waiting for a reply from a CFQUERY, CFHTTP, and so on. These operations talk to something outside of CF (like a database with CFQUERY, or another server with CFHTTP or web service call--which could even be requesting a page from the same CF instance, but technically the underlying Java httpclient process doesn't know that.) It could also happen with file or network operations.
So the request will wait for this long-running operation to finish. It can't stop it, not with the CF Admin request timeout, not with CFSETTING RequestTimeout, not with the kill features in the CF Server Monitor, FusionReactor, and SeeFusion. Nothing. It's like the Anti-Terminator: "it absolutely will not stop" (can't be terminated) until its task is completed.
C) So what happens when the long-running operation finishes? Is that when the request times out? An example, and more
"Ok, I got it. The long-running operation (CFQUERY, CFHTTP, whatever) will not stop until it's finished. What happens then?"
Well, you see, that's where the confusion comes in. Let's use an example to make things crystal clear.
Say that the CF admin timeout is 60 seconds (not at all uncommon), or perhaps you set the timeout to 60 for a given template using CFSETTING. Anyway, let's say that the request in question gets 2 seconds into processing when it starts running a long-running query (for example). Let's say that query then takes 75 seconds. When the query is done, the request has now run for 77 seconds, which is 17 seconds beyond the timeout time.
We already know that it won't stop the CFQUERY itself (at least until the query is finished). But guess what: it also will NOT report that the timeout has been exceeded on that line (whatever it was, CFQUERY, CFHTTP, etc.) From my experience, CF doesn't check the time against the timeout at the end of operations, but rather at the beginning.
So instead, it will proceed to the next line of code. You'd think, "ok, then, it will stop on whatever is the next line of code and give you the error there, right?" Sadly, not necessarily, and it only adds to the confusion of the timeout message.
CF checks the time at the start of the next operation, but sadly only on SOME tags
So it's bad enough that it won't report the error on the tag that DID run long. Instead, we saw that it will proceed to the next tag/function. But curiously (tragically), CF will often NOT stop on THE next line of code.
Instead, I've observed that it only seems to check the time (against the timeout) at the beginning of CERTAIN tags, such as CFOUTPUT, CFLOOP, CFQUERY, and so on. Yes, I'm saying that I've confirmed that it will skip over various other tags (such as CFSET, or CFSCRIPT code, and more). I've not yet found any documentation as to the details of this.
So this is where the error gets confusing
So the bottom line is that not only does the the request NOT stop on the tag/function that took a long time, it doesn't even stop on "the next line" after that, which can make things all the more confusing/challenging to resolve.
Indeed, this is why you often see the error reporting as having occurred on a tag other than what was really the problem, and why you also can't just look at whatever was *the* line of code preceding that.
So what can you do with this information?
I don't mean to paint an entirely bleak picture. All is not lost. It's just a little more challenging than it should be.
At least first of all you can now know that when you see this error, you should NOT assume that it's reporting the line that really caused the problem. You can and should consider whether some earlier operation in the code could have taken a long time. In my experience, this is usually the situation.
I'll talk in a moment about some other tools that can help you understand where the time is really being taken. First, I do want to offer a clarification, lest anyone read my meaning too literally.
Are you saying the error message is always lying?
Well, no. You'll notice that I peppered my opening paragraphs with "usually", because it's certainly possible that a request could indeed be stopped on the very line that DID exceed the timeout.
Consider in our example that if the long-running query had run for only 57 seconds. Now, since it had taken 2 seconds before that, it now is one second short of timing out. Let's say the request then proceeds to loop over a query resultset or do some other operations that might take it a couple more seconds. When it does finally exceed the timeout, it may well happen right on the very tag that CF Reports as having "crossed" the timeout time.
But given the problem of how it only reports that on some tags (and not all), it could still be in this situation that it reports the wrong line of code. Just consider all the above as you evaluate what to make of the situation.
D) So how can I know what tag did take a long time? either while the request is running or after the fact
So we know we have a long-running request. We understand now that its some tag or function or statement or other feature that is somehow itself hung up (typically).
So how can you know what specifically CF is waiting for? or perhaps why it DID take a long time, if it finished in the past? This is a bit more challenging. The good news is that there are tools that can help, including the CF Enterprise Server Monitor, FusionReactor, and SeeFusion.
Let's focus first on using these tools to catch requests while they're still running, which could be valuable if your server is hanging up because of some long-running requests. Then we'll talk about using the tools to capture the same information and make it available by email to review later.
FusionReactor and SeeFusion might easily show what a request is waiting for
Both FusionReactor and SeeFusion let you drill into a given request while it's running to see some details about it.
(Update: In the case of FusionReactor, it automatically monitors each CF datasource as of FR 5. SeeFusion requires configuration of each datasource to be monitored, even as of SF 5.1)
And with each, you can show you clearly if a given request is running a query. Since that's about the most common cause of hung requests, just having that information alone is valuable. And both tools also keep a history of recent slow requests, so you can also use these to see why some recent slow requests were slow.
[Update: later releases of FusionReactor also track separately the time spent waiting for a CFHTTP as well as some other things. See the "relations" tab within a request's details, whether while it's running or afterward.)
While the CF Server Monitor can also show if a request is running a query, it only does that if you turn on "start profiling". I will talk later here why that's generally not a good idea for busy sites, and it's indeed off by default. The FR and SF means of doing such query monitoring is far less impactful by default.
Shine a spotlight on what the request is doing AT A GIVEN MOMENT, via stack tracing
But sometimes these tools will help you see that the reason a request is running slow is NOT because of a query (or cfhttp either in the case of FR).
In that case, and in all three tools, it IS possible to identify exactly what a request is doing at a point in time, by "stack tracing" that request.
This is actually a feature built-into the JVM, which is exposed easily by these tools, but missed entirely by many. More than that, some misunderstand stack tracing as something only shown at the bottom of error pages. (That is indeed a stack trace, but it's not nearly as useful as what I'm referring to here, which is for getting information on request while they're running, not when they have had an error.)
Stack tracing a running request will allow you to see exactly what line of CFML (if any) is running at that very moment, which again can be vital for resolving problems of long-running requests.
Stack tracing in CF Enterprise Server Monitor
First, if you run CF Enterprise (8 or 9), you can use the CF Server Monitor to watch requests while running and see more details about them. If you use the available "start monitoring" button, you can see what requests are running in "Active Requests". Further, if you enable "start profiling", then if you double-click a running request, you can see in the middle of the next page a "stack trace", which shows the exact line of code that was executing at the time you double-clicked the request.
(Again, yes, I'm aware of the potential overhead of using various features the Server Monitor, though some people do over-state it in my experience. I'll point to other resources I've done on the Monitor, where I discuss its pros and cons, in a moment.)
Of course, viewing that stack trace at a random point in time during the life of a request could well mean simply that you'd see it executing just any random line, where perhaps a millisecond later CF will have moved on to another. The key is to refresh the stack trace, to see if CF indeed HAS moved on to a new line. If not, that line would be a smoking gun to investigate. Sadly the refresh icon in the Monitor doesn't update the details while viewing a running request. You need to go back to the list of active requests, open the request again, and repeat your observation.
Stack tracing with FusionReactor and SeeFusion
Fortunately, tools like FusionReactor and SeeFusion make that refresh a lot easier, to obtain a stack trace while request is running. Each offers a button to take a stack trace of a running request. From the page they show you can usually determine the line of CFML code that's running (they each offer a little more stack trace detail than the CF Server Monitor does, but I'll point you soon to a resource to help you better understand them.)
More important, each of these tools offer a refresh button to refresh the stack trace, so that you can properly determine if the line that's executing has changed while you're refreshing.
That said, I will note that FusionReactor offers an important advantage with respect to that refresh operation: it ties the stack trace display to the specific CFML page that was being viewed (in Running Requests page) when you selected it. So if that request ends while you're looking at its stack trace, and you refresh it, FusionReactor will report that it's finished.
SeeFusion, on the other hand, would not. It knows only the thread id on which the requested page was running, so that if the request ends and you refresh the stack trace, it only knows to refresh the stack trace for whatever request is running on that thread. It can't (and won't) tell you if the given request has in fact ended, so you could now be looking at a new (and different) request, which could be quite confusing in this situation. It's incumbent upon you to notice (when using SeeFusion) that the stack trace you see in indeed for the same request you started with. (FR gives each request its own internal request id, which is how it avoids that problem.)
Catching running requests details when you're not watching the monitor tools
Of course, it's only possible to use the stack tracing features above if you can be on the server running the monitor tools when the problem occurs, right?
Well, not exactly: all three tools offer features to watch for a long-running request which can then send you notification by email of the details that would include a thread dump, which is a stack trace of all running requests.
In the CF Server Monitor, these are called Alerts. FusionReactor refers to them as Crash Protection notifications, and SeeFusion refers to this as "Active Monitoring Rules". See the documentation for each tool to find more information.
Learning more about stack tracking and the monitor tools
For more on all this, I discuss the idea of taking stack traces and thread dumps (which is a list of all stack traces for all current threads) in another blog entry.
I also discuss the CF Server Monitor, FusionReactor, and SeeFusion in several blog entries. The links just used are to the respective categories about each here in my blog. I've also discussed these topics (monitoring, stack tracing, and more) in various articles and presentations I've done.
The step debugger
Some may point out that you can also get an idea of the time spent on any tag/function within a request if you use the interactive Step Debugger (whether that built into CFBuilder or the commercial FusionDebug alternative [update:, or as available via the web UI in FR 6 Ultimate). As you step through the code, it would be clear if you got "hung up" on a line, though I don't know that I'd favor this as a solution here. Still, I've discussed these also in various blog entries, articles, and presentations.
(Sadly, you can't rely on the typical end-of-page debugging output, as enabled in the CF Admin, because that output is only shown if the request completes. We're referring here to pages that end in error.)
E) Can't I force CF to timeout some specific tags?
Again, in my experience (as I focus on CF server troubleshooting as a consultant), the root cause of problems in most "long-running" requests in CF is that some one tag or function is running long.
So could we perhaps force CF to timeout that specific tag? Well, yes and no.
You can, in fact, (and should) consider whether the tag in question might have its own TIMEOUT attribute or feature (and whether it will really help, as I'll explain.) Let's look at each of them.
Setting Timeout on CFHTTP, CFINVOKE, and others
There is indeed a TIMEOUT attribute on CFHTTP. Unfortunately, it won't ALWAYS keep the operation from exceeding that timeout. I've not quite put my finger on it (just haven't experimented completely), but if I had to guess, I'd say that it could be that if the operation is in the midst of returning data (from the server to CF), then it could perhaps time it out, whereas if it's waiting for the output then it may not be able to. Anyone know for sure?
There is also a TIMEOUT on CFINVOKE (for use when calling web services). Curiously, though, there is no TIMEOUT for use with CFOBJECT when calling web services (try it, it won't work, and none is documented). More curious still is that there IS a timeout available for use with createObject() (when calling a web service), though only by way of an argstruct argument that's new in CF8, which I have blogged about. Note as well that, according to the docs, that only times out the process of obtaining the WSDL, not the execution of any method in the web service.
There are also timeouts on various other operations that talk to something outside of CF (cfmail, cfftp on open/close operations, cfldap, cfpop, cffeed), though again it seems reasonable to expect that these may not always honor the timeout at the exact time given, as discussed above.
Setting Timeout on CFQuery
What about the elephant in the room, CFQUERY? Well, yes, it does have a TIMEOUT attribute, but many have found that it often does not timeout the query. Like the CFHTTP, I wonder if it may be a question of whether it's waiting for output (which likely can't be interrupted) or starting to receive it (which likely can be).
I will note that there's some promise in this regard, though for now not from CFML itself, but rather from the updated JDBC drivers in CF9.
And it is enabled via the addition of a new timeout option in the CF Admin Datasource Advanced Settings page. You'll see that there is a new "query timeout" option that was not in CF before 9. I have blogged about it in more detail.
It's not perfect: first, note that it's only available for the CF-provided db drivers (which are themselves licensed from DataDirect, which added the option, and the CF admin exposes it as setting.) Second, people are reporting different experiences with it (see the comments in the blog entry), and note (more important) that for now there seems no corresponding connection between this and the CFQUERY TIMEOUT attribute. (As I note there, I have raised a bug about this.) Still, it may be better than nothing and could help many, if you're on CF9 or above.
F) So is there really nothing I can do for the hung requests?
OK, so we've explained why the requests don't timeout, often because they're talking to some remote process that is not responding. But what CAN you do when you're in this boat? Well, other than trying to add timeouts to the code as discussed above, generally nothing, at least for the requests that are already running.
And certainly a restart of CF will kill them off, or at least stop CF trying to talk to the remote process. (Of course, it's possible that upon restart, new requests will come in and try to connect to the same non-responsive or slow-responding remote process, so it could come right back.)
Stop the request on the remote server
But while you can't do much from WITHIN CF for these hung requests, there's one other way you may be able to stop the madness: stop the request on the remote server.
Once you can determine exactly what tag it is that's hung up (with the stack tracing tools above), you could then target whatever it was waiting for: the database server, a remote page called via CFHTTP, an exchange server using CFLDAP, etc.
Since the tools that let you stack trace the running request also show you the time the request started, you could use that info to go to the administrator of whatever service you're calling and ask if THEY may be able to kill the request. As soon as what you're waiting for stops, the CF request will continue. (Of course, it may only continue for a few milliseconds before it will be timed out by CF, as I discussed above, which is why I'm no fan of the CF request timeout feature, and think it should be turned off. More on that in a moment.)
Beware: you may not always find the remote server still "hung up"
Back to this issue of finding and killing the remote process that CF may be waiting for (that's causing your hung request), I should note that there may be times when you would go to the remote administrator and say, "look, I have this long-running CF requests that's waiting for this process (query, ldap request, web page, etc.) that is waiting forever for something that is running long on your server". And they may look and see nothing on their ends that's running long. Doh!
Yep, it can happen, for various reasons, so just be sensitive to this. You may really then have no way at all to kill the hung request. But note, again, that you may be able to use this observation to do something more to prevent the problem in the future, perhaps on the remote server side.
For instance, I've heard some describe problems where CFQUERY processing has hung talking to an Oracle database and (if I've got it right) the problem is an inconsistency between the CF datasource connection timeout and Oracle's "session" timeout. If anyone has more details on that, please do share.
But my point is simply that CF may be "waiting" for a call that will never be answered and can't be terminated from the other end. Again, in such cases, you can only kill them by restarting CF, and then you need to investigate how/why the call to the remote server are getting hung up in the first place. That's where logging information for diagnostic purposes may really come in handy, as is discussed next.
Logging what CF is getting hung up, to show to the remote administrator
If this problem (of calls to remote servers that take too long or get hung up) is happening often, and/or you can't always be logged in to see when it's happening using the tools above, another idea is to log for yourself whenever you make such a call to a remote server (that you know tends to hang up), such as putting a CFLOG statement before and after the CFHTTP, CFLDAP, CFQUERY, etc.
At least then you'll be able to see when it does and doesn't take a long time. The log would also help you by showing when it logs a start but no stop.
You could also code it so that it only logs when it's slow, but being able to confirm that it's generally fast and only sometimes slow may be itself useful diagnostic info.
Note that CF 9.0.1 by default adds new logging that does automatically log the start and end of calls to cfhttp, cffeed, and more to corresponding new logs (http.log, feed.log, etc.), which could also help.
Finally, as for logging the queries, you can get that from FusionReactor and SeeFusion automatically, as their "jdbc wrapper" features allow you to log every query (or optionally only those slower than a certain time). There is also a new "log activity" feature in the "advanced settings" of a CF datasource definition that could also log DB activity, though it is quite verbose and a tad unwieldy (not one line/row per query like the other two tools).
G) Bottom line: I'm no fan of request timeout features
So all that said, I'll repeat and clarify that I'm no fan of request timeout features, not that in the CF Admin, nor that offered in CF monitoring tools that offer to "kill requests" automatically, like the CF Server Monitor Alerts, FusionReactor's crash protection, and SeeFusion's active monitoring rules. I don't think they should be used, personally.
Let me be clear: I do love those tools and use them and help people use them daily. And I do love and highly recommend the features in those tools for sending you *alerts* when requests exceed a given time. What I don't like is them trying to kill them automatically, for all the reasons I outlined above. So I tell clients to turn off the "timeout requests" feature (though it does still make sense to use TIMEOUT attributes on certain tags, or may make sense to implement the CFSETTING RequestTimeOut on some page where you know that the reason it runs long is not one of these things that can't be killed anyway.)
Instead, I recommend (and help my clients daily) to use the alert info (from the CF Server Monitor, or FR or SF) to be notified if/when requests ARE taking too long--and NOT to kill them. Note that these tools all send the notice *as soon as* the request takes too long (whatever time you set), whereas CF's "log slow requests" feature only logs when requests end--and that's only IF they do end without failing.
So yes, get notified that requests are taking too long. Use the info in the alerts, which includes the stack trace info I discuss above. Do find and resolve the problem. Don't rely on (or in my opinion even use) auto-kill features, when in fact they nearly never are able to kill really problematic requests anyway.
Yes, yes, I do realize that there are some requests that CAN be interrupted by these timeout/kill features, but I'll assert that such requests are far less commonly the cause of any serious problems. Your mileage may vary, of course. But I make my statement based on several hundred instances of helping folks solve typical CF server problems.
So why is the "timeout requests" setting there?
One last thought worth considering: someone might reasonably ask, "Charlie, why are you such a hater of the setting? If Adobe has it there, it must be for a good reason."
Here's what I'd say to that: sure, when CF originally ran on C++ (prior to CF 6), perhaps this setting could be reasonably relied upon to ensure that requests would not run any longer than the set time. (I don't recall, but perhaps even then there may have been at least SOME tags that it couldn't interrupt.) But clearly since CF 6, in the Java model, this is no longer the case.
And yet if you read the Admin page, or its help, or the docs, or the comments from nearly anyone who considers the setting, the presumption is that this WILL stop requests from running longer than the x number of seconds indicated.
Why am I so impassioned/manic about this?
I hope I've made clear in this entry why I think that's not only wrong to conclude (in nearly all cases), but worse it sets up a tragic misconception of how CF works. If you think this should and will stop long requests (or that the alert features of the monitors will kill them), then you're going to be in for a shock when requests do hang up for an extended period of time. What are the implications?
- You may totally under-estimate how many simultaneous request threads you should enable.
- You may never pay attention to tools like CFSTAT or jrun metrics (to observe at least "how many requests are running" at any given time), which will help you see if/when requests are hung.
- You may never bother to learn how to use the CF Server Monitor (or FR or SF), all of which can go still further and show not just how many requests are running (possibly hung) but a) how long, b) what the URL is, c) what the IP address is, and so much more, which can help you find and resolve problems.
- You may never bother to learn how to do the stack tracing that I discuss above, which is often vital to understanding where and why any given request is hung (or was at the time an alert was thrown)
- You may never bother to analyze logs that show the activity patterns (how many requests are running at periodic intervals, such as the FusionReactor "resource log" reports.) It's really THAT information that is vital to your understanding what to set for your simultaneous requests setting.
- and so on
All of this info (and understanding) is VITAL to a very important and common class of CF server troubleshooting: why is CF up but not responding? It may be that requests are hung.
But if you assume, "well, they can't be running any more than x seconds", then you'll start to think "so it must be something else", and you figure you may as well just restart CF. Or you start reading about how someone suggests you change your JVM settings (which may have NOTHING TO DO with this problem, and not only not solve it but could cause new ones), and so on.
Again, I see this all the time. I hope by this entry to have helped avoid some of the very common misunderstandings on this subject that I frequently see either on lists, or in emails to me, or in my consulting engagements. If I seem passionate about it, it's because I am. Same with the memory issues I discuss in the related and similarly titled entry, CF911: Lies, damned lies, and when memory problems not be at all what they seem, Part 1.
H) Need More Help?
I mentioned above that I provide CF Server Troubleshooting consulting. If you need some help understanding how to apply the information above to your specific problem (or need help with any CF server, or CFBuilder, problem), I'm happy to help.
I don't need to come on-site, nor do you need to give me remote access. Instead, we can work easily and securely right over the web using Adobe Connect.
And I don't have any minimum time-block requirement--and I even offer a satisfaction guarantee. To learn more, including rate plans, see my consulting page. (I hope some will forgive this brief commercial here. I don't generally mention it, but since some say that they didn't know I offer such services, it seemed an appropriate point to mention it.)
So phew, another really long blog entry. But I hope it may help some people (and help some who help others).
As always, I welcome your feedback, corrections, additions, etc. Really, I ask for your feedback. If it helped, please say so. My blog doesn't get the traffic of many others. I often see that hundreds of people have read things, but few ever comment. I can't know if it's that I've answered every question (I can hope so), or that you weren't impressed. Like the guy said in Dirty Harry, "I gots to know". :-) Sometimes, all it takes is a few people to "prime the pump" and start commenting to lead others to do so. Why not grab the handle? :-) And if you think this would be helpful info for others, please do share it (tweet about it, mention it on mailing lists/forums when you see the problem raised, etc.)
I'm planning to better organize and package CF server troubleshooting resources (mine and others). We have a lot of great info out there for those solving CF problems. It can just be a challenge to sort through it all. I hope to help solve that. Look for more news to come on that front in time.
For more content like this from Charlie Arehart:
Need more help with problems?
- Signup to get his blog posts by email:
- Follow his blog RSS feed
- View the rest of his blog posts
- View his blog posts on the Adobe CF portal
- If you may prefer direct help, rather than digging around here/elsewhere or via comments, he can help via his online consulting services
- See that page for more on how he can help a) over the web, safely and securely, b) usually very quickly, c) teaching you along the way, and d) with satisfaction guaranteed
I'm one of those people who read all the time but rarely comment :)
@Matt, again, I'd say that it MAY be possible that a TIMEOUT could stop it (whether at the admin, page, or tag level), but again none are guaranteed to work, from my experience, depending on what state the tag hangs in.
As you note, the best "solution" then is again to determine what's causing the slow response from the thing you're calling. In your case of a web service, then yes it may be possible to get the provider to offer their own timeout on their end. Sometimes we have to think outside the (server) box. :-)
I told you in person Tuesday, but let me get this in text. I really appreciate that you wrote this, it is an amazing read and I really learned something out of it, something that has puzzled me since about 2002. Great job, thank you!
(timeout fog has lifted)
Hope I might get the same reaction from you (and others) from my new series, on when memory errors may not be what they seem. :-)
I keep coming back to this post and its follow-up.
These are essential, precious, edifying pieces of writing, and the CF community owes you a big deal for your exceptional contributions.
There, I said it.
<cfsetting requesttimeout="#(ceiling(request.errorTickCount / 1000) + 3)#" />
Thanks again, this article was a life saver. My apologies if I reiterated what others have expressed elsewhere.
Personally, I don't like using request timeouts at all, preferring instead to have an alert mechanism tell me when requests run unexpectedly long, to include a stack trace to understand why it's running long, as I talked about in the entry. :-)
But sure, that idea may help some, so thanks for sharing, and thanks for the kind regards. It's always great to know if/when something I write helps someone. Cheers.
See the discussion there for more, including more clarifications that may be posted after I write this comment, which may offer better explanations for what was observed:
i have thru this post quite a few times! Thanks for sharing it!
I also face timeout issues on my cfhttp. The funny thing is the first request takes a long time to process while subsequent ones are faster. Any ideas, why it shud be so?
@Shailaja, sorry that I somehow missed your question there. Did you ever resolve it? If so, or if you still need help, write back and we'll try again.
The issue with coldfusion, this is how I understand it, is that cf does not know when a third party request is going to respond at all. Plus it is not timing the request so therefore it will never determine if it needs to give up on the request or close (however you want to put it).
My environment at work is hurting big time right now because our sql server is hanging up on third party requests (flat file db). Its horrible, but the jist of this is that when sql hangs up and coldfusion cfquery requests hang up I can't kill those request. So those requests continue to run until I restart cf service. Horrible, right?
Very interesting read. Very useful. You're the closest so far to help me debug the situation I'm in.
I'm stuck with an issue that's been baffling me for months. I have 6 servers (2 x double core CPUs each), 3 CF nodes on each. When I monitor using FR, all is well. Traffic is quite high every day, though I could not tell you exact numbers. External requests are handled by a load balancer making checks on every nodes, secured and unsecured, every second - these requests take usually 16 ms.
Some basic settings, same for all nodes:
* Maximum number of simultaneous Template requests = 10
* Timeout Requests after ( seconds ) = 120
* Caching = active
Normally, everything works just fine. The sky is blue, no rain.
But every once in a while, I see in FR that requests pile up, nodes go red, all of them, and traffic gets to a standstill if responsive at all. CPU and Memory per node, however, are fine.
I checked the scheduled tasks on the site's "admin" node (an additional server with two CF nodes that is used by the site admin to run recurrent, back end processes, like uploading images for the front end nodes to use, run data imports and exports, etc.). These nodes run tasks quite frequently, but I was unable to link particular outage times with particular tasks. We expect slowness and potential downtimes at night during the peak of these admin tasks, like inventory refresh, catalog updates, order processing, etc, but the outages happen during day time, when only the most trivial of these tasks happen. This is what concerns me.
I have not yet ruled these recurrent CF tasks (of varying DB-intensity impact) out, but I want to explore other possibilities, since mapping the outages with the tasks is inconclusive.
I've checked the single DB that supplies these nodes, and it exhibits no issues (so far as what my DBAs tell me). Plus, after the tide has receded, I check the longest requests in FR and I see that for some pages taking >1 min (my timeout is 120 sec, so the requests DO proceed), the JDBC (MS SQL) shows that it took only 1 sec.
So where is the time waste? The files are located on a NAS server, so I'm also pursuing that path to see if I have network or I/O latencies. But I wish there'd be a way that FR of the CFAdmin can put a time on (almost) all operations, so that I can zoom in on the culprit. If there's a way, please assist.
What is puzzling me is that when things work fine, the very same pages can load below 2 secs. When hell breaks loose, same page takes >1 min.
What would help me is a rundown list of suspects so I can investigate every one of them and dispatch some to my colleages to check
* DB - for long running requests
* Network - for latencies connecting to DB or NAS
* NAS - I/O issues, responsiveness
* Thread limit reached - and therefore, queued
I apologize for the long post. We can communicated via email if you desire, but I don't mind it to be open so that it could benefit others stuck with a similar issue as me.
I do find it odd that you say you have and use FusionReactor, but you say you can't tell the "exact numbers" for your traffic. That' should be pretty apparent in FR and especially in its logs, if you need to know the numbers over time. I could help you find and understand that info within FR better, if needed.
More important (and sad to hear), you say also that you see requests pileup in FR, with "nodes going red" (by which I assume you mean the FR Enterprise Dashboard feature), but then you say you do not see how to link outages/hangups to "particular tasks", by which I assume you mean requests.
I say "sad to hear", because the info should really be very apparent to you.
So from what you say, I suspect you are maybe *only* using the FR Enterprise Dashboard (ED). But note that is just one part of what FR does for you. Did you know that if you click on any of the many links offered within that dashboard for an instance in question, you will be taken to the "real" FR interface for that instance?
(And many people who use FR might never even use the ED but just launch the FR interface for their monitored CF instance directly, typically as port 8088.)
Now, don't get me wrong: the ED is indeed useful, especially for watching many instances at once. But there's nothing it shows that can't be seen within the specific FR interface for an instance. And the FR interface for an instance can in fact show LOTs more info.
To your issue, note that specifically you can go to its requests>activity page and see EXACTLY what requests are running: what URL, for how long, from what IP address. And you can then get details like how much time is query time, total, then for each query (the sql, its duration, etc.) It's amazing, magical stuff if you've never seen it before.
I really think your biggest challenge (and opportunity) is simply leveraging the great info you have in FR, which is indeed a wonderful tool, and about which I have written many times here and presented for years (see other links here on my site to find such resources).
I also mentioned above the FR logs. Those too are fabulous, to see what's going on especially BEFORE a crash or hangup that leads you to restart, because as you know, once CF restarts, FR cannot show you anything about the instance before that.
Of course, no one wants to sit around watching any monitor all day, so the next most important thing is to have FR setup to have its Crash Protection alerts send you an email whenever there is a significant problem. The best alert, in my mind, is the "request quantity alert", which can be set, for instance, so that the next time you have 10 requests running for 10 seconds, you get an email again listing all the details for all running requests: their url, the ip address, duration, and more.
Finally, whether you are viewing a currently running request in the FR interface, or viewing an email with details of requests running at the time of a hangup, the last and perhaps most vital feature to leverage is the stack trace feature, which shows for each request EXACTLY what it's doing at a point in time. With it, you can know, for instance, that all the hung requests are waiting for a database query, or a cfhttp call, or a cflock, and so on.
Bottom line: you need not suffer along, guessing (or googling) about what might be the problem when CF has issues. With the right tool and techniques, and connecting the right dots, you (or especially I, as I do it daily with folks) can generally solve any problem in less than an hour (once we have the needed diagnostics).
I hope that's helpful. Let me know if you have follow-up questions, though if they're not in the guise of this particular blog entry, it may be best to send me an email directly. My contact info is offered via a link on the right of my blog and main site.
I am encountering a different sort of "lie".
CF 9.02 Enterprise. I have a cfc function (remotely called via an Ajax request) to upload a file, parse it and insert in the db.
On one server it times out at 60 seconds.
I have updated the timeout setting sin CFAdmin to 90 seconds, restarted the server, no luck it still times out at 60.
I have updated the time out in Application.cfc same result.
I have added <cfsetting requesttimeout="90"> at the top of the function that is doing the process and still times out after 60.
What gives? Is there a hidden place that I'm missing that overwrites the time out setting?
First, can you confirm that the timeout is specifically the one from CF, which says "The request has exceeded the allowable time limit Tag: xxx"? If not, then the timeout may be from something other than CF.
Second, if you DO get that message from CF, then it could be that you have a timeout attribute on a particular tag (which would override the page, application, or server-level timeout).
In the above CF timeout message (assuming you are getting it), what is the tag reported? As I noted in the blog entry above, that doesn't tell you it WAS that tag, but then you could look to see what preceded that tag (unless there are many of them) and perhaps be able to tell what was taking a long time. It could be THAT thing (whatever it is) that has some timeout on it.
Third, if that approach doesn't work (and it could be a challenge if you have lots of code or perhaps even include files), you could just search the codebase (all cfm and cfc files) to see if you have the word timeout anywhere, and then maybe you could find one set that you did not expect.
Fourth and finally, if all that failed or was too challenging, you could watch the request running in a tool like FusionReactor (available as a free 14-day trial) where you could do a stack trace of the request while it's running. That would tell you the line of code running at that point. If you did that a few times while the request was running, you'd likely find it was running the same tag for many seconds, and then you could see if THAT tag had a timeout on it.
Sounds like an interesting challenge. Let us know what you find (or if you may like consulting assistance to solve this, which shouldn't take very long at all. See the consulting page on carehart.org).
Thank you for the quick response. I will investigate more based on the info you have provided and will let you know how it goes.
If the problem may be that the remote service you're calling is just not responding quickly in just a simple attempt to reach it via TCP/IP, he points out how you could use a java-based tcp socket connection test (which you could call easily within CFML) with a short timeout (since java supports it for that particular method), so you can see if the remote service is at least responding rather quickly initially.
It doesn't help in situations where, though it connects quickly, the thing you're asking for then takes a long time *on that server* (as could happen with a CFHTTP to a long-running page, or a CFQUERY to a long-running query).
Still, in case it may help someone, I wanted to point it out here:
Update: Since I posted that comment, the blog is gone but you can still find the post in the web archive: