Changes for page Troubleshooting Frozen Deployed Instances
Last modified by David Avendasora on 2010/11/30 06:43
From version 2.1
edited by smmccraw
on 2007/07/08 10:33
on 2007/07/08 10:33
Change comment:
There is no comment for this version
To version 7.1
edited by Jonathan 'Wolf' Rentzsch
on 2009/02/03 16:54
on 2009/02/03 16:54
Change comment:
There is no comment for this version
Summary
-
Page properties (3 modified, 0 added, 0 removed)
Details
- Page properties
-
- Title
-
... ... @@ -1,1 +1,1 @@ 1 - Programming__WebObjects-WebApplications-Deployment-Debugging Frozen Deployed Instances1 +Web Applications-Deployment-Debugging Frozen Deployed Instances - Author
-
... ... @@ -1,1 +1,1 @@ 1 -XWiki.s mmccraw1 +XWiki.rentzsch - Content
-
... ... @@ -1,10 +1,13 @@ 1 -This article was written by Andrew Lindesay (http: ~/~/www.lindesay.co.nz) around February 2005. It first appeared as LaTeX PDF and has been transcribed into this Wiki. You use the information contained in this document at your own risk. Please contact the author if you feel there may have been an error in the conversion to Wiki markup.1 +This article was written by Andrew Lindesay ([[http://www.lindesay.co.nz]]) around February 2005. It first appeared as LaTeX PDF and has been transcribed into this Wiki. You use the information contained in this document at your own risk. Please contact the author if you feel there may have been an error in the conversion to Wiki markup. 2 2 3 -= Applicability = 3 +|= Contents 4 +| {{toc style="disc"}}{{/toc}} 4 4 6 += Applicability = 7 + 5 5 The material discussed here has been used with WebObjects 5 and Java 1.4 on MacOS-X Server. It may or may not work on older or newer versions of WebObjects or Java. It is strongly suggested that you test this on a non-production server first. Note that there may be security issues with this technique if your system is exposed on the internet. 6 6 7 -= Introduction 10 += Introduction = 8 8 9 9 Java provides the substrate upon which WebObjects 5 applications operate. Java has it's good and bad points. One good point is the ease with which one can achieve threaded operation of a software system. This can often be used to make the most of your hardware assets, but also means you need to ensure that things are locked to prevent two threads getting at the same thing at the same time. Locking failures can lead to data getting damaged or when one thread won't let a lock go for some reason, other threads can be left waiting indefinitely. 10 10 ... ... @@ -14,27 +14,26 @@ 14 14 15 15 This article focuses on a technique that can be used to ascertain what is going on inside a frozen instance in a production environment. In particular this technique will obtain for you a stack-trace of the instances' threads. Armed with this information, you are much better placed to diagnose the issue and fix it quickly. 16 16 17 -= Before you Start 20 += Before you Start = 18 18 19 -A WebObjects system which is deployed in the traditional manner consists of a number of copies of the program running separately, each carrying some of the inbound load from users. Each of these 'copies' is termed an instance. The ## computer code|SiteConfig.xml## file defines the instances. This configuration file is located at the following place in your MacOS-X Server's file system.22 +A WebObjects system which is deployed in the traditional manner consists of a number of copies of the program running separately, each carrying some of the inbound load from users. Each of these 'copies' is termed an instance. The ##SiteConfig.xml## file defines the instances. This configuration file is located at the following place in your MacOS-X Server's file system. 20 20 21 -{{ code}}24 +{{noformat}} 22 22 23 23 /Library/WebObjects/Configuration/SiteConfig.xml 24 24 25 -{{/ code}}28 +{{/noformat}} 26 26 27 -Before you modify it, make a backup of the ## computer code|SiteConfig.xml## file in case anything goes wrong.30 +Before you modify it, make a backup of the ##SiteConfig.xml## file in case anything goes wrong. 28 28 29 -= Setup 32 += Setup = 30 30 31 -The instances are modified such that they are able to be connected to remotely using the ## computer code|jdb## debugging tool. Some "additional arguments" need to be inserted into the configuration in the ##computer code|SiteConfig.xml## file for each instance in order to achieve this. These additional arguments are inserted as shown below in the text of the element ##computer code|additionalArgs##. You will need to choose a different address for each instance~-~-choose addresses from 8000 - 8999. This is a TCP/IP port. Note that the additional arguments should all appear on one continuous line. The author has split this up here to improve readability.34 +The instances are modified such that they are able to be connected to remotely using the ##jdb## debugging tool. Some "additional arguments" need to be inserted into the configuration in the ##SiteConfig.xml## file for each instance in order to achieve this. These additional arguments are inserted as shown below in the text of the element ##additionalArgs##. You will need to choose a different address for each instance - choose addresses from 8000 - 8999. This is a TCP/IP port. Note that the additional arguments should all appear on one continuous line. The author has split this up here to improve readability. 32 32 33 -{{code}} 36 +{{code value="xml"}} 34 34 35 35 ... 36 36 <instanceArray type="NSArray"> 37 -{panel} 38 38 <element type="NSDictionary"> 39 39 <id type="NSNumber">1</id> 40 40 <port type="NSNumber">2001</port> ... ... @@ -43,22 +43,27 @@ 43 43 -Xdebug 44 44 -Xrunjdwp:transport=dt_socket,address=8121,server=y,suspend=n 45 45 </additionalArgs> 46 -{panel} 47 47 ... 48 -{panel} 49 49 </element> 50 -{panel} 51 51 52 52 {{/code}} 53 53 54 -The ## computer code|element## tag will repeat here for all the instances inside the ##computer code|instanceArray## tag.53 +The ##element## tag will repeat here for all the instances inside the ##instanceArray## tag. 55 55 56 -The ## computer code|id## tag gives the instance number and you need to remember the mapping from the instance number to the ##computer code|address## in the additional arguments. Jot this information down on a piece of paper. For example, one can see above that the instance 1 is mapped to address 8121.55 +The ##id## tag gives the instance number and you need to remember the mapping from the instance number to the ##address## in the additional arguments. Jot this information down on a piece of paper. For example, one can see above that the instance 1 is mapped to address 8121. 57 57 58 58 Now restart your instances. 59 59 60 - =WhenSomethingGoesWrong...=59 +Note that some JVM's on other platforms expect the "jdwp" to be specified as follows; 61 61 61 +{{noformat}} 62 + 63 +-agentlib:jdwp=transport=dt_socket,address=8121,server=y,suspend 64 + 65 +{{/noformat}} 66 + 67 += When Something Goes Wrong... = 68 + 62 62 The instances are listed in the JavaMonitor. A screenshot is shown below with the instance number circled in red. You need to first identify which instance has frozen. 63 63 64 64 [[image:Monitor-with-instances.gif]] ... ... @@ -65,39 +65,53 @@ 65 65 66 66 Once you have the instance number, using your instance to address mapping from the setup, identify the address you want to connect to. 67 67 68 -Use the ## computer code|jdb## command line tool that comes with the java environment to connect to the instance and debug it. To do this, enter a command of the following form on the application server.75 +Use the ##jdb## command line tool that comes with the java environment to connect to the instance and debug it. To do this, enter a command of the following form on the application server. 69 69 70 -{{ code}}77 +{{noformat}} 71 71 72 72 fooserver$ jdb -attach 8121 73 73 74 -{{/ code}}81 +{{/noformat}} 75 75 76 76 If you want to debug a remote machine you can use the following command. 77 77 78 -{{ code}}85 +{{noformat}} 79 79 80 80 foodev$ jdb -attach woserverhost:8121 81 81 82 -{{/ code}}89 +{{/noformat}} 83 83 84 -This may be useful in situations where your WebObjects application server does not have the ## computer code|jdb## tool installed and so you need to run the jdb tool from a host where the ##computer code|jdb## tool is installed.91 +This may be useful in situations where your WebObjects application server does not have the ##jdb## tool installed and so you need to run the jdb tool from a host where the ##jdb## tool is installed. 85 85 86 -You will now be using the java debugger. There are a slew of commands that can help you work with the debugged java system, but this article is just going to focus on getting the thread stack traces. Issue the command ## computer code|suspend## to freeze all of the threads so they can be dumped and then the command ##computer code|where all## in order to get all the stack traces of the threads. Finally when you wish to resume the threads again, issue the command ##computercode|resume##. You're advised to actually quit the ##computer code|jdb## environment as soon as you have the information you need.93 +You will now be using the java debugger. There are a slew of commands that can help you work with the debugged java system, but this article is just going to focus on getting the thread stack traces. Issue the command ##suspend## to freeze all of the threads so they can be dumped and then the command ##where all## in order to get all the stack traces of the threads. Finally when you wish to resume the threads again, issue the command ##resume##. You're advised to actually quit the ##jdb## environment as soon as you have the information you need. 87 87 88 88 = What to Look For = 89 89 97 +##jdb##'s ##watch all## command will give you a stack trace for all threads in your app. But your process needs first to be suspended in order to get a coherent stack trace. Use it like so: 98 + 99 +{{noformat}} 100 + 101 +> suspend 102 +All threads suspended. 103 +> where all 104 +... 105 +> suspend 106 +All threads suspended. 107 +> 108 + 109 +{{/noformat}} 110 + 90 90 An example of a stack trace is shown below. You'll notice the java class name and source-code line number at the end of a particular entry in the stack. Here we can see that the thread called "WorkerThread103" has stuck trying to get a session from the session store. In this situation another thread will most likely have the session store locked and is not releasing the lock. 91 91 92 -{{ code}}113 +{{noformat}} 93 93 94 94 WorkerThread103: 95 -{panel} 96 96 [1] java.lang.Object.wait (native method) 97 97 [2] java.lang.Object.wait (Object.java:429) 98 98 [3] com.webobjects.appserver.WOSessionStore.checkOutSessionWithID (WOSessionStore.java:207) 99 99 [4] com.webobjects.appserver.WOApplication.restoreSessionWithID (WOApplication.java:1,546) 100 - [5] com.webobjects.appserver._private.WOComponentRequestHandler._dispatchWithPreparedApplication (WOComponentRequestHandler.java:314) 120 + [5] com.webobjects.appserver._private.WOComponentRequestHandler._dispatchWithPreparedApplication 121 +(WOComponentRequestHandler.java:314) 101 101 [6] com.webobjects.appserver._private.WOComponentRequestHandler._handleRequest (WOComponentRequestHandler.java:358) 102 102 [7] com.webobjects.appserver._private.WOComponentRequestHandler.handleRequest (WOComponentRequestHandler.java:432) 103 103 [8] com.webobjects.appserver.WOApplication.dispatchRequest (WOApplication.java:1,306) ... ... @@ -105,18 +105,16 @@ 105 105 [10] com.webobjects.appserver._private.WOWorkerThread.runOnce (WOWorkerThread.java:173) 106 106 [11] com.webobjects.appserver._private.WOWorkerThread.run (WOWorkerThread.java:254) 107 107 [12] java.lang.Thread.run (Thread.java:552) 108 - {panel}129 + 109 109 WorkerThread101: 110 -{panel} 111 111 [1] java.net.PlainSocketImpl.accept (PlainSocketImpl.java:351) 112 112 [2] java.net.ServerSocket.implAccept (ServerSocket.java:448) 113 113 [3] java.net.ServerSocket.accept (ServerSocket.java:419) 114 114 [4] com.webobjects.appserver._private.WOWorkerThread.run (WOWorkerThread.java:238) 115 115 [5] java.lang.Thread.run (Thread.java:552) 116 -{panel} 117 117 ... 118 118 119 -{{/ code}}138 +{{/noformat}} 120 120 121 121 If you have a look at what the other threads are doing at the same time, it is hopefully possible to ascertain what area might be at fault. At the very least, one can figure out what part of the application is at fault. 122 122 ... ... @@ -123,3 +123,7 @@ 123 123 = Conclusion = 124 124 125 125 Despite the simplicity of this approach, it provides for a means by which you can find out what is going on inside frozen instances rather than playing laborious guessing games. 145 + 146 += Alternative Approaches = 147 + 148 +[[http://www.gvcsitemaker.com/gvc.webobjects/faq&mode=single&recordID=41413]]