Chuck Hill: [0:03] The next section is web objects optimization. EOF and beyond. For those of us [inaudible 00:10] market place tomorrow, I'm Jack Hill, Vice President of Development of Global Village Consulting based out in Vancouver BC in Canada. [0:19] A couple of words about optimization. I've been asked several times to come onto projects that have run into performance problems and help them try to troubleshoot it. With only a couple exceptions ,it's been pretty easy to fix. Usually, I take a look at it and after an hour or two I'm thinking, "Well, of course, it's slow. What did you do that for? How can you expect it not to be slow?" [0:43] Most of what I'm going to talk about today, it's not magic it's not rocket science. It's just things to think about, things to be aware of. Ways to approach the optimization. [silence] Chuck: [1:03] Where is it now? [laughs] Man 1: [inaudible 01:41] [1:14] Chuck: [1:47] I hope. So the way I'm going to look through this is to follow the architecture. Looking from the browser, the web server, the role adaptor, your application code itself, and the database. And see what opportunities there are along the way to A screw things up and B make things faster. [2:15] There's basically three kinds of optimization that I see. Ones that are pretty easy to do and they have a big payback. Ones that are pretty difficult to do and they have a small payback. They're really only something you'll want to do if you have a really heavily loaded site and you're trying to squeeze every last bit of performance out of it. The last ones are application specific. [2:35] I'm just going to talk about one at the end of the presentation to give you an idea of the things. These are things that are really domain specific. It's really something that only your app needs to do to optimize. You take a look at something like the iTunes music store. I'm sure they've had to do all kinds of crazy things to make that as fast as it is, that we don't do in our applications day to day because we don't have that load on it. [2:58] At least I don't have that load in my applications. A good thing to keep in mind here is most of this stuff is pretty easy. You just have to remember to actually do it. A word of advice in this, most important thing to do when you're optimizing is to make sure that you measure. Don't guess. [3:18] If you just go in there and say, "My app's slow, I've got to do this, I've got to do this, I've got to rewrite this." You're going to end up spending a lot of time for not very much benefit. You really have to measure the app. Go in there, take a look at it. Which pages are slow? Which actions are slow? Which situations are slow? [3:36] Then focus in on that and say, OK, what part of that is slow? Where is the problem here? Measure it to see exactly how big that problem is. Figure out what you're going to do about it, do something, measure it again. Now, how much improvement did you make? If you didn't make any improvement, maybe you should get rid of that change that didn't help. If you did make an improvement, is that enough? Or do you have to go in and do something else? [4:03] When you go through your app, you'll probably find a lot of things that aren't quite as fast as you'd like them to be. You can spend an almost infinite amount of time optimizing your application, and in the process, wasting a lot of money and time. [4:16] You really have to look at seeking high return on investment. What are the critical slow parts? If you have something in your application that's slow, an administration function, and people use it once a month, who cares if it's slow? Is it worth spending a week of effort making it fast? Probably not. If there's something that the users are using all the time that's critical to your application, then, yeah, that's it. [4:40] When you go through here, make a list of things that are slow, figure out which ones are the most value. Then the second thing you have to do is go through and take a look at them and say, "Well, how hard is it to optimize this?" Some things you might realize that, I'd really like to optimize that, but that's two months of work and I just can't justify it. Some things you're not going to optimize. [5:00] Go through and start picking the stuff that's the most value, the least effort, and keep going through it till you reach a point where you say, "OK, it's fast enough, I don't want to spend any more money on this." [5:14] A word about avoiding premature optimization is, a big tendency for engineers, I find, especially, to say, "OK, we can have this, we can do this, I'm going to make all this really fast." Often times, that's not the problem. You end up finishing applications, the performance problem was something else entirely. [5:31] Try to avoid spending money on it until, A, you know it's a problem, and B, you've measured it, and C, you've figured out that it's probably worth the money to address it. [5:41] There are a few things I'll talk about that you can do just as part of your standard development process, as part of your standard coding that don't really cost anything to do. It's an equal, you can do it bad or you can do it good. Same amount of effort, but the result is different. Those are the kinds of things that you should be doing all the time. [6:02] We talked quite a bit before about needing to measure the performance. An important thing about this is to use a realistic set up data. If your application is expecting to be processing 50,000 rows or 500,000 rows in production, and you're testing it with 500 rows, there's a really good chance you're not going to notice that it's slow. [6:25] For a lot of things, the experience, maybe you don't need that much data. Often, now, I can take a look at a query, I can look at the query plan, and I can say, "OK, that's just never going to go." It's only taking a tenth of a second now in development, that's just never going to go in production. [6:40] But your best bet is to have a completely realistic set up data, build it up, try and get a good spread of data. Think about what the real values are going to be. If you have a column that can have 10 different values in it, in reality, 80 percent of them are going to be one value and the other 10 percent are going to be the other 9, then model your data like that. You're really testing with what's actually going to happen. [7:06] When you start doing performance measuring, be careful of the first request the application makes, because in that time, it's loading up the object cache, it's building our page templates, it's doing a whole lot of work. You'll find if you do the same action the second time, it's a lot quicker. [7:24] When you're going through, if you're doing load testing, performance testing, discard the first set of request data, because that's outliers. That's not really how slow your application's going to be. [7:35] For measuring, there's a range of different options. JPerf, there's JMeter. I think Sharp is still in OSX. There's still built-in tools in Java. There's a lot of different tools out there that you can use to do it, and it doesn't have to be that complicated. You can just create a time stamp, start at a thing you want to measure, create another NStimestamp at the end, subtract them, that's how long it took. [8:01] You don't need to spend a whole lot of time building up a big performance testing suite. You can just do it very simply with NStimestamp and logging. Wonder also has some interesting functionality for this, and we'll talk about that in just a minute. [8:15] Other things you can take a look at is the event systems. The WoE events and the EO events can be useful for measuring some types of activity. This is the thing from Wonder. It's another gift from Mike [inaudible 08: [8:28] 32] at Apple. It was developed internally at Apple for something in the music store. I'm not sure what. It's one of the few tools out there that's really WebObjects specific. The other tools just give you generic, OK this much took this time, this SQL took this much time, this much in memory. [8:48] This is really WebObjects focused, and it really helps you understand how your application is functioning and understanding why it's slow. I didn't actually test this recently, so I'm hoping it's still working. It worked last time I used it. [background chatter] Chuck: [9:08] It works? OK, good, thanks. I'm always worried about that with Wonder. [background chatter] Chuck: [9:15] Yeah, there's a good presentation on this, and inside the framework itself, in the Wonder source, there's a package, HTML, that explains how it works and gives you a lot of detail. I'm not going to do a whole lot of presentation on this. It's been heard before. But it's a really good thing to look at it. It's maybe not super-suited for Rest. It's really aimed at page based optimization, for people who have page based applications still. [9:42] It looks at the statistics, so what we can see over here is that the profiler says this page took 194.71 milliseconds to render. Of that, there were 1,228 events or 39 percent of them that took less than one millisecond. You've got a whole lot of things that are being done over and over and over and over and over and over again, so that may be an opportunity to do some caching in your page. [10:16] 24 percent of them took less than 10 milliseconds, and there were two things that took more than 10 milliseconds and less than 100. For things to get, I might start looking for slow SQL or some other calculations in my page. See if that's something that can you render, can you cache it, can you move it out? Happily, there was nothing that took less than 100 milliseconds. [10:40] It shows things like the SQL that was executed. There were four statements that took seven percent of the time. That's a pretty low percent, so I might not be too worried at that point about the SQL. You can click on that. You can drill into what the SQL was generated, where the penalty for the time was taken. [10:59] For Direct to Web rules, you can click this. It will show you the rule evaluation. I'm not a big Direct to Web user, but I think that's also a good thing for debugging as well. These three show the percentage of each part of the request response loop. To take values, invoke action, and pend a response. [11:20] This wasn't a form submission. It took nothing to do that. Invoke action didn't take anything. Everything was in the drawing of the page. 92 percent. Now you notice that doesn't add up to 100 percent. There's another eight percent that's outside the page thing but still part of the request. Those things won't usually add up to 100 percent. [11:42] You can click all three to combine them all together. The other really neat thing about this if you're having a hard time figuring out what's wrong is the heat map. This is an odd link because clicking that turns it on, not off. That always confuses me. But, anyway, if you click that and you load a page, it will analyze what's in the page. [12:01] It will put dibs in it. You'll get colored areas. Red's really slow and orange is a bit slower and yellow's not too slow. So it actually color codes your page for you, what parts of it are slow. You can take a look at it and say wow my nav bar is taking up 80 percent of my time. What are they doing in my nav bar. [12:20] For page based stuff, it's pretty easy to just drop it and turn it on. Like you said there's a presentation from a couple years ago and also the documentation in the package. Here's a look at the end to end architecture that I want to take a look at. Start addressing things that what can you do client side to optimize your application? What do you do in the web server, what do you do in the role adaptor. The big thing is usually going to be either in your [inaudible 12: [12:37] 55] itself or something in the database. But it's worth looking at all the things. Some of us do have very heavily loaded sites or very tight machine constraints. Browser consideration. The first one's halfway between the browser and the web server. But if you compress the response going down to the browser then you've got less network traffic to deal with. Not a huge issue, but it's pretty easy to turn on. You can do it in your application just by setting this [inaudible 13: [13:06] 27] application response compression enabled as true. You can also do it in Apache. [13:32] Other things you can do aimed at the same thing, reducing the amount of data that's going back and forth. Use minify in your JavaScript. Combine all your CSS files into one. Instead of going back and forth getting a bunch of them, it just pulls back at once. [13:49] If you want to get really fancy, you can combine images into sprites and slice them up in your pages in JavaScript. Again, that's out there for something that's using a lot of small images. It's not really an easy thing to do. I'm just mentioning it for completeness. You can also use minifying your HTML in CSS, get rid of all the excess white space out of it. Again, [inaudible14: [14:09] 11] things aimed at reducing the amount of data you're sending to the browser and reducing the amount of trips back and forth. The web server side, some things you can look at. [inaudible 14: [14:22] 25] are a couple of ways of having the compression on if you're not using Wonder. [inaudible 14:33] Spiders is something that's given me trouble over the past few years because it's often not set up the way that you want it to be set up. What you want to do when you start running your application is you want to open up the developer tools in Safari, assuming most of us probably use Safari. [14:50] Take a look and see what's being cached and what's not getting cached. Because what I've seen a lot of time is that the JavaScript and the CSS and all the other stuff that you don't want to have downloaded each time isn't actually getting cached. Because Apache's telling it not...Just cache for two seconds, that'll be enough. The other thing I've had where it will actually cache HTML and you don't want that happening especially if people are moving back and forth quickly. I've seen some really odd problems with that. So [inaudible 15: [15:07] 19] some basic settings. [inaudible 15:20] by default. [inaudible 15:23] second default. Again, you'll have to decide for yourself how often you want it to go back for the JavaScript and the CSS. How long you think it's going to last for. But at least let it be cached for an hour. [15:37] For text HTML, one second or less. There's a whole lot of settings there that you can play with in Apache. I am not Apache guru, I do not play an Apache guru on the Internet. Please don't ask me Apache questions. These are mostly from my partner, Sasha. [15:57] In general, the higher these numbers are the better your performance is going to be. The other side of that is the higher the numbers are, the more resources you're going to use on your server. So you have to balance that out with the resources you have available on your web server. There are a couple other things that you can look at doing in Apache and reducing the logging because that can be slow. [16:22] If you take a look on the Internet, there's an endless source of articles on Apache tuning. I'm not going to try to tell you any more than just some brief suggestions and what to look at here. Another thing that they can sometimes be useful is to be able to split your site up so you have the app going through one and you have another instance that's just dedicated to serving the static resources. [16:49] There's a few things you can do in the role adaptor. Not a whole lot. One thing you might want to take a look at is the fast CPI adaptor in Wonder. I don't recall what the timing difference is with this and the basic Wonder adaptor. But if you're running into problems where you think that might be an issue that's a good thing to take a look at. Man 2: [17:08] Could you use a [inaudible 17:12] . Chuck: [17:11] Yes. Maybe you could talk about that in the question and answer. I don't know a whole lot about that side of things. Man 2: [17:16] I'm just saying the [inaudible 17:18] CPI itself is a separate processor. So you can connect it to any web processor. Chuck: [17:21] Yes. Man 2: [17:22] Whereas the [inaudible 17:22] that we usually use in Apache [inaudible 17:25] . Chuck: [17:28] Yeah, the base [inaudible 17:31] object adaptor is very much Apache specific. Whereas assessed CGI is generic. I'm going to talk a little bit about the worker threads and the [inaudible 17:43] size. The defaults of these are huge. [17:48] If you're using the defaults, your application is open to having a request storm. What will happen is that because these numbers are so big, the adaptor will start sending requests. Something will happen in your app. The database has a hiccup, you hit a slow process, and then requests start backing up. But these numbers are high. So your apps saying to adaptor sure give me more work. Bring it on, bring it on. [18:14] After the event is over, you still have this huge backlog of work. So even though you don't have a problem anymore, your app's processing things normally. From the users point of view, it's not because they submitted their request 45 seconds ago and they still haven't got a response. What do they do? They hit reload. Now, you've got more work in the queue and you app's still not got to it. [18:36] 30 seconds later, you still haven't got a response so they hit it again. They hit it again and they hit it again, and your queue just builds up and up and up. Eventually, you get no instance available. It's far better to keep these numbers low because you avoid two things. One, you avoid overloading any one instance. Two, you avoid overloading all of them. And three, you avoid getting to the point where you can't recover. [18:59] You can get to the point where it's so bad the only thing you can do is shut down the web server, shut down the web applications, and bring everything back up again. That's the only way to recover. The minimum adapter threads and the maximum adapter threads are the only things most of us have to worry about. [19:21] Adapter threads are for people way back in 4.5. Is anybody still using 4.5? Not here. One person? There's a couple people on the list still using 4.5 so you've got a different setting to worry about. Same principle applies. Most of us will be concerned with the minimum and maximum. I would say keep it in the two to six, two to eight range. [19:42] Listen queue is used when all the worker threads are full. With this setting here, if you get eight concurrently executing worker threads, so eight worker threads processing eight user requests actively, not waiting, actively processing them, your application gets more requests. It'll stick them in the listen queue size. [20:02] Once you have eight actively processing worker threads and four worker threads waiting for processing then the app will tell the adapter no more, no more. That's a reasonable number. If your app is doing a bunch of really small, quick, concurrent requests, you might want to have more adapter threads. If your app's requests tend to take longer and they're slower, you might want to have a lower number. [20:29] Really, you don't want to go too high because you're going to hit the point of no return. [silence] Chuck: [20:39] As far as I know, and tell me if I'm wrong please, the only load balancing that works is round robin. [off mic question] Chuck: [21:03] I know some people have been asking questions and trying to do some work on it. Man 3: [21:11] For the load balancing methods, the load averaging method should work but you'll need a recent mod web objects adapter that was compiled in the last month or so. If you know how to get to it, there's the status page for the adapter and it'll tell you which load balancing methods are installed. If it lists load average then that means it should work. Chuck: [21:40] Previous to that, the options were round robin and buggy. The thing to be aware of, of round robin is it doesn't care about your machines. It just goes OK, instance one, instance two, instance three, instance four, instance five, instance six. It just goes down the list from top to bottom. [22:00] If you've got four servers and you've got instances one to 10 on one server and instances 11 to 16, like that, it's going to load up one server completely and put nothing on the other three. When you define them in Java Monger don't just throw in [inaudible 22: [22:10] 16] . You have to go through and first instance goes to machine one, second instance to machine two, third instance to machine three, fourth instance to machine four, OK five back to machine one. When the requests come in, it will distribute them more evenly. [22:31] It's particularly important when you're starting up because otherwise you can just slam one machine and the other ones aren't going to help users at all. Now, we've gotten out of the client, we've gotten away from the web server. We're back looking at our application. A few things you can do. Really should be setting with allows concurrent request handling true. [22:57] The only reason to have this false is if you have some really awfully bad code in your application and I'd hope nobody has any really awfully bad code. [23:07] This is a default from a long time ago. Again, I don't know why the default is like that. I suspect it probably didn't exist in web objects before so when it was added, the default was the old behavior. There's really no behavior not to be dispatching requests concurrently. Man 4: [23:29] In the past, before, I think six months ago, that line was not in the [inaudible 23:36] but now it's in the template so any new projects should be substitute by default unless you remove the line. Chuck: [23:45] For new projects, this should be set. For old ones, make sure it's set. Next one, caching. It's talking about caching of the templates for components. For development, obviously, you don't want this cached because you want to be able to make changes and quickly save and rev up your application. [24:00] For development, you don't want it to have to keep going back to the file system and saying has it been saved every time you touch a component so turn it to true. I don't think there's a code setting for the next one, bugging enabled. Set it to false. I think most of us are using Log for J now anyway to do our logging. [24:20] The session timeout is something that you need to think about tuning a bit, mainly for reasons of memory usage. The more sessions that you have in memory that are alive that users aren't using anymore, that's just taking up your heap space and it's not doing you any good. [24:42] Unless you know that your users are going to log out, and I've never seen that happen, once in a while, they'll remember to but usually they don't. You should set the session time up to something less than what the default is. I would say about 10 minutes on average. This is really application specific. You have to figure how long are my users going to be in the application not doing anything? [25:06] That's really what the session timeout is talking about. How long can a user sit there with a page open, not do anything, and then go back and expect to do something? Are your users just going to get up and go out for lunch and come back later? Do you need to have it open for 90 minutes? I usually try and keep it pretty small. [25:26] There's things in the Ajax framework like Ajax ping that you can stick on a page and as long as the browser's open this'll keep pinging the back-end and keep your session alive. After they close the window, 10 minutes later their session goes away because after they close the window, they're not coming back. Again, this is something you really have to tune based on what you expect to happen in your own application. [25:51] Page cache size, the default is 30. I see users use the back button quite a bit. They still do it. I've never seen them hit it 30 times in a row. Even the most aggressive one is not going to sit there 30 times in a row backing up trying to find something so having 30 pages in your page cache is too much. It's just a waste of memory. [26:12] You have to tune this to how much of the back button can you tolerate, how much are you going to support, how much memory did you want to allocate to it? I usually give them about five. [26:23] Again, the permanent page cache isn't something that gets used very much unless you know your application is using it and needs more than five then set it back to five again. The idea here is to reduce the memory usage over time so that you have more free memory for people who are actually doing something in your applications and less memory dedicated to users that have gone off to do something else. [26:47] I'm skipping over session. You might notice there's really nothing performance-wise to do in session. It's all in application. We'll talk a bit now about WOA components. This is another place where you can get some benefit without doing too much effort. Stateless components have been covered in past sessions. I'm not really going to cover them now. [27:03] The advantage of them is they use a lot less memory because you have one or two or three instances and they're shared across the whole application because they don't handle any state. They're set to be shared across the application. That really reduces the amount of memory you use because you're not constantly creating new instances of them. [27:21] The down side of stateless components is they're a little bit more effort to write. Not a whole lot but a little bit. Automatic synchronization. Automatic binding synchronization is amazing. It's great. You can do all kinds of things with no effort at all, but the point is if you're not putting out the effort, your application is putting out the effort. It's moving data back and forth. Even if there's been no change, it's moving data back and forth and back and forth. [27:45] That actually can use quite a bit of processing time so if you're at the point where you're trying to make your page faster turn it off and do manual synchronization which means you're doing value for binding on your page instead, set value for binding, pushing and pulling and you know in your code, you need to get or set the value. Again, it's a trade-off between ease of use and slow, more work and fast. [28:13] The next one's a simple thing you should do as part of your coding practice. It's not really an optimization. It's equal effort to do it both ways, one's faster than the other. Return context page null. If you return null, it's still going to go through all the rest of the page trying to see if somebody else is going to return something that's not null. [28:29] As soon as you return context page, it stops, no more processing, and you're back on to the next action in it. [28:38] Lazy creation is more of a Java pattern than a component thing but a problem that I see a lot in components is that people have methods that are expensive and they get called a lot, several bindings in the log file refer to them. Doing something like this, you can defer the calculation. That's one thing. If conditionals, whatever, in your page mean it's not getting used, it doesn't get calculated. [29:09] If it does get calculated, you've cached the value so you don't calculate it again. If it's something you don't want cached too long then in your awake method set everything to null so it gets recalculated on the next round through the page. Man 5: [29:23] Actually, I just wanted to point out that sum value has the same name as sum value so egregious use of underbars is really not necessary. Just FYI. Man 6: [29:41] For a stateless component, I suggest that you watch the Mark Ritchie presentation from 2009 about stateless components design. It was making stats about it. Chuck: [29:58] Actually, I have a note here to mention that. Mark Ritchie's presentation 2009. About 40 minutes into the session, he starts talking about that and he does a really good job of it. It's really well worth going back and looking at that if you're not using this. [30:17] Next, we move from low components onto Java. Again, these are simple, not amazing, things you can do that will have some performance benefit. The first three are really aimed at either not allocating memory or reclaiming memory quickly. [30:36] Somebody wants to create an integer, they'll do new integer eight. There is, however, the value of method and what that does, it holds a cache of integer instances. If the value you want is in the cache then it just returns it to you without creating a new one so everywhere in your program you can use the same instance of integer one rather than creating a new instance all over the place and you don't need it anymore then a garbage collects it then you create more, then a garbage collects it. [31:00] Integer long. There are several other math related classes that have that and it costs you nothing to write that compared to writing it the slow way and it saves you all the memory allocation problems. String builder is faster than string buffer. String buffer is synchronized so it's safe to use across threads. I don't think I've ever use anybody use a string buffer across threads. [31:23] You can use string builder exactly the same API, drop in the API, not synchronized, not thread safe, much faster and, again, costs you nothing. Man 7: [31:34] I've seen previously where there's ERX constant dot either integer of or value of. Is that now out of favor? It's the same construct as integer dot value of. We started using the RS constant dot zero integer, dot one integer, and it has the same benefits that it caches whatever you ask for. Chuck: [32:00] I don't recall the implementation of that but it sounds like it's equivalent. Just avoid new is what you want to do. Whenever possible, avoid new, reuse caching. Especially for a lot of the Java values which are not mutable, anyway. Obviously, null references when they're not needed. As soon as all the references go null then the garbage collector can pick things up. Not a big thing but it can help in some situations. [32:27] Give your app enough heap space. You might notice some of your applications run fine for a while and then they get slower and slower and slower. Probably what's happened is you've hit memory starvation. Instead of your app spending 90 percent of the time running your app and 10 percent doing garbage collection, it's spending 90 percent of the time trying to find enough memory to do 10 percent of the processing. It's important to give it enough memory. [32:52] It depends on how many instances you're running, what's available on your server, what else is running on your server. For any good sized app, between 256 and half a gig of memory is probably a good place to start with. I don't want to go into too much heap tuning because, again, I'm not a heap tuning expert. I never really found the need to do some of the fancy, crazy Java heap tuning tricks that people have out on the net. [33:22] If you're interested, you can read up on them. Mostly I find just give it enough memory. If you make the lower level higher rather than lower then it does all that allocation at once and you don't have to do it bit by bit by bit by bit as your application starts up so it'll make it start up faster. [33:43] Now, we can talk a little bit about EOF and the snapshot cache. This is the one thing that really differentiates Web Objects from most of the other systems out there is it's not just SQL going to the database and pulling rows back. There's a snapshot cache in-between that will be used. Whenever you do a fetch specification, whenever you go across a relationship, whenever EOF goes to the database and pulls back enterprise objects, it takes the row data and it stores it in the snapshot cache in the EO database objects. [34:17] Snapshots are a collection of, number one, the data. Obviously, we need the data from the row. Two, a global ID. The global ID is what ties that particular snapshot to the EO as it may live in different editing contexts. Even if you have the same object in different editing contexts, they're all sharing the same single snapshot. It has a retain count and this is to keep track of what's in use, what it can discard. [34:44] Each time you trade an instance of an EO in an editing context, it increments retain count in the snapshot. Each time you discard the editing context or the object goes out of scope, the retained count goes down. When it goes down to zero then the object goes away and the memory is retained. [35:03] You'll see problems once in a while. I think there was a bug fairly recently in the list about child editing contexts, which the retain count would incorrectly go down to zero, then leaving you with an error message you try to save saying there's no snapshot for this object you're trying to save. That should be a bug. [35:24] There's also a fetch time stamp on it and that allows EOF to keep track of how old the data is. Each time you fetch data, the editing context will say you can use the snapshot not when you do a fetch but when you try to get an object from the global ID. The editing context will say I want data but not older than this. If the fetch time stamp is before that, it'll go back to the database and get an updated snapshot for it. [35:52] The most important thing to know about the snapshot cache is it's fast. Going back to the database is slow. This is fast. Whenever possible, you want to use the snapshot cache. How do you use it? Following relationships is the most common way that we use it. Every time you follow a relationship, it doesn't go back to the database and go select that row. [36:11] First of all, it goes to snapshot cache and do you have a snapshot for this row? No? OK, I'll go to the database. Do you have a snapshot for this? Yes, you do. OK, give it to me. [36:20] The second way you can do it is by doing object for global idea or fault for global ID. That does pretty much the same thing that happens when you follow a relationship. It takes the global ID, it looks it up in the database, and it says database, is there a snapshot for that and returns it. With these, one of them you get I think object for global ID will give you a null and fault will actually create a fault for you. [36:49] Using the snapshot cache, you do have to think about object freshness and that, again, goes back to what you need in your application. Which objects change frequently, which objects don't change frequently. Something else I was supposed to say about that, I know. [37:13] There's been a couple past presentations on freshness, as well, so you can go back and look those up. I'm not going to repeat myself. A couple things to know is if you're doing a fetch spec you're always going back to the database. Even if you're fetching data and all of the snapshots are in memory and all of the snapshots are current, it's still going to go back to the database. [37:32] It's going to pull back all the data and it might just throw it away. It goes yeah, that's current, that's current, never mind, I'll throw all this stuff away. You do the snapshots in the end anyway. [37:44] You really have to be aware in your application do I really need to fetch? In a lot of cases, you can do something else like caching the objects. Put them in a cache. I'll talk about an object in a little bit that you can use for caching and get them out with a global ID or get them out with a key value instead of incurring the cost to go back to the database and get the data that you already have. [38:04] Raw rule of fetches, there are no snapshots. They will always go back to the database. [38:14] I want to talk about some ways of avoiding the fetch specs going back to the database. Two possible classes you can use here. EO shared edited contexts, which may now mostly be bug free. It's had a very troubled history. Some people like it, some people hate it. I don't want to get into the religious aspects of it. [laughs] Tastes great, less filling. [38:44] Both of these things address the problem of data that is read very frequently by your application and modified very infrequently. That's the data that you don't want to be going back to the database for. That's the data you want the snapshot cache for. Both of them also have the advantage of they prevent the retain count from going down. They prevent the snapshots from getting discarded. [39:12] The advantage of the EO shared editing context is that it doesn't require any changes. It's the same API as a regular editing context. You'd have to do very little in your application to make use of it. The downside is it might have a few bugs in it, maybe. Your mileage may vary. I don't use it myself. That's mostly paranoia at this point, I think. [39:36] EO Enterprise Object cache is from Wonder and it requires more work. It doesn't look like an editing context, it doesn't act like an editing context. You have to write code to make it work in your application. What it does is it allows you to build up a cache of objects that you can identify by a unique key. You can, if you want, use the global ID, but for most things, it's a lot more useful to have a meaningful key for it. [40:00] Maybe it's a cache of department objects and you've used the department code to pull them out. Really useful things like look up lists, all kinds of things you don't want to go back to the database for. It might have bugs but you have the source and it's commonly used so it probably has less bugs than EO shared editing context, which is a very complex piece of code. The EO Enterprise Object cache is not. [40:29] The final thing you can take a look at in EO model, there's a cache in memory option and you can turn it on. When you fetch an entity, it'll read in all the instances, cache them in memory, and never go back to the database again. The problem with this one that you don't get with these other two is that it never refreshes. With the shared editing context or enterprise object cache, you can go back and get fresh data. With this one, you just click it in the EO model and you're done. [40:58] It's useful for things like sex. There's male, female, won't tell you, whatever. They don't really invent new sexes every day. The months of the year, pretty static. They're not going to change very much. Countries in Europe, maybe not so much. You've got to take a look at what you're thinking. If it's something that is really static and you never want to go back, just flick it on the EO model and forget about it. There you go, it's optimized, it's done. [41:29] We'll take a quick look at how to use the ERX Enterprise Object cache. It was written over a couple of years with a couple of different purposes in mind. There was a couple of different constructors. This is the one that really uses the snapshot cache I'm talking about. It starts out with the entity name, matches what's in the model. The KeePass is into the entity and the intention is that it returns a unique value for each object. [41:58] For example, a department object would have a department code or a person object would have, I don't know, something unique. A restricting qualifier means that you can cache just a subset of the data. You can have a cache for men and a cache for women or I have a catalog of items that are for sale, I only want to cache the ones that I'm actively selling in the store. You can put a restricting qualifier on that to filter out what goes into it. [42:30] There's a timeout which indicates how long the object should stay in the cache before it goes back to the database and gets a fresh copy of it. Should retain objects. I'm not really too sure why you would want this to be false but it's an option. Maybe if you have a bunch of things and you're using it like a most recently used cache. As long as you have some reference in memory, it'll keep it around, otherwise it'll go back into a fault and it'll fetch it again. [43:11] Should fetch initial values, again, depends upon the model. If you know you're going to need all the data for that entity, set this to select star at the start of it. If you're only going to maybe need to cache 15 or 20 percent of the objects in an entity, set it to false and it'll just cache the ones that you refer to. The first time you refer to it it'll go get a copy and then after that it won't. [43:35] It's a tradeoff between can I do one big fetch and get everything I need now or is it more efficient to do fetches of the few things that I do need? Should return unsaved objects. Objects that are created that match the restricting qualifier automatically go into cache so you have to decide in your application's logic whether those are valid values to return from the cache or not. [44:03] To give you an example of this, I've used this in some things where I was importing a lot of data and the data had interconnections with each other and they only saved it when I had a full, complete object reference. I wanted that true because once I put an object in it, say an order into it, and then I got an order item, I wanted to be able to find that unsaved order again so I set that to true. For other cases, maybe you don't want to see any unsaved objects. [44:30] A quick example of the usage of it, entity name is branch office. Branch code is the KeePass into it giving it their unique value. No restricting qualifier. No timeout because we're pretty sure if they add a new branch to the office we'll be restarting our applications before that happens. I want all the branches so I set should retain initial values to true. Should retain objects because I want to keep the snapshots around and should return unsaved objects because I don't have any reason not to. [45:03] Then you can write code into your application like branch office code and simply pass in the code to the editing context. It'll fetch it out of the cache for you. There's an example in the bottom fetching the branch office from Montreal with my made-up Montreal code. [45:27] EOF is really useful. It prevents us from doing a lot of work, but sometimes, EOF is not your best friend. A good example of this is bulk deletions. When you're cascading deletes because EOF is not going to go yeah, you want to delete all those, I'm just going to put all those SQL out there. It's going to fetch each one of those and go is it OK to delete this one? Yeah. OK. Is it OK to delete this one? Yeah. OK. It can take an incredibly long time if you're deleting a big chain of objects. [46:02] Rather than do that, I pass EOF and just kill them out yourself. ERX access utilities is probably the easiest methods to do this. Delete rows, describe a qualifier if you just want to delete everything. Update if you want to update everything. That's all your objects that match a certain thing, you want to set visible to not visible. You can even do bulk inserts if you want. [46:30] The thing to be aware of, though, is that when you do this EOF's not going to know. If you have the object you fetched in memory and you do this and go behind its back and delete it, it doesn't know that. If you try to update that object you're going to get an error. You might want to consider, after you do this, doing invalidate all objects just so everything in your cache is tossed away and brought back again. [46:54] Again, it depends on how much you're deleting. Do you think it's in the cache or not? A lot of things. If you're going and doing bulk delete you're probably deleting data from a month ago you don't need any more. It's a pretty good guess data from a month ago is not in your application. He keeps his data around longer than a month. Man 7: [47:17] Everybody should be aware that invalidate all objects will kill your app immediately if you still are going to use any objects that you had before in your app because every object now is a forward and every forward will be a single SQL statement to be re-fetched. Never call that method. Never, ever use it. Chuck: [47:42] It depends on the situation. If you know your cache is full of a bunch of objects you just deleted, that might be worth it. Update, restart the app. That's the other option. Man 7: [47:57] You can invalidate objects with global IDs, right? Chuck: [48:01] Yeah, but you don't have them in memory because if you had them in memory, you wouldn't have to do this. Anyway, the takeaway from this is if you do this, EOF's not going to know and it can make your snapshot cache out of sync with what's actually in the database now. Man 8: [48:17] Invalidate objects will invalidate objects in the current editing context so you can localize the EO you want to be invalidated and invalidate that editing context so you don't invalidate absolutely everything in your application. Chuck: [48:39] It's really mostly a problem with delete and any time I've ever had to do this I've pretty much known the objects weren't going to be in memory. That's the reason I was deleting them. They were old. They were obsolete. For update and insert, you can just do a fetch and fetch in all the new data, repopulate your cache like that. [49:00] If you don't want to do things by qualifier, you just want to go do raw SQL, sometimes hand-coded raw SQL is, in fact, the answer. It's usually what we try to avoid with EOF but sometimes, for reasons of performance and optimization, hand-coded SQL is the right answer. ERX access utilities evaluate SQL with identity name. Man 9: [49:26] Actually, I've gotten to where I don't make myself feel guilt for using SQL but I do mark that so that I can always find it because it's a pain to find. You're constructing strings and things. You can write it in such a way that it's hard to find later. Chuck: [49:44] I try to minimize. For some SQL, it actually does make it vendor specific, which some of you may care about and some of you won't. A couple words in using SQL. Sometimes there just is no other way. If you want your app to perform within a certain boundaries of time you have to use custom SQL. Sometimes, it's easier than writing a custom EOL and custom EO qualifier. [50:13] There's quite a few advanced qualifiers out there in Wonder. There's also some in a framework called the How-To Frameworks. Those will generate a lot of kinds of SQL for you. I often find I get in a situation when it's not what I need to be optimized and writing an EO qualifier is hard. It's really hard. It's a lot easier just to write an SQL string and say done with it. [50:36] To use custom SQL, your utilities, your access ERX EO access utilities, both have methods that will use custom SQL to update, insert, delete. When you need it, you need it. [50:56] This component is aimed at addressing memory usage. A lot of times, for whatever reason, users want to fetch in a ton of data. You keep going back to it saying no, put some qualifiers in it. Which data? I don't know which data I want. I want to see it all. Try not to let them do that but sometimes they will. What this does is it reduces the amount of memory you use. What it does is by default it goes out there and it fetches all the primary keys and then it just fetches a batch of objects. [51:31] Maybe you're showing 20 in each batch so each time you click next, it just pulls in 20 rather than pulling in 200,000 at once. Kieran, I believe, is working on a limit function for it, as well. Both of them. Man 9: [51:52] The limit has been implemented in some plug-ins for a long time for SQL and MySQL. The limit is just at limit fetch 100, but to do ranging for SQL, I found some rest apps and with rest, every time our state was rest, we were fetching maybe a million primary keys just to get 20 and every time we created a new EOF to fetch. My goal eventually is to get that just SQL batching only. [52:21] It's a limit option with the two things, the offset and the mark. And that's in the full response. And so there's a full request for that option in ERS fetch spec, and it's implemented in the MySQL plugin. It's going to be very easy to implement it in the other plugins, too. It's pretty simple. Chuck: [52:39] The difference between using a range and fetching the primary keys is twofold. One, if you're fetching the primary keys, and there's still two million rows you're fetching, that's still a lot of data, just step to two million primary keys. By using a range fetch, you avoid that entirely. [52:58] The downside is if you fetch all the primary keys, then you have a consistent state of the data at a point in time that you can show to the user as they move backwards and forwards between the pages. If you're doing it by ranges and your data is changing as the users are going backwards and forwards, rows are going to be moving backwards and forwards between the batches that they see. Maybe something you can live with, maybe it's something you can't live with. [53:26] For navigating between the batches, there's a couple components, there's a ERX batch navigation bar. If you want a more Ajax way of showing it, there's the Ajax grid and the Ajax grid navigation nav bar. The Ajax grid uses a New York's patching display group under the hood to pull in the data. [53:50] This is an easy thing to do. This is the thing that I would do as a premature optimization. I wouldn't leave this until I have a problem. [54:01] What it does is, it optimistically faults in objects. The OS says, "Well, if you're going to need this one, you're probably going to need these ones, too. I'm just going to pull it all in, in a single select instead of letting you do 10 individual ones." [54:14] You can set this in two places. You can set it on the entity, so anytime, an instance of that entity is referenced, then we'll use that app size. You can also fine tune it by setting it just on relationships to the entities and say, this relationship, I know whenever I go through this, I want to pull in 100. When I go through this relationship, I only want to pull in 10. [54:34] The big question here is, how big should a batch be? For a lot of optimization things, something's better than nothing. Even if you use two, you're doing half the number of vectors you were doing before. Use something. [54:50] Most applications, 10 to 20 is probably a pretty good guess to start with. That's going to get you a fair way down the road of optimization. if you run into problems later when you're measuring, you can look at bumping it up or doing something else. Man 10: [55:07] Am I right in noticing that the relationship, batch vaulting stuff, isn't in the UI, or have I just missed it? You have to do it in code? Is that right? Chuck: [55:16] No, there...That should be the one there. I just took the snapshots the other day out of Entity Modeler. [55:33] Zero. One at a time. Just as slow as you can possibly be. That's why I said two is better than nothing. I guess, sorry, the defaults would be one, not zero. [silence] Chuck: [55:54] This is an alternative to batch vaulting or an extension to batch vaulting. It allows you a lot more control over when it happens, where it happens, which ones you pull in. This is something that you do in code, rather than something you set up in model. It's very much a situational thing, whereas the batch vaulting, if you set it to 10, and you need 100 instances of that object, it'll put in 10, then another 10, then another 10. [56:22] With pre-fetching, it'll pull in all 100 at once. There's no piecemeal with this. It just pulls in everything. It gives you a bit more precision for it, but it's only useful if you need all or most of the objects. Because if your app situation is maybe the user's going to page through a couple and you only need 10 or 20, don't pull in 2,000 with this. [56:45] It's set on your fetch specifications, set pre-fetching relationship KeePass. The KeePass are based on the root entity, the root entity being the one that you're fetching. It'll only follow class property relationships. What it will do is it will add one additional fetch to your fetch back for each KeePass, unless you're using inheritance, in which case it'll add more. But we'll talk about that later. [57:14] What it does is it migrates the qualifier from the root object through the relationships to the other objects it's fetching. It's only fetching the ones that match what was in the root fetch spec. It's not just blindly fetching things. [57:28] Again, this isn't optimal if most of your objects are already in the snapshot cache. If you've got something that's in the shared editing context or your ERX EO enterprise object cache, don't pre-fetch them because you've already got them. This is just going to waste time going through the database pulling in data you don't need. You already have it. Man 11: [57:52] I was just going to say, I've worked with people that are really religious about this one. We all extend WoE component, right? What we do, and their extensions of this actually have a method with a relationship passed that they want to batch fetch, pre-fetch. They'll look. They'll look at how many short SQL statements they have. [58:10] They actually have it built right there where it's just very easy to add a line because it can be big performance of gains. Chuck: [58:19] Yeah, for the common paths, I usually put them in an array on the enterprise object. Yeah, the enterprise object, so that I can say whenever I fetch this one in this situation, I want these 40 KeePass. In this situation, I want these five KeePass. [58:36] It saves you typing them over and over again. Here's another alternative, which is ERX batch fetch utilities. Again, all of these are aimed at avoiding one row at a time coming back from the database, pull back a bunch of things. It's the third tool. It's very focused batching it fetches. [silence] Man 12: [59:07] Just I think this is the best opportunity you have to get control over the data you fetch because if you go through a loop and you see in your logging output that for each run through the loop, you fetch some relationships of the objects you're working within the loop. This is the way to go before you go into the loop. Just fetch all the relationships at once. This is the most controlled and I think effective way. [59:38] I'm not sure if that fetch does it by itself. You could get away with the objects you already have in the cache so that they don't get into the final SQL. At least that's what we do. We filter out objects that are already in the memory, and we are not batch-fetching them as they're already there. Chuck: [59:58] Yes, there is an option when you do this. I didn't include it in the method there, that you can skip objects that are in the snapshot cache. So only go back and only pull in the rows that you actually need. So then the choice of when to use this one versus prefetching with KeePass, if you're doing a FetchSpec, and you know most of the objects aren't in the cache, that may be the easier one to use. [60:24] If you're not doing a FetchSpec but you just have an object and you know, as I said, you're going down a page -- you're going to be referencing a bunch of the relationships -- a simple call to this before you start and you've got way, way, way less SQL happening. As I said when I started to talk, I often go into places for optimization, and how did you ever expect this to be fast? [60:47] This is usually the thing that I end up doing because you can't go back to the database 10,000 times on a page and expect it's going to be fast. It officially fetches arbitrarily deep KeePass, so you can have a really long KeePass. It'll navigate through the model, move the qualifiers around before you figure out what the fetches are. They've got a gazillion options to control exactly what you want to do on it. [61:19] Sometimes, you don't want enterprise objects. Sometimes you just want the data, so with raw rows that's all you get. You only get the data. You don't get any Java code to run. You don't get any support. You don't get any logic. You get nothing. You get a dictionary of data. It's useful sometimes when you want to show data from a lot of EOs, but you really only want one or two of the EOs. [61:45] For example, if you have a list of data. I was talking before about using the batch FetchSpec, fetch navigator. The user wants to see all the data, but really they're just going to select one of them. "Oh, yeah. That's the one I want to see." So with this one you avoid the problem of bringing in the row. [62:04] "OK, is it in the snapshot cache already? No. OK, make the snapshot cache out of it." OK, you put it in there. Register it. Send up the notifications. Fetching in enterprise objects is slow. It takes up a lot of resources. If you end up fetching in 10,000 rows and you've actually one enterprise object, you've wasted a lot of time. So that's when you can use raw rows. [62:27] There's a couple different ways of doing it. EO fetch specification will fetch them. EU utilities has a method or two to fetch them, and ERX EO access utilities also has methods to return these dictionaries. Then you promote them, for lack of a better term, from a row of data in the dictionary to an actual enterprise object by using EL utilities object from raw row or EL control utilities fulls for raw rows if you have an array of them, if you want more than one. [62:59] The thing to be aware of is this is still is going to go back to the database. Then you can have the raw row with all the data that EOF needs for that row. It's still going to go back to the database. It is really of use when you want to pull in a lot of data or maybe to show a subset of data in the EO. Then you only actually need the full object for a few of them. [63:24] So under the fetch specification there's a limit that Karen mentioned a minute ago. Depending upon your database plugin, this might not do what you expect. The original implementation in EOF is it would fetch everything you wanted, and it would do limited memory. [63:46] If you were fetching in everything with a limit of one, it would happily fetch in 100,000 rows. It would take the first row. The garbage would collect the rest of them, which can be really, really slow. Some of the plugins in Wonder have now been updated, so it lets you do the limit of the database, which is what you wanted to begin with so you just get back one row or a few rows. [64:09] The other thing to know is that if you're using prefetching relationship KeePass with limit, the prefetching KeePass don't respect the limit. They will prefetch everything which is almost certainly not what you want. If you're going to use limits and FetchSpec, really take a look at the SQLs coming out of your application. [64:31] Is that what you want or not? If it's not, then you can look at adding a patch to Wonder to fix it. I was thinking about it. I don't think there's any way to make the prefetching relationship KeePass work that isn't an insane amount of work. For something like this if you're doing a limit at the database do that, then use the back fetch utilities to pull in the rest of the data. And that's it for that. If you're using a database that doesn't support this properly you can fix it. It's not that difficult. [65:03] You can contribute it to Wonder. There's also a method in ERX control utilities that will do the range at the database provided the plug-in supports it. Now, I want to talk about not modeling data which may be an odd concept. But most people just model everything. My advice to you is only model what you actually need to model. [65:38] Because everything you model, UF has to go around and it has to maintain it. It has to keep that relationship consistent, it has to keep the snapshots around. If you don't need it, don't model it. Just because you can doesn't mean you should. A particular one that people always get bitten by is relationships to look up data. If you go [inaudible 66: [65:54] 00] model OK join these tables to many. You've got your address to country relationship and then you've got your country to every address in that country. UF's very happy with that. You want to add a new address, OK, a new address in Canada sure. [66:15] Give me all the addresses in Canada, OK there's all the addresses in Canada. I'll put that new one in there, now I'll save it back out again. You can imagine that gets pretty slow. So, you should only have a one directional relationship with things like that from the data to the lookup. Not from the lookup back to the data. [66:35] If you ever actually do need all the addresses in Canada, you're probably not going to go fetch the Canada object and say give me all the addresses. What you're probably going to do even with that bridge, you're probably going to say, "OK, I want to fetch spec on addresses where country is Canada." So you can do the same thing, you get the same data, you don't need it. Simply do not do this. It will kill your application. Man 12: [67:04] There's a class in Wonder. Because ERX un-modeled to [inaudible 67:10] relationship or something. So instead of drop in class. You just drop in and you can put in the key so that the key you would have used for the relationship and the destination object, and the [inaudible 67:23] constructor. Just drop that in and it's like a [inaudible 67:29] class. Most of the methods that are on the templates are [inaudible 67: [67:27] 32] in that little class. You can actually, if you don't model the relationship, drop in this class, configure it in your EO and it has all the logic generically for doing all the things you want to do with the relationship that isn't modeled anymore. It's called ERX un-modeled to many relationships or something like that. Man 13: [67:53] Make sure you speak right in the microphone. Chuck: [67:58] I think you can model it and just turn the [inaudible 68:02] off so that it's not an instance variable. You're still able to use your relationship from both sides in any qualifier you want. But it won't fetch them. It's the way to be able to still use it in qualifiers without having the overhead. Man 14: [68:22] That would still work? I tried something like that recently. I can't remember what it was. It didn't work anymore. Man 13: [68:31] This gentleman says that it works. Chuck: [68:33] OK. But don't make it a class property. Another simple easy thing to do in your model, factor up cloud into their own objects. If you have an EO and it's got a cloud in it and you need to cloud every time you reference the EO, then sure, leave them together. But if you only need the cloud sometimes, there's no point paying the price to upgrade over from the database, use up your heap space and then garbage collect it later. All you have to do is make a separate entity that you're [inaudible 69: [69:02] 07] . All it needs is a primary key and the object value. Make it [inaudible 69:13] own destination. A simple change in your code from object. [inaudible 69:17] value to object. [inaudible 69:19].value. [69:19] It defers the cost to fetch the cloud in if you actually do need it. A couple ideas on model optimization here. Trim the fat which is, I'm sorry... Man 13: [69:39] Sorry about that. Just about the large binary object stuff. I don't know if anybody knows but I remember because I used to ask about this quite a bit about pulling that stuff onto the heap every time. I know some people were working on some streaming things at one point. Did anybody ever finish working on that? Chuck: [70:05] I think there's one for front based. David [inaudible 70:07] was working on it before he died. I have the source code. I may even have it with me. But I think that might have been front based specific, I don't know. But it went directly from the database to the response. [inaudible 70:26] trim the fat. The number one thing trimming the fat is if you don't need the relationships don't model them. If you're working with somebody's legacy database and there's attributes in there that you don't care about, you don't need to see, don't need to update, don't model them. You don't need to model them. Unless they're [inaudible 70: [70:34] 46] and you have to set them. You can do crazy things like map multiple entities to the same table. So you can have an entity that [inaudible 70:57] this table that just pulls back a couple of values that you show on the list. I would probably prefer some [inaudible 71: [71:01] 02] but maybe you need some logic, maybe you need supporting code with that as well. You need to be really careful when you do this. Because [inaudible 71:11] not going to know they all represent the same entity. So if you change one and save it, it's not going to get reflected in the other ones. [71:19] You really only want to have one of these you ever write. Keep the rest as read only. [71:26] This is dubious. I'd be nervous before I went around doing this. But it's one possibility to reduce some of the cost of fetching of big objects. A little benefit you can get is reducing the number of attributes locked. That's an extra thing that UF has to compare, it's an extra thing the database has to compare. [71:44] Some people go so far as they turn off locking everything and they just have an integer or a time stamp. They simply lock on that one. Not a huge benefit, it's not going to make your app go from somebody on a bicycle to a Ferrari, but it will provide us somewhat of a speed up. [72:04] You can even do things like de-normalize your data, which is something we normally try to avoid doing, but for purposes of optimization sometimes you do need to de-normalize the data. You can also do that just in the database by views, you can do it by flattening relationships, you can just do horrible things by just de-normalizing it right in the database. Sometimes you need to do what you need to do to get the speed. [72:29] Another thing you can do is keep complex data structures in large objects, a binary or a blob. Things like dictionaries. Rather than writing in a bunch of different values into the database, just keep it all in memory, serialize it into a string, stick it into a blob, pull it back out, and de-serialize it. You can really reduce the database traffic like that. [72:53] Consider using things like stored procedures. If you can offload the processing to the database, rather than bringing a bunch of data in, converting it to EOs, if you get the database to do some of that then that can also be a good win for optimization purposes. [73:13] Inheritance. Inheritance in UF can be pretty useful. Some people like it a lot, some people hate it a lot. I use it in almost every application I write but I try not to make too much use of it. The danger is that it can be slow to process. Try to keep hierarchies flat. You just want one or two levels down. You don't want a whole big, long chain of inheritance because UF has to do a lot to maintain that. [73:41] Avoid using concrete super classes if you can. Try to keep the super class abstract. If you're going to use it, use single table inheritance if you care about performance. If you want a pretty database and a beautiful schema and you don't give a shit how slow your application is, use vertical inheritance. It's great for that. [74:03] Pretty pictures. Pretty, pretty pictures. Pretty slow, too. Single table is pretty good because it can just do one select and get it out. If you have other ones it ends up doing more and more selects so something to keep in mind. [74:23] This is something that the people seem to forget a lot when they're doing web applications. A lot of the things that I see is they forget there's any SQL. You've got the model, you've got the Java, it's all good. You forget there's a bunch of SQL going back and forth. I would say when you're developing your application pretty much all the time you leave it on because then you know when you click your page and all of a sudden things just start going through your console like that you screwed something up. [74:50] It's time to go back and look at batch faulting, look at the batch fetch, do pre-fetching KeePass. That's the number one reason that I've ever seen that an app is slow, is that people didn't pay attention to the SQL, they didn't watch what their application was doing, and it's just spewing out reams and reams. Keep it on, keep watch. [75:14] Any changes that you might make to an app that might possibly effect this, you go back in, you're going to make a little fix, turn it back on, go through your page, watch this. Before you commit your code, make sure you're not committing dumbass. [laughs] Man 14: [75:36] That's a tag you can put in and pull out in Eclipse, right? Chuck: [75:39] Don't commit dumbass, yes. A couple other things to watch for. Repeated single row selects are almost always a result of not using batch fetching, batch faulting, something like that. There's no reason to do repeated single row select ever. Man 15: [76:05] I often have the case where the adapter [inaudible 76:08] and makes my overall applications very slow because all of the logging so I made a little switch inside the apps where it can turn it on before some particular operation and turn it off afterwards. Besides the actual setting, you have to set the [inaudible 76:26] locking for groups and it's locked to debug group database access for it to actually work. Chuck: [76:36] I usually recommend people leave it on because what I find is that people forget to do that. Oh yeah, I forgot about that. Man 15: [76:45] Another thing is I have an Ox script where I can throw a whole application lock at the script and the adapter debug lock and extract the actual SQL statements with complete values which you can immediately paste into an SQL editor to run them afterwards. Chuck: [77:13] That would be pretty handy to have. Man 15: [77:15] Where should I provide that? Chuck: [77:17] Probably the community wiki. Man 14: [77:21] You didn't use said. Chuck: [77:24] It's a war. Man 16: [77:29] We are using such a tool, too. It is called [inaudible 77:33] . It's very nice because, again, we are directly putting the database locks into it and it generates nice HTML reports of combined query results with different statistics. It was very hard for us to follow these match queries because we had so much. This HTML report gave us real ideas how to optimize. [inaudible 78:09] is nice. Chuck: [78:13] There's a page, actually, on the community wiki for EOF optimization which would be a good place to put things like that. Some other things to check for besides the obvious 10,000 single selects are slow queries. Again, I said at the beginning having a lot of data, having a realistic size of data, is going to make it a lot more obvious that the query is slow. [78:38] The other thing to check for is indexes on common query terms. If it's just where primary key equals value, you don't need to worry about it. When you're doing a qualified select, I would go back and take a look and see do you have indexes for it, for those common terms? If they're things that people aren't going to search on very often then maybe you don't want to carry the weight of the index. [79:01] It's good to go back in and take the query, paste it into your SQL tool, take a look at the query plan. Is there an index? What is it doing on that? I usually end up adding a page or so of what I call performance indexes to the tables just to handle common query terms so that you can do the fetches faster. Man 17: [79:26] Just curious, how many use MySQL here? One third maybe. When you generate the SQL from the model, it doesn't give you the foreign key, the foreign key indexes. You can't really use MySQL's built-in foreign key constraints because it's not conferred. It just messes you up. Even though you could use Chuck's Microsoft SQL Server thing that orders the operations, I don't know, I couldn't get it to work on my app, anyway. [79:55] As a basic user, you must create all your own foreign key indexes so when you generate the SQL, put it in VB Edit or whatever, and every foreign key put an index on those tables because that's one of the biggest issues I see when someone calls me for help. The biggest problem I see is they have no foreign key indexes and they're wondering why their app is slow. Because every relationship is killing the app, every relationship fetch. [80:18] Foreign key integer fields, everyone must have, just create a regular index in your SQL. Chuck: [80:26] Otherwise every select's going to go and it's going to table scan until it finds the value. Man 18: [80:36] I have a question. Using MySQL, I have to put a hint to force MySQL to use an index. Where can I put the hint in the query constriction, in the process. The UF built. I need to put a hint in the SQL request. Chuck: [81:08] Oh a hint. Sorry. Man 18: [81:18] Is it clear? Chuck: [81:18] I thought he said hint at first. I keep forgetting there's no H in French. Man 18: [81:24] I have to put a hint in the SQL, in the request, to force my SQL to use an index. Where can I put the hint? Chuck: [81:50] If you want to put in a hint you have to customize something. You want to insert custom SQL. There's two approaches. One is you can examine how hints are structured into the statement in the SQL documentation for MySQL and you can go and actually modify the adapter, modify the EO SQL expression. You know the FetchSpec. [82:12] In the FetchSpec the last constructor option is a dictionary and right now the dictionary only has one entry which is a custom SQL to be used instead of the default generated one. You can create your own key. You can call it MySQL index hint and just create your own key and you could actually even submit a pass to [inaudible 82: [82:21] 33] to modify the EO secret expression to look and check for something in that dictionary under the key MySQL hint key and put a property into the SQL expression as it's being constructed. It's EOSQL expression sub-task in MySQL program. [82:52] It can be done, it's just someone sit down and code it. Man 19: [82:56] Get busy. He could've written it in that time but he wants to explain it to us. Chuck: [83:11] It is really important to take a look at all the SQL that your application is generating and ask yourself does it have indexes that will allow the database to optimize the query? If it doesn't, is it worth having? Indexes do incur a penalty. You've got a table, there's a lot of inserts or updates in it, then every index you have is going to drag down performance and inserts and updates. [83:35] It's a balance you have to make between speeding up selects and slowing down inserts and updates. If it's a table that doesn't see many inserts and updates, use the indexes. [83:48] Wrong key. Here's a couple ways that you can use to quickly find the slow SQL. It's not going to find the hundreds of single row selects but it will find the slow SQL. There's the ERX adapter channel delegate. The SQL logging delegate is actually one of my login frameworks. They both pretty much do the same thing. They both track and log the SQL. [84:15] The ERX version has thresholds for different levels. You can say if it's above two milliseconds give me an info, if it's above 10 give me a warn, if it's above 100 give me an error. It can also filter out by entity. You put in a regular expression so you can only see the SQL that's generated for a specific entity. [84:36] My version is actually inspired by the Wonder version. The difference that I have, I've...The log message is comma separated value format, so it's pretty easy to go in, rip out a log, open it up in Excel, sort it by response time, and at the top, you find the ten slowest selects after your test run on your application. [85:01] It can also log the data fetched, which can be useful for debugging, and it can also log the stack trace. Because sometimes because of the way UF goes through things, it can be hard to figure out where the slow fetch is coming from. Sometime when I have a few spare minutes, I'll take my stuff and roll it into the Wonder things. Or somebody can do it for me. Want to contribute? It's an easy way. Man 19: [85:26] Just give us your code, and we'll do that. Chuck: [85:30] It's on Sourceforge. I'll move it to Github later. I'm busy! [laughter] Chuck: [85:42] This one's a bit away from optimization. The real intention of this is to reduce optimistic arcing conflicts. But it can end up, for some applications, reducing the amount of fetching that you end up doing. It can reduce the need to fetch fresh data because it's moving the snapshots between the instances so that they're getting a refreshed cache of the snapshots without having to go back to the data. [86:10] Again, you need to be able to measure your application to know whether doing something like this is useful or not, because having this is going to make the machine do extra processing. While it may save you some trips to the database, if the extra processing is more than that, it's going to be a net less. You need to be able to measure, to know whether optimizations like this are actually doing you any good or not. [86:35] Joint table indexes, I believe, are still a problem, at least, for Frontbase, they are. They only get one index on them, one primary key index. If you end up, so, it's like first column, second column, so if you end up with a select and it happens very often, I find, they go to the second one. There's no index, there's no optimization to it. It's going to go through and it's going to look through every single look up, every single mini to mini to row till it finds the one it wants. [87:05] All you have to do is, with your other indexes you're adding on the query terms, you're, the foreign key indexes for MySQL. Make sure you have an index in the second column of your mini to mini joint tables, so that the database can actually optimize. [87:21] I ran into that first years and years ago. Had an app, it had been deployed for quite a while. No problem, no problems, no problems. [87:32] All of a sudden, Tuesday morning, anything anybody did took 37 seconds. It had just gotten incrementally more rows in the joint table, and it hit a point where the database just couldn't do anything with it anymore and it was just pulling it all into memory and it was scanning it. Put one in text into it and if it went for 37 seconds to a quarter of a second. It can make a huge difference if you have, when you hit certain levels of data in it. [88:02] Mind you, it wasn't a very fun Tuesday finding that. [88:07] Database tuning. I'll just talk a little bit about this. I'm assuming most or all of us are using a database where we can actually see what the database's query plan is. It's good for anything other than really simple, select where primary key equals value, copy the query out of it, take a look at the actual plan. Especially for some of the more advanced qualifiers in Wonder, some databases don't like some formats of SQL. [88:36] I know it's been a problem for me with FrontBase and a few things, when you can write equivalent SQL in two ways and one's dead slow and one's really fast because it just, for some reason, it just looks at it and says I can't optimize this. So check the plan, make sure it's optimizable. Sometimes, you'll have indexes and the database won't do it. [88:55] You've got to take a look at using a different qualifier or a different way of writing something or hand-coded SQL or something to get it to the point where the database can optimize it. [89:05] As with your application, cache, cache, cache, cache. The more memory you have on your database server, the better. Try and get as much stuff in memory as you can. There's not much more to say about it than that. Just turn on the row caching, use up as much memory as you have available to, and check the hit ratio. [89:29] Make sure that you're not wasting memory on one table that you could use in another one. So you end up having to do a bit of tuning to figure out where you get the best balance of the hit ratios for the cash contents. [89:45] I'm not a database tuning guru. There's lots of information out on there. But it can make a huge, huge, huge difference. [89:58] Now at the end, something fun. Who here doesn't know about serves? Oh, there's a few people that don't. OK, so this was written by a company called Enough Pepper out of Portugal, and it's an online survey system much like SurveyMonkey and a few other ones out there. It's quite nicely done. It's quite fast. If you're looking for a survey engine, it's definitely really worth the look. [90:25] I think a lot of people here do use it. I know we use it for WoE ODC a lot. It's interesting because of the kinds of optimizations they had to do to make it go. The architect of this now works for me, and I was able to get him and his partners to agree to let a few of the secrets out of the box about how they made this work. [90:54] I think I only have time for this one. So counters and concurrency. Here's the situation. You need to count records according to some criteria, and there's a lot of records. Doing a select summary is not going to work. Finding with a query is just too slow. [91:15] What do you do? You cache a query someplace else. You cache the value someplace else. Every time you add a row to the table, you update the counter. Now, to get the counter, it's just one select instead of having to count up a bunch of them. [91:32] Now, the problem with that is, something to survey system where you've got a lot of people answering a lot of questions, a lot of people querying the results of their survey. That count value, now you've got a huge amount of contention for it, because you update the one row and then you update that row, but somebody else has already updated it. [91:51] Then you go back and you try it again. You refresh the counter again, and try and do it again. You end up with the application locked up on that, trying to update the counter. What do you do? Ideas? [silence] Chuck: [92:12] No one? Man 20: [92:15] Take a survey. Chuck: [92:16] You could take a survey. [92:24] This is one of the things I was talking about at the very beginning of the presentation, about some kinds of optimization are domain-specific. Sometimes, the things you have to do, like this, aren't oh, you should do this all the time when you write a WebObjects application. Like, use batch vaulting, you should always do this. [92:43] This is a unique, domain-specific thing. But I hope it gives you an idea of ways to come at problems from a different side. The answer was to create several sub-counters. Instead of having one count in the table, now you have several. If I want to know the count of apples, I now have to sum up three rows. But that's not so bad. Summing up three rows in SQL is pretty easy. Summing up 300,000 is pretty slow. [93:14] What we have here is the counter identifier, which is meaningful, subcounter, just to make them unique, and a value for each one. So how does that work? Man 21: [93:39] This is one of Chuck's interview questions, right? Chuck: [laughs] [93:47] The way they did it is they defined a maximum number of subcounters. When they read, they just select all the counters for the identifier, a pain to sum, I said it's 3, it's 10, it's 100. It's not that big of a number. It's easy to handle for SQL, completely indexed. [94:03] To update it in the application, instead of saying OK, I want to update the number of apples, you say I want to update the number of apples, and you generate a random number between one and ten or however many counters you have. So you still have the optimistic locking problem on the counter. But instead of having it on one counter, you now have it in three counters or in ten counters. [94:25] You've reduced the optimistic locking contention in your application by, say, a factor of 10 by simply adding 10 counters. You can drop it and add 100 counters, still quick to count in SQL. You've just dropped the concurrency problems by 100, a factor of 100. It's just a very, very simple technique. It's just not obvious. It's not eliminating the performance problem, it's just reducing it to a manageable level. Man 22: [95:01] I did something like this for my boss, very similar idea. He wanted to have Quartz, the JMS, but he didn't want to have a separate app to control. He just wanted to have all of our app instances, any one of them, able to act like something that can handle the messages. And so, what we did is when it came time for Quartz to fire, all the app instances do it, but they create a random number between 0 to 60 to go. [95:30] It never happens exactly when you schedule it. It's going to be within a minute. But any one of those 10 app instances, they're not going to pick the same exact second. So the first one wins, and the others just lay off. It was just a quick and simple way. Chuck: [95:45] That's an interesting solution. Man 22: [95:46] It's a similar idea, but you don't have to manage a separate message in between the app instances using the random numbers and reducing the probability of hitting at the same time. It may happen, but it happens a lot less. Chuck: [96:07] All right. That's it. Thanks, everybody, for your participation and your ideas. [applause] Chuck: [96:14] Are there any last questions? [96:21] No? Oh, there is one. Man 23: [96:25] I have a question about adding context to locking. I appreciate the auto locking, but yeah, it's long response should take care of the auto locking, but actually it does not appear stable to me. I have to actually manually lock the EC and the long response [inaudible 96:45] . Chuck: [96:48] I'm trying to remember what the code looks like and I can't, I'll have to look at the code. Man 23: [96:51] I saw some people mentioning it on the list, but I can't see why it is that way. Chuck: [97:07] Check the concurrency presentation from last year. [off mic conversation] Man 23: [97:26] I just would like to add precision about the entrepreneurship like cash manager. There is an option, should fetch initial. [silence] Man 23: [97:48] Yeah. The good point that if you set it to force, when somewhere else, by classification, knew that, automatically, the cache is updated. But if your key is a common key, like a true attribute, one plus another one, it doesn't work because when you don't set to true, the manager of the object cache tries to create a qualifier, but as the key isn't an attribute, it fails. So as a result, if your key is a company key, you must set true to should fetch in the short values. Chuck: [98:38] I'll have to take a look at that. Man 23: [98:40] Is it clear again? Chuck: [98:41] I think so. Man 23: [98:43] But the bad side is that the cache is updated automatically, so when you try to get a value, if the value is null, you have to fetch manually, interact with the cache manually. OK. Man 24: [99:02] That sounds like a bug. Chuck: [99:04] It does. No? [background chatter] Chuck: [99:13] A key that changes. Man 23: [99:15] Yeah, the key is not a hint. It's from a method, like an attribute plus an another one, to make it unique. And in that case, it can do nothing because there is no qualifier it can build with that. Man 24: [99:38] I still think that would be design hole. Man 25: [99:41] You mentioned the load balancer. We created a new one which is a mixture of round robin and load. But the original problem with the load balancer that works with session load is that you have to ask the session or the application how much load it has, and if it doesn't answer or answers slowly, that load balancer has a problem. [100:09] What we did is we used the round robin balancer, but we are counting how many active requests there are in each instance, and we are skipping as long as we don't receive them...find the minimum. We are only giving requests to the instances that have a minimum active count. So and if they are all zero, we just keep on going using round robin. That works very well, and if somebody wants to have that, we just would give it out. Chuck: [100:47] That sounds like a much better way of doing it. Yeah, I think we'd like to have that in Wonder. Man 25: [101:00] Yeah, I'm using your JGroup synchronizer to synchronize UF instances. Where exactly does that fit in or what exactly is that really doing? Is that synchronizing all the cache? Because that was the JMS synchronizer, right? I think... Chuck: [101:16] Yeah, there's two different flavors of it. Man 24: [101:18] OK, so it's just doing it at the snapshot cache level. Is it actually resending the snapshot data or is it just telling it to refetch... Chuck: [101:26] It sends the snapshot data. Man 24: [101:31] Did anything from the dogs barking on the thread stuff get into...that's not in Wonder, is it? Or is it... Chuck: [101:38] Dogs barking on the thread? [laughter] Man 24: [101:41] At Apple...I can see the face again. The applications had a request would come in, and at beginning of the request response loop, they would throw the information to a thread that this request had started, and then that thread, if it took over a certain parameter of time... Chuck: [102:03] Oh, yeah. That's still in Wonder. Man 24: [102:05] It is. Man 25: [102:07] The delayed response handler? Chuck: [102:09] No, there's another one in there, a thread watchdog. It's either called Seppuku or Hara-kiri or something. Man 23: [102:17] You tell it that you think your requests are going to come in under a certain amount of time, and if they come in over, then it starts spitting out stacks onto the... Chuck: [102:26] Well, the one I'm thinking of...maybe the one I'm thinking of actually killed the app or killed the thread. Man 23: [102:33] It started spitting out...it would just do a...the backslash on the...it would get a thread done. Man 25: [102:41] I think, yeah, it's in URX Stats, I think. URX Stats. Chuck: [102:44] Yeah, that's in URX Stats. I was thinking of the one that killed the app, Seppuku. Man 25: [102:47] ...limit, and you turn it on, and then it logs, like after 10 seconds, it says slow, warning, and then after three seconds it will give you a [inaudible 102:57] error, like somebody logged out error, logged out one, and certain requests you get a whole stack of [inaudible 103:00] . Man 23: [103:04] Yeah, they would say then that the dogs were barking. That was the... Chuck: [103:13] A few seconds ago, somebody was asking about locking with an [inaudible 103:17] context, and what Kieran said is right. He did a lot of work, and that's a good thing to see. But just briefly, I think when you do have something that's going long, I don't think you should be thinking about locking. I think when you make a new EC, you don't just say, "New EC." You say, "New EC," and you put into the constructor a new objects coordinator. [103:35] What that's going to do is it's going to give you a brand new session to your database. You don't have to worry about locking with the rest of your app. You just let it do its thing. And work and work until it's done. Man 24: [103:46] By memory. Man 23: [103:50] Only because it's you. Last question. [103:53] Just to add to what Aaron was saying. If you look at that presentation last year, in Wonder we have an ERX task objects store coordinator pool. Basically, you don't want background tasks ever to be using your request thread object store coordinator. In general, because during a background thread, we're hammering the database, doing a million something. So you can adjust a property in Wonder. You can say, "I want four object store coordinators dedicated to background threads." Then it shows you an [inaudible 104: [104:15] 24] how to...There's an abstract class. Every time you make a new editing context, it will do round robin on the pool and just pick one of those coordinators. Actually, I think it uses it once. [104:35] It initializes it once for that class, for that task you're running. All new editing contexts in that class will use the same object store coordinator in the background.