EOF Performance Tuning
EOF does many nice things for you. But many people learn some things the hard way; like that a relational->object mapping library can have some serious overhead. A common experience with large and complex object model, is that people model their objects, then do a large fetch and find out that bringing in a large set of EO's can be really slow (relative to what they were expecting with JDBC or direct SQL). The overhead of bringing something into an EO can vary, but seeing 100 times speed degradation over raw SQL is certainly plausible; and for many problems, that is just too stiff a penalty. So what to do about it?
Here are some of the approaches you can take to improve things;
Lighten your objects
The largest penalty in an R/O system, is comparing new objects (items) against other cached verions (snapshots) and consolidating and resolving any changes. So adding things as objects, means you will incur that penalty. Still, there are some minor benefits to be had if you lighten the objects.
But the point is that the lighter and simpler an EO is, the faster it is. However, the main penalty is on serialization/synchronization at all; so the returns are probably not dramatic enough, in most cases, to justify the measures necessary to create these "optimizations".
Adapt your model
When you are going to be using a relational to object mapping library (like EOF), you should expect that this will change your requirements; enough that you can adapt your model to fit the tool.
You can even go further, and start moving complex data structures and relationships into blobs that you manage yourself. This offloads EOF from managing them, and often allows you to speed things up; but the cost is more code maintenance on your part, and of course denormalizing can negatively impact the design so you want to be careful about how zealously you go down this path.
Raw-Rows (late promotion)
One of THE most important techniques for speeding up acces, is using Raw-Rows; or doing late promotion. Basically a raw-row fetch is a way for you to do a database fetch that returns a data structure (array of key-value pairs), instead of fetching and promoting that to a full blown object (EO). This lighter fetch takes much less time, and is much better for large collections.
Then on pages that use those collections can use Raw-Rows, and you promote an object out of that collection to a heavy-weight EO, only on those pages that need to work with an entire object (and all the complexities thereof).
Fetches generated by EOF or EOModeler or other tools are generally not as tuned as those generated by a human being; if they were, then machines wouldn't need us, and could program themselves. By creating the Raw-SQL yourself, you can generally tune and optimize fetches yourself, to either fetch in just the parts of table (both columns and/or rows) that you have interest in; again improving the performance and tuning the areas that most need it.
Another major boost in performance here, is that you can traverse your relationships if you select the fields you need from your relationships as part of the raw rows. This also reduces your transaction load (some databases only allow 100 TPM with cheaper licenses), and often increases fetch speed as all data is fetched in the one SQL query with a multiple table SELECT statement (a JOIN). You can also command your EOModel as to wether the relationships perform inner or outer joins which can be a major benefit for checking data integrity. (Check out some SQL sites to learn about INNER, LEFT OUTER, OUTER and RIGHT OUTER joins to see some great data collection methods)
Get closer to the Database (Stored Procedures)
While it seems like it is defeating one of the purposes of a database abstraction layer to get down and dirty with a database (using stored procedures, etc.); it may be a necessary or prudent step.
Stored procedures can do things at the database level that are much faster and lighten the app server load immensely. The cost is a lack of database independence; and you must learn intimate details of the database you are using. But a few well placed StoreProcedures can offer orders of magnitude difference.
One issue that needs to be discussed on its own is caching. While EO's are slow; they do cache – - so subsequent fetches (refetches) are significantly faster than first (primary fetches). This works well on not mutable (static) data. Less well on dynamic data. Still, you can use more than the default cache behaviors. You can fetch raw-rows of non-mutable data into your own arrays and manage your own cache of those. You can also put complex or compound attibutes in a cached attribute and fetch that item, instead of getting that live or constantly recalculating that from all the source parts (prebuild some calculations or collections).
So caching not only means WO+EOF should cache, but you might want to consider what you can cache to lighten various fetch loads as well.
Prefetching / Hinting
Now various prefetches and hints don't really save you time, in that you still have to load all the objects in and pay the same load time for the; but they can move where you pay that time. Sometimes this can dramatically improve the percieved performance, even if it doesn't alter the actual performance. Users might tollerate a 5 or 10 second penalty on one page, if the next 10 things they interact with are all much faster; and vise versa. So it helps to figure out what a user is trying to do, and how they are trying to do it; and tune accordingly.
Reduce your dataset
This may sound obvious, but reduce your dataset whenever possible. For example, no page should really display 5,000 elements – - that takes way too long to load and display. Better to use fetch limits or similar type of behaviors to show a user a smaller set of data whenever possible; and then use various techniques to fetch as little data as possible. The smaller the dataset, generally the faster you can work with it.
These are just some of the techniques you can think of to speed up EO fetches.
Some might say that if you have to do all these things that you're doing all the work for EOF for it; or your spending as much time optimizing code as it saved you in the first place. While in a few extreme cases that might be true; generally what happens is that EOF allows you to create a solution very quickly and easily; then you go back and learn where the bottlenecks are (profile), and tune those areas until they perform reasonably for your requirements. This allows the advantages of a RAD (Rapid Application Development) tool, with later the power and speed of a lower level tool; if you'll commit the time to tuning those areas that need to be tuned. And many classes of Apps may require very little tuning.
For vendor-specific performance tuning, refer to the vendor-specific sections.
Take a look at the ERXAdaptorChannelDelegate in Project Wonder.
The SQL that EOModeler generates for many to many join tables does not allow full optimization of the join. Many to many join tables are only generated with a single index which is the compound key of both columns.
CREATE TABLE "GROUP_FILE" ( "FILE_PKEY" NUMERIC NOT NULL, "GROUP_PKEY" NUMERIC NOT NULL ); ALTER TABLE "GROUP_FILE" ADD PRIMARY KEY ("GROUP_PKEY","FILE_PKEY") NOT DEFERRABLE INITIALLY IMMEDIATE;
This index can only be used to optimize joins on the first column in the index (which is the second column in the table for some reason). The lack of an index for the other column results in a table scan when joining on the other column. This sort of joins happen when following a to many relationship. For example, group.files() will result in SQL similar to this for each element in files():
SELECT f.* FROM FILE f, GROUP g, GROUP_FILE gf WHERE f.PKEY = gf.FILE_PKEY AND g.PKEY = gf.GROUP_PKEY AND g.PKEY = 10;
Only the GROUP_PKEY has an index which is usable for optimizing this query. The join on FILE_PKEY (f.PKEY = gf.FILE_PKEY) is not optimized and results in table scans. You need to manually add an index for this:
CREATE INDEX GROUP_FILE_FILE_PKEY ON GROUP_FILE (FILE_PKEY);
This can make a truly dramatic difference if there are a lot of rows in the join table.Category:WebObjects