Tuesday, August 14, 2007

How do u identify the bottlenecks in Mappings?

Bottlenecks can occur in

A. Targets The most common performance bottleneck occurs when the informatica server writes to a target database. You can identify target bottleneck by configuring the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck.

Solution : Drop or Disable index or constraints Perform bulk load (Ignores Database log) Increase commit interval (Recovery is compromised) Tune the database for RBS, Dynamic Extension etc.,

B. Sources Set a filter transformation after each SQ and see the records are not through. If the time taken is same then there is a problem. You can also identify the Source problem by Read Test Session – where we copy the mapping with sources, SQ and remove all transformations and connect to file target. If the performance is same then there is a Source bottleneck. Using database query – Copy the read query directly from the log. Execute the query against the source database with a query tool. If the time it takes to execute the query and the time to fetch the first row are significantly different, then the query can be modified using optimizer hints.

Solutions: Optimize Queries using hints. Use indexes wherever possible.

C. Mapping

If both Source and target are OK then problem could be in mapping. Add a filter transformation before target and if the time is the same then there is a problem.

(OR) Look for the performance monitor in the Sessions property sheet and view the counters.

Solutions: If High error rows and rows in lookup cache indicate a mapping bottleneck. Optimize Single Pass Reading:

Optimize Lookup transformation :

1. Caching the lookup table:
When caching is enabled the informatica server caches the lookup table and queries the cache during the session. When this option is not enabled the server queries the lookup table on a row-by row basis.
Static, Dynamic, Shared, Un-shared and Persistent cache

2. Optimizing the lookup condition whenever multiple conditions are placed, the condition with equality sign should take precedence.

3. Indexing the lookup table

The cached lookup table should be indexed on order by columns.
The session log contains the ORDER BY statement, the un-cached lookup since the server issues a SELECT statement for each row passing into lookup transformation, it is better to index the lookup table on the columns in the condition

Optimize Filter transformation:

You can improve the efficiency by filtering early in the data flow. Instead of using a filter transformation halfway through the mapping to remove a sizable amount of data.
Use a source qualifier filter to remove those same rows at the source, If not possible to move the filter into SQ, move the filter transformation as close to the source qualifier as possible to remove unnecessary data early in the data flow.

Optimize Aggregate transformation:

1. Group by simpler columns. Preferably numeric columns.
2. Use Sorted input. The sorted input decreases the use of aggregate caches. The server assumes all input data are sorted and as it reads it performs aggregate calculations.
3. Use incremental aggregation in session property sheet.


Optimize Seq. Generator transformation:

1. Try creating a reusable Seq. Generator transformation and use it in multiple mappings 2. The number of cached value property determines the number of values the informatica server caches at one time.

Optimize Expression transformation:
1. Factoring out common logic
2. Minimize aggregate function calls.
3. Replace common sub-expressions with local variables.
4. Use operators instead of functions.


D. Sessions

If you do not have a source, target, or mapping bottleneck, you may have a session bottleneck. You can identify a session bottleneck by using the performance details. The informatica server creates performance details when you enable Collect Performance Data on the General Tab of the session properties. Performance details display information about each Source Qualifier, target definitions, and individual transformation. All transformations have some basic counters that indicate the Number of input rows, output rows, and error rows. Any value other than zero in the readfromdisk and writetodisk counters for Aggregate, Joiner, or Rank transformations indicate a session bottleneck.

Low bufferInput_efficiency and BufferOutput_efficiency counter also indicate a session bottleneck. Small cache size, low buffer memory, and small commit intervals can cause session bottlenecks.

E. System (Networks)

4 comments:

Rajasekar Nonburaj said...

Nice article,

Thanks,
Raja
http://chennaibi.wordpress.com

Bliss said...

hi, I found useful info at ur blog, good work. Myself Pankaj, Masters from IIT-Kanpur.., persently working as BA in a supply chain solution company, and now a days exploring the area of data modeling, database, DW, data mining etc. if I have any doubt can I mail to u.., if its not a problem can u give me ur mail id.

Thanks,
Pankaj

Bliss said...

my mail id is -action.pankaj@gmail.com

TheNULL said...

Hi m
Your article find to be too good... excellent work !!!