Login Register Actian.com  

Actian Community Wiki

Navigation
Learn About
Developing With
Ingres Talk
Information
Toolbox

Performance Enhancements

From Ingres Community Wiki

Jump to: navigation, search

Contents

SIR 122512: Hash-join, Hash-agg improvements

The DDS for the hash-join, hash-agg performance improvement project is available here as an OpenOffice ODT, file, or here as a Word .doc file. A brief summary of the impovements:

  • decoupled hash algorithm
  • fast skip of same-key entries (join)
  • query setup time analysis of QP for much more accurate hash memory reservation (big change!)
  • better reservation heuristics, especially for agg
  • rehash upon recursion (agg only)
  • decouple read and write (=partition) buffer sizes, and add config parameters
  • simple (RLL) row compression
  • unified spill files: one per partition, instead of two per partition (join)
  • include bitvector space for hash build (join)
  • don't impose merge-join sortedness considerations upon hash join (allows hash join to use full joining key)
  • hot-path code improvements in hash join

This change has been submitted and is available in the "main" code line.


SIR 122513: Partitioning related query execution improvements

The DDS for the partitioning related query execution improvement project is available here as an OpenOffice ODT, file, or here as a Word .doc file. A brief summary of the improvements:

  • PC aggregation (hash agg only)
  • Nested PC-joins, PC-agg over PC-join.
  • qualify more generalized expressions, e.g. OR lists, OR'ed between.
  • join-time partition qualification
  • hot-path code improvements in orig

This change has been submitted and is available in the "main" code line.


SIR 122739: Improved DMF Rowaccessor Scheme

The DDS for improvements to DMF row-accessing is available here as an OpenOffice ODT file, or here as a Word .doc file. The motivation for this change is to make it easier to add new row-level compression types, and to improve the performance of the existing standard compression.


SIR (122757): Various DI Improvements

The DDS for a collection of DI-level improvements is available here as an OpenOffice ODT file, or here as a Word .doc file. The changes include xfs / ext4 fallocate reservations, as well as various other performance improvements.

Performance enhancement list

Here's a plain, unembellished list of performance enhancements done at Datallegro and waiting to be integrated over to the standard Ingres code line:

Items completed:

  • hash-join, hash-agg performance (see above for DDS)
  • Partitioning related query execution improvements (see above for DDS)
  • revised DMF row-accessor framework (see above for DDS)
    • permit new row compression types to be added more easily
    • better compression=(data) performance, especially with many columns
  • XFS reservations (file preallocation, of a sort) (see above for DDS)
  • better DMT-SHOW performance especially for session temps
  • partition close-behind during scan (better memory usage)
  • recovery performance when many nonredo entries are present

Items deferred for post-10.0 release. The last two items (page compression and pseudo-temporary tables) are unlikely to be integrated in their original Datallegro form, as the changes were very specific to that environment.

  • INSERT-SELECT into non-journaled heap via LOAD instead of row-PUT
    • much faster, much less logging
    • bulk-load conditions apply: heap (only), no SI, no journaling
  • DMF sort memory larger, and somewhat configurable
  • partitioned global temporary tables
  • compression = (page)
  • PSEUDOTEMPORARY tables (probably too nonstandard, but maybe not?)

Documentation

Hash Join and Hash Aggregation Improvements

Improvements to hash join and hash aggregation allow faster queries and better concurrent query capability. Environments that will benefit are those with moderate to heavy concurrency (where memory usage is a potential problem), or in which the hash join or hash aggregation cannot fit in memory and is therefore spilled to disk before completing.

This feature adds five new configuration parameters to the DBMS Server component:

qef_hash_rbsize

The size in bytes of a read buffer for a hash operation (join or aggregation). There is one read buffer per hash operation, so we recommend relatively large values.
Default: 128 KB

qef_hash_wbsize

The size in bytes of a write buffer for a hash operation (join or aggregation). There are many write buffers per hash operation, and in the hash join case, the write buffer size interacts with the number of hash “buckets”. Larger buffers can mean fewer (and therefore larger) buckets. Smaller buffers mean less room for rows in memory and more spillage. Installations must balance spill write efficiency (large qef_hash_wbsize) against memory usage and the ability to fit many buckets in memory (small qef_hash_wbsize).
Default: 16 KB

qef_hash_cmp_threshold

The minimum size of the compressible part (non-key part) of a row to be considered for run-length compression. Run-length compression allows more hash-join rows to fit in memory, and reduces the size of hash join or hash aggregation spill to disk, at the expense of some extra CPU overhead.
Default: 128 bytes

qef_hashjoin_min

The minimum allocation, in bytes, to a hash join. Most installations do not need to change qef_hashjoin_min unless the optimizer underestimates the size of hash joins, causing excessive spillage.
Default: 7*qef_hash_wbsize + qef_hash_rbsize + 100K

qef_hashjoin_max

The maximum allocation, in bytes, to a hash join. Most installations do not need to change qef_hashjoin_max unless the optimizer frequently over-estimates the size of hash joins, taking memory away from needier hash joins.
Default: 128 MB

The qef_hash_rbsize and qef_hash_wbsize parameters may need to be changed to suit your environment. The qef_hashjoin_min and qef_hashjoin_max parameters will rarely need to be changed, since they are for preventing problems in extreme situations.

The Ingres upgrade process adds these parameters to an existing config.dat. If for some reason any are missing from config.dat, the default values are taken.

For details on the parameters, see the online help for Configuration-By-Forms (or the equivalent visual tool).


Query Execution Improvements Related to Partitioned Tables

Enhancements to the optimizer improve execution times for queries against partitioned tables, especially queries used in data warehousing applications. Query optimization involving partitioned tables previously included partition pruning against constant predicates, and partition-compatible joining. The enhancements include: more complete analysis of partition pruning opportunities, join-time partition pruning, nested partition-compatible joining, and partition-compatible aggregation.

DMF Row Accessor Scheme

Internal changes to the DMF data structures improve the performance of standard compression, especially for tables with many columns. (Standard compression is specified by using the WITH COMPRESSION=DATA clause on the MODIFY or the DECLARE GLOBAL TEMPORARY TABLE statement.) These improvements will also enable new compression schemes in the future, such as run-length encoding compression.


Direct I/O Improvements (UNIX)

Performance improvements are made to Ingres direct I/O capability in UNIX environments, including Linux. Control of direct I/O is now configured at the system rather than DBMS level. Administrators can select direct I/O independently for tables, the transaction log, and build files (tables being loaded). Direct I/O is beneficial in large data warehouse environments, when the OS file system cache is ineffective (because of large sequential scans). OLTP-oriented installations may run better with direct I/O OFF, making use of the OS file system cache.

These improvements add new direct I/O configuration parameters to Ingres:

ii.$.config.direct_io

Enables direct I/O for tables.
Default: OFF

ii.$.config.direct_io_log

Enables direct I/O for the transaction log.
Default: OFF

A new value, DIRECT, can be specified on the following existing parameter:

ii.$.rcp.load_optim

Enables direct I/O during table build operations, when set to DIRECT. The default remains ON (table build operations use normal cached writing).

Note: If the platform or file system does not support direct I/O, the parameter is ignored, or in the case of load_optim, the value reverts to ON, the default.

The configuration parameters dbms.*.direct_io and recovery.direct_io are now obsolete. During upgrade, if dbms.*.direct_io is ON in the old version, then config.direct_io and config.direct_io_log are set to ON.


File Allocation Improvements (UNIX)

Ingres can now take advantage of the "fallocate" function of certain file systems, which reserves (pre-allocates) disk space as files are written. When applied to table files, pre-allocation may slow creation of the table somewhat, but subsequent reading of the table may be faster, since the table is more likely to be allocated sequentially on disk. The improvement is most likely to be seen when table scans are common, such as in data warehousing installations with large tables.

To take advantage of pre-allocation, the table should be created with the ALLOCATION= and EXTEND= parameters set to reflect the expected table size.

The default allocation and extend values are too small to be useful for pre-allocation.

This improvement adds a DBMS Server configuration parameter to Ingres:

ii.hostname.dbms.*.fallocate

Enables use of file pre-allocation through fallocate. If the file system does not support fallocate or an equivalent, the parameter setting does not matter (is effectively OFF).
Default: OFF
Personal tools
© 2011 Actian Corporation. All Rights Reserved