今天接到同事求助
说有一个select query
在Oracle上要跑一分多钟
他希望能在
s内出结果
以下就是解决这个问题的方法
需要的朋友可以参考下
该sql如下
复制代码 代码如下:
Select /*+ parallel(src
) */ distinct
src
systemname as systemname
src
databasename as databasename
src
tablename as tablename
src
username as username
from <STRONG>meta_dbql_table_usage_exp_hst</STRONG> src
inner <STRONG>join DR_QRY_LOG_EXP_HST</STRONG> rl on
<STRONG>src
acctstringdate = rl
acctstringdate
and src
queryid = rl
queryid</STRONG>
And Src
Systemname = Rl
Systemname
and src
acctstringdate > sysdate
And Rl
Acctstringdate > Sysdate
inner join <STRONG>meta_dr_qry_log_tgt_all_hst </STRONG>tgt on
upper(tgt
systemname) = upper(
MOZART
)
And Upper(tgt
Databasename) = Upper(
GDW_TABLES
)
And Upper(tgt
Tablename) = Upper(
SSA_SLNG_LSTG_MTRC_SD
)
<STRONG>AND src
acctstringdate = tgt
acctstringdate
and rl
statement_id = tgt
statement_id</STRONG>
and rl
systemname = tgt
systemname
And Tgt
Acctstringdate > Sysdate
And Not(
Upper(Tgt
Systemname)=Upper(src
systemname)
And
Upper(Tgt
Databasename) = Upper(Src
Databasename)
And
Upper(Tgt
Tablename) = Upper(Src
Tablename)
)
And tgt
Systemname is not null
And tgt
Databasename Is Not Null
And tgt
tablename is not null
SQL的简单分析
总 得来看这个SQL就是三个表 (meta_dbql_table_usage_exp_hstDR_QRY_LOG_EXP_HSTmeta_dr_qry_log_tgt_all_hst) 的INNER JOIN这三个表数据量都在百万级别且都是分区表(以acctstringdate为分区键)执行计划如下
复制代码 代码如下:
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
|
| SELECT STATEMENT | |
|
|
| | |
|
| PX COORDINATOR | | | | | | |
|
| PX SEND QC (RANDOM) | :TQ
|
|
|
| | |
|
| SORT UNIQUE | |
|
|
| | |
|
| PX RECEIVE | |
|
|
| | |
|
| PX SEND HASH | :TQ
|
|
|
| | |
|*
| TABLE ACCESS BY LOCAL INDEX ROWID| DR_QRY_LOG_EXP_HST |
|
|
| | |
|
| NESTED LOOPS | |
|
|
| | |
|
| NESTED LOOPS | |
|
K|
| | |
|
| BUFFER SORT | | | | | | |
|
| PX RECEIVE | | | | | | |
|
| PX SEND BROADCAST | :TQ
| | | | | |
|
| PARTITION RANGE ITERATOR | |
|
|
| KEY |
|
|*
| TABLE ACCESS FULL | META_DR_QRY_LOG_TGT_ALL_HST |
|
|
| KEY |
|
|
| PX BLOCK ITERATOR | |
|
K|
| KEY | KEY |
|*
| TABLE ACCESS FULL | META_DBQL_TABLE_USAGE_EXP_HST |
|
K|
| KEY | KEY |
|
| PARTITION RANGE ITERATOR | |
| |
| KEY | KEY |
|*
| INDEX RANGE SCAN | DR_QRY_LOG_EXP_HST_IDX |
| |
| KEY | KEY |
Predicate Information (identified by operation id):
filter("RL"
"STATEMENT_ID"="TGT"
"STATEMENT_ID" AND "RL"
"SYSTEMNAME"="TGT"
"SYSTEMNAME" AND "SRC"
"SYSTEMNAME"="RL"
"SYSTEMNAME")
filter(UPPER("TGT"
"SYSTEMNAME")=
MOZART
AND UPPER("TGT"
"DATABASENAME")=
GDW_TABLES
AND
UPPER("TGT"
"TABLENAME")=
SSA_SLNG_LSTG_MTRC_SD
AND "TGT"
"ACCTSTRINGDATE">SYSDATE@!
AND "TGT"
"SYSTEMNAME" IS NOT NULL
"TGT"
"DATABASENAME" IS NOT NULL AND "TGT"
"TABLENAME" IS NOT NULL)
filter("SRC"
"ACCTSTRINGDATE"="TGT"
"ACCTSTRINGDATE" AND (UPPER("TGT"
"SYSTEMNAME")<>UPPER("SRC"
"SYSTEMNAME") OR
UPPER("TGT"
"DATABASENAME")<>UPPER("SRC"
"DATABASENAME") OR UPPER("TGT"
"TABLENAME")<>UPPER("SRC"
"TABLENAME")) AND
"SRC"
"ACCTSTRINGDATE">SYSDATE@!
)
access("SRC"
"QUERYID"="RL"
"QUERYID" AND "SRC"
"ACCTSTRINGDATE"="RL"
"ACCTSTRINGDATE")
filter("RL"
"ACCTSTRINGDATE">SYSDATE@!
)
定位问题
从 上面执行计划中的表连接方式可以知道这三个表之间进行了两次NESTED LOOP问题出现在最里层的NESTED LOOP(对两个表都做了TABLE FULL SCAN)因为表都是百万级别的(即时过滤后的数据量也不小)性能问题就出现在内表(即被驱动 表)META_DBQL_TABLE_USAGE_EXP_HST做了太多次的全表扫描如果能把全表扫描转换成索引则性能可以大幅度提高
下面是NESTED LOOP的介绍
嵌套连接把要处理的数据集分为外部循环(驱动数据源)和内部循环(被驱动数据源)外部循环只执行一次内部循环执行的次数等于外部循环执行返回的数据个数
这种连接的好处是内存使用非常少
如果驱动数据源有限且被驱动表在连接列上有相应的索引则这种连接方式才是高效的
下面是这三个表上索引的情况
复制代码 代码如下:
SQL> select index_name
table_name from user_indexes where table_name in (
DR_QRY_LOG_EXP_HST
upper(
meta_dbql_table_usage_exp_hst
)
upper(
meta_dr_qry_log_tgt_all_hs
INDEX_NAME TABLE_NAME
META_DR_QRY_LOG_TGT_ALL_IDX META_DR_QRY_LOG_TGT_ALL_HST
META_DBQL_TUSAGE_EHST_IDX META_DBQL_TABLE_USAGE_EXP_HST
DR_QRY_LOG_EXP_HST_IDX DR_QRY_LOG_EXP_HST
CREATE INDEX "GV"
"META_DR_QRY_LOG_TGT_ALL_IDX" ON "GV"
"META_DR_QRY_LOG_TGT_ALL_HST" ("STATEMENT_ID"
"ACCTSTRINGDATE")
CREATE INDEX "GV"
"META_DBQL_TUSAGE_EHST_IDX" ON "GV"
"META_DBQL_TABLE_USAGE_EXP_HST" ("QUERYID"
"ACCTSTRINGDATE")
CREATE INDEX "GV"
"DR_QRY_LOG_EXP_HST_IDX" ON "GV"
"DR_QRY_LOG_EXP_HST" ("QUERYID"
"ACCTSTRINGDATE")
这 三个索引都是本地分区索引(都包含分区键acctstringdate)很显然DR_QRY_LOG_EXP_HST表少了个索引因为它与表 meta_dr_qry_log_tgt_all_hst 在statement_id上做join因此应该在它的statement_id上也创建本地分区索引如下
复制代码 代码如下:
create index DR_QRY_LOG_EXP_HST_IDX
on gv
DR_QRY_LOG_EXP_HST (statement_id
ACCTSTRINGDATE) local;
性能对比
新的执行计划如下
复制代码 代码如下:
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
|
| SELECT STATEMENT | |
|
|
| | |
|
| SORT UNIQUE | |
|
|
| | |
|*
| TABLE ACCESS BY LOCAL INDEX ROWID | META_DBQL_TABLE_USAGE_EXP_HST |
|
|
| | |
|
| NESTED LOOPS | |
|
|
| | |
|
| NESTED LOOPS | |
|
|
| | |
|
| PARTITION RANGE ITERATOR | |
|
|
| KEY |
|
|*
| TABLE ACCESS FULL | META_DR_QRY_LOG_TGT_ALL_HST |
|
|
| KEY |
|
|
| PARTITION RANGE ITERATOR | |
|
|
| KEY |
|
|*
| TABLE ACCESS BY LOCAL INDEX ROWID| DR_QRY_LOG_EXP_HST |
|
|
| KEY |
|
|*
| <STRONG>INDEX RANGE SCAN | DR_QRY_LOG_EXP_HST_IDX
</STRONG> |
| |
| KEY |
|
|
| PARTITION RANGE ITERATOR | |
| |
| KEY | KEY |
|*
| INDEX RANGE SCAN | META_DBQL_TUSAGE_EHST_IDX |
| |
| KEY | KEY |
Predicate Information (identified by operation id):
filter((UPPER("TGT"
"SYSTEMNAME")<>UPPER("SRC"
"SYSTEMNAME") OR
UPPER("TGT"
"DATABASENAME")<>UPPER("SRC"
"DATABASENAME") OR UPPER("TGT"
"TABLENAME")<>UPPER("SRC"
"TABLENAME"))
AND "SRC"
"SYSTEMNAME"="RL"
"SYSTEMNAME")
filter(UPPER("TGT"
"SYSTEMNAME")=
MOZART
AND UPPER("TGT"
"DATABASENAME")=
GDW_TABLES
AND
UPPER("TGT"
"TABLENAME")=
SSA_SLNG_LSTG_MTRC_SD
AND "TGT"
"ACCTSTRINGDATE">SYSDATE@!
AND "TGT"
"SYSTEMNAME"
IS NOT NULL AND "TGT"
"DATABASENAME" IS NOT NULL AND "TGT"
"TABLENAME" IS NOT NULL)
filter("RL"
"SYSTEMNAME"="TGT"
"SYSTEMNAME")
access("RL"
"STATEMENT_ID"="TGT"
"STATEMENT_ID" AND "RL"
"ACCTSTRINGDATE">SYSDATE@!
AND
"RL"
"ACCTSTRINGDATE" IS NOT NULL)
access("SRC"
"QUERYID"="RL"
"QUERYID" AND "SRC"
"ACCTSTRINGDATE"="RL"
"ACCTSTRINGDATE")
filter("SRC"
"ACCTSTRINGDATE"="TGT"
"ACCTSTRINGDATE" AND "SRC"
"ACCTSTRINGDATE">SYSDATE@!
)
从新的的执行计划可以看出它的第一个NESTED LOOP果然用了最新创建的索引
下面是执行时间
复制代码 代码如下:
已用时间:
:
:
两秒种搞定远远超出他期望的s )
方法总结
NESTED LOOP高效的条件驱动数据源有限且被驱动表在连接列上有相应的索引