Oracle中in与existnot in与not exist的性能问题.docx

资源描述

Oracle中in与existnot in与not exist的性能问题.docx

《Oracle中in与existnot in与not exist的性能问题.docx》由会员分享，可在线阅读，更多相关《Oracle中in与existnot in与not exist的性能问题.docx（42页珍藏版）》请在冰豆网上搜索。

Oracle中in与existnot in与not exist的性能问题.docx

Oracle中in与existnotin与notexist的性能问题

上星期五与haier讨论in跟exists的性能问题，正好想起原来公司的一个关于notin的规定，本想做个实验证明我的观点是正确的，但多次实验结果却给了我一个比较大的教训。

我又咨询了下oracle公司工作的朋友，确实是我持有的观点太保守了。

于是写个文章总结下，希望对大家有所启发。

后面可能有大篇是关于10053trace的内容，只作实验证明，可直接忽略看最终的结论即可。

我们知道，in 是把外表和内表作hash 连接，而exists是对外表作loop循环，每次loop循环再对内表进行查询。

一直以来认为exists比in效率高的说法是不准确的。

如果查询的两个表大小相当，那么用in和exists是差别不大的。

但如果两个表中一个较小，一个是大表，则子查询表大的用exists，子查询表小的用in，效率才是最高的。

假定表A（小表），表B（大表），cc列上都有索引：

∙select * from A where cc in （select cc from B）;--效率低，用到了A表上cc列的索引

∙select * from A where exists（select cc from B where cc=A.cc）;--效率高，用到了B表上cc列的索引。

相反的：

∙select * from B where cc in （select cc from A）;--效率高，用到了B表上cc列的索引

∙select * from B where exists（select cc from A where cc=B.cc）;--效率低，用到了A表上cc列的索引

通过使用exists，Oracle会首先检查主查询，然后运行子查询直到它找到第一个匹配项，这就节省了时间。

Oracle在执行IN子查询时，首先执行子查询，并将获得的结果列表存放在一个加了索引的临时表中。

在执行子查询之前，系统先将主查询挂起，待子查询执行完毕，存放在临时表中以后再执行主查询。

这也就是使用EXISTS比使用IN通常查询速度快的原因。

那notin跟exists呢？

如果查询语句使用了not in 那么内外表都进行全表扫描，没有用到索引；而not exists 的子查询依然能用到表上的索引。

所以无论那个表大，用not exists都比not in要快。

not in 逻辑上不完全等同于not exists，请看下面的例子：

create table t1 （c1 number,c2 number）;

create table t2 （c1 number,c2 number）;

insert into t1 values （1,2）;

insert into t1 values （1,3）;

insert into t2 values （1,2）;

insert into t2 values （1,null）;

select * from t1 where c2 not in （select c2 from t2）; --结果是no rows found

select * from t1 where not exists （select 1 from t2 where t1.c2=t2.c2）; --结果是1 3

正如所看到的，not in 出现了不期望的结果集，存在逻辑错误。

如果看一下上述两个select语句的执行计划，也会不同。

后者使用了hashjoin。

因此，请尽量不要使用not in（它会调用子查询），而尽量使用not exists（它会调用关联子查询）。

如果子查询中返回的任意一条记录含有空值，则查询将不返回任何记录，正如上面例子所示。

除非子查询字段有非空限制，这时可以使用not in ,并且也可以通过提示让它使用hasg_aj或merge_aj连接。

notin（...）括号中的返回值不能存在null值，是OracleSQL开发的一条铁律。

我们再看下性能方面。

关于这2个谁的性能好坏的讨论从来就没有停止过，我不想牵扯进去。

。

只是先提出一条，基于哪个oracle的版本。

为什么？

因为oracle的CBO算法是一直在优化当中的。

这时，你应该心存感谢，因为我们写的非常多的性能不高的sql，oracle都默默地绞尽脑汁地给你优化过了。

。

废话不多说，我们建2个表用来实验下：

createtabletest1（colnumber）;

createtabletest2（colnumber）;

然后插入一些数据：

insertintotest1

selectlevelfromdualconnectbylevel<=100000;

insertintotest2

selectlevel+1fromdualconnectbylevel<=100000;

commit;

然后来分别看一下使用notexists和notin的性能差异：

17:

16:

30SQL>select*fromv$versionwhererownum=1;

BANNER

-----------------------------------------------------

OracleDatabase10gRelease10.1.0.5.0–Production

17:

01SQL>settimingon

17:

47SQL>select*fromtest1wherenotexists（select1fromtest2wheretest1.col=test2.col）;

COL

----------

Elapsed:

00:

00.25

17:

59SQL>select*fromtest1wherecolnotin（selectcolfromtest2）;

COL

----------

Elapsed:

00:

08:

31.14

确实，两者所需要的时间非常大，这也是我们最常看到的结果。

看下执行计划：

我们发现，对于oracle10g，第一个sql没什么可说的了，出现了hashjoin，直接跳过。

对于第2个sql，可以看到，关联谓词是filter，它类似于两表关联中的nestedloop，也就是跑两层循环，可见它的效率有多差。

为什么notin不能使用hashjoin作为执行计划呢？

正如上面解释的，因为内表或外表中存在空值对最终结果产生的影响是hashjoin无法实现的，因为hashjoin不支持把空值放到hash桶中，所以它没办法处理外表和内表中存在的空值，效率与正确性放在一起时，肯定是要选择正确性，所以oracle必须放弃效率，保证正确性，采用filter谓词。

这个执行计划中我们还有感兴趣的东西，那就是：

LNNVL（"COL"<>:

B1），它在这里的作用很巧妙，oracle知道使用filter性能很差，所以它在扫描内表test2时，会使用LNNVL来检查test2.col是否存在null值，只要扫描到null值，就可以断定最终的结果为空值，也就没有了继续执行的意义，所以oracle可以马上终止执行，在某种意义上它弥补了filter较差的性能。

而具体的优化方法，我们通过最后一个实验来解决。

我们再换一个11g的库看一下：

17:

16:

17SQL>select*fromv$versionwhererownum=1;

BANNER

----------------------------------------------------------------------------

OracleDatabase11gEnterpriseEditionRelease11.2.0.1.0-64bitProduction

17:

15:

44SQL>settimingon

17:

15:

48SQL>select*fromtest1wherenotexists（select1fromtest2wheretest1.col=test2.col）;

COL

----------

Elapsed:

00:

00.24

17:

16:

08SQL>select*fromtest1wherecolnotin（selectcolfromtest2）;

COL

----------

Elapsed:

00:

00.06

看下它的执行计划：

当时看到这执行计划时，确实让我有点不解了，2者的执行计划几乎是一样的，只是在谓语分析部分跟hashjoin的方式略有不同。

再试一次，结果也是一样的。

我们也注意到，执行计划并没有使用到oracle的收集信息，而是使用了动态采样。

我们再尝试使用统计信息收集，再看下执行计划，结果是一样的。

当然，因为使用了hashjoin，结果基本是瞬间的事情。

于是我又咨询了下oracle公司工作的朋友，其实，in和exist的区别只在10.2.0.3及以前的版本中存在；而10.2.0.4及以后的版本中，in和exist的效果是完全一样的。

以下是Oracle官方的解释：

Incertaincircumstances,itisbettertouseINratherthanEXISTS.Ingeneral,iftheselectivepredicateisinthesubquery,thenuseIN.Iftheselectivepredicateisintheparentquery,thenuseEXISTS.

Sometimes,OraclecanrewriteasubquerywhenusedwithanINclausetotakeadvantageofselectivityspecifiedinthesubquery.Thisismostbeneficialwhenthemostselectivefilterappearsinthesubqueryandthereareindexesonthejoincolumns.Conversely,usingEXISTSisbeneficialwhenthemostselectivefilterisintheparentquery.ThisallowstheselectivepredicatesintheparentquerytobeappliedbeforefilteringtherowsagainsttheEXISTScriteria.

所以以后大家遇到有类型的情况，优化使用notexist，毕竟是在所有oracle版本中通用的。

我们再来看一个对notin优化的思路：

首先来看两个sql，返回结果相同，但是耗时差别很大：

SQL>select*fromv$versionwhererownum=1;

BANNER

-----------------------------------------------------

OracleDatabase10gRelease10.2.0.1.0–Production

SQL>selectcount（*）

2fromjustin_goodr

3wherenotexists

4（select'x'fromjustin_countpcwherepc.id=r.justin_good_id）

COUNT（*）

----------

7229

Executedin3.437seconds

SQL>selectcount（*）

2fromjustin_goodr

3wherer.justin_good_idnotin

4（selectpc.idfromjustin_countpc）

COUNT（*）

----------

7229

Executedin128.203seconds

再来看一下它们的执行计划

使用notexist的语句cost为3452，而notin的却达到14216

SQL>explainplanforselectcount（*）

2fromjustin_goodr

3wherenotexists

4（select'x'fromjustin_countpcwherepc.id=r.justin_good_id）;

Explained.

SQL>select*fromtable（dbms_xplan.display）;

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------------------------------

Planhashvalue:

1087925722

----------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

|0|SELECTSTATEMENT||1|9|3452

（2）|00:

00:

42|

|1|SORTAGGREGATE||1|9|||

|*2|HASHJOINRIGHTANTI||59|531|3452

（2）|00:

00:

42|

|3|INDEXFASTFULLSCAN|PK11_1|4562|22810|4（0）|00:

00:

01|

|4|TABLEACCESSFULL|justin_good|602K|2355K|3440

（2）|00:

00:

42|

----------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------

PredicateInformation（identifiedbyoperationid）:

---------------------------------------------------

2-access（"PC"."ID"="R"."justin_good_id"）

16rowsselected.

SQL>explainplanforselectcount（*）

2fromjustin_goodr

3wherer.justin_good_idnotin

4（selectpc.idfromjustin_countpc）;

Explained.

SQL>select*fromtable（dbms_xplan.display）;

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------------------------------

Planhashvalue:

4119029611

-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

|0|SELECTSTATEMENT||1|4|14216

（2）|00:

02:

51|

|1|SORTAGGREGATE||1|4|||

|*2|FILTER||||||

|3|TABLEACCESSFULL|justin_good|602K|2355K|3442

（2）|00:

00:

42|

|*4|INDEXFULLSCAN|PK11_1|1|5|11（0）|00:

00:

01|

-------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------

PredicateInformation（identifiedbyoperationid）:

---------------------------------------------------

2-filter（NOTEXISTS（SELECT0FROM"justin"."justin_count""PC"WHERE

LNNVL（"PC"."ID"<>:

B1）））

4-filter（LNNVL（"PC"."ID"<>:

B1））

18rowsselected.

可以看到使用notexist的sql采用了hashjoinanti，而notin的却使用了filter

改写一下语句,确保justin_good_id不会在查询中返回NULL

SQL>setlinesize300

SQL>explainplanforselectcount（*）

2fromjustin_goodr

3wherenvl（r.justin_good_id,'NULL'）notin

4（selectpc.idfromjustin_countpc）;

Explained.

SQL>select*fromtable（dbms_xplan.display）;

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------------------------------------------------

Planhashvalue:

1087925722

----------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

|0|SELECTSTATEMENT||1|9|3452

（2）|00:

00:

42|

|1|SORTAGGREGATE||1|9|||

|*2|HASHJOINRIGHTANTI||602K|5298K|3452

（2）|00:

00:

42|

|3|INDEXFASTFULLSCAN|PK11_1|4562|22810|4（0）|00:

00:

01|

|4|TABLEACCESSFULL|justin_good|602K|2355K|3440

（2）|00:

00:

42|

----------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------------------------

PredicateInformation（identifiedbyoperationid）:

---------------------------------------------------

1-access（"PC"."ID"=NVL（"R"."justin_good_id",TO_NUMBER（'NULL'）））

SQL>explainplanforselectcount（*）

2fromjustin_goodr

3wherer.justin_good_idnotin

4（selectpc.idfromjustin_countpc）

5andr.justin_good_idisnotnull;

Explained.

SQL>select*fromtable（dbms_xplan.display）;

PLAN_TABLE_OUTPUT

-----------------------

展开阅读全文