Oracle中in与existnotin与notexist性能问题.docx

资源描述

Oracle中in与existnotin与notexist性能问题.docx

《Oracle中in与existnotin与notexist性能问题.docx》由会员分享，可在线阅读，更多相关《Oracle中in与existnotin与notexist性能问题.docx（35页珍藏版）》请在冰豆网上搜索。

Oracle中in与existnotin与notexist性能问题.docx

Oracle中in与existnotin与notexist性能问题

上星期五与haier讨论in跟exists的性能问题，正好想起原来公司的一个关于notin的规定，本想做个实验证明我的观点是正确的，但多次实验结果却给了我一个比较大的教训。

我又咨询了下oracle公司工作的朋友，确实是我持有的观点太保守了。

于是写个文章总结下，希望对大家有所启发。

后面可能有大篇是关于10053trace的内容，只作实验证明，可直接忽略看最终的结论即可。

我们知道，in 是把外表和内表作hash 连接，而exists是对外表作loop循环，每次loop循环再对内表进行查询。

一直以来认为exists比in效率高的说法是不准确的。

如果查询的两个表大小相当，那么用in和exists是差别不大的。

但如果两个表中一个较小，一个是大表，则子查询表大的用exists，子查询表小的用in，效率才是最高的。

假定表A（小表），表B（大表），cc列上都有索引：

∙select * from A where cc in （select cc from B）。

--效率低，用到了A表上cc列的索引

∙select * from A where exists（select cc from B where cc=A.cc）。

--效率高，用到了B表上cc列的索引。

相反的：

∙select * from B where cc in （select cc from A）。

--效率高，用到了B表上cc列的索引

∙select * from B where exists（select cc from A where cc=B.cc）。

--效率低，用到了A表上cc列的索引

通过使用exists，Oracle会首先检查主查询，然后运行子查询直到它找到第一个匹配项，这就节省了时间。

Oracle在执行IN子查询时，首先执行子查询，并将获得的结果列表存放在一个加了索引的临时表中。

在执行子查询之前，系统先将主查询挂起，待子查询执行完毕，存放在临时表中以后再执行主查询。

这也就是使用EXISTS比使用IN通常查询速度快的原因。

那notin跟exists呢？

如果查询语句使用了not in 那么内外表都进行全表扫描，没有用到索引；而not exists 的子查询依然能用到表上的索引。

所以无论那个表大，用not exists都比not in要快。

not in 逻辑上不完全等同于not exists，请看下面的例子：

create table t1 （c1 number,c2 number）。

create table t2 （c1 number,c2 number）。

insert into t1 values （1,2）。

insert into t1 values （1,3）。

insert into t2 values （1,2）。

insert into t2 values （1,null）。

select * from t1 where c2 not in （select c2 from t2）。

--结果是no rows found

select * from t1 where not exists （select 1 from t2 where t1.c2=t2.c2）。

--结果是1 3

正如所看到的，not in 出现了不期望的结果集，存在逻辑错误。

如果看一下上述两个select语句的执行计划，也会不同。

后者使用了hashjoin。

因此，请尽量不要使用not in（它会调用子查询），而尽量使用not exists（它会调用关联子查询）。

如果子查询中返回的任意一条记录含有空值，则查询将不返回任何记录，正如上面例子所示。

除非子查询字段有非空限制，这时可以使用not in ,并且也可以通过提示让它使用hasg_aj或merge_aj连接。

notin（...）括号中的返回值不能存在null值，是OracleSQL开发的一条铁律。

我们再看下性能方面。

关于这2个谁的性能好坏的讨论从来就没有停止过，我不想牵扯进去。

。

只是先提出一条，基于哪个oracle的版本。

为什么？

因为oracle的CBO算法是一直在优化当中的。

这时，你应该心存感谢，因为我们写的非常多的性能不高的sql，oracle都默默地绞尽脑汁地给你优化过了。

。

废话不多说，我们建2个表用来实验下：

createtabletest1（colnumber）。

createtabletest2（colnumber）。

然后插入一些数据：

insertintotest1

selectlevelfromdualconnectbylevel<=100000。

insertintotest2

selectlevel+1fromdualconnectbylevel<=100000。

commit。

然后来分别看一下使用notexists和notin的性能差异：

17:

16:

30SQL>select*fromv$versionwhererownum=1。

BANNER

-----------------------------------------------------

OracleDatabase10gRelease10.1.0.5.0–Production

17:

01SQL>settimingon

17:

47SQL>select*fromtest1wherenotexists（select1fromtest2wheretest1.col=test2.col）。

COL

----------

Elapsed:

00:

00.25

17:

59SQL>select*fromtest1wherecolnotin（selectcolfromtest2）。

COL

----------

Elapsed:

00:

08:

31.14

确实，两者所需要的时间非常大，这也是我们最常看到的结果。

看下执行计划：

我们发现，对于oracle10g，第一个sql没什么可说的了，出现了hashjoin，直接跳过。

对于第2个sql，可以看到，关联谓词是filter，它类似于两表关联中的nestedloop，也就是跑两层循环，可见它的效率有多差。

为什么notin不能使用hashjoin作为执行计划呢？

正如上面解释的，因为内表或外表中存在空值对最终结果产生的影响是hashjoin无法实现的，因为hashjoin不支持把空值放到hash桶中，所以它没办法处理外表和内表中存在的空值，效率与正确性放在一起时，肯定是要选择正确性，所以oracle必须放弃效率，保证正确性，采用filter谓词。

这个执行计划中我们还有感兴趣的东西，那就是：

LNNVL（"COL"<>:

B1），它在这里的作用很巧妙，oracle知道使用filter性能很差，所以它在扫描内表test2时，会使用LNNVL来检查test2.col是否存在null值，只要扫描到null值，就可以断定最终的结果为空值，也就没有了继续执行的意义，所以oracle可以马上终止执行，在某种意义上它弥补了filter较差的性能。

而具体的优化方法，我们通过最后一个实验来解决。

我们再换一个11g的库看一下：

17:

16:

17SQL>select*fromv$versionwhererownum=1。

BANNER

----------------------------------------------------------------------------

OracleDatabase11gEnterpriseEditionRelease11.2.0.1.0-64bitProduction

17:

15:

44SQL>settimingon

17:

15:

48SQL>select*fromtest1wherenotexists（select1fromtest2wheretest1.col=test2.col）。

COL

----------

Elapsed:

00:

00.24

17:

16:

08SQL>select*fromtest1wherecolnotin（selectcolfromtest2）。

COL

----------

Elapsed:

00:

00.06

看下它的执行计划：

当时看到这执行计划时，确实让我有点不解了，2者的执行计划几乎是一样的，只是在谓语分析部分跟hashjoin的方式略有不同。

再试一次，结果也是一样的。

我们也注意到，执行计划并没有使用到oracle的收集信息，而是使用了动态采样。

我们再尝试使用统计信息收集，再看下执行计划，结果是一样的。

当然，因为使用了hashjoin，结果基本是瞬间的事情。

于是我又咨询了下oracle公司工作的朋友，其实，in和exist的区别只在10.2.0.3及以前的版本中存在；而10.2.0.4及以后的版本中，in和exist的效果是完全一样的。

以下是Oracle官方的解释：

Incertaincircumstances,itisbettertouseINratherthanEXISTS.Ingeneral,iftheselectivepredicateisinthesubquery,thenuseIN.Iftheselectivepredicateisintheparentquery,thenuseEXISTS.

Sometimes,OraclecanrewriteasubquerywhenusedwithanINclausetotakeadvantageofselectivityspecifiedinthesubquery.Thisismostbeneficialwhenthemostselectivefilterappearsinthesubqueryandthereareindexesonthejoincolumns.Conversely,usingEXISTSisbeneficialwhenthemostselectivefilterisintheparentquery.ThisallowstheselectivepredicatesintheparentquerytobeappliedbeforefilteringtherowsagainsttheEXISTScriteria.

所以以后大家遇到有类型的情况，优化使用notexist，毕竟是在所有oracle版本中通用的。

我们再来看一个对notin优化的思路：

首先来看两个sql，返回结果相同，但是耗时差别很大：

SQL>select*fromv$versionwhererownum=1。

BANNER

-----------------------------------------------------

OracleDatabase10gRelease10.2.0.1.0–Production

SQL>selectcount（*）

2fromjustin_goodr

3wherenotexists

4（select'x'fromjustin_countpcwherepc.id=r.justin_good_id）

5。

COUNT（*）

----------

7229

Executedin3.437seconds

SQL>selectcount（*）

2fromjustin_goodr

3wherer.justin_good_idnotin

4（selectpc.idfromjustin_countpc）

5。

COUNT（*）

----------

7229

Executedin128.203seconds

再来看一下它们的执行计划

使用notexist的语句cost为3452，而notin的却达到14216

SQL>explainplanforselectcount（*）

2fromjustin_goodr

3wherenotexists

4（select'x'fromjustin_countpcwherepc.id=r.justin_good_id）。

Explained.

SQL>select*fromtable（dbms_xplan.display）。

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------------------------------

Planhashvalue:

1087925722

----------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

|0|SELECTSTATEMENT||1|9|3452

（2）|00:

00:

42|

|1|SORTAGGREGATE||1|9|||

|*2|HASHJOINRIGHTANTI||59|531|3452

（2）|00:

00:

42|

|3|INDEXFASTFULLSCAN|PK11_1|4562|22810|4（0）|00:

00:

01|

|4|TABLEACCESSFULL|justin_good|602K|2355K|3440

（2）|00:

00:

42|

----------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------

PredicateInformation（identifiedbyoperationid）:

---------------------------------------------------

2-access（"PC"."ID"="R"."justin_good_id"）

16rowsselected.

SQL>explainplanforselectcount（*）

2fromjustin_goodr

3wherer.justin_good_idnotin

4（selectpc.idfromjustin_countpc）。

Explained.

SQL>select*fromtable（dbms_xplan.display）。

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------------------------------

Planhashvalue:

4119029611

-------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------

|0|SELECTSTATEMENT||1|4|14216

（2）|00:

02:

51|

|1|SORTAGGREGATE||1|4|||

|*2|FILTER||||||

|3|TABLEACCESSFULL|justin_good|602K|2355K|3442

（2）|00:

00:

42|

|*4|INDEXFULLSCAN|PK11_1|1|5|11（0）|00:

00:

01|

-------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------------------------------------------------------------------------------------------

PredicateInformation（identifiedbyoperationid）:

---------------------------------------------------

2-filter（NOTEXISTS（SELECT0FROM"justin"."justin_count""PC"WHERE

LNNVL（"PC"."ID"<>:

B1）））

4-filter（LNNVL（"PC"."ID"<>:

B1））

18rowsselected.

可以看到使用notexist的sql采用了hashjoinanti，而notin的却使用了filter

改写一下语句,确保justin_good_id不会在查询中返回NULL

SQL>setlinesize300

SQL>explainplanforselectcount（*）

2fromjustin_goodr

3wherenvl（r.justin_good_id,'NULL'）notin

4（selectpc.idfromjustin_countpc）。

Explained.

SQL>select*fromtable（dbms_xplan.display）。

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------------------------------------------------

Planhashvalue:

1087925722

----------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------

|0|SELECTSTATEMENT||1|9|3452

（2）|00:

00:

42|

|1|SORTAGGREGATE||1|9|||

|*2|HASHJOINRIGHTANTI||602K|5298K|3452

（2）|00:

00:

42|

|3|INDEXFASTFULLSCAN|PK11_1|4562|22810|4（0）|00:

00:

01|

|4|TABLEACCESSFULL|justin_good|602K|2355K|3440

（2）|00:

00:

42|

----------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------------------------

PredicateInformation（identifiedbyoperationid）:

---------------------------------------------------

1-access（"PC"."ID"=NVL（"R"."justin_good_id",TO_NUMBER（'NULL'）））

SQL>explainplanforselectcount（*）

2fromjustin_goodr

3wherer.justin_good_idnotin

4（selectpc.idfromjustin_countpc）

5andr.justin_good_idisnotnull。

Explained.

SQL>select*fromtable（dbms_xplan.display）。

PLAN_TABLE_OUTPUT

------------------------------

展开阅读全文