双三次插值及优化.docx

资源描述

双三次插值及优化.docx

《双三次插值及优化.docx》由会员分享，可在线阅读，更多相关《双三次插值及优化.docx（54页珍藏版）》请在冰豆网上搜索。

双三次插值及优化.docx

双三次插值及优化

1。

数学模型

对于一个目的像素,其坐标通过反向变换得到的在原图中的浮点坐标为（i+u，j+v），其中i、j均为非负整数，u、v为［0,1）区间的浮点数，双三次插值考虑一个浮点坐标（i+u，j+v）周围的16个邻点，目的像素值f（i+u,j+v）可由如下插值公式得到:

f（i+u,j+v）=[A］*[B]*[C]

[A]=［S（u+1）　S（u+0）　S（u—1）　S（u—2）]

　　┏f（i-1,j—1）　f（i—1，j+0）　f（i—1，j+1）　f（i-1，j+2）┓

［B]=┃f（i+0,j-1）　f（i+0,j+0）　f（i+0,j+1）　f（i+0，j+2）┃

　　┃f（i+1,j-1）　f（i+1,j+0）　f（i+1，j+1）　f（i+1，j+2）┃

　　┗f（i+2，j—1）　f（i+2,j+0）　f（i+2,j+1）　f（i+2,j+2）┛

　　┏S（v+1）┓

[C]=┃S（v+0）┃

　　┃S（v-1）┃

　　┗S（v-2）┛

　　┏1—2＊Abs（x）^2+Abs（x）^3　　　　　,0<=Abs（x）<1

S（x）={4-8*Abs（x）+5＊Abs（x）^2-Abs（x）^3　,1〈=Abs（x）〈2

　　┗0　　　　　　　　　　　　　　　,Abs（x）〉=2

S（x）是对Sin（x＊Pi）/x的逼近（Pi是圆周率——π）,为插值核。

2.计算流程

1.获取16个点的坐标P1、P2……P16

2。

由插值核计算公式S（x）分别计算出x、y方向的插值核向量Su、Sv

3.进行矩阵运算，得到插值结果

iTemp1=Su0*P1+Su1*P5+Su2＊P9+Su3＊P13

iTemp2=Su0*P2+Su1＊P6+Su2＊P10+Su3*P14

iTemp3=Su0＊P3+Su1*P7+Su2＊P11+Su3*P15

iTemp4=Su0*P4+Su1*P8+Su2＊P12+Su3*P16

iResult=Sv1＊iTemp1+Sv2*iTemp2+Sv3*iTemp3+Sv4*iTemp4

4.在得到插值结果图后，我们发现图像中有“毛刺”,因此对插值结果做了个后处理，即:

设该点在原图中的像素值为pSrc,若abs（iResult-pSrc）大于某阈值,我们认为插值后的点可能污染原图,因此用原像素值pSrc代替.

3。

算法优化

由于双三次插值计算一个点的坐标需要其周围16个点，更有多达20次的乘法及15次的加法，计算量可以说是非常大，势必要进行优化。

我们选择了Intel的SSE2优化技术，它只支持在P4及以上的机器。

测试当前CPU是否支持SSE2,可由CPUID指令得到,代码为：

BOOLg_bSSE2=FALSE;

__asm

{

moveax，1；

cpuid；

testedx,0x04000000;

jzNotSupport;

movg_bSSE2,1

NotSupport：

｝

支持SSE2的CPU引入了8个128位的寄存器，这样一个寄存器中就可以存放4个点（RGB），有利于并行计算。

详细代码见Transform。

cpp中函数Optimize_Bicubic。

优化中遇到的问题：

1.图像每个点由RGB通道组成，由于1个SSE2寄存器有16个字节，这样读入4个像素点后，要浪费4个字节，同时要花费时间将数据对齐，即由BRGB|RGBR｜GBRG|BRGB对齐成0RGB|0RGB|0RGB｜0RGB;

2.读16字节数据到寄存器时，由于图像地址不能保证是16字节对齐，因此需用更多时钟周期的MOVDQU指令（6个以上时钟周期）;如能使地址16字节对齐,则可用MOVDQA指令（1个时钟周期）；

3.为了消除除法及浮点运算，对权值放大256倍，这样在计算插值核时，必须用2Bytes来表示1个系数，而图像数据都是1Byte，这样在对齐做乘法时，要浪费一半的SSE2寄存器的空间,导致运算时间变长；而若降低插值核的精度，使其在1Byte表示范围内时，运算的精度又大为下降；

4。

对各指令的周期以及若干行指令是否能够并行流水缺乏经验和认识。

附：

SSE2指令整理

算术（Arithmetic）指令:

ADDPD--PackedDouble—PrecisionFloating-PointAddSSE2

2个double对应相加

ADDPDxmm0，xmm1/m128

ADDPS—-PackedSingle—PrecisionFloating—PointAddSSE

4个float对应相加

ADDPSxmm0，xmm1/m128

ADDSD--ScalarDouble—PrecisionFloating—PointAdd

1个double（低端）对应相加SSE2

ADDSDxmm0，xmm1/m64

ADDSS—-ScalarSingle-PrecisionFloating—PointAddSSE

1个float（低端）对应相加

ADDSSxmm0,xmm1/m32

PADDB/PADDW/PADDD-—PackedAdd

Opcode

Instruction

Description

0FFC/r

PADDBmm,mm/m64

Addpackedbyteintegersfrommm/m64andmm。

660FFC/r

PADDBxmm1,xmm2/m128

Addpackedbyteintegersfromxmm2/m128andxmm1。

0FFD/r

PADDWmm,mm/m64

Addpackedwordintegersfrommm/m64andmm。

660FFD/r

PADDWxmm1，xmm2/m128

Addpackedwordintegersfromxmm2/m128andxmm1。

0FFE/r

PADDDmm,mm/m64

Addpackeddoublewordintegersfrommm/m64andmm。

660FFE/r

PADDDxmm1，xmm2/m128

Addpackeddoublewordintegersfromxmm2/m128andxmm1。

PADDQ—-PackedQuadwordAdd

Opcode

Instruction

Description

0FD4/r

PADDQmm1，mm2/m64

Addquadwordintegermm2/m64tomm1

660FD4/r

PADDQxmm1，xmm2/m128

Addpackedquadwordintegersxmm2/m128toxmm1

PADDSB/PADDSW—-PackedAddwithSaturation

Opcode

Instruction

Description

0FEC/r

PADDSBmm，mm/m64

Addpackedsignedbyteintegersfrommm/m64andmmandsaturatetheresults.

660FEC/r

PADDSBxmm1,

xmm2/m128

Addpackedsignedbyteintegersfromxmm2/m128andxmm1saturatetheresults。

0FED/r

PADDSWmm,mm/m64

Addpackedsignedwordintegersfrommm/m64andmmandsaturatetheresults。

660FED/r

PADDSWxmm1,xmm2/m128

Addpackedsignedwordintegersfromxmm2/m128andxmm1andsaturatetheresults.

PADDUSB/PADDUSW--PackedAddUnsignedwithSaturation

Opcode

Instruction

Description

0FDC/r

PADDUSBmm，mm/m64

Addpackedunsignedbyteintegersfrommm/m64andmmandsaturatetheresults.

660FDC/r

PADDUSBxmm1，xmm2/m128

Addpackedunsignedbyteintegersfromxmm2/m128andxmm1saturatetheresults.

0FDD/r

PADDUSWmm，mm/m64

Addpackedunsignedwordintegersfrommm/m64andmmandsaturatetheresults。

660FDD/r

PADDUSWxmm1，xmm2/m128

Addpackedunsignedwordintegersfromxmm2/m128toxmm1andsaturatetheresults。

PMADDWD——PackedMultiplyandAdd

Opcode

Instruction

Description

0FF5/r

PMADDWDmm，mm/m64

Multiplythepackedwordsinmmbythepackedwordsinmm/m64。

Addthe32—bitpairsofresultsandstoreinmmasdoubleword

660FF5/r

PMADDWDxmm1,xmm2/m128

Multiplythepackedwordintegersinxmm1bythepackedwordintegersinxmm2/m128,andaddtheadjacentdoublewordresults。

PSADBW—-PackedSumofAbsoluteDifferences

Opcode

Instruction

Description

0FF6/r

PSADBWmm1，mm2/m64

Absolutedifferenceofpackedunsignedbyteintegersfrommm2/m64andmm1;differencesarethensummedtoproduceanunsignedwordintegerresult.

660FF6/r

PSADBWxmm1，xmm2/m128

Absolutedifferenceofpackedunsignedbyteintegersfromxmm2/m128andxmm1;the8lowdifferencesand8highdifferencesarethensummedseparatelytoproducetwowordintegerresults.

PSUBB/PSUBW/PSUBD—-PackedSubtract

Opcode

Instruction

Description

0FF8/r

PSUBBmm,mm/m64

Subtractpackedbyteintegersinmm/m64frompackedbyteintegersinmm.

660FF8/r

PSUBBxmm1，xmm2/m128

Subtractpackedbyteintegersinxmm2/m128frompackedbyteintegersinxmm1。

0FF9/r

PSUBWmm,mm/m64

Subtractpackedwordintegersinmm/m64frompackedwordintegersinmm。

660FF9/r

PSUBWxmm1，xmm2/m128

Subtractpackedwordintegersinxmm2/m128frompackedwordintegersinxmm1。

0FFA/r

PSUBDmm,mm/m64

Subtractpackeddoublewordintegersinmm/m64frompackeddoublewordintegersinmm。

660FFA/r

PSUBDxmm1，xmm2/m128

Subtractpackeddoublewordintegersinxmm2/mem128frompackeddoublewordintegersinxmm1。

PSUBQ-—PackedSubtractQuadword

Opcode

Instruction

Description

0FFB/r

PSUBQmm1，mm2/m64

Subtractquadwordintegerinmm1frommm2/m64.

660FFB/r

PSUBQxmm1,xmm2/m128

Subtractpackedquadwordintegersinxmm1fromxmm2/m128。

PSUBSB/PSUBSW--PackedSubtractwithSaturation

Opcode

Instruction

Description

0FE8/r

PSUBSBmm，mm/m64

Subtractsignedpackedbytesinmm/m64fromsignedpackedbytesinmmandsaturateresults.

660FE8/r

PSUBSBxmm1，xmm2/m128

Subtractpackedsignedbyteintegersinxmm2/m128frompackedsignedbyteintegersinxmm1andsaturateresults。

0FE9/r

PSUBSWmm,mm/m64

Subtractsignedpackedwordsinmm/m64fromsignedpackedwordsinmmandsaturateresults。

660FE9/r

PSUBSWxmm1，xmm2/m128

Subtractpackedsignedwordintegersinxmm2/m128frompackedsignedwordintegersinxmm1andsaturateresults.

PSUBUSB/PSUBUSW——PackedSubtractUnsignedwithSaturation

Opcode

Instruction

Description

0FD8/r

PSUBUSBmm,mm/m64

Subtractunsignedpackedbytesinmm/m64fromunsignedpackedbytesinmmandsaturateresult.

660FD8/r

PSUBUSBxmm1,xmm2/m128

Subtractpackedunsignedbyteintegersinxmm2/m128frompackedunsignedbyteintegersinxmm1andsaturateresult.

0FD9/r

PSUBUSWmm,mm/m64

Subtractunsignedpackedwordsinmm/m64fromunsignedpackedwordsinmmandsaturateresult.

660FD9/r

PSUBUSWxmm1,xmm2/m128

Subtractpackedunsignedwordintegersinxmm2/m128frompackedunsignedwordintegersinxmm1andsaturateresult。

SUBPD—-PackedDouble-PrecisionFloating—PointSubtract

Opcode

Instruction

Description

660F5C/r

SUBPDxmm1,xmm2/m128

Subtractpackeddouble—precisionfloating—pointvaluesinxmm2/m128fromxmm1.

SUBPS—-PackedSingle—PrecisionFloating—PointSubtract

Opcode

Instruction

Description

0F5C/r

SUBPSxmm1xmm2/m128

Subtractpackedsingle-precisionfloating—pointvaluesinxmm2/memfromxmm1。

SUBSD-—ScalarDouble-PrecisionFloating-PointSubtract

Opcode

Instruction

Description

F20F5C/r

SUBSDxmm1，xmm2/m64

Subtractsthelowdouble-precisionfloating-pointnumbersinxmm2/mem64fromxmm1。

SUBSS-—ScalarSingle—FPSubtract

Opcode

Instruction

Description

F30F5C/r

SUBSSxmm1,xmm2/m32

Subtractthelowersingle-precisionfloating-pointnumbersinxmm2/m32fromxmm1。

—--—-------—--———-————--———--——-——-———-—--—-————-—---——----———-——--—-—-—-—————-———---—-——————-—-—-———-

PMULHUW—-PackedMultiplyHighUnsigned

Opcode

Instruction

Description

0FE4/r

PMULHUWmm1,mm2/m64

Multiplythepackedunsignedwordintegersinmm1registerandmm2/m64，andstorethehigh16bitsoftheresultsinmm1。

660FE4/r

PMULHUWxmm1，xmm2/m128

Multiplythepackedunsignedwordintegersinxmm1andxmm2/m128，andstorethehigh16bitsoftheresultsinxmm1.

PMULHW—-PackedMultiplyHighSigned

Opcode

Instruction

Description

0FE5/r

PMULHWmm,mm/m64

Multiplythepackedsignedwordintegersinmm1registerandmm2/m64,andstorethehigh16bitsoftheresultsinmm1。

660FE5/r

PMULHWxmm1,xmm2/m128

Multiplythepackedsignedwordintegersinxmm1andxmm2/m128,andstorethehigh16bitsoftheresultsinxmm1。

PMULLW-—PackedMultiplyLowSigned

Opcode

Instruction

Description

0FD5/r

PMULLWmm,mm/m64

Multiplythepackedsignedwordintegersinmm1registerandmm2/m64,andstorethelow16bitsoftheresultsinmm1。

660FD5/r

PMULLWxmm1，xmm2/m128

Multiplythepackedsignedwordintegersinxmm1andxmm2/m128,andstorethelow16bitsoftheresultsinxmm1。

PMULUDQ--MultiplyDoublewordUnsigned

Opcode

Instruction

Description

0FF4/r

PMULUDQmm1，mm2/m64

Multiplyunsigneddoublewordintegerinmm1byunsigneddoublewordintegerinmm2/m64,andstorethequadwordresultinmm1。

66OFF4/r

PMULUDQxmm1,xmm2/m128

Multiplypackedunsigneddoublewordintegersinxmm1bypackedunsigneddoublewordintegersinxmm2/m128，andstorethequadwordresults

展开阅读全文