双三次插值及优化.docx-资源下载

双三次插值及优化.docx

1、双三次插值及优化1。数学模型对于一个目的像素,其坐标通过反向变换得到的在原图中的浮点坐标为(i+u，j+v)，其中i、j均为非负整数，u、v为0,1）区间的浮点数，双三次插值考虑一个浮点坐标（i+u，j+v)周围的16个邻点，目的像素值f（i+u,j+v）可由如下插值公式得到: f(i+u,j+v) = A * B * CA= S(u + 1)S（u + 0）S(u 1)S（u 2） f(i-1, j1）f（i1， j+0）f（i1， j+1)f（i-1， j+2) B= f(i+0, j-1)f（i+0, j+0)f（i+0, j+1)f(i+0， j+2) f（i+1, j-1）f（i+1

2、, j+0）f(i+1， j+1）f（i+1， j+2） f（i+2， j1）f（i+2, j+0）f(i+2, j+1）f(i+2, j+2） S(v + 1) C= S（v + 0） S(v - 1） S(v - 2） 12Abs(x）2+Abs（x)3 , 0=Abs(x)1S(x)= 4-8*Abs（x）+5Abs（x)2-Abs（x）3, 1=Abs(x）2 0 , Abs（x）=2S(x)是对 Sin(xPi）/x 的逼近（Pi是圆周率),为插值核。2.计算流程 1. 获取16个点的坐标P1、P2P16 2。由插值核计算公式S(x）分别计算出x、y方向的插值核向量Su、Sv 3

3、. 进行矩阵运算，得到插值结果 iTemp1 = Su0 * P1 + Su1 * P5 + Su2 P9 + Su3 P13 iTemp2 = Su0 * P2 + Su1 P6 + Su2 P10 + Su3 * P14 iTemp3 = Su0 P3 + Su1 * P7 + Su2 P11 + Su3 * P15 iTemp4 = Su0 * P4 + Su1 * P8 + Su2 P12 + Su3 * P16 iResult = Sv1 iTemp1 + Sv2 * iTemp2 + Sv3 * iTemp3 + Sv4 * iTemp4 4. 在得到插值结果图后，我们发现图像中有

4、“毛刺”,因此对插值结果做了个后处理，即:设该点在原图中的像素值为pSrc,若abs（iResult - pSrc）大于某阈值,我们认为插值后的点可能污染原图,因此用原像素值pSrc代替. 3。算法优化由于双三次插值计算一个点的坐标需要其周围16个点，更有多达20次的乘法及15次的加法，计算量可以说是非常大，势必要进行优化。我们选择了Intel的SSE2优化技术，它只支持在P4及以上的机器。测试当前CPU是否支持SSE2,可由CPUID指令得到,代码为： BOOL g_bSSE2 = FALSE; _asm mov eax， 1； cpuid； test edx, 0x04000000

5、; jz NotSupport; mov g_bSSE2, 1 NotSupport：支持SSE2的CPU引入了8个128位的寄存器，这样一个寄存器中就可以存放4个点（RGB），有利于并行计算。详细代码见Transform。cpp中函数Optimize_Bicubic。优化中遇到的问题：1. 图像每个点由RGB通道组成，由于1个SSE2寄存器有16个字节，这样读入4个像素点后，要浪费4个字节，同时要花费时间将数据对齐，即由BRGB | RGBR GBRG | BRGB对齐成 0RGB | 0RGB | 0RGB 0RGB ;2. 读16字节数据到寄存器时，由于图像地址不能保证是16字节对齐

6、，因此需用更多时钟周期的MOVDQU指令(6个以上时钟周期）;如能使地址16字节对齐,则可用MOVDQA指令(1个时钟周期) ；3. 为了消除除法及浮点运算，对权值放大256倍，这样在计算插值核时，必须用2Bytes来表示1个系数，而图像数据都是1Byte，这样在对齐做乘法时，要浪费一半的SSE2寄存器的空间,导致运算时间变长；而若降低插值核的精度，使其在1Byte表示范围内时，运算的精度又大为下降；4。对各指令的周期以及若干行指令是否能够并行流水缺乏经验和认识。附：SSE2指令整理算术(Arithmetic)指令:ADDPD-Packed DoublePrecision Floatin

7、g-Point Add SSE2 2个double对应相加ADDPD xmm0， xmm1/m128ADDPS-Packed SinglePrecision FloatingPoint Add SSE 4个float对应相加ADDPS xmm0， xmm1/m128ADDSD-Scalar DoublePrecision FloatingPoint Add 1个double(低端）对应相加 SSE2ADDSD xmm0， xmm1/m64ADDSS-Scalar Single-Precision FloatingPoint Add SSE 1个float（低端）对应相加ADDSS xmm0,

8、xmm1/m32PADDB/PADDW/PADDD-Packed AddOpcodeInstructionDescription0F FC /rPADDB mm, mm/m64Add packed byte integers from mm/m64 and mm。66 0F FC /rPADDB xmm1,xmm2/m128Add packed byte integers from xmm2/m128 and xmm1。0F FD /rPADDW mm, mm/m64Add packed word integers from mm/m64 and mm。66 0F FD /rPADDW xmm

9、1， xmm2/m128Add packed word integers from xmm2/m128 and xmm1。0F FE /rPADDD mm, mm/m64Add packed doubleword integers from mm/m64 and mm。66 0F FE /rPADDD xmm1， xmm2/m128Add packed doubleword integers from xmm2/m128 and xmm1。PADDQ-Packed Quadword AddOpcodeInstructionDescription0F D4 /rPADDQ mm1，mm2/m64

10、Add quadword integer mm2/m64 to mm166 0F D4 /rPADDQ xmm1，xmm2/m128Add packed quadword integers xmm2/m128 to xmm1PADDSB/PADDSW-Packed Add with SaturationOpcodeInstructionDescription0F EC /rPADDSB mm， mm/m64Add packed signed byte integers from mm/m64 and mm and saturate the results.66 0F EC /rPADDSB x

11、mm1,xmm2/m128Add packed signed byte integers from xmm2/m128 and xmm1 saturate the results。0F ED /rPADDSW mm, mm/m64Add packed signed word integers from mm/m64 and mm and saturate the results。66 0F ED /rPADDSW xmm1, xmm2/m128Add packed signed word integers from xmm2/m128 and xmm1 and saturate the res

12、ults.PADDUSB/PADDUSW-Packed Add Unsigned with SaturationOpcodeInstructionDescription0F DC /rPADDUSB mm， mm/m64Add packed unsigned byte integers from mm/m64 and mm and saturate the results.66 0F DC /rPADDUSB xmm1， xmm2/m128Add packed unsigned byte integers from xmm2/m128 and xmm1 saturate the results

13、.0F DD /rPADDUSW mm， mm/m64Add packed unsigned word integers from mm/m64 and mm and saturate the results。66 0F DD /rPADDUSW xmm1， xmm2/m128Add packed unsigned word integers from xmm2/m128 to xmm1 and saturate the results。PMADDWDPacked Multiply and AddOpcodeInstructionDescription0F F5 /rPMADDWD mm， m

14、m/m64Multiply the packed words in mm by the packed words in mm/m64。 Add the 32bit pairs of results and store in mm as doubleword66 0F F5 /rPMADDWD xmm1, xmm2/m128Multiply the packed word integers in xmm1 by the packed word integers in xmm2/m128, and add the adjacent doubleword results。 PSADBW-Packed

15、 Sum of Absolute DifferencesOpcodeInstructionDescription0F F6 /rPSADBW mm1， mm2/m64Absolute difference of packed unsigned byte integers from mm2 /m64 and mm1; differences are then summed to produce an unsigned word integer result.66 0F F6 /rPSADBW xmm1， xmm2/m128Absolute difference of packed unsigne

16、d byte integers from xmm2 /m128 and xmm1; the 8 low differences and 8 high differences are then summed separately to produce two word integer results.PSUBB/PSUBW/PSUBD-Packed SubtractOpcodeInstructionDescription0F F8 /rPSUBB mm, mm/m64Subtract packed byte integers in mm/m64 from packed byte integers

17、 in mm.66 0F F8 /rPSUBB xmm1， xmm2/m128Subtract packed byte integers in xmm2/m128 from packed byte integers in xmm1。0F F9 /rPSUBW mm, mm/m64Subtract packed word integers in mm/m64 from packed word integers in mm。66 0F F9 /rPSUBW xmm1， xmm2/m128Subtract packed word integers in xmm2/m128 from packed w

18、ord integers in xmm1。0F FA /rPSUBD mm, mm/m64Subtract packed doubleword integers in mm/m64 from packed doubleword integers in mm。66 0F FA /rPSUBD xmm1， xmm2/m128Subtract packed doubleword integers in xmm2/mem128 from packed doubleword integers in xmm1。PSUBQ-Packed Subtract QuadwordOpcodeInstructionD

19、escription0F FB /rPSUBQ mm1， mm2/m64Subtract quadword integer in mm1 from mm2 /m64.66 0F FB /rPSUBQ xmm1, xmm2/m128Subtract packed quadword integers in xmm1 from xmm2 /m128。PSUBSB/PSUBSW-Packed Subtract with SaturationOpcodeInstructionDescription0F E8 /rPSUBSB mm， mm/m64Subtract signed packed bytes

20、in mm/m64 from signed packed bytes in mm and saturate results.66 0F E8 /rPSUBSB xmm1， xmm2/m128Subtract packed signed byte integers in xmm2/m128 from packed signed byte integers in xmm1 and saturate results。0F E9 /rPSUBSW mm, mm/m64Subtract signed packed words in mm/m64 from signed packed words in m

21、m and saturate results。66 0F E9 /rPSUBSW xmm1， xmm2/m128Subtract packed signed word integers in xmm2/m128 from packed signed word integers in xmm1 and saturate results.PSUBUSB/PSUBUSWPacked Subtract Unsigned with Saturation OpcodeInstructionDescription0F D8 /rPSUBUSB mm, mm/m64Subtract unsigned pack

22、ed bytes in mm/m64 from unsigned packed bytes in mm and saturate result.66 0F D8 /rPSUBUSB xmm1, xmm2/m128Subtract packed unsigned byte integers in xmm2/m128 from packed unsigned byte integers in xmm1 and saturate result.0F D9 /rPSUBUSW mm, mm/m64Subtract unsigned packed words in mm/m64 from unsigne

23、d packed words in mm and saturate result.66 0F D9 /rPSUBUSW xmm1, xmm2/m128Subtract packed unsigned word integers in xmm2/m128 from packed unsigned word integers in xmm1 and saturate result。SUBPD-Packed Double-Precision FloatingPoint SubtractOpcodeInstructionDescription66 0F 5C /rSUBPD xmm1, xmm2/m1

24、28Subtract packed doubleprecision floatingpoint values in xmm2/m128 from xmm1.SUBPS-Packed SinglePrecision FloatingPoint SubtractOpcodeInstructionDescription0F 5C /rSUBPS xmm1 xmm2/m128Subtract packed single-precision floatingpoint values in xmm2/mem from xmm1。SUBSD-Scalar Double-Precision Floating-

25、Point SubtractOpcodeInstructionDescriptionF2 0F 5C /rSUBSD xmm1， xmm2/m64Subtracts the low double-precision floating-point numbers in xmm2/mem64 from xmm1。SUBSS-Scalar SingleFP SubtractOpcodeInstructionDescriptionF3 0F 5C /rSUBSS xmm1, xmm2/m32Subtract the lower single-precision floating-point numbe

26、rs in xmm2/m32 from xmm1。-PMULHUW-Packed Multiply High UnsignedOpcodeInstructionDescription0F E4 /rPMULHUW mm1, mm2/m64Multiply the packed unsigned word integers in mm1 register and mm2/m64， and store the high 16 bits of the results in mm1。66 0F E4 /rPMULHUW xmm1， xmm2/m128Multiply the packed unsign

27、ed word integers in xmm1 and xmm2/m128， and store the high 16 bits of the results in xmm1. PMULHW-Packed Multiply High SignedOpcodeInstructionDescription0F E5 /rPMULHW mm, mm/m64Multiply the packed signed word integers in mm1 register and mm2/m64, and store the high 16 bits of the results in mm1。66

28、0F E5 /rPMULHW xmm1, xmm2/m128Multiply the packed signed word integers in xmm1 and xmm2/m128, and store the high 16 bits of the results in xmm1。PMULLW-Packed Multiply Low SignedOpcodeInstructionDescription0F D5 /rPMULLW mm, mm/m64Multiply the packed signed word integers in mm1 register and mm2/m64,

29、and store the low 16 bits of the results in mm1。66 0F D5 /rPMULLW xmm1， xmm2/m128Multiply the packed signed word integers in xmm1 and xmm2/m128, and store the low 16 bits of the results in xmm1。PMULUDQ-Multiply Doubleword UnsignedOpcodeInstructionDescription0F F4 /rPMULUDQ mm1， mm2/m64Multiply unsigned doubleword integer in mm1 by unsigned doubleword integer in mm2/m64, and store the quadword result in mm1。66 OF F4 /rPMULUDQ xmm1, xmm2/m128Multiply packed unsigned doubleword integers in xmm1 by packed unsigned doubleword integers in xmm2/m128， and store the quadword results

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
特别说明：	请自助下载，系统不会自动发送文件的哦；如果您已付费，想二次下载，请登录后访问：我的下载记录
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？