1、SourcePoint M ;maxCoefficients = new SourcePoint*for (int j = 0; j rows; j+) for (inti = 0; i cols; i+) float sample = arrij; if (sample maxValues0.value) int q = 1;while ( sample maxValuesq.value q M ) maxValuesq-1 = maxValuesq; / shuffle thevalues back q+; maxValuesq-1.value = sample; maxValuesq-1
2、.point = Point(i,j); APoint struct is just two ints - x and y.点结构就是两个 ints - x 和 y。 This code basically does an insertion sort of the values coming in. maxValues0always contains the SourcePoint with the lowest value that still keeps it within the top Mvalues encoutered so far. This gives us a quick
3、and easy bailout if sample = maxValues,we dont do anything. The issue Im having is the shuffling every time a new better valueis found. It works its way all the way down maxValues until it finds its spot, shuffling allthe elements in maxValues to make room for itself.这段代码基本上是在插入某种类型的值。maxValues0始终包含
4、具有最低值的源点,该源点仍然保持在到目前为止附带的前 M 值中。如果样本 = maxValues,我们什么都不做,这就给了我们一个快速而简单的援助。我遇到的问题是每次找到一个新的更好的值时都要进行洗牌。它沿着 maxValues 一直向下移动,直到找到它的位置,将 maxValues 中的所有元素都拖放到一起,为自己腾出空间。 Im getting to the point where Im ready to look into SIMD solutions, or cacheoptimisations, since it looks like theres a fair bit of cac
5、he thrashing happening. Cutting thecost of this operation down will dramatically affect the performance of my overallalgorithm since this is called many many times and accounts for 60-80% of my overallcost.我已经准备好研究 SIMD 解决方案或缓存优化,因为看起来有相当一部分缓存抖动正在发生。降低这个操作的成本将极大地影响我的整体算法的性能,因为这被多次调用,占我总成本的 60-80%。 I
6、ve tried using a std:vector and make_heap, but I think the overhead for creating theheap outweighed the savings of the heap operations. This is likely because M and Ngenerally arent large. M is typically 10-20 and N 10-30 (NxN 100 - 900). The issue is thisoperation is called repeatedly, and it cant
7、be precomputed.我尝试过使用 std:vector 和 make_heap,但是我认为创建堆的开销超过了堆操作的节省。这可能是因为 M 和 N 一般都不大。M 通常是 10-20 和 N 10-30 (NxN 100 -900)。问题是这个操作被反复调用,并且不能预先计算。 I just had a thought to pre-load the first M elements of maxValues which may providesome small savings. In the current algorithm, the first M elements are
8、guaranteed to shufflethemselves all the way down just to initially fill maxValues.我刚想到预加载 maxValues 的前 M 个元素,这可以节省一些开销。在当前算法中,前 M 个元素被保证会一直拖到下端,以最初填充 maxValues。 Any help from optimization gurus would be much appreciated :)任何来自优化大师的帮助,我们都非常感激: 5A few ideas you can try. In some quick tests with N=100
9、and M=15 I was able to get itaround 25% faster in VC+ 2010 but test it yourself to see whether any of them help inyour case. Some of these changes may have no or even a negative effect depending on theactual usage/data and compiler optimizations.你可以尝试一些想法。在 N=100 和 M=15 的一些快速测试中,我可以在 vc+2010 中提高 25%
10、左右的速度,但是你可以自己测试一下,看看它们是否对你有帮助。根据实际的使用/数据和编译器优化,其中一些更改可能没有或甚至有负面影响。 Dont allocate a new maxValues array each time unless you need to. Using a stackvariable instead of dynamic allocation gets me +5%. 除非需要,否则不要每次都分配一个新的 maxValues 数组。使用堆栈变量而不是动态分配得到+5% Changingg_Sourceij to g_Sourceji gains you a very l
11、ittle bit (not as much as Id thought therewould be). 将 g_Sourceij更改为 g_Sourcejji会给您带来一点好处(没有我想象的那么多)。 Using the structure SourcePoint1 listed at the bottom gets me another fewpercent. 使用底部列出的 SourcePoint1 结构,我又得到了几个百分点。 The biggestgain of around +15% was to replace the local variable sample with g_S
12、ourceji. Thecompiler is likely smart enough to optimize out the multiple reads to the array which itcant do if you use a local variable. +15%的最大收益是用 g_Sourceji代替局部变量样本。编译器可能足够聪明,可以优化数组的多次读取,如果使用局部变量,编译器就不能这么做。 Trying a simple binary search netted me a small loss of a fewpercent. For larger M/Ns youd
13、 likely see a benefit. 尝试一个简单的二分搜索,我得到了百分之几的小损失。对于较大的 M/ n,你可能会看到好处。 If possible try tokeep the source data in arr sorted, even if only partially. Ideally youd want to generatemaxValues at the same time the source data is created. 如果可能的话,尝试保持arr中的源数据排序,纵然只是部分排序。理想情况下,您希望在创建源数据的同时生成 maxValues。 Look a
14、t how the data is created/stored/organized may giveyou patterns or information to reduce the amount of time to generate your maxValuesarray. For example, in the best case you could come up with a formula that gives you thetop M coordinates without needing to iterate and sort. 查看如何创建/存储/组织数据可以为您提供模式或信息,从而减少生成 maxValues数组的时间。例如,在最好的情况下,您可以得出一个公式,它给出了最上面的 M 坐标,而不需要迭代和排序。Code for above:上面的代码: struct S
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1