0%

二分写法

二分写法

将一个区间 [l,r] 分为 [l,m][m + 1,r] 两个区间,左右区间元素通过bool Less(int idx) 区分

std::lower_boundstd::upper_bound的区别在哪里呢?

  • std::lower_bound: Less( {左区间元素}, val ) == true
  • std::upper_bound: Less( {右区间元素}, val ) == true

他们都返回右区间起始元素 ( end 是哨兵)

Version 1

二分顾名思义,在有序线性表上每次尝试去除一半的解空间

难以处理的边界值是当 当前范围为2的时候,下边是带注释的直觉的写法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
T* lower_bound(vector<T>& arr, int val)
{
// [l,r] : possible result
int l = 0, r = arr.size() - 1;
while (l < r)
{
// About whether the mid should be closer to l or r
// this is mattered when l + 1 == r
int mid = l + (r - l) / 2;
// note since Less is operator<, we should always identify the less relation
// this would be LessT(arr[mid], val) for std::upper_bound
if (LessT(arr[mid], val))
l = mid + 1;
else
r = mid;
// take a look at the range shrinking process
// l = mid +1
// or
// r = mid
// since l = mid + 1 is always ok for range reduce
// so to keep r = mid always do so, the mid must be closer to l
}
if (!LessT(arr[l], val))
return arr.data() + l;
else
return arr.data() + arr.size();
}

Version 2

更加简洁优美的写法来自于 sentinel(哨兵)概念的引入, lower_bound 或者 upper_bound 为不存在的状态返回 arr.data() + arr.size(),这其实是一个天然哨兵

在上一个写法中,我们在末尾需要判断当前界值的有效性,这个判断摆在这里就有点多余。

一开始阻碍我将哨兵加入范围判断的思想来源于sentinel的不可访问性

但其实,不必真的访问哨兵

源码可以参考 msvc的实现

故有修改版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
T* lower_bound(vector<T>& arr, int val)
{
// [l,r] : possible retval
int l = 0, r = arr.size();
while (l < r)
{
int mid = l + (r - l) / 2;
if (LessT(arr[mid - 1], val))
l = mid + 1;
else
r = mid;
}
return arr.data() + l;
}

总结

二分的坑点有二:

  • mid 值的判断:这一步可以先不做处理,等到判断解空间分割的时候,通过观察子解空间必定缩小的条件来决定,mid应当靠近方向
  • 哨兵返回值:将尾元素值认为是绝对符合要求(大于或大于等于)的值

测试代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
#include <algorithm>
#include <cassert>
#include <iostream>
#include <vector>

using namespace std;
using T = int;
bool LessT(T a, T b)
{
return a < b;
}

struct BS
{
virtual T* lower_bound(vector<T>& arr, int val) = 0;
};
struct BS_std : public BS
{
T* lower_bound(vector<T>& arr, int val)
{
return std::lower_bound(arr.data(), arr.data() + arr.size(), val, LessT);
}
};

struct BS_V1 : public BS
{
T* lower_bound(vector<T>& arr, int val)
{
// [l,r] : possible result
int l = 0, r = arr.size() - 1;
while (l < r)
{
// About whether the mid should be closer to l or r
// this is mattered when l + 1 == r
int mid = l + (r - l) / 2;
// note since Less is operator<, we should always identify the less relation
// this would be LessT(arr[mid], val) for std::upper_bound
if (LessT(arr[mid], val))
l = mid + 1;
else
r = mid;
// take a look at the range shrinking process
// l = mid +1
// or
// r = mid
// since l = mid + 1 is always ok for range reduce
// so to keep r = mid always do so, the mid be closer to l would be nice.
}
if (!LessT(arr[l], val))
return arr.data() + l;
else
return arr.data() + arr.size();
}
};

struct BS_V2 : public BS
{
T* lower_bound(vector<T>& arr, int val)
{
// [l,r] : possible retval
int l = 0, r = arr.size();
while (l < r)
{
int mid = l + (r - l) / 2;
if (LessT(arr[mid - 1], val))
l = mid + 1;
else
r = mid;
}
return arr.data() + l;
}
};

int main()
{
int testRunCount = 1000;
int arrSize = 1000;
srand(time(0));

BS* bss[] = {
new BS_V1(),
new BS_V2(),
};
for (int r = 0; r < testRunCount; r++)
{
vector<T> data(arrSize);
for (int i = 0; i < arrSize; i++)
data[i] = rand() % (testRunCount * 2);
sort(data.begin(), data.end(), LessT);

for (int t = 0; t < 3; ++t)
{
int tval = rand() % (testRunCount * 2);
T* std_result = BS_std().lower_bound(data, tval);

for (auto bs : bss)
{
T* result = BS_V1().lower_bound(data, tval);
assert(std_result == result);
}
}
}
}