Ceph BackoffThrottle分析

概述

本文讨论下Ceph在Jewel中引入的 dynamic throttle:BackoffThrottle;分析后优化Ceph filestore,journal相关的throttle配置;

参考文章:

http://blog.wjin.org/posts/ceph-dynamic-throttle.html
https://fossies.org/linux/ceph/src/doc/dynamic-throttle.txt

BackoffThrottle

Jewel引入了dynamic的throttle,就是代码中BackoffThrottle,现在filestore和Journal都是使用它来做throttle的;

1
2
3
4
5
6
7
8
9
class FileStore
{
BackoffThrottle throttle_ops, throttle_bytes;
}

class JournalThrottle {
BackoffThrottle throttle;

}

BackoffThrottle定义和相关参数如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/**
* BackoffThrottle
*
* Creates a throttle which gradually induces delays when get() is called
* based on params low_threshhold, high_threshhold, expected_throughput,
* high_multiple, and max_multiple.
*
* In [0, low_threshhold), we want no delay.
*
* In [low_threshhold, high_threshhold), delays should be injected based
* on a line from 0 at low_threshhold to
* high_multiple * (1/expected_throughput) at high_threshhold.
*
* In [high_threshhold, 1), we want delays injected based on a line from
* (high_multiple * (1/expected_throughput)) at high_threshhold to
* (high_multiple * (1/expected_throughput)) +
* (max_multiple * (1/expected_throughput)) at 1.
*
* Let the current throttle ratio (current/max) be r, low_threshhold be l,
* high_threshhold be h, high_delay (high_multiple / expected_throughput) be e,
* and max_delay (max_muliple / expected_throughput) be m.
*
* delay = 0, r \in [0, l)
* delay = (r - l) * (e / (h - l)), r \in [l, h)
* delay = h + (r - h)((m - e)/(1 - h))
*/
class BackoffThrottle {

/// see above, values are in [0, 1].
double low_threshhold = 0;
double high_threshhold = 1;

/// see above, values are in seconds
double high_delay_per_count = 0;
double max_delay_per_count = 0;

/// Filled in in set_params
double s0 = 0; ///< e / (h - l), l != h, 0 otherwise
double s1 = 0; ///< (m - e)/(1 - h), 1 != h, 0 otherwise

/// max
uint64_t max = 0;
uint64_t current = 0;

}

filestore throttle举例分析

下面以使用BackoffThrottle的filestore throttle举例分析下其参数配置

filestore throttle的相关配置项

1
2
3
4
5
6
7
8
9
10
11
OPTION(filestore_expected_throughput_bytes, OPT_DOUBLE, 200 << 20)
OPTION(filestore_expected_throughput_ops, OPT_DOUBLE, 200)

OPTION(filestore_queue_max_bytes, OPT_U64, 100 << 20)
OPTION(filestore_queue_max_ops, OPT_U64, 50)

OPTION(filestore_queue_max_delay_multiple, OPT_DOUBLE, 0)
OPTION(filestore_queue_high_delay_multiple, OPT_DOUBLE, 0)

OPTION(filestore_queue_low_threshhold, OPT_DOUBLE, 0.3)
OPTION(filestore_queue_high_threshhold, OPT_DOUBLE, 0.9)

根据配置项初始化BackoffThrottle

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
bool BackoffThrottle::set_params(
double _low_threshhold,
double _high_threshhold,
double _expected_throughput,
double _high_multiple,
double _max_multiple,
uint64_t _throttle_max,
ostream *errstream)
{
low_threshhold = _low_threshhold;
high_threshhold = _high_threshhold;
high_delay_per_count = _high_multiple / _expected_throughput;
max_delay_per_count = _max_multiple / _expected_throughput;
max = _throttle_max;

if (high_threshhold - low_threshhold > 0) {
s0 = high_delay_per_count / (high_threshhold - low_threshhold);
} else {
low_threshhold = high_threshhold;
s0 = 0;
}

if (1 - high_threshhold > 0) {
s1 = (max_delay_per_count - high_delay_per_count)
/ (1 - high_threshhold);
} else {
high_threshhold = 1;
s1 = 0;
}
}

int FileStore::set_throttle_params()
{
stringstream ss;
bool valid = throttle_bytes.set_params(
g_conf->filestore_queue_low_threshhold,
g_conf->filestore_queue_high_threshhold,
g_conf->filestore_expected_throughput_bytes,
g_conf->filestore_queue_high_delay_multiple,
g_conf->filestore_queue_max_delay_multiple,
g_conf->filestore_queue_max_bytes,
&ss);

valid &= throttle_ops.set_params(
g_conf->filestore_queue_low_threshhold,
g_conf->filestore_queue_high_threshhold,
g_conf->filestore_expected_throughput_ops,
g_conf->filestore_queue_high_delay_multiple,
g_conf->filestore_queue_max_delay_multiple,
g_conf->filestore_queue_max_ops,
&ss);

}

获取delay值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
std::chrono::duration<double> BackoffThrottle::_get_delay(uint64_t c) const
{
if (max == 0)
return std::chrono::duration<double>(0);
double r = ((double)current) / ((double)max);
if (r < low_threshhold) {
return std::chrono::duration<double>(0);
} else if (r < high_threshhold) {
return c * std::chrono::duration<double>(
(r - low_threshhold) * s0);
} else {
return c * std::chrono::duration<double>(
high_delay_per_count + ((r - high_threshhold) * s1));
}
}

如上述函数描述,分四种情况计算delay值:

  1. max = 0时:永远返回 0
  2. current/max < low_threshhold时:返回 0
  3. low_threshhold <= current/max < high_threshhold时:计算一值
  4. high_threshhold <= current/max时:计算一值

backofthrottle

如图所示,在第一个区间的时候,也就是压力不大的情况下,delay值为0,是不需要wait的。当压力增大,x落入第二个区间后,delay值开始起作用,并且逐步增大, 当压力过大的时候,会落入第三个区间,这时候delay值增加明显加快,wait值明显增大,尽量减慢io速度,减缓压力,故而得名dynamic throttle。

默认情况下filestore throttle分析

filestore有bytes和ops两个throttle,这里以bytes为例分析:

默认情况下:

1
2
filestore_queue_high_delay_multiple = 0
filestore_queue_max_delay_multiple = 0

相当于BackoffThrottle中的值如下:

1
2
3
4
5
6
7
low_threshhold = 0.3
high_threshhold = 0.9
high_delay_per_count = 0
max_delay_per_count = 0
s0 = 0
s1 = 0
max = 100 << 20

所以默认配置下,是关闭dynamic delay的;

开启dynamic throttle

参考最早的代码,配置:

1
2
filestore_queue_high_delay_multiple = 2
filestore_queue_max_delay_multiple = 10

其他使用默认值是,BackoffThrottle中的值如下:

1
2
3
4
5
6
7
low_threshhold = 0.3
high_threshhold = 0.9
high_delay_per_count = 2/(200 << 20)
max_delay_per_count = 10/(200 << 20)
s0 = (2/(200 << 20))/0.6
s1 = (8/(200 << 20))/0.1
max = 100 << 20

则此时的delay分为如下几种:

c:op->bytes,即一次请求的数据量
current:当前filestore queue的数据量,初始化为 0,每次调用:throttle_bytes.get(o->bytes);{ current + = c;}

  1. current/max < low_threshhold时:
    此时 current < (30 << 20);delay = 0

  2. low_threshhold <= current/max < high_threshhold时:
    此时 (30 << 20) <= current < (90 << 20)
    delay = c ((current/max - 0.3) s0)
    a)current = 30 << 20时:delay = 0
    b)current = 90 << 20时:delay = c / (100 << 20)

  3. high_threshhold <= current/max时:
    此时 (90 << 20) < current
    delay = c (2/(200 << 20) + (current/max - 0.9) s1)
    a)current = 90 << 20时:delay = c / (100 << 20)
    b)current = 100 << 20时:delay = 5 * c / (100 << 20)

当前配置下的dynamic throttle

配置如下:

1
2
3
4
5
6
filestore_expected_throughput_bytes =  536870912    // 512M
filestore_queue_max_bytes= 1048576000 // 1000M
filestore_queue_low_threshhold = 0.6
filestore_queue_high_threshhold = 0.9 // 默认值
filestore_queue_high_delay_multiple = 2
filestore_queue_max_delay_multiple = 10

BackoffThrottle中的值如下:

1
2
3
4
5
6
7
low_threshhold = 0.6
high_threshhold = 0.9
high_delay_per_count = 2/(512 << 20)
max_delay_per_count = 10/(512 << 20)
s0 = (2/(512 << 20))/0.3
s1 = (8/(512 << 20))/0.1
max = 1000 << 20

则此时的delay分为如下几种:

  1. current/max < low_threshhold时:此时 current < (600 << 20);delay = 0

  2. low_threshhold <= current/max < high_threshhold时:
    此时 (600 << 20) <= current < (900 << 20)
    delay = c ((current/max - 0.6) s0)
    a)current = 600 << 20时:delay = 0
    b)current = 900 << 20时:delay = c / (256 << 20)

  3. high_threshhold <= current/max时:
    此时 (900 << 20) < current
    delay = c (2/(512 << 20) + (current/max - 0.9) s1)
    a)current = 900 << 20时:delay = c / (256 << 20)
    b)current = 1000 << 20时:delay = 5 * c / (256 << 20)

结论:这里的参数配置不是很合理;600M之前的delay都是0;后续随着current的增大,delay的值小于默认时候的值,可能会加大filestore的压力;

支持原创