SSD加速SATA盘之bcache策略

概述

在前面的文章中介绍了 SSD加速SATA盘之flashcache策略
一般我们也推荐选择稳定的flashcache策略来做SSD加速SATA盘,但在实践中,发现其在CentOS上编译安装还是很麻烦的,这里就抓紧研究实践了下bcache策略。

另外bcache使用可以用一块SSD来缓存多块SATA盘,对于使用中随时变动磁盘的应用场景来说,操作非常便捷。

并且官网说bcache的性能完全优于flashcache,参考:

http://www.accelcloud.com/2012/04/18/linux-flashcache-and-bcache-performance-testing/

Bcache

介绍:

https://wiki.archlinux.org/index.php/Bcache

https://bcache.evilpiepirate.org/

Bcache在Linux kernel 3.10版本加入了mainline,使用它只需要主机的kernel版本大于3.10即可。

bcache-tools 源码:https://evilpiepirate.org/git/bcache-tools.git

Ubuntu上安装

系统信息

1
2
3
4
5
6
7
8
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial
# uname -r
4.4.0-72-generic

编译安装

加载系统的bcache module:

1
2
3
# lsmod | grep bcache
# modprobe bcache
# lsmod | grep bcache

编译安装bcace-tools:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# apt-get install -y pkg-config libblkid-dev
# git clone https://evilpiepirate.org/git/bcache-tools.git
# cd bcache-tools
# make
cc -O2 -Wall -g `pkg-config --cflags uuid blkid` -c -o bcache.o bcache.c
bcache.c:125:9: warning: ‘crc_table’ is static but used in inline function ‘crc64’ which is not static
crc = crc_table[i] ^ (crc << 8);
^
cc -O2 -Wall -g `pkg-config --cflags uuid blkid` make-bcache.c bcache.o `pkg-config --libs uuid blkid` -o make-bcache
/tmp/ccW6rXtD.o: In function `write_sb':
/root/yangguanjun/bcache-tools/make-bcache.c:277: undefined reference to `crc64'
collect2: error: ld returned 1 exit status
<builtin>: recipe for target 'make-bcache' failed
make: *** [make-bcache] Error 1

网上搜索有这个bug的fix,如下:

https://www.spinics.net/lists/linux-bcache/msg02847.html

1
2
3
4
5
6
7
8
9
10
11
12
--- a/bcache.c
+++ b/bcache.c

@@ -115,7 +115,7 @@ static const uint64_t crc_table[256] = {
0x9AFCE626CE85B507ULL
};

-inline uint64_t crc64(const void *_data, size_t len)
+uint64_t crc64(const void *_data, size_t len)
{
uint64_t crc = 0xFFFFFFFFFFFFFFFFULL;
const unsigned char *data = _data;

按上面patch修改bcache.c后,编译安装正常。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# make
cc -O2 -Wall -g `pkg-config --cflags uuid blkid` -c -o bcache.o bcache.c
cc -O2 -Wall -g `pkg-config --cflags uuid blkid` make-bcache.c bcache.o `pkg-config --libs uuid blkid` -o make-bcache
cc -O2 -Wall -g `pkg-config --cflags uuid blkid` probe-bcache.c `pkg-config --libs uuid blkid` -o probe-bcache
cc -O2 -Wall -g -std=gnu99 bcache-super-show.c bcache.o `pkg-config --libs uuid` -o bcache-super-show
cc -O2 -Wall -g -c -o bcache-register.o bcache-register.c
cc bcache-register.o -o bcache-register
# make install
install -m0755 make-bcache bcache-super-show /usr/sbin/
install -m0755 probe-bcache bcache-register /lib/udev/
install -m0644 69-bcache.rules /lib/udev/rules.d/
install -m0644 -- *.8 /usr/share/man/man8/
install -D -m0755 initramfs/hook /usr/share/initramfs-tools/hooks/bcache
install -D -m0755 initcpio/install /usr/lib/initcpio/install/bcache
install -D -m0755 dracut/module-setup.sh /lib/dracut/modules.d/90bcache/module-setup.sh

CentOS上安装

因为bcache在kernel 3.10版本才进入主线,所以我们要保证CentOS的内核版本大于3.10,所以CenOS 6就不要尝试了,直接用新的CentOS 7吧。

系统信息

1
2
3
4
5
6
7
8
# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.3.1611 (Core)
Release: 7.3.1611
Codename: Core
# uname -r
3.10.0-693.17.1.el7.x86_64

编译安装

加载系统的bcache模块:

1
2
3
# lsmod | grep bcache
# modprobe bcache
modprobe: FATAL: Module bcache not found.

查看发现在kernel 3.10.0-693 版本里,默认是不编译bcache模块的,参考:

https://lakelight.net/2017/12/20/bcache-centos-7.html

这里就需要下载当前内核版本的源码,重新编译内核bcache模块,然后再加载bcache模块。

鉴于之前CentOS上安装flashcache时探索了内核版本的升级,所以这里在已经升级内核版本的机器上尝试,发现kernel 4.4版本默认已经编译好了bcache模块,所以在CentOS上使用cache时,还是建议升级到4.4版本内核。

1
2
3
4
5
6
7
# uname -r
4.4.115-1.el7.elrepo.x86_64

# lsmod | grep bcache
# modprobe bcache
# lsmod | grep bcache
bcache 233472 0

编译安装bcace-tools:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# yum install -y git pkgconfig libblkid-devel
# git clone https://evilpiepirate.org/git/bcache-tools.git
# cd bcache-tools/
# make
cc -O2 -Wall -g `pkg-config --cflags uuid blkid` -c -o bcache.o bcache.c
cc -O2 -Wall -g `pkg-config --cflags uuid blkid` make-bcache.c bcache.o `pkg-config --libs uuid blkid` -o make-bcache
cc -O2 -Wall -g `pkg-config --cflags uuid blkid` probe-bcache.c `pkg-config --libs uuid blkid` -o probe-bcache
cc -O2 -Wall -g -std=gnu99 bcache-super-show.c bcache.o `pkg-config --libs uuid` -o bcache-super-show
cc -O2 -Wall -g -c -o bcache-register.o bcache-register.c
cc bcache-register.o -o bcache-register
# make install
install -m0755 make-bcache bcache-super-show /usr/sbin/
install -m0755 probe-bcache bcache-register /lib/udev/
install -m0644 69-bcache.rules /lib/udev/rules.d/
install -m0644 -- *.8 /usr/share/man/man8/
install -D -m0755 initramfs/hook /usr/share/initramfs-tools/hooks/bcache
install -D -m0755 initcpio/install /usr/lib/initcpio/install/bcache
install -D -m0755 dracut/module-setup.sh /lib/dracut/modules.d/90bcache/module-setup.sh

Bcache使用

下面在CentOS机器上,介绍如何使用bcache。

硬盘信息

1
2
3
4
5
# fdisk -l | grep dev
...
Disk /dev/vdb: 107.4 GB, 107374182400 bytes, 209715200 sectors
Disk /dev/vdc: 107.4 GB, 107374182400 bytes, 209715200 sectors
Disk /dev/vdd: 53.7 GB, 53687091200 bytes, 104857600 sectors

这里使用三块盘:vdb、vdc、vdd。

其中vdb、vdc是容量型磁盘,vdd是性能型磁盘,实验用vdd通过bcache加速vdb和vdc。

使用步骤

与bcache相关的命令有:make-bcachebcache-super-show

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# make-bcache
Please supply a device
Usage: make-bcache [options] device
-C, --cache Format a cache device
-B, --bdev Format a backing device
-b, --bucket bucket size
-w, --block block size (hard sector size of SSD, often 2k)
-o, --data-offset data offset in sectors
--cset-uuid UUID for the cache set
--writeback enable writeback
--discard enable discards
--cache_replacement_policy=(lru|fifo)
-h, --help display this help and exit

# bcache-super-show
Usage: bcache-super-show [-f] <device>

创建backing device

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# make-bcache -B /dev/vdb
UUID: c602abab-bf5a-4b51-b7f6-1492d34239f4
Set UUID: 423e1910-f61a-45fa-8cdf-a23aca3b5eb8
version: 1
block_size: 1
data_offset: 16

# bcache-super-show /dev/vdb
sb.magic ok
sb.first_sector 8 [match]
sb.csum 1376BA45B5F924B [match]
sb.version 1 [backing device]

dev.label (empty)
dev.uuid c602abab-bf5a-4b51-b7f6-1492d34239f4
dev.sectors_per_block 1
dev.sectors_per_bucket 1024
dev.data.first_sector 16
dev.data.cache_mode 0 [writethrough]
dev.data.cache_state 1 [clean]

cset.uuid 4b60c663-7720-4dea-a17a-e9316078e796

创建cache device

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# make-bcache -C /dev/vdd
UUID: 050998ce-403c-45d7-a89b-492379644c1b
Set UUID: 4b60c663-7720-4dea-a17a-e9316078e796
version: 0
nbuckets: 102400
block_size: 1
bucket_size: 1024
nr_in_set: 1
nr_this_dev: 0
first_bucket: 1

# bcache-super-show /dev/vdd
sb.magic ok
sb.first_sector 8 [match]
sb.csum 68CDDDC337A2E296 [match]
sb.version 3 [cache device]

dev.label (empty)
dev.uuid 050998ce-403c-45d7-a89b-492379644c1b
dev.sectors_per_block 1
dev.sectors_per_bucket 1024
dev.cache.first_sector 1024
dev.cache.cache_sectors 104856576
dev.cache.total_sectors 104857600
dev.cache.ordered yes
dev.cache.discard no
dev.cache.pos 0
dev.cache.replacement 0 [lru]

cset.uuid 4b60c663-7720-4dea-a17a-e9316078e796

绑定backing device到cache device

1
2
3
4
5
6
7
8
9
# echo "4b60c663-7720-4dea-a17a-e9316078e796" > /sys/block/bcache0/bcache/attach
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdb 253:16 0 100G 0 disk
└─bcache0 251:0 0 100G 0 disk
vdc 253:32 0 100G 0 disk
vdd 253:48 0 50G 0 disk
└─bcache0 251:0 0 100G 0 disk

查看bcache相关信息

1、state

1
2
# cat /sys/block/bcache0/bcache/state
clean

state的几个状态:

  • no cache:该backing device没有attach任何caching device
  • clean:一切正常,缓存是干净的
  • dirty:一切正常,已启用回写,缓存是脏的
  • inconsistent:遇到问题,后台设备与缓存设备不同步

2、缓存数据量

1
2
# cat /sys/block/bcache0/bcache/dirty_data
0.0k

3、缓存模式

1
2
3
4
5
# cat /sys/block/bcache0/bcache/cache_mode
[writethrough] writeback writearound none
# echo writearound > /sys/block/bcache0/bcache/cache_mode
# cat /sys/block/bcache0/bcache/cache_mode
writethrough writeback [writearound] none

4、writeback信息

1
2
3
# cat /sys/block/bcache0/bcache/writeback_
writeback_delay writeback_percent writeback_rate_debug writeback_rate_p_term_inverse writeback_running
writeback_metadata writeback_rate writeback_rate_d_term writeback_rate_update_seconds

解绑backing device的cache device

1
2
3
4
5
6
7
8
9
10
11
# echo "697b764f-b3ef-4675-8761-d9518a12089c" > /sys/block/bcache0/bcache/detach
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdb 253:16 0 100G 0 disk
└─bcache0 251:0 0 100G 0 disk
vdc 253:32 0 100G 0 disk
vdd 253:48 0 50G 0 disk

# cat /sys/block/vdb/bcache/state
no cache

解绑后设备可以继续使用,只是没有cache device的加速

添加新backing device

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# make-bcache -B /dev/vdc
UUID: cc790e62-b3eb-4237-8265-dd1b619e15c0
Set UUID: edb2b1d0-9eeb-4a8a-b811-3dafd676fac0
version: 1
block_size: 1
data_offset: 16

# echo "4b60c663-7720-4dea-a17a-e9316078e796" > /sys/block/bcache1/bcache/attach
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdb 253:16 0 100G 0 disk
└─bcache0 251:0 0 100G 0 disk
vdc 253:32 0 100G 0 disk
└─bcache1 251:1 0 100G 0 disk
vdd 253:48 0 50G 0 disk
└─bcache1 251:1 0 100G 0 disk

使用bcache device

1
2
3
4
5
6
7
8
# mkfs.ext4 /dev/bcache1
# mount /dev/bcache1 /mnt/
# df -h
Filesystem Size Used Avail Use% Mounted on
...
/dev/bcache1 99G 61M 94G 1% /mnt

# umount /mnt/

注销bcache device

1
2
3
4
5
6
7
8
9
# echo 1 > /sys/block/vdc/bcache/stop
# echo 1 > /sys/block/vdb/bcache/stop
# echo 1 > /sys/fs/bcache/10057a1c-15a2-4631-a6d2-f4652d37645d/unregister
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdb 253:16 0 100G 0 disk
vdc 253:32 0 100G 0 disk
vdd 253:48 0 50G 0 disk

echo的数字不重要,可为任何值 ;)

快捷创建bcache device

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# make-bcache -B /dev/vdb /dev/vdc -C /dev/vdd
UUID: 09f971eb-6063-4f94-bdac-d7d7117c0e0f
Set UUID: 697b764f-b3ef-4675-8761-d9518a12089c
version: 0
nbuckets: 102400
block_size: 1
bucket_size: 1024
nr_in_set: 1
nr_this_dev: 0
first_bucket: 1
UUID: b45301fa-8932-4194-9518-ab681f37d9c9
Set UUID: 697b764f-b3ef-4675-8761-d9518a12089c
version: 1
block_size: 1
data_offset: 16
UUID: 2c0452f7-76de-4319-bd3c-2a73b4fa4b68
Set UUID: 697b764f-b3ef-4675-8761-d9518a12089c
version: 1
block_size: 1
data_offset: 16
[root@lvm-centos-tst ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdb 253:16 0 100G 0 disk
└─bcache0 251:0 0 100G 0 disk
vdc 253:32 0 100G 0 disk
└─bcache1 251:1 0 100G 0 disk
vdd 253:48 0 50G 0 disk
├─bcache0 251:0 0 100G 0 disk
└─bcache1 251:1 0 100G 0 disk

遇到的问题

make-bcache命令有提示

之前做过bcache的device,重做bcache有提示

1
2
3
4
5
6
7
8
# make-bcache -B /dev/vdb
Already a bcache device on /dev/vdb, overwrite with --wipe-bcache

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdb 253:16 0 100G 0 disk
└─bcache0 251:0 0 100G 0 disk

虽说有提示,但实际bcache device已经创建成果

针对上述情况,可以通过写device前一部分数据的方法解决

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
可以通过dd命令来清理device的前部分数据:
# dd if=/dev/zero of=/dev/vdb bs=1M count=100 oflag=direct
# dd if=/dev/zero of=/dev/vdd bs=1M count=100 oflag=direct

再创建bcache device,就不会报错了:
# make-bcache -B /dev/vdb
UUID: c602abab-bf5a-4b51-b7f6-1492d34239f4
Set UUID: 423e1910-f61a-45fa-8cdf-a23aca3b5eb8
version: 1
block_size: 1
data_offset: 16

# make-bcache -C /dev/vdd
UUID: 050998ce-403c-45d7-a89b-492379644c1b
Set UUID: 4b60c663-7720-4dea-a17a-e9316078e796
version: 0
nbuckets: 102400
block_size: 1
bucket_size: 1024
nr_in_set: 1
nr_this_dev: 0
first_bucket: 1

设备没umount就直接注销

没有umount,注销bcache device后,设备依旧可以使用,umount后设备消失

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdb 253:16 0 100G 0 disk
└─bcache0 251:0 0 100G 0 disk /mnt
vdc 253:32 0 100G 0 disk
vdd 253:48 0 50G 0 disk
└─bcache0 251:0 0 100G 0 disk /mnt

# mount | grep bcache
/dev/bcache0 on /mnt type ext4 (rw,relatime,seclabel,data=ordered)

# echo 1 > /sys/fs/bcache/4b60c663-7720-4dea-a17a-e9316078e796/unregister
# echo 0 > /sys/block/vdb/bcache/stop

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdb 253:16 0 100G 0 disk
└─bcache0 251:0 0 100G 0 disk /mnt
vdc 253:32 0 100G 0 disk
vdd 253:48 0 50G 0 disk

# ls /sys/block/vdb/bcache
ls: cannot access /sys/block/vdb/bcache: No such file or directory

# cd /mnt/
# touch tstfile
# umount /mnt/

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
vdb 253:16 0 100G 0 disk
vdc 253:32 0 100G 0 disk
vdd 253:48 0 50G 0 disk
支持原创