cephfs的stripe配置

简介

cephfs支持配置file layout,可以控制file分配到指定的ceph rados objects上,这些信息是写在file/dir的xattrs上。

  • 文件的layout xattrs为:ceph.file.layout
  • 目录的layout xattrs为:ceph.dir.layout

目录中的文件和子目录默认继承父目录的layout配置

支持的layout配置项有:

  1. pool
    file的数据存储在哪个RADOS pool里
  2. namespace
    file的数据存储在RADOS pool里的哪个namespace里,但现在rbd/rgw/cephfs都还不支持
  3. stripe_unit
    条带的大小,以Bytes为单位
  4. stripe_count
    条带的个数

比如,stripe_unit=524288stripe_count=2,默认object size是4MB,则file写10MB的数据分配如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   --- object set 0 ---               --- object set 1 ---
/---------\ /---------\ /---------\ /---------\
| obj 0 | | obj 1 | | obj 2 | | obj 3 |
|=========| |=========| |=========| |=========|
| stripe | | stripe | | stripe | | stripe |
| unit 0 | | unit 1 | | unit 0 | | unit 1 |
|---------| |---------| |---------| |---------|
| stripe | | stripe | | stripe | | stripe |
| unit 2 | | unit 3 | | unit 2 | | unit 3 |
|---------| |---------| \=========/ \=========/
| stripe | | stripe |
| unit 4 | | unit 5 | osd 16 osd 20
|---------| |---------|
| stripe | | stripe |
| unit 6 | | unit 7 |
|---------| |---------|
| stripe | | stripe |
| unit 8 | | unit 9 |
|---------| |---------|
| stripe | | stripe |
| unit 10 | | unit 11 |
|---------| |---------|
| stripe | | stripe |
| unit 12 | | unit 13 |
|---------| |---------|
| stripe | | stripe |
| unit 14 | | unit 15 |
\=========/ \=========/

osd 25 osd 3

配置file stripe

以admin的user登录,配置dir的attribute

1
2
3
# mount -t ceph 10.10.2.1:6789:/ /mnt/tstfs2/
# mkdir /mnt/tstfs2/mike512K/
# setfattr -n ceph.dir.layout -v "stripe_unit=524288 stripe_count=8 object_size=4194304 pool=cephfs_data2" /mnt/tstfs2/mike512K/

配置目录的attribute,默认其子目录和文件都会集成该目录的

1
2
3
4
5
6
7
8
9
10
11
12
# touch /mnt/tstfs2/mike512K/tstfile
# getfattr -d -m ceph /mnt/tstfs2/mike512K
getfattr: Removing leading '/' from absolute path names
# file: mnt/tstfs2/mike512K
ceph.dir.entries="1"
ceph.dir.files="1"
ceph.dir.rbytes="4194304000"
ceph.dir.rctime="1495766140.09204154946"
ceph.dir.rentries="2"
ceph.dir.rfiles="1"
ceph.dir.rsubdirs="1"
ceph.dir.subdirs=“0"

验证file stripe

查看file的location

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# dd if=/dev/zero of= /mnt/tstfs2/mike512K/tstfile bs=4M count=100
# cephfs /mnt/tstfs2/mike512K/tstfile show_location
WARNING: This tool is deprecated. Use the layout.* xattrs to query and modify layouts.
location.file_offset: 0 // file的偏移
location.object_offset:0 // object的偏移
location.object_no: 0 // object的number
location.object_size: 4194304 // object size为4M
location.object_name: 10000002356.00000000 // object的name
location.block_offset: 0 // block的偏移
location.block_size: 524288 // block size为512k
location.osd: 0 // 存储在osd 0 上

# cephfs /mnt/tstfs2/mike512K/tstfile show_location -l 524288
WARNING: This tool is deprecated. Use the layout.* xattrs to query and modify layouts.
location.file_offset: 524288 // file的偏移
location.object_offset:0 // object的偏移
location.object_no: 1 // object的number
location.object_size: 4194304 // object size为4M
location.object_name: 10000002356.00000001 // object的name
location.block_offset: 0 // block的偏移
location.block_size: 524288 // block size为512k
location.osd: 24 // 存储在osd 24 上

查看osd上的object

1
2
3
4
5
# cd /var/lib/ceph/osd/ceph-0/current/
# find . -name "*10000002356.0000000*"
./14.126_head/10000002356.00000000__head_8CE99726__e
# ll -h ./14.126_head/10000002356.00000000__head_8CE99726__e
-rw-r--r-- 1 ceph ceph 4.0M May 26 10:35 ./14.126_head/10000002356.00000000__head_8CE99726__e

参考

http://docs.ceph.com/docs/jewel/architecture/#data-striping
http://docs.ceph.com/docs/jewel/cephfs/file-layouts/

支持原创