Add 'vardir' disk utilization collecting #296

avtikhon · 2021-04-30T12:47:51Z

Added for 'vardir' path disk disk utilization collecting routine which
parses file:

      /proc/diskstats

for the value milliseconds spent doing I/Os from the given device
and counts 'busy' value in percents, like iostat tool does for its
'%util' field. Check for more information linux kernel documentation
[1]. We use from the 'diskstats' file Field 10 which is in real at
the 12th position:

  Field  9 -- # of I/Os currently in progress
    The only field that should go to zero. Incremented as requests are
    given to appropriate struct request_queue and decremented as they
    finish.

  Field 10 -- # of milliseconds spent doing I/Os
    This field increases so long as field 9 is nonzero.

Decided to collect 'vardir' disk utilization per each run test in
standalone file and print to stdout after testing completed, up to
10 most biggest of it.

To count 'vardir' disk utilization for each test we need to know what
was the value of the disk bound on the test start and time when test
started to count later the utilization. On test finish we check the
test finish time and current disk bound and count disk utilization.
All of these checks and counts implemented as the new routine
'get_disk_bound_stat_busy()' in 'lib/utils.py' file.

To collect disk bounds and durations on test start/finish events used
'StatisticsWatcher' listener which has the following usable routines
for it:

  process_result()
    Using 'WorkerCurrentTask' queue message apperance to save the disk
    bounds values at the start of the tests.

    Using 'WorkerTaskResult' queue message appearance to save final
    disk bounds values for finished tests.

  print_statistics() - statistics printing to stdout after testing.
    Prints disk bounds for failed tests and up to 10 most used it tests.
    We use standalone 'statistics' directory in 'vardir' path to save
    'durations.log' file in it with disk bounds for each tested task
    in format:

      <test task name> <disk bound>

Needed for tarantool/tarantool-qa#97

[1]: https://www.kernel.org/doc/Documentation/iostats.txt

Added for 'vardir' path disk disk utilization collecting routine which parses file: /proc/diskstats for the value milliseconds spent doing I/Os from the given device and counts 'busy' value in percents, like iostat tool does for its '%util' field. Check for more information linux kernel documentation [1]. We use from the 'diskstats' file Field 10 which is in real at the 12th position: Field 9 -- # of I/Os currently in progress The only field that should go to zero. Incremented as requests are given to appropriate struct request_queue and decremented as they finish. Field 10 -- # of milliseconds spent doing I/Os This field increases so long as field 9 is nonzero. Decided to collect 'vardir' disk utilization per each run test in standalone file and print to stdout after testing completed, up to 10 most biggest of it. To count 'vardir' disk utilization for each test we need to know what was the value of the disk bound on the test start and time when test started to count later the utilization. On test finish we check the test finish time and current disk bound and count disk utilization. All of these checks and counts implemented as the new routine 'get_disk_bound_stat_busy()' in 'lib/utils.py' file. To collect disk bounds and durations on test start/finish events used 'StatisticsWatcher' listener which has the following usable routines for it: process_result() Using 'WorkerCurrentTask' queue message apperance to save the disk bounds values at the start of the tests. Using 'WorkerTaskResult' queue message appearance to save final disk bounds values for finished tests. print_statistics() - statistics printing to stdout after testing. Prints disk bounds for failed tests and up to 10 most used it tests. We use standalone 'statistics' directory in 'vardir' path to save 'durations.log' file in it with disk bounds for each tested task in format: <test task name> <disk bound> Part of tarantool/tarantool-qa#97 Closes #308 [1]: https://www.kernel.org/doc/Documentation/iostats.txt

Totktonada · 2021-06-07T08:51:08Z

We discussed it with @avtikhon. I hope Alexander will summarize our near and far goals in tarantool/tarantool-qa#97.

I'll just note here that, as far as I understand, we should measure disk saturation (ability to serve more requests) rather than utilization (a fraction of time spent on IO). It should give a better indicator, whether the disk IO is near to aggressive queueing.

Relevant materials:

http://www.brendangregg.com/usemethod.html
http://www.brendangregg.com/USEmethod/use-linux.html
https://www.percona.com/blog/2017/08/28/looking-disk-utilization-and-saturation/
https://www.kernel.org/doc/html/latest/admin-guide/iostats.html
https://www.kernel.org/doc/html/latest/filesystems/proc.html#proc-pid-io-display-the-io-accounting-fields (does not count mmaps?)

Aside of this, I doubt about splitting system-wide statistics per tests. I guess it'll too vague, when many tests are working in parallel. I propose to start with some binary indicator: write something to the terminal, when the system goes to a disk saturation.

@kyukhin asked the nice question: whether 1 GiB write() will be counted as one request or as many block sized ones? I don't know, let's look deeper.

Totktonada · 2021-06-07T09:01:33Z

Fast-written notes around iostat code (just to hold them somewhere):

aqu-sz (old name: avgqu-sz) is field 11 difference, divided on wall time. See the code. (It seems, / 1000 and * 100 in S_VALUE is something about time units conversion and is not relevant for us.)
await code is here. It is field 4 + 8 + 15 (time spent on read + write + discard request; flush requests are skipped, see the comment in the iostat code), divided by a difference in field 1 + 5 + 12 (difference in # of read + write + discard requests). Looks like a kind of average time for 1 request.

I don't know what is better to choose: queue size or await time and how to choose a threshold. We need to experiment around those metrics and elaborate an indicator that will work good for us.

Totktonada · 2021-06-07T09:20:23Z

lib/utils.py

+def get_vardir_device(vardir):
+    dev = 'unknown'
+    path = os.path.abspath(vardir)
+    while not os.path.ismount(path):
+        path = os.path.dirname(path)
+    try:
+        with open('/proc/mounts', 'r') as f:
+            for line in f:
+                if line.split()[1] == path:
+                    mount = line.split()[0]
+                    dev = mount.split('/')[2]
+    except (OSError, IOError, IndexError):
+        pass
+    return dev


I didn't check, but will it work as expected for me?

$ cat /proc/mounts /dev/root / ext4 rw,noatime 0 0 devtmpfs /dev devtmpfs rw,nosuid,relatime,size=10240k,nr_inodes=2033827,mode=755 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 tmpfs /run tmpfs rw,nodev,relatime,size=1627436k,mode=755 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0 fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0 selinuxfs /sys/fs/selinux selinuxfs rw,relatime 0 0 efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0 cgroup_root /sys/fs/cgroup tmpfs rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755 0 0 openrc /sys/fs/cgroup/openrc cgroup rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc 0 0 none /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0 cpuset /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0 cpu /sys/fs/cgroup/cpu cgroup rw,nosuid,nodev,noexec,relatime,cpu 0 0 cpuacct /sys/fs/cgroup/cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpuacct 0 0 blkio /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0 memory /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0 devices /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0 freezer /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0 net_cls /sys/fs/cgroup/net_cls cgroup rw,nosuid,nodev,noexec,relatime,net_cls 0 0 perf_event /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0 net_prio /sys/fs/cgroup/net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_prio 0 0 hugetlb /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0 pids /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0 mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /dev/shm tmpfs rw,nosuid,nodev,noexec,noatime,size=1048576k 0 0 tmpfs /var/tmp/portage tmpfs rw,nodev,noatime,size=4194304k 0 0 binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime 0 0 /dev/root /var/lib/docker ext4 rw,noatime 0 0 nsfs /run/docker/netns/default nsfs rw 0 0 $ cat /proc/diskstats 7 0 loop0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 1 loop1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 2 loop2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 3 loop3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 4 loop4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 5 loop5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 6 loop6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 7 loop7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 sda 3914740 2091334 209636333 3300777 20400957 25673178 8530616480 245749826 0 12550525 241975264 1065702 0 590682616 2552745 8 1 sda1 90 0 9728 53 0 0 0 0 0 44 25 0 0 0 0 8 2 sda2 28 0 2080 22 0 0 0 0 0 15 11 0 0 0 0 8 3 sda3 3914590 2091334 209623349 3300680 20339615 25673178 8530616480 245695226 0 12541502 241943738 1065702 0 590682616 2552745 9 0 md0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

avtikhon · 2021-06-18T03:10:15Z

We discussed it with @avtikhon. I hope Alexander will summarize our near and far goals in tarantool/tarantool-qa#97.

Right, we discussed it and found that current statistic is really good and fine for HDD based hosts. But for SSD this statistics doesn't show anything countable except that any of SSD parallel RW channels were in use, but not all, while SSD may have it and 16 and 64 and use only some of it doesn't feed and overload its use.

avtikhon requested review from Totktonada and kyukhin April 30, 2021 12:47

avtikhon force-pushed the avtikhon/gh-qa-97-disk-bound-collect branch 3 times, most recently from d3ff560 to 979fa99 Compare April 30, 2021 14:22

avtikhon self-assigned this Apr 30, 2021

avtikhon added the teamQ label Apr 30, 2021

avtikhon force-pushed the avtikhon/gh-qa-97-disk-bound-collect branch from 979fa99 to 59bed08 Compare May 24, 2021 06:12

avtikhon changed the title ~~Add vardir disk bound status collecting routine~~ Add 'vardir' disk utilization collecting May 24, 2021

avtikhon force-pushed the avtikhon/gh-qa-97-disk-bound-collect branch 7 times, most recently from f038227 to 24ec878 Compare May 31, 2021 10:34

avtikhon linked an issue May 31, 2021 that may be closed by this pull request

Collect statistics for 'vardisk' disk usage #308

Closed

avtikhon force-pushed the avtikhon/gh-qa-97-disk-bound-collect branch 3 times, most recently from d72bbb5 to 97e2a2d Compare June 2, 2021 10:39

avtikhon force-pushed the avtikhon/gh-qa-97-disk-bound-collect branch from 97e2a2d to 4edad84 Compare June 2, 2021 13:43

Totktonada reviewed Jun 7, 2021

View reviewed changes

avtikhon closed this Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'vardir' disk utilization collecting #296

Add 'vardir' disk utilization collecting #296

avtikhon commented Apr 30, 2021 •

edited

Loading

Totktonada commented Jun 7, 2021

Totktonada commented Jun 7, 2021

Totktonada Jun 7, 2021

avtikhon commented Jun 18, 2021

Add 'vardir' disk utilization collecting #296

Add 'vardir' disk utilization collecting #296

Conversation

avtikhon commented Apr 30, 2021 • edited Loading

Totktonada commented Jun 7, 2021

Totktonada commented Jun 7, 2021

Totktonada Jun 7, 2021

Choose a reason for hiding this comment

avtikhon commented Jun 18, 2021

avtikhon commented Apr 30, 2021 •

edited

Loading