200.1. Measure and Troubleshoot Resource Usage
200.1 Measure and Troubleshoot Resource Usage
Weight: 6
Description: Candidates should be able to measure hardware resource and network bandwidth, identify and troubleshoot resource problems.
Key Knowledge Areas:
Measure CPU usage
Measure memory usage
Measure disk I/O
Measure network I/O
Measure firewalling and routing throughput
Map client bandwidth usage
Match / correlate system symptoms with likely problems
Estimate throughput and identify bottlenecks in a system including networking
The following is a partial list of the used files, terms and utilities:
iostat
netstat
w
top
sar
processes blocked on I/O
blocks out
vmstat
pstree, ps
Isof
uptime
swap
blocks in
Computers hang, People Nags and Applications run with lags. Handling all of this problems is on administrators' shoulders. It doesn't matter if we are talking about a single computer or we are some where in the cloud. We need some tools to explore the behavior of a system in order to find a solution. Performance problems have two main roots, Sometimes lack of system resources such as CPU, Memory, HDD or even Network cause them, Sometimes we as humankind make mistakes. When there is a paradox between what we expect, what is in need and how we configure computers hang.
Measure memory usage
RAM is like a gateway of a town, everything goes through RAM. If we have a process, it should be loaded in RAM first inorder to be served by CPU later. CPU feeds its Caches from RAM data. How about User activities? If user ask to read some data From HardDisk it is loaded in RAM and then user can work with that.

As it seeam RAM is pretty busy and always in need.Linux uses some Techniques to over come problem, to make it simple lets explain Memory Terminology
Item
Description
Page
The blocks that are used in memory, Ususally 4K size
Paging
Used to get memory from secondary storage to primary storage
Swap
emulated memory on HDD
Virtual Memory
Total allocatable memory, [known as process address space]Linux supports Tera Bytes of virtual memory and use TLB to allocate physial memory.
Translation Look aside Buffer(TLB)
Kind of cache which speedup translation between RAM and virtual memory, stored in RAM
Page cache
Recently used memory pages are stored here(not cpu cache ) Used for parked data
Dirty Cache
Data which are waiting to go to HDD from RAM(Works Like a buffer)
Buffers
Used for some Block devices and cache file system metadata.not really important
Lets explorer whats going inside with some tools and commands.
the command vmstat show virtual memory status and some other information. For periodic check with interval you can use vmstat 2 5. Lets try free command:
if you use -h option you can see human readable values, you can use -m for megabyte and -g for gigabyte view.
as you can see 241M is allocated to buff/cache. to clear buff/cache:
echo 3 > /proc/sys/vm/drop_caches ; free -h
this system just has 1 GigaByte of Ram, so obviously it starts using swap, Whats swappiness?
Swappiness is the kernel parameter that defines how much (and how often) your Linux kernel will copy RAM contents to swap. This parameter's default value is “60” and it can take anything from “0” to “100”. The higher the value of the swappiness parameter, the more aggressively your kernel will swap.
Monitoring CPU
Monitor and measure the load that you have put on your system by uptime command:
uptime command shows uptime obviously :), number of current logged users, and Load average in last 1 min, Last 5 min and last 15 min.The way that uptime calculate system load average is base on CPU Load average and Disk I/O.But the way it shows Load Average is base on # of CPU cores. So to find out % of Real System Load we need some calculations:
so if you are going beyond your CPU capability.You need to investigate more. Sometimes CPU is the bottleneck and some times Disk is bottleneck. top command is here to help us :
top command gives us information about system uptime, load avarage, and detailed info about processes. top also has some tricks:
Key
Description
press "1"
shows all cpu cores load
"shift" + "<" or ">"
sort top based on different culoms
"shift" + "p"
sort based CPU usage
"shift" + "m"
sort based memory usage
press "c"
shows absolute patch of process
press "z"
will running process in color
press "d" and then delay number
by default top command runs every 3.0 second
press "k" and insetr PID of process
Kill the process by using PID
press "r" and PID of process
to renice a process
"shift" + "w"
ro write top command results
press "q"
to exit
So top command is usefull for monitoring both CPU and RAM and also show us I/O wait processes.For more detailed lets take a look at third line of top command result. Linux has two Spaces from OS point of view, User space and System space , top command classified process this way :
did you see wa? it shows IO wait processes and it happens when the bottleneck is Disk. In this condition a process needs some data to be read from the Hard Disk, but Hard Disk is pretty busy and can not read Data When is needed. So poor process must be waited till Disk dose its jobs. Process is I/O Blocked and sleep :) from CPU point of view process goes in an "Unintrruptable Sleep".This Condition is so bad because we can't even kill that process :( so monitor processes with top command and always make sure that % of I/O waited process is Zero.
Measure disk I/O
If finally we have detected that the problem is Disk speed, We should monitor Disk I/O, top command is amazing but we need to know what program has caused the problem. we use iotop [need to be installed based on your distro, kernel >= 2.6 ]:
There is another tool which gives us less info but its quick and handy, Like vmstat that we have talked about, we have iostat which gives us a snap shot of current situation of Disk I/O: [iostat is a part of sysstat package and need to be installed]:
There is another tool which is sar, yes that is funny because that is the name of a bird in Farsi :) but here it is acronym for "System Activity Report". For running sar command you should define delay and time for it:
Some times we need to know what files are opened in out system. Weather we have performance issue or just for monitoring whats going on before rebooting the computer. lsof gives us " LiSt of Open Files" :
lsof gives us a very very long list of open files, you can see part of that above. But we usually use it with grep and also we can use it for a specific user lsof -u myuser.
Measure Network I/O
By spreading Internet and growth of personal Networks, the importance of Network grows. When we want explore what has caused problem in network, we should consider whole path, from beginning to the end and visa versa. That is Like talking about car traffic in big cities. Lets get familiar with some commands and tools.
iftop
Let add another member to top series commands, iftop. iftop gives you info about Sessions on your NIC and sort them based on transfer rate.
Like top, you can use < > keys to sort based on different columns, and you can prees 1 or 2 or 3 inorder to sort based on last 2secs, 10 secs or 40 secs network activities. also you can pick specific Interface by using iftop -i eth1
nload
Networl Load nload is another Handy tool, this time for monitoring current bandwidth.
If you have enjoyed then try ifstat for yourself.
For Testing Connection between two nodes of network we have nice iperf tool.
as both of nodes in my example are virtual machines, you can see amazing 2.95 Gbits/sec bandwidth. Who knows some day in future this might happened in real world :)
And finally, lets introduce a tiny tool to check you internet speed from command line:
apt install speedtest-cli
.
.
.
Last updated