Linux manages it’s physical memory in clever and often efficient ways – as a result it’s not uncommon to only think about how the memory in your system is being used when we run into performance issues. And this is where the frustration can begin – without fully understanding how memory is managed, it can be very difficult to answer some seemingly straight-forward questions like ‘How much free memory do I have?‘ or ‘How much memory is this process taking?‘. There are a lot of complications and as a result performance monitoring can be a challenge.
I was determined to fully understand precisely what the various memory figures report by the kernel mean and understand – on a practical level – the implications of Linux’s memory management on our performance sensitive applications. In this multi-part post we’ll attempt to debunk many of the mysteries of Linux’s memory.
MemTotal / Memory Available
We’ll start off by looking at the ‘MemTotal’ line – as reported by ‘cat /proc/meminfo’. For the purpose of these tutorial’s I’ll be using a Linux 2.6.35 kernel on a virtualized (with QEMU) ARM versatilepb board.
# cat /proc/meminfo MemTotal: 29372 kB ... # cat /proc/cmdline console=ttyAMA0 mem=32M loglevel=9 root=/dev/ram rdinit=/sbin/init
The first thing you may notice is that there is a slight difference between the memory we’ve allowed the kernel to use: 32 MB and the amount that the kernel is reporting as it’s total for use: 28.68 MB – in other words we seem to have already lost nearly 4 MB. To shed some light on this, we’ll examine the kernel log.
# dmesg ... On node 0 totalpages: 8192 free_area_init_node: node 0, pgdat c05d7dcc, node_mem_map c0613000 Normal zone: 64 pages used for memmap Normal zone: 0 pages reserved Normal zone: 8128 pages, LIFO batch:0 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 8128 Kernel command line: console=ttyAMA0 mem=32M loglevel=9 root=/dev/ram rdinit=/sbin/init PID hash table entries: 128 (order: -3, 512 bytes) Dentry cache hash table entries: 4096 (order: 2, 16384 bytes) Inode-cache hash table entries: 2048 (order: 1, 8192 bytes) Memory: 32MB = 32MB total Memory: 26264k/26264k available, 6504k reserved, 0K highmem Virtual kernel memory layout: vector : 0xffff0000 - 0xffff1000 ( 4 kB) fixmap : 0xfff00000 - 0xfffe0000 ( 896 kB) DMA : 0xffc00000 - 0xffe00000 ( 2 MB) vmalloc : 0xx2800000 - 0xc2000000 ( 344 MB) lowmem : 0xc0000000 - 0xc2000000 ( 32 MB) modules : 0xbf000000 - 0xc0000000 ( 16 MB) .init : 0xc0008000 - 0xc0311000 (3108 kB) .text : 0xc0311000 - 0xc05be000 (2740 kB) .data : 0xc05be000 - 0xc05d83e0 ( 105 kB)
The first observation we can make can be found on the following lines:
Memory: 32MB = 32MB total Memory: 26264k/26264k available, 6504k reserved, 0K highmem
We can determine that the kernel has picked up our request for it to use 32MB or memory. We can also conclude that of the total 32MB of memory – 26264k is available and 6504k of that has been ‘reserved’. For the sharp among us – you may also have noticed the ‘available’ memory reported here is different to MemTotal displayed from /proc/mem/info at the end of boot. This difference can be explained via my previous post on the Init Call Mechanism in Linux – though in a nut shell: The kernel is able to free memory previously occupied by initialisation code upon boot – as this code will never be executed again. This code is contained within the ‘.init’ section of the kernel image and it’s size is reported by the ‘Virtual kernel memory layout’ table shown above. Finally we can explain where our MemTotal and Memory Available figures come from:
Available = total - reserved 26264k = 32768k - 6504k MemTotal = total - reserved + .init 29372k = 32768k - 6504k + 3108k
We’ve seen from the kernel output that 6504k of memory gets ‘reserved’ and thus eats into our available memory – so what is this reserved memory? Taking a high level view – it tends to be memory which the kernel reserves/allocates for its own use and for which it never intends to release. To give you an example, I added some instrumentation into the code to find out exactly where this reserved memory goes:
6184 kb - arch/arm/mm/mmu.c:reserve_node_zero (memory occupied for the kernel iteself (kernel _stext > _end)) 56 kb - mm/page_alloc.c:alloc_node_mem_map 6 kb - fs/dcache.c:dcache_init_early (directory entry cache for the VFS sub system) 6 kb - arch/arm/mm/mmu.c:reserve_node_zero - memory from page tables 8 kb - kernel/pid.c:pidhash_init - PID hash table (hash table for PID lookups) 4 kb - arch/arm/mm/init.c:bootmem_init_node - memory for bootmem allocator bitmap 4 kb - arch/arm/mm/init.c:free_area_init_node 8 kb - arch/arm/mm/mmu.c:paging_init - zero page 8 kb - arch/arm/mm/mmu.c:create_mapping 4 kb - arch/arm/kernel/setup.c:request_standard_resources
The kernel, of course, occupies some memory and thus a large portion of the reserved space is used to cover itself (to prevent it’s pages being allocated). You’ll notice that this size is similar to that reported by ‘size’ of your vmlinux (proper) image. Some reserved memory is used for various caches or hash tables. And the remaining memory is used to initialise the structures required for the memory sub system and physical (zone) memory allocator.
There is a bit of a chicken and egg situation here – in order to support the complex memory allocators and subsystem – these allocators need their own memory allocator to allocate memory for themselves to initialise! In order to achieve this the kernel has a ‘bootmem‘ allocator just for this purpose. It’s life ends very early during boot once the physical memory allocator has initialised. In fact when we see the ‘Memory: 26264k/26264k available, 6504k reserved, 0K highmem’ line – this is the point where it passes its free pages over to the physical memory allocator. The ‘reserved’ memory is all the memory the bootmem allocator has allocated which hasn’t been freed at this point in time.
The one useful thing we can take from this is that we can maximise the available system RAM for user processes by reducing the amount of memory reserved by the kernel. And we can do this in the following two ways:
- Reduce kernel size – We can do this by removing un-necessary drivers and functionality (for example kallsyms, IKCONFIG)
- Ensure maximum use of the .init section – You should make sure that all initialisation code and data is correctly stored in the .init section such that it can be reclaimed after boot (via __init and __initdata).
Whilst trying to understand this lot, I came across some very useful documentation on how the kernel manages it’s memory. If you wish to read more try here, definitely here and some info on QEMU here.
By the way – In order to determine exactly what was using memory allocated by bootmem – I added some debug (including a call to dump_stack into the __reserve and __free function of the bootmem allocator). [© 2011 embedded-bits.co.uk]