MEMORY_TARGET
(SGA_TARGET) or HugePages – which to choose?
Oracle 10g introduced the SGA_TARGET and SGA_MAX_SIZE parameter
which dynamically resized many SGA components on demand. With 11g Oracle
developed this feature further to include the PGA as well – the feature is now
called “Automatic Memory Management” (AMM) which is enabled by setting the
parameter MEMORY_TARGET.
Automatic Memory Management makes use of the SHMFS – a pseudo
file system just like /proc. Every file created in the SHMFS is created in
memory.
Unfortunately using MEMORY_TARGET or MEMORY_MAX_SIZE together
with Huge Pages is not supported. You have to choose either Automatic Memory
Management or HugePages. In this post i´d like to discuss AMM and Huge Pages.
Automatic
Memory Management (AMM)
AMM – what it is
Automatic Memory Management was introduced with Oracle 11g
Release 1 and automated sizing and re-sizing of SGA and PGA.
With AMM activated there are two parameters of interest:
·
MEMORY_TARGET
·
MEMORY_MAX_TARGET
MEMORY_TARGET specifies the oracle system-wide usable amount of
memory while MEMORY_MAX_TARGET specifies the upper bound of which the DBA can
se MEMORY_TARGET to. If MEMORY_MAX_TARGET is not specified it defaults to
MEMORY_TARGET.
For AMM to work there is one important requirement: Your system
needs to support memory mapped files (on Linux typically mounted on /dev/shm).
·
Linux
·
Solaris
·
Windows
·
HP-UX
·
AIX
Advantages
·
SGA and PGA automatically adjusted
·
dynamically resizeable
·
Not swappable
Disadvantages
·
Only available on a limited number of plattforms
Bug
for Feature?
·
Does not work together with HugePages – so it is either AMM or
HugePages
Before deciding lets see what HugePage are:
HugePages
HugePages – what are they?
Here is one description:
Hugepages is a mechanism that allows the Linux kernel to utilise
the multiple page size capabilities of modern hardware architectures. Linux
uses pages as the basic unit of memory, where physical memory is partitioned
and accessed using the basic page unit. The default page size is 4096 Bytes in
the x86 [and x86_64 as well; note by Ronny Egner] architecture. Hugepages
allows large amounts of memory to be utilized with a reduced overhead. Linux
uses “Transaction Lookaside Buffers” (TLB) in the CPU architecture. These
buffers contain mappings of virtual memory to actual physical memory addresses.
So utilising a huge amount of physical memory with the default page size
consumes the TLB and adds processing overhead. The Linux kernel is able to
set aside a portion of physical memory to be able be addressed using a larger
page size. Since the page size is higher, there will be less overhead managing
the pages with the TLB.
(Source: http://unixfoo.blogspot.com/2007/10/hugepages.html)
Advantages
·
Huge Pages are not swappable; thus keeping your SGA locked in
memory
·
Overall memory performance is increased: Since there are less
pages to scan the memory performance is increased
·
kswapd needs far less resources: kswapd regularly scans the page
table for infrequent accessed pages which are a candidate for paging to disk.
If the page table is large kswapd will use a lot of resources in therm of CPU.
With Huge Pages enabled the page table is much smaller and Huge Pages are not
subject to swap so kswapd will use less resources.
·
Improves TLB hit ratio due to less entries thus increasing
memory performance further. The TLB is a small cache on cpu which stores
virtual to physical memory mappings
Disadvantages
·
Should be allocated at startup (allocating huge pages at runtime
is possible but will fail probably due to memory fragmentation; so it is
advisable to allocate them at startup)
·
dynamically allocating hugepages is buggy
Linux memory management with and without
Huge Pages
The following two figures try to illustrate how memory access
with and without huge pages work. As you can see every process in a virtual
memory operating system has it´s own process page table which points to a
system page table.For oracle processes running on linux it is not uncommon to
use the same physical memory regions due to accessing the SGA or the block
cache. This is illustrated in the figures below for the pages 2 and 3; they are
access by both processes.
Without huge pages memory is divided in chunks (called: “pages”)
of 4 KB on Intel x86 and Intel x86_64 (the actual size depends on the hardware
platform). Operating system offering virtual memory (as most modern operating
system do, for instance linux, solaris, hp-ux, aix and even windows) present
each process a continuous addressable memory (“virtual memory”) which consists
of memory pages which reside in memory or even on disk (“swap”). See here for
more information.
The following file taken from the wikipedia article mentioned
above illustrates the concept of virtual memory:
From the process’ view it looks like it is solely running on the
operating system. But it is not. In fact there are a lot of other processes
running.
As mentioned above memory is presented to processes in chunks of
4 kb – a so called “page”. The operating system manages a list of pages – the
“page table” – for each process and for the operating system as well which maps
the virtual memory to physical memory.
The page table can be seen as “memory for managing memory”.Each
page table entry (PTE) takes:
·
4 bytes of memory per page (4 kb) per process on 32-bit intel
and
·
8 byte of memory per page (4 kb) per process on 64-bit intel
So for a process to touch every page on a 64-bit system with 16
GB memory there are required:
·
for the memory referenced by the process: 4.2 million PTE
(~ 16 GB) with 8 byte each = 32 MB
·
PLUS for the system page table: 4.2 million PTE (~ 16 GB) with 8
byte each = 32 MB
·
equals to 64 MB for the whole page table as
counted in /proc/meminfo [PageTables]
On systems running oracle databases with a huge buffer cache and
highly active processes (= sessions) it is not uncommon for the processes to
reference the whole SGA and parts of the PGA after a while. Taken the example
above assuming a buffer cache of 16 GB this adds up to 32 MB per process for
the page table. 100 processes will consume 3.2 GB! Thats a lot of memory which
is not available and solely used to manage memory.
The size of the page table can be querien on linux as follows:
cat
/proc/meminfo | grep PageT
PageTables:
25096 kB
This command show the size of the Page Table (sum of system and
all process page tables). This amount of memory is unusable for all processes
and solely for managing the memory. On system with a lot of memory and a huge
sga/pga and many dedicated server connections the page table can be several
GByte in size!
The solution for this memory wastage is to implement huge pages.
Huge pages increase the memory chunks from 4 kb to 2 MB so a page table entry
still takes 8 bytes in 64-bit intel but references 2 mb – thats more efficenty
by a factor of 512!
So taken our example from above with a buffer cache of 16 GB
(16384 MB) referenced completely by a process the page table for the process
will be:
·
16384 GB referenced / 2 MB per page = 8192 PTE with each 8 byte
needed = 65536 Byte or 65 KB
I guess the advantage is obvious: 32 MB with no huge pages vs.
65 KB with huge pages!
Is
my system already using huge pages?
You can check this by doing:
cat
/proc/meminfo | grep Huge
There are three possibilities:
No huge pages configured at all
cat
/proc/meminfo | grep Huge
HugePages_Total:
0
HugePages_Free:
0
HugePages_Rsvd:
0
Hugepagesize:
2048 kB
Huge pages configured but not used
cat
/proc/meminfo | grep Huge
HugePages_Total:
3000
HugePages_Free:
3000
HugePages_Rsvd:
0
Hugepagesize:
2048 kB
Huge pages configured and used
[root@rac1
~]# cat /proc/meminfo | grep Huge
HugePages_Total:
3000
HugePages_Free:
2601
HugePages_Rsvd:
2290
Hugepagesize: 2048 kB
How
to configure huge pages?
1. edit /etc/sysctl.conf and add the following line:
vm.nr_hugepages
= <number>
Note: This parameter specifies the number of
huge pages. To get the total size you have to multiply “nr_hugepages” by “cat
/proc/meminfo | grep Hugepagesize”. For 64-bit Linux on x86_64 the size of one
Huge Page is 2 MB. So for a total amount of 2 GB or roughly 2000 MB you need
1000 Pages.
2. edit /etc/security/limits.conf and add the following lines:
<oracle
user> soft memlock
unlimited
<oracle
user> hard memlock
unlimited
3. Reboot the server and check
Some
real world examples
The following are two database systems not using Huge Pages.
Lets see how much memory is spend just for managing the memory:
System A
The following is an example of a Linux based database server
running two database instances with approx. 8 GB SGA in total. At the time of
sampling there are 444 dedicated server sessions connected.
MemTotal:
16387608 kB
MemFree:
105176 kB
Buffers:
21032 kB
Cached:
9575340 kB
SwapCached:
1036 kB
Active:
11977268 kB
Inactive:
2378928 kB
HighTotal:
0 kB
HighFree:
0 kB
LowTotal:
16387608 kB
LowFree:
105176 kB
SwapTotal:
8393952 kB
SwapFree:
8247912 kB
Dirty:
9584 kB
Writeback:
0 kB
AnonPages:
4754720 kB
Mapped:
7130088 kB
Slab:
256088 kB
CommitLimit:
16587756 kB
Committed_AS:
22134904 kB
PageTables:
1591860 kB
VmallocTotal:
34359738367 kB
VmallocUsed:
9680 kB
VmallocChunk:
34359728499 kB
HugePages_Total:
0
HugePages_Free:
0
HugePages_Rsvd:
0
Hugepagesize:
2048 kB
As you notice approx. 10% of all available memory is used for
the Page Tables.
System B
System B is a Linux based system with 128 GB memory running one
single database instance with 34 GB SGA and approx 400 sessions:
MemTotal:
132102884 kB
MemFree:
596308 kB
Buffers:
472620 kB
Cached:
111858096 kB
SwapCached:
138652 kB
Active:
65182984 kB
Inactive:
53195396 kB
HighTotal:
0 kB
HighFree:
0 kB
LowTotal:
132102884 kB
LowFree:
596308 kB
SwapTotal:
8393952 kB
SwapFree:
8112828 kB
Dirty:
568 kB
Writeback:
0 kB
AnonPages:
5901940 kB
Mapped:
33971664 kB
Slab:
915092 kB
CommitLimit:
74445392 kB
Committed_AS:
48640652 kB
PageTables:
12023792 kB
VmallocTotal:
34359738367 kB
VmallocUsed:
279912 kB
VmallocChunk:
34359456747 kB
HugePages_Total:
0
HugePages_Free:
0
HugePages_Rsvd:
0
Hugepagesize:
2048 kB
In this example Page Tables allocate 12 GB of memory. Thats a
lot of memory.
Some
laboratory examples
Test case no. 1
The first test case is quite simple. I started 100 dedicated
database connections with a delay of 1 second between each database connection.
Each session will log on and sleep for 200 seconds and log off. In the
operating system i will monitor the page table size with increasing and
decreasing database sessions.
foo.sql
script
exec
dbms_lock.sleep(200);
exit
doit.sql
script
doit.sql
for
i in {1..100}
do
echo
$i
sqlplus
system/manager@ora11p @foo.sql &
sleep
1
done
Results
without Huge Pages
The following output were observed without huge pages:
while
true; do cat /proc/meminfo | grep PageTable && sleep 3; done
PageTables:
58836 kB
PageTables:
60792 kB
PageTables:
62808 kB
PageTables:
64808 kB
PageTables:
66560 kB
PageTables:
68780 kB
PageTables:
70084 kB
PageTables:
72044 kB
PageTables:
72296 kB
PageTables:
74184 kB
PageTables:
76804 kB
PageTables:
79000 kB
PageTables:
80928 kB
PageTables:
82932 kB
PageTables:
84652 kB
PageTables:
86576 kB
PageTables:
88936 kB
PageTables:
90896 kB
PageTables:
94120 kB
PageTables:
96424 kB
PageTables:
98212 kB
PageTables:
100304 kB
PageTables:
101868 kB
PageTables:
103960 kB
PageTables:
105996 kB
PageTables:
108108 kB
PageTables:
109992 kB
PageTables:
111404 kB
PageTables:
113584 kB
PageTables:
114860 kB
PageTables:
116856 kB
PageTables:
118276 kB
PageTables:
120256 kB
PageTables:
120316 kB
PageTables:
120240 kB
PageTables:
120616 kB
PageTables:
120316 kB
PageTables:
121456 kB
PageTables:
121480 kB
PageTables:
121484 kB
PageTables:
121480 kB
PageTables:
121408 kB
PageTables:
121404 kB
PageTables:
121484 kB
PageTables:
121632 kB
PageTables:
121484 kB
PageTables:
121480 kB
PageTables:
120316 kB
PageTables:
120320 kB
PageTables:
120316 kB
PageTables:
120320 kB
PageTables:
121460 kB
PageTables:
121652 kB <==== PEAK AROUND
HERE
PageTables:
121500 kB
PageTables:
121540 kB
PageTables:
120096 kB
PageTables:
118136 kB
PageTables:
116188 kB
PageTables:
114192 kB
PageTables:
112236 kB
PageTables:
110240 kB
PageTables:
106556 kB
PageTables:
103792 kB
PageTables:
101820 kB
PageTables:
97916 kB
PageTables:
95900 kB
PageTables:
95120 kB
PageTables:
93104 kB
PageTables:
91848 kB
PageTables:
89852 kB
PageTables:
87860 kB
PageTables:
85896 kB
PageTables:
83868 kB
PageTables:
81940 kB
PageTables:
79944 kB
[...]
Results
with Huge Pages
while
true; do cat /proc/meminfo | grep PageTable && sleep 3; done
PageTables:
27112 kB
PageTables:
27236 kB
PageTables:
27280 kB
PageTables:
27320 kB
PageTables:
27344 kB
PageTables:
27368 kB
PageTables:
27396 kB
PageTables:
27416 kB
PageTables:
31028 kB
PageTables:
31412 kB
PageTables:
37668 kB
PageTables:
37912 kB
PageTables:
39964 kB
PageTables:
39756 kB
PageTables:
39740 kB
PageTables:
41312 kB
PageTables:
41436 kB
PageTables:
41508 kB
PageTables:
42192 kB
PageTables:
42196 kB
PageTables:
42528 kB
PageTables:
43036 kB
PageTables:
43232 kB
PageTables:
45616 kB
PageTables:
44852 kB
PageTables:
44540 kB
PageTables:
44552 kB
PageTables:
44728 kB
PageTables:
44748 kB
PageTables:
44764 kB
PageTables:
45936 kB
PageTables:
46992 kB
PageTables:
48128 kB
PageTables:
49264 kB
PageTables:
50312 kB
PageTables:
51056 kB
PageTables:
52244 kB
PageTables:
53496 kB
PageTables:
54256 kB
PageTables:
55296 kB
PageTables:
56440 kB
PageTables:
57712 kB
PageTables:
58240 kB
PageTables:
58824 kB
PageTables:
59612 kB
PageTables:
60656 kB
PageTables:
62468 kB
PageTables:
63592 kB
PageTables:
64700 kB
PageTables:
65820 kB
PageTables:
66916 kB
PageTables:
68344 kB
PageTables:
69144 kB
PageTables:
70260 kB
PageTables:
71044 kB
PageTables:
72172 kB
PageTables:
73224 kB
PageTables:
73684 kB
PageTables:
74736 kB
PageTables:
75828 kB
PageTables:
76952 kB
PageTables:
78068 kB
PageTables:
79180 kB
PageTables:
78604 kB
PageTables:
79384 kB
PageTables:
79384 kB
PageTables:
80064 kB
PageTables:
80092 kB
PageTables:
80096 kB
PageTables:
80096 kB
PageTables:
80096 kB
PageTables:
80084 kB
PageTables:
80096 kB
PageTables:
80092 kB
PageTables:
80096 kB <=== PEAK AROUND HERE
PageTables:
80096 kB
PageTables:
79408 kB
PageTables:
79400 kB
PageTables:
79400 kB
PageTables:
79396 kB
PageTables:
79392 kB
PageTables:
79392 kB
PageTables:
79392 kB
PageTables:
79392 kB
PageTables:
79392 kB
PageTables:
79396 kB
PageTables:
79396 kB
PageTables:
70260 kB
[...]
Observations
·
without huge pages page table size peaked at approx. 120 MB
·
with huge page page table size peaked at approx. 80 MB
In this simple test case using huge pages used only 66% of
memory.
Because of the choosen test case memory saving is not that big
because we did not referenced that much memory from the SGA in our processes.
Tests would be much clearer with larger parts (e.g. buffer cache) of the SGA
referenced.
Conclusion
HugePages offers some important advantages over AMM, for
instance:
·
minimizing cpu-cycles used for scanning memory pages which are
candidates for swapping thus freeing cpu-cycles for your database,
·
minimizing memory spend for managing memory
references
The latter point is the most important one. Especially systems
with large memory amounts dedicated to SGA and PGA and many database sessions
(> 100) will benefit from using Huge Pages. The more memory dedicated to SGA
and PGA and the more sessions connected with the database the larger the memory
savings from using Huge Pages will be.
From my point of view even if AMM simplifies memory management
by including both PGA and SGA the memory (and cpu) savings from using Huge
Pages are more important than just simlifying memory management.
So if you have an SGA larger than 16 GB and more than 100
sessions using Huge Pages is definetly worth trying. On system with only a few
sessions using Huge Pages will give some benefit as well but only by reduding
cpu-cycles needed for scanning the memory pages.
Hugepage setting itself: Nvm.nr_hugepages is the total number of hugepages to be allocated on the system.
The number of hugepages required can be determined by finding the maximum amount of SGA memory expected to be used by the system (the SGA_MAX_SIZE value normally, or the sum of them on a server with multiple instances) and dividing it by the size of the hugepages, 2048k, or 2M on Linux. To account for Oracle process overhead, add five more hugepages. So, if we want to allow 180G of hugepages, we would use this equation: (180*1024*1024/2048)+5.
This gives us 92165 hugepages for 180G. Note: I took a shortcut in this calculation, by using memory in MEG rather than the full page size.
To calculate the number in the way I initial described, the equation would be: (180*1024*1024*1024)/(2048*1024).
/etc/security/limits.conf
oracle soft memlock 230000000
oracle hard memlock 230000000
/etc/sysctl.conf
vm.nr_hugepages = 92165
kernel.shmmax = 93273528320+1g = 94347270144
kernel.shmall =
USE_LARGE_PAGES=only
SGA_TARGET=80G
SGA_MAX_SIZE=80G
MEMORY_MAX_TARGET=0
MEMORY_TARGET=0
verify huge page
cat /proc/meminfo | grep Huge
awk '/Hugepagesize:/{p=$2} / 0 /{next} / kB$/{v[sprintf("%9d GB %-s",int($2/1024/1024),$0)]=$2;next} {h[$0]=$2} /HugePages_Total/{hpt=$2} /HugePages_Free/{hpf=$2} {h["HugePages Used (Total-Free)"]=hpt-hpf} END{for(k in v) print sprintf("%-60s %10d",k,v[k]/p); for (k in h) print sprintf("%9d GB %-s",p*h[k]/1024/1024,k)}' /proc/meminfo|sort -nr|grep --color=auto -iE "^|( HugePage)[^:]*"