本文共 11362 字,大约阅读时间需要 37 分钟。
Netfilter conntrack performance tweaking, v0.8 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Hervé EychenneThis document explains some of the things you need to know for netfilterconntrack (and thus NAT) performance tuning.Latest version of this document can be found at:http://www.wallfire.org/misc/netfilter_conntrack_perf.txt------------------------------------------------------------------------------There are two parameters we can play with:- the maximum number of allowed conntrack entries, which will be called CONNTRACK_MAX in this document- the size of the hash table storing the lists of conntrack entries, which will be called HASHSIZE (see below for a description of the structure)CONNTRACK_MAX is the maximum number of "sessions" (connection tracking entries)that can be handled simultaneously by netfilter in kernel memory.A conntrack entry is stored in a node of a linked list, and there are severallists, each list being an element in a hash table. So each hash table entry(also called a bucket) contains a linked list of conntrack entries.To access a conntrack entry corresponding to a packet, the kernel has to:- compute a hash value according to some defined characteristics of the packet. This is a constant time operation. This hash value will then be used as an index in the hash table, where a list of conntrack entries is stored.- iterate over the linked list of conntrack entries to find the good one. This is a more costly operation, depending on the size of the list (and on the position of the wanted conntrack entry in the list).The hash table contains HASHSIZE linked lists. When the limit is reached(the total number of conntrack entries being stored has reached CONNTRACK_MAX),each list will contain ideally (in the optimal case) aboutCONNTRACK_MAX/HASHSIZE entries.The hash table occupies a fixed amount of non-swappable kernel memory,whether you have any connections or not. But the maximum number of conntrackentries determines how many conntrack entries can be stored (globally into thelinked lists), i.e. how much kernel memory they will be able to occupy at most.This document will now give you hints about how to choose optimal values forHASHSIZE and CONNTRACK_MAX, in order to get the best out of the netfilterconntracking/NAT system.Default values of CONNTRACK_MAX and HASHSIZE============================================By default, both CONNTRACK_MAX and HASHSIZE get average values for"reasonable" use, computed automatically according to the amount ofavailable RAM.Default value of CONNTRACK_MAX------------------------------On i386 architecture, CONNTRACK_MAX = RAMSIZE (in bytes) / 16384 =RAMSIZE (in MegaBytes) * 64.So for example, a 32 bits PC with 512MB of RAM can handle 512*1024^2/16384 =512*64 = 32768 simultaneous netfilter connections by default.But the real formula is:CONNTRACK_MAX = RAMSIZE (in bytes) / 16384 / (x / 32)where x is the number of bits in a pointer (for example, 32 or 64 bits)Please note that:- default CONNTRACK_MAX value will not be inferior to 128- for systems with more than 1GB of RAM, default CONNTRACK_MAX value is limited to 65536 (but can of course be set to more manually).Default value of HASHSIZE-------------------------By default, CONNTRACK_MAX = HASHSIZE * 8. This means that there is an averageof 8 conntrack entries per linked list (in the optimal case, and whenCONNTRACK_MAX is reached), each linked list being a hash table entry(a bucket).On i386 architecture, HASHSIZE = CONNTRACK_MAX / 8 =RAMSIZE (in bytes) / 131072 = RAMSIZE (in MegaBytes) * 8.So for example, a 32 bits PC with 512MB of RAM can store 512*1024^2/128/1024 =512*8 = 4096 buckets (linked lists)But the real formula is:HASHSIZE = CONNTRACK_MAX / 8 = RAMSIZE (in bytes) / 131072 / (x / 32)where x is the number of bits in a pointer (for example, 32 or 64 bits)Please note that:- default HASHSIZE value will not be inferior to 16- for systems with more than 1GB of RAM, default HASHSIZE value is limited to 8192 (but can of course be set to more manually).Reading CONNTRACK_MAX and HASHSIZE==================================Current CONNTRACK_MAX value can be read at runtime, via the /proc filesystem.Before Linux kernel version 2.4.23, use:# cat /proc/sys/net/ipv4/ip_conntrack_maxSince Linux kernel version 2.4.23 (thus Linux 2.6 as well), use:# cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max (old /proc/sys/net/ipv4/ip_conntrack_max is then deprecated!)Current HASHSIZE is always available (for every kernel version) in syslogmessages, as the number of buckets (which is HASHSIZE) is printed there atip_conntrack initialization.Since Linux kernel version 2.4.24 (thus Linux 2.6 as well), current HASHSIZEvalue can be read at runtime with:# cat /proc/sys/net/ipv4/netfilter/ip_conntrack_bucketsModifying CONNTRACK_MAX and HASHSIZE====================================Default CONNTRACK_MAX and HASHSIZE values are reasonable for a typical host,but you may increase them on high-loaded firewalling-only systems.So CONNTRACK_MAX and HASHSIZE values can be changed manually if needed.While accessing a bucket is a constant time operation (hence the interestof having a hash of lists), keep in mind that the kernel has to iterate overa linked list to find a conntrack entry. So the average size of a linkedlist (CONNTRACK_MAX/HASHSIZE in the optimal case when the limit is reached)must not be too big. This ratio is set to 8 by default (when values arecomputed automatically).On systems with enough memory and where performance really matters, you canconsider trying to get an average of one conntrack entry per hash bucket,which means HASHSIZE = CONNTRACK_MAX.Setting CONNTRACK_MAX---------------------Conntrack entries are stored in linked lists, so the maximum number ofconntrack entries (CONNTRACK_MAX) can be easily configured dynamically.Before Linux kernel version 2.4.23, use:# echo $CONNTRACK_MAX > /proc/sys/net/ipv4/ip_conntrack_maxSince Linux kernel version 2.4.23 (thus Linux 2.6 as well), use:# echo $CONNTRACK_MAX > /proc/sys/net/ipv4/netfilter/ip_conntrack_maxwhere $CONNTRACK_MAX is an integer.Setting HASHSIZE----------------For mathematical reasons, hash tables have static sizes. So HASHSIZE must bedetermined before the hash table is created and begins to be filled.Before Linux kernel version 2.4.21, a prime number should be chosen for hashsize, ensuring that the hash table will be efficiently populated. Oddnon-prime numbers or even numbers are strongly discouraged, as the hashdistribution will be sub-optimal.Since Linux kernel version 2.4.21 (thus Linux 2.6 as well), conntrackuses jenkins2b hash algorithm which is happy with all sizes, but powerof 2 works best.If netfilter conntrack is statically compiled in the kernel, the hash tablesize can be set at compile time, or (since kernel 2.6) as a boot option withip_conntrack.hashsize=$HASHSIZEIf netfilter conntrack is compiled as a module, the hash table size can beset at module insertion, with the following command:# modprobe ip_conntrack hashsize=$HASHSIZEwhere $HASHSIZE is an integer.Since 2.6.14, it is possible to set hashsize dynamically at runtime,after boot and module load.Between 2.6.14 and 2.6.19 (included), use:# echo $HASHSIZE > /sys/module/ip_conntrack/parameters/hashsizeSince 2.6.20, use:# echo $HASHSIZE > /sys/module/nf_conntrack/parameters/hashsizeIdeal case: firewalling-only machine------------------------------------In the ideal case, you have a machine _just_ doing packet filtering and NAT(i.e. almost no userspace running, at least none that would have a growingmemory consumption like proxies, ...).The size of kernel memory used by netfilter connection tracking is:size_of_mem_used_by_conntrack (in bytes) = CONNTRACK_MAX * sizeof(struct ip_conntrack) + HASHSIZE * sizeof(struct list_head)where:- sizeof(struct ip_conntrack) can vary quite much, depending on architecture, kernel version and compile-time configuration. To know its size, see the kernel log message at ip_conntrack initialization time. sizeof(struct ip_conntrack) is around 300 bytes on i386 for 2.6.5, but heavy development around 2.6.10 make it vary between 352 and 192 bytes!- sizeof(struct list_head) = 2 * size_of_a_pointer On i386, size_of_a_pointer is 4 bytes.So, on i386, kernel 2.6.5, size_of_mem_used_by_conntrack is aroundCONNTRACK_MAX * 300 + HASHSIZE * 8 (bytes).If we take HASHSIZE = CONNTRACK_MAX (if we have most of the memory dedicatedto firewalling, see "Modifying CONNTRACK_MAX and HASHSIZE" section above),size_of_mem_used_by_conntrack would be around CONNTRACK_MAX * 308 byteson i386 systems, kernel 2.6.5.Now suppose your firewalling-only box has 512MB of RAM (a decent amountof memory considering today's memory prices). You have to spare a bit ofmemory for a few applications (syslog, etc.): 128MB should really be bigenough for a firewall in console mode, for example.The rest can be dedicated to conntrack entries.Then you could set both CONNTRACK_MAX and HASHSIZE approximately to:(512 - 128) * 1024^2 / 308 =~ 1307315 (instead of 32768 for CONNTRACK_MAX,and 4096 for HASHSIZE by default).Since Linux 2.4.21 (thus Linux 2.6 as well), hash algorithm is happy with"power of 2" sizes (it used to be a prime number before).So here we can set CONNTRACK_MAX and HASHSIZE to 1048576 (2^20), for example.This way, you can store about 32 times more conntrack entries than thedefault, and get better performance for conntrack entry access.- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Last changes on Jan 10, 2008Revision history:0.8 Make the "Ideal case: firewalling-only machine" paragraph a bit more clearer.0.7 Hashsize parameter can be set dynamically since Linux 2.6.14. Thanks to Christopher A. Craig for the suggestion.0.6 Hashsize parameter can be set at boot time with Linux 2.6. Thanks to Tobias Diedrich for pointing this out.0.5 Added further notice about the varying length of the conntrack structure.0.4 Since Linux 2.4.21, hash algorithm is happy with all sizes, not only prime ones. However, power of 2 is best.0.3 Various small precisions.0.2 Information about Linux kernel versions and corresponding /proc entries. (/proc/sys/net/ipv4/netfilter/ip_conntrack_{max,buckets}).0.1 Initial writing, largely based on my discussions with Harald Welte (netfilter maintainer) on the netfilter-devel mailing-list. Many thanks to him!
转载地址:http://prpyo.baihongyu.com/