VMM page replacement tuning

Fuente: IBM

The memory management algorithm tries to keep the size of the free list and the percentage of real memory occupied by persistent segment pages within specified bounds.

These bounds, discussed in Real-memory management, can be altered with the vmo command, which can only be run by the root user. Changes made by this tool remain in effect until the next reboot of the system. To determine whether the vmo command is installed and available, run the following command:

# lslpp -lI bos.perf.tune

Executing the vmo command with the -a option displays the current parameter settings. For example:

# vmo -a
        cpu_scale_memp = 8
 data_stagger_interval = 161
                 defps = 1
   force_relalias_lite = 0
             framesets = 2
             htabscale = -1
     kernel_heap_psize = 4096
  large_page_heap_size = 0
          lgpg_regions = 0
             lgpg_size = 0
       low_ps_handling = 1
       lru_file_repage = 1
     lru_poll_interval = 0
             lrubucket = 131072
            maxclient% = 80
               maxfree = 1088
               maxperm = 3118677
              maxperm% = 80
                maxpin = 3355444
               maxpin% = 80
       mbuf_heap_psize = 4096
       memory_affinity = 1
         memory_frames = 4194304
         memplace_data = 2
  memplace_mapped_file = 2
memplace_shm_anonymous = 2
    memplace_shm_named = 2
        memplace_stack = 2
         memplace_text = 2
memplace_unmapped_file = 2
              mempools = 1
               minfree = 960
               minperm = 779669
              minperm% = 20
             nokilluid = 0
               npskill = 1536
             npsrpgmax = 12288
             npsrpgmin = 9216
           npsscrubmax = 12288
           npsscrubmin = 9216
               npswarn = 6144
      num_spec_dataseg = 0
             numpsblks = 196608
     page_steal_method = 0
          pagecoloring = n/a
       pinnable_frames = 3868256
 pta_balance_threshold = n/a
   relalias_percentage = 0
              rpgclean = 0
            rpgcontrol = 2
                 scrub = 0
            scrubclean = 0
 soft_min_lgpgs_vmpool = 0
      spec_dataseg_int = 512
      strict_maxclient = 1
        strict_maxperm = 0
              v_pinshm = 0
  vm_modlist_threshold = -1
       vmm_fork_policy = 1

The purpose of the free list is to keep track of real-memory page frames released by terminating processes and to supply page frames to requestors immediately, without forcing them to wait for page steals and the accompanying I/O to complete.

The minfree limit specifies the free-list size below which page stealing to replenish the free list is to be started. The maxfree parameter is the size above which stealing ends. In the case of enabling strict file cache limits, like the strict_maxperm or strict_maxclient parameters, the minfree value is used to start page stealing. When the number of persistent pages is equal to or less than the difference between the values of the maxfree and minfree parameters, with the strict_maxperm parameter enabled, or when the number of client pages is equal to or less than the difference between the values of the maxclient and minfree parameters, with the strict_maxclient parameter enabled, page stealing starts.

The objectives in tuning these limits are to ensure the following:

  • Any activity that has critical response-time objectives can always get the page frames it needs from the free list.
  • The system does not experience unnecessarily high levels of I/O because of premature stealing of pages to expand the free list.

The default values of the minfree and maxfree parameters depend on the memory size of the machine. The difference between the maxfree and minfree parameters should always be equal to or greater than the value of the maxpgahead parameter, if you are using JFS. For Enhanced JFS, the difference between the maxfree and minfree parameters should always be equal to or greater than the value of the j2_maxPageReadAhead parameter. If you are using both JFS and Enhanced JFS, you should set the value of the minfree parameter to a number that is greater than or equal to the larger pageahead value of the two file systems.

The minfree and maxfree parameter values are different if there is more than one memory pool. Memory pools were introduced in AIX® 4.3.3 for MP systems with large amounts of RAM. Each memory pool has its own minfree and maxfree values. Prior to AIX 5.3 the minfree and maxfree values shown by the vmo command are the sum of the minfree and maxfree values for all memory pools. Starting with AIX 5.3 and later, the values shown by vmo command are per memory pool. The number of memory pools can be displayed with vmo -L mempools. A less precise but more comprehensive tool for investigating an appropriate size for minfree is the vmstat command. The following is a portion of vmstat command output on a system where the minfree value is being reached:

# vmstat 1
kthr     memory             page              faults        cpu
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs   us sy id wa
 2  0 70668   414   0   0   0   0    0   0 178 7364  257  35 14  0 51
 1  0 70669   755   0   0   0   0    0   0 196 19119 272  40 20  0 41
 1  0 70704   707   0   0   0   0    0   0 190 8506  272  37  8  0 55
 1  0 70670   725   0   0   0   0    0   0 205 8821  313  41 10  0 49
 6  4 73362   123   0   5  36 313 1646   0 361 16256 863  47 53  0  0
 5  3 73547   126   0   6  26 152  614   0 324 18243 1248 39 61  0  0
 4  4 73591   124   0   3  11  90  372   0 307 19741 1287 39 61  0  0
 6  4 73540   127   0   4  30 122  358   0 340 20097 970  44 56  0  0
 8  3 73825   116   0  18  22 220  781   0 324 16012 934  51 49  0  0
 8  4 74309    26   0  45  62 291 1079   0 352 14674 972  44 56  0  0
 2  9 75322     0   0  41  87 283  943   0 403 16950 1071 44 56  0  0
 5  7 75020    74   0  23 119 410 1611   0 353 15908 854  49 51  0  0

In the above example output, you can see that the minfree value of 120 is constantly being reached. Therefore, page replacement occurs and in this particular case, the free list even reaches 0 at one point. When that happens, threads needing free frames get blocked and cannot run until page replacement frees up some pages. To prevent this situation, you might consider increasing the minfree and maxfree values. If you conclude that you should always have at least 1000 pages free per memory pool, run the following command:

# vmo -o minfree=1000 -o maxfree=1008

To make this a permanent change, include the -p flag:

# vmo -o minfree=1000 -o maxfree=1008 -p 

Starting with AIX 5.3, the default value of the minfree parameter is increased to 960 per memory pool and the default value of the maxfree parameter is increased to 1088 per memory pool.

The vmo -o mempools=number_of_memory_pools command allows you to change the number of memory pools that are configured at system boot time.

The mempools option is therefore not a dynamic change. It is recommended to not change this value without a good understanding of the behavior of the system and the VMM algorithms. You cannot change the mempools value on a UP kernel and on an MP kernel, the change is written to the kernel file.

This tunable should only be adjusted when advised by an IBM® service representative.

In AIX® 5.3, the LRU algorithm can either use lists or the page frame table. Prior to AIX 5.3, the page frame table method was the only method available. The list-based algorithm provides a list of pages to scan for each type of segment.

The following is a list of the types of segments:

  • Working
  • Persistent
  • Client
  • Compressed

If WLM is enabled, there are lists for classes as well. You can disable the list-based LRU feature and enable the original physical-address-based scanning with the page_steal_method parameter of the vmo command. The default value for the page_steal_method parameter is 0, which means that the list-based LRU feature is enabled and lists are used to scan pages. If the page_steal_method parameter is set to 1, the physical-address-based scanning is used. The value for the page_steal_method parameter takes effect after a bosboot and reboot. Note: With list-based scanning, buckets that are specified with the lrubucket parameter are still used, but buckets can overlap on multiple lists and include a count of the number of pages that were scanned.

Tuning with the lrubucket parameter can reduce scanning overhead on large memory systems.

The page-replacement algorithm scans memory frames looking for a free frame. During this scan, reference bits of pages are reset, and if a free frame has not been found, a second scan is done. In the second scan, if the reference bit is still off, the frame will be used for a new page (page replacement).

On large memory systems, there may be too many frames to scan, so now memory is divided up into buckets of frames. The page-replacement algorithm will scan the frames in the bucket and then start over on that bucket for the second scan before moving on to the next bucket. The default number of frames in this bucket is 131072 or 512 MB of RAM. The number of frames is tunable with the command vmo -o lrubucket=new value, and the value is in 4 KB frames.

The operating system takes advantage of the varying requirements for real memory by leaving in memory pages of files that have been read or written.

If the file pages are requested again before their page frames are reassigned, this technique saves an I/O operation. These file pages may be from local or remote (for example, NFS) file systems.

The ratio of page frames used for files versus those used for computational (working or program text) segments is loosely controlled by the minperm and maxperm values:

  • If percentage of RAM occupied by file pages rises above maxperm, page-replacement steals only file pages.
  • If percentage of RAM occupied by file pages falls below minperm, page-replacement steals both file and computational pages.
  • If percentage of RAM occupied by file pages is between minperm and maxperm, page-replacement steals only file pages unless the number of file repages is higher than the number of computational repages.

In a particular workload, it might be worthwhile to emphasize the avoidance of file I/O. In another workload, keeping computational segment pages in memory might be more important. To understand what the ratio is in the untuned state, use the vmstat command with the -v option.

# vmstat -v
              1048576 memory pages                                         
              1002054 lruable pages                                        
               478136 free pages                                           
                    1 memory pools                                         
                95342 pinned pages                                         
                 80.1 maxpin percentage                                    
                 20.0 minperm percentage                                   
                 80.0 maxperm percentage                                   
                 36.1 numperm percentage                                   
               362570 file pages                                           
                  0.0 compressed percentage                                
                    0 compressed pages                                     
                 35.0 numclient percentage                                 
                 80.0 maxclient percentage                                 
               350782 client pages                                         
                    0 remote pageouts scheduled                            
                   80 pending disk I/Os blocked with no pbuf               
                    0 paging space I/Os blocked with no psbuf              
                 3312 filesystem I/Os blocked with no fsbuf                
                    0 client filesystem I/Os blocked with no fsbuf         
               474178 external pager filesystem I/Os blocked with no fsbuf

The numperm value gives the number of file pages in memory, 362570. This is 36.1 percent of real memory.

If you notice that the system is paging out to paging space, it could be that the file repaging rate is higher than the computational repaging rate since the number of file pages in memory is below the maxperm value. So, in this case we can prevent computational pages from being paged out by lowering the maxperm value to something lower than the numperm value. Since the numperm value is approximately 36%, we could lower the maxperm value down to 30%. Therefore, the page replacement algorithm only steals file pages. If the lru_file_repage parameter is set to 0, only file pages are stolen if the number of file pages in memory is greater than the value of the minperm parameter.

The strict_maxperm option of the vmo command, when set to 1, places a hard limit on how much memory is used for a persistent file cache by making the maxperm value be the upper limit for this file cache.

When the upper limit is reached, the least recently used (LRU) is performed on persistent pages. Attention: The strict_maxperm option should only be enabled for those cases that require a hard limit on the persistent file cache. Improper use of the strict_maxperm option can cause unexpected system behavior because it changes the VMM method of page replacement.

The maxclient parameter represents the maximum number of client pages that can be used for buffer cache if the strict_maxclient parameter is set to 1, which is the default value.

The enhanced JFS file system uses client pages for its buffer cache. The limit on client pages in real memory is enforced using the maxclient parameter, which is tunable. If the value of the strict_maxclient parameter is set to 0, the maxclient parameter acts as a soft limit. This means that the number of client pages can exceed the value of the maxclient parameter, and if that happens, only client file pages are stolen rather than computational pages when the client LRU daemon runs.

The LRU daemon begins to run when the number of client pages is within the number of minfree pages of the maxclient parameter's threshold. The LRU daemon attempts to steal client pages that have not been referenced recently. If the number of client pages is lower than the value of the maxclient parameter but higher than the value of the minperm parameter, and the value of the lru_file_repage parameter is set to 1, the LRU daemon references the repage counters.

If the value of the file repage counter is higher than the value of the computational repage counter, computational pages, which are the working storage, are selected for replacement. If the value of the computational repage counter exceeds the value of the file repage counter, file pages are selected for replacement.

If the value of the lru_file_repage parameter is set to 0 and the number of file pages exceeds the value of the minperm parameter, file pages are selected for replacement. If the number of file pages is lower than the value of the minperm parameter, any page that has not been referenced can be selected for replacement.

If the number of client pages exceeds the value of the maxclient parameter, which is possible if the value of the strict_maxclient parameter equals 0, file pages are selected for replacement.

The maxclient parameter also affects NFS clients and compressed pages. Also note that the maxclient parameter should generally be set to a value that is less than or equal to the maxperm parameter, particularly in the case where the strict_maxperm parameter is enabled, or the value of the strict_maxperm is set to 1.