Showing posts with label digging. Show all posts
Showing posts with label digging. Show all posts

Tuesday, March 15, 2011

Sort memory capping out

I had a curios problem today, on a otherwise lightly loaded machine with plenty of memory left login as the Oracle user took tens of seconds to complete. Since it was the souring of a shell script that was stalling identify where the time was spent:

$ set -x
$ . /path/to/oraenv
...
+ nawk { print $2 }
+ sort
...


The whole delay was spent in sort(1) for no obvious reason. Truss shows the system call:
24.8167 sysconfig(_CONFIG_AVPHYS_PAGES) = 1500736
0.0011 sysconfig(_CONFIG_PAGESIZE) = 8192
0.0008 getpid() = 24178 [24177]


It took 24 seconds to get the number of available memory pages, a operation that worked fine in the global zone. The sysconfig source shows us that the call is very different for a global zone and for a memory capped zone:

if (!INGLOBALZONE(curproc) &&
curproc->p_zone->zone_phys_mcap != 0) {
pgcnt_t cap, rss, free;
vmusage_t in_use;
size_t cnt = 1;

cap = btop(curproc->p_zone->zone_phys_mcap);
if (cap > physinstalled)
return (freemem);

if (vm_getusage(VMUSAGE_ZONE, 1, &in_use, &cnt,
FKIOCTL) != 0)

If there is a physical memory capping set for the zone that is less than the amount of physical memory in the machine vm_getusage will be called, it will in turn look at every memory segment for every process , this can take quite a while on larger if the zone is a heavy allocator of memory, in this case the zone was using about 50GB of memory. This is not something you want to do every time a shell script calls $(sort). If you have ever used prstat -Z with large local zones you have seen the effects of this, it can take a long time.

Comment from the source:
"This file implements the getvmusage() private system call.
getvmusage() counts the amount of resident memory pages and swap
reserved by the specified process collective. A "process collective" is
the set of processes owned by a particular, zone, project, task, or user."


The source of the problem in sort was in utility.c:
size_t phys_total = sysconf(_SC_PHYS_PAGES) * sysconf(_SC_PAGESIZE);
It seems like no other utilities in solaris uses sysconf(_SCPHYS_PAGES) which is why we had no other problems.

The short time solution was to disable the physical memory cap for these zones:
# rcapadm -z zone01 -m 0