设为首页收藏本站

LUPA开源社区

 找回密码
 注册
文章 帖子 博客
LUPA开源社区 首页 业界资讯 技术文摘 查看内容

为什么linux下多线程程序如此消耗虚拟内存

2015-2-3 21:16| 发布者: joejoe0332| 查看: 4649| 评论: 0|原作者: 陈斌的博客|来自: 陈斌的博客

摘要: 最近游戏已上线运营,进行服务器内存优化,发现一个非常奇妙的问题,我们的认证服务器(AuthServer)负责跟第三方渠道SDK打交道(登陆和充值),由于采用了curl阻塞的方式,所以这里开了128个线程,奇怪的是每次刚启 ...


3.刨根问底

  经过一番google和代码查看。终于知道了原来是glibc的malloc在这里捣鬼。glibc 版本大于2.11的都会有这个问题:在redhat 的官方文档上:https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/6.0_Release_Notes/compiler.html

Red Hat Enterprise Linux 6 features version 2.11 of glibc, providing many features and enhancements, including… An enhanced dynamic memory allocation (malloc) behaviour enabling higher scalability across many sockets and cores.This is achieved by assigning threads their own memory pools and by avoiding locking in some situations. The amount of additional memory used for the memory pools (if any) can be controlled using the environment variables MALLOC_ARENA_TEST and MALLOC_ARENA_MAX. MALLOC_ARENA_TEST specifies that a test for the number of cores is performed once the number of memory pools reaches this value. MALLOC_ARENA_MAX sets the maximum number of memory pools used, regardless of the number of cores.

The developer, Ulrich Drepper, has a much deeper explanation on his blog:http://udrepper.livejournal.com/20948.html

Before, malloc tried to emulate a per-core memory pool. Every time when contention for all existing memory pools was detected a new pool is created. Threads stay with the last used pool if possible… This never worked 100% because a thread can be descheduled while executing a malloc call. When some other thread tries to use the memory pool used in the call it would detect contention. A second problem is that if multiple threads on multiple core/sockets happily use malloc without contention memory from the same pool is used by different cores/on different sockets. This can lead to false sharing and definitely additional cross traffic because of the meta information updates. There are more potential problems not worth going into here in detail.

 

The changes which are in glibc now create per-thread memory pools. This can eliminate false sharing in most cases. The meta data is usually accessed only in one thread (which hopefully doesn’t get migrated off its assigned core). To prevent the memory handling from blowing up the address space use too much the number of memory pools is capped. By default we create up to two memory pools per core on 32-bit machines and up to eight memory per core on 64-bit machines. The code delays testing for the number of cores (which is not cheap, we have to read /proc/stat) until there are already two or eight memory pools allocated, respectively.

While these changes might increase the number of memory pools which are created (and thus increase the address space they use) the number can be controlled. Because using the old mechanism there could be a new pool being created whenever there are collisions the total number could in theory be higher. Unlikely but true, so the new mechanism is more predictable.

… Memory use is not that much of a premium anymore and most of the memory pool doesn’t actually require memory until it is used, only address space… We have done internally some measurements of the effects of the new implementation and they can be quite dramatic.

New versions of glibc present in RHEL6 include a new arena allocator design. In several clusters we’ve seen this new allocator cause huge amounts of virtual memory to be used, since when multiple threads perform allocations, they each get their own memory arena. On a 64-bit system, these arenas are 64M mappings, and the maximum number of arenas is 8 times the number of cores. We’ve observed a DN process using 14GB of vmem for only 300M of resident set. This causes all kinds of nasty issues for obvious reasons.

 

Setting MALLOC_ARENA_MAX to a low number will restrict the number of memory arenas and bound the virtual memory, with no noticeable downside in performance – we’ve been recommending MALLOC_ARENA_MAX=4. We should set this in hadoop-env.sh to avoid this issue as RHEL6 becomes more and more common.

  总结一下,glibc为了分配内存的性能的问题,使用了很多叫做arena的memory pool,缺省配置在64bit下面是每一个arena为64M,一个进程可以最多有 cores * 8个arena。假设你的机器是4核的,那么最多可以有4 * 8 = 32个arena,也就是使用32 * 64 = 2048M内存。 当然你也可以通过设置环境变量来改变arena的数量.例如export MALLOC_ARENA_MAX=1

  hadoop推荐把这个值设置为4。当然了,既然是多核的机器,而arena的引进是为了解决多线程内存分配竞争的问题,那么设置为cpu核的数量估计也是一个不错的选择。设置这个值以后最好能对你的程序做一下压力测试,用以看看改变arena的数量是否会对程序的性能有影响。

  mallopt(M_ARENA_MAX, xxx)如果你打算在程序代码中来设置这个东西,那么可以调用mallopt(M_ARENA_MAX, xxx)来实现,由于我们AuthServer采用了预分配的方式,在各个线程内并没有分配内存,所以不需要这种优化,在初始化的时候采用mallopt(M_ARENA_MAX, 1)将其关掉,设置为0,表示系统按CPU进行自动设置。


4.意外发现

  想到tcmalloc小对象才从线程自己的内存池分配,大内存仍然从中央分配区分配,不知道glibc是如何设计的,于是将上面程序中线程每次分配的内存从1k调整为1M,果然不出所料,再分配完64M后,仍然每次都会增加1M,由此可见,新版 glibc完全借鉴了tcmalloc的思想。

  忙了几天的问题终于解决了,心情大好,通过今天的问题让我知道,作为一个服务器程序员,如果不懂编译器和操作系统内核,是完全不合格的,以后要加强这方面的学习。

转自 http://blog.jobbole.com/83878/


酷毙

雷人

鲜花

鸡蛋

漂亮
  • 快毕业了,没工作经验,
    找份工作好难啊?
    赶紧去人才芯片公司磨练吧!!

最新评论

关于LUPA|人才芯片工程|人才招聘|LUPA认证|LUPA教育|LUPA开源社区 ( 浙B2-20090187 浙公网安备 33010602006705号   

返回顶部