Mic 的技术点滴:关于linux, wince 6.0...还有其他牢骚 mic's another world http://www.ootroo.com/zblog

广告

我的最新日志

  • HP笔记本分区错误

    2008-5-12

    公司买了HP KL535AV的新机子,我想装了linux上去却发现用pqmagic8.0分区出现问题。

    打电话给Hp的客服,他给我一个电话,打过去却发现是红旗linux的技术支持;他一个劲的问我要红旗的序列号,我当来没有了啊,我用的是fedora.

    所以只要在网上找文章,有个朋友说,需要先作一次chkdsk /f.我试了一下,果然有效! 特此记录,希望遇到同样问题的朋友可以参考。

     

     

  • page allocation failed

    2008-5-09

    查询了一些资料据说有以下的解决方法

    1. increase /proc/sys/vm/min_free_kbytes

    2. patch the mm patch

    <piggin@cyberone.com.au>
    [PATCH] mm: tune the page allocator thresholds

    without patch:
          pages_min   pages_low   pages_high
    dma        4          8          12
    normal   234        468         702
    high     128        256         384

    with patch:
          pages_min   pages_low   pages_high
    dma       17         21          25
    normal   939       1173        1408
    high     128        160         192

    without patch:
                                 | GFP_KERNEL        | GFP_ATOMIC
    allocate immediately         |   9 dma, 469 norm |  9 dma, 469 norm
    allocate after waking kswapd |   5 dma, 234 norm |  3 dma,  88 norm
    allocate after synch reclaim |   5 dma, 234 norm |  n/a

    with patch:
                                 | GFP_KERNEL         | GFP_ATOMIC
    allocate immediately         |  22 dma, 1174 norm | 22 dma, 1174 norm
    allocate after waking kswapd |  18 dma,  940 norm |  6 dma,  440 norm
    allocate after synch reclaim |  18 dma,  940 norm |  n/a

    So the buffer between GFP_KERNEL and GFP_ATOMIC allocations is:

    2.6.8      | 465 dma, 117 norm, 582 tot = 2328K
    2.6.10-rc  |   2 dma, 146 norm, 148 tot =  592K
    patch      |  12 dma, 500 norm, 512 tot = 2048K

    Which is getting pretty good.

    kswap starts at:
    2.6.8     477 dma, 496 norm, 973 total
    2.6.10-rc   8 dma, 468 norm, 476 total
    patched    17 dma, 939 norm, 956 total

    So in terms of total pages, that's looking similar to 2.6.8.

    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds torvalds@osdl.org

    怎么patch

    Yes, as far as I know, the -mm patchset is for the vanilla (www.kernel.org) kernel. To apply a patch to your kernel that you downloaded form www.kernel.org, first make sure that the patch you downloaded is for the kernel version you downloaded. Then extract the kernel source and put the patch in the root kernel source directory (such as /usr/src/linux-2.6.17.1/). Now decompress the patch and enter this command, as root, from that directory:

    Code:
    patch -p1 < [TheNameOfThePatchFile]
    Note the "<". Forgetting to put that in will probably mess up your patch file.

    Happy patching!

    --Dane


     

  • 收藏的好文<<When Linux Runs Out of Memory>>,待汉化

    2008-5-08

    When Linux Runs Out of Memory

    Perhaps you rarely face it, but once you do, you surely know what's wrong: lack of free memory, or Out of Memory (OOM). The results are typical: you can no longer allocate more memory and the kernel kills a task (usually the current running one). Heavy swapping usually accompanies this situation, so both screen and disk activity reflect this.

    At the bottom of this problem lie other questions: how much memory do you want to allocate? How much does the operating system (OS) allocate for you? The basic reason of OOM is simple: you've asked for more than the available virtual memory space. I say "virtual" because RAM isn't the only place counted as free memory; any swap areas apply.

    Exploring OOM

    To begin exploring OOM, first type and run this code snippet that allocates huge blocks of memory:

    #include <stdio.h>
    #include <stdlib.h>

    #define MEGABYTE 1024*1024

    int main(int argc, char *argv[])
    {
    void *myblock = NULL;
    int count = 0;

    while (1)
    {
    myblock = (void *) malloc(MEGABYTE);
    if (!myblock) break;
    printf("Currently allocating %d MB\n", ++count);
    }

    exit(0);
    }

    Compile the program, run it, and wait for a moment. Sooner or later it will go OOM. Now compile the next program, which allocates huge blocks and fills them with 1:

    #include <stdio.h>
    #include <stdlib.h>

    #define MEGABYTE 1024*1024

    int main(int argc, char *argv[])
    {
    void *myblock = NULL;
    int count = 0;

    while(1)
    {
    myblock = (void *) malloc(MEGABYTE);
    if (!myblock) break;
    memset(myblock,1, MEGABYTE);
    printf("Currently allocating %d MB\n",++count);
    }
    exit(0);

    }

    Notice the difference? Likely, program A allocates more memory blocks than program B does. It's also obvious that you will see the word "Killed" not too long after executing program B. Both programs end for the same reason: there is no more space available. More specifically, program A ends gracefully because of a failed malloc(). Program B ends because of the Linux kernel's so-called OOM killer.

    The first fact to observe is the amount of allocated blocks. Assume that you have 256MB of RAM and 888MB of swap (my current Linux settings). Program B ended at:

    Currently allocating 1081 MB

    On the other hand, program A ended at:

    Currently allocating 3056 MB

    Where did A get that extra 1975MB? Did I cheat? Of course not! If you look closer on both listings, you will find out that program B fills the allocated memory space with 1s, while A merely simply allocates without doing anything. This happens because Linux employs deferred page allocation. In other words, allocation doesn't actually happen until the last moment you really use it; for example, by writing data to the block. So, unless you touch the block, you can keep asking for more. The technical term for this is optimistic memory allocation.

    Checking /proc/<pid>/status on both programs will reveal the facts. Here's program A:

    $ cat /proc/<pid of program A>/status
    VmPeak: 3141876 kB
    VmSize: 3141876 kB
    VmLck: 0 kB
    VmHWM: 12556 kB
    VmRSS: 12556 kB
    VmData: 3140564 kB
    VmStk: 88 kB
    VmExe: 4 kB
    VmLib: 1204 kB
    VmPTE: 3072 kB

    Here's program B, shortly before the OOM killer struck:

    $ cat /proc/<pid of program B>/status 
    VmPeak: 1072512 kB
    VmSize: 1072512 kB
    VmLck: 0 kB
    VmHWM: 234636 kB
    VmRSS: 204692 kB
    VmData: 1071200 kB
    VmStk: 88 kB
    VmExe: 4 kB
    VmLib: 1204 kB
    VmPTE: 1064 kB

    VmRSS deserves further explanation. RSS stands for "Resident Set Size." It explains how many of the allocated blocks owned by the task currently reside in RAM. Also note that before B reaches OOM, swap usage is almost 100 percent (most of the 888MB), while A uses no swap at all. It's clear that malloc() itself did nothing more than just preserve a memory area, nothing else.

    Another question also arises. "Even without touching the pages, why is the allocation limit 3056MB?" This exposes an unseen limit. For every application in a 32-bit system, there is 4GB of address space available for usage. The Linux kernel usually splits the linear address to provide 0 to 3GB for user space and 3GB to 4GB for kernel space. User space is a room where a task can do anything it wants, while kernel space is solely for the kernel. If you try to cross this 3GB border, you will get a segmentation fault.

    (Side note: There is a kernel patch that gives the whole 4GB to userspace, at the cost of some context-switching.)

    The conclusion is that OOM happens for two technical reasons:

    1. No more pages are available in the VM.
    2. No more user address space is available.
    3. Both #1 and #2.

    Thus the strategies to prevent those circumstances are:

    1. Know how large the user address space is.
    2. Know how many pages are available.

    When you ask for a memory block, usually by using malloc(), you're asking the runtime C library whether a preallocated block is available. This block's size must at least equal the user request. If there is already a memory block available, malloc() will assign this block to the user and mark it as "used." Otherwise, malloc() must allocate more memory by extending the heap. All requested blocks go in an area called the heap. Do not confuse it with the stack, because the stack stores local variable and function return addresses. These two sections have different jobs.

    Where is the heap located in the address space? The process address map can tell you exactly where:

    $ cat /proc/self/maps
    0039d000-003b2000 r-xp 00000000 16:41 1080084 /lib/ld-2.3.3.so
    003b2000-003b3000 r-xp 00014000 16:41 1080084 /lib/ld-2.3.3.so
    003b3000-003b4000 rwxp 00015000 16:41 1080084 /lib/ld-2.3.3.so
    003b6000-004cb000 r-xp 00000000 16:41 1080085 /lib/tls/libc-2.3.3.so
    004cb000-004cd000 r-xp 00115000 16:41 1080085 /lib/tls/libc-2.3.3.so
    004cd000-004cf000 rwxp 00117000 16:41 1080085 /lib/tls/libc-2.3.3.so
    004cf000-004d1000 rwxp 004cf000 00:00 0
    08048000-0804c000 r-xp 00000000 16:41 130592 /bin/cat
    0804c000-0804d000 rwxp 00003000 16:41 130592 /bin/cat
    0804d000-0806e000 rwxp 0804d000 00:00 0 [heap]
    b7d95000-b7f95000 r-xp 00000000 16:41 2239455 /usr/lib/locale/locale-archive
    b7f95000-b7f96000 rwxp b7f95000 00:00 0
    b7fa9000-b7faa000 r-xp b7fa9000 00:00 0 [vdso]
    bfe96000-bfeab000 rw-p bfe96000 00:00 0 [stack]

    This is an actual address space layout shown for cat, but you may get different results. It is up to the Linux kernel and the runtime C library to arrange them. Notice that recent Linux kernel versions (2.6.x) kindly label the memory area, but don't completely rely on them.

    The heap is basically free space not already given for program mapping and stack; thus, it narrows down the available address space. It's not a full 3GB, but it's 3GB minus everything else that's mapped. The bigger your program's code segment is, the less space you have for heap. The more dynamic libraries you link into your program, the less space you get for the heap. This is important to remember.

    How does the map for program A look when it can't allocate more memory blocks? With a trivial change to pause the program (see loop.c and loop-calloc.c) just before it exits, the final map is:

    0009a000-0039d000 rwxp 0009a000 00:00 0 ---------> (allocated block)
    0039d000-003b2000 r-xp 00000000 16:41 1080084 /lib/ld-2.3.3.so
    003b2000-003b3000 r-xp 00014000 16:41 1080084 /lib/ld-2.3.3.so
    003b3000-003b4000 rwxp 00015000 16:41 1080084 /lib/ld-2.3.3.so
    003b6000-004cb000 r-xp 00000000 16:41 1080085 /lib/tls/libc-2.3.3.so
    004cb000-004cd000 r-xp 00115000 16:41 1080085 /lib/tls/libc-2.3.3.so
    004cd000-004cf000 rwxp 00117000 16:41 1080085 /lib/tls/libc-2.3.3.so
    004cf000-004d1000 rwxp 004cf000 00:00 0
    005ce000-08048000 rwxp 005ce000 00:00 0 ---------> (allocated block)
    08048000-08049000 r-xp 00000000 16:06 1267 /test-program/loop
    08049000-0804a000 rwxp 00000000 16:06 1267 /test-program/loop
    0806d000-b7f62000 rwxp 0806d000 00:00 0 ---------> (allocated block)
    b7f73000-b7f75000 rwxp b7f73000 00:00 0 ---------> (allocated block)
    b7f75000-b7f76000 r-xp b7f75000 00:00 0 [vdso]
    b7f76000-bf7ee000 rwxp b7f76000 00:00 0 ---------> (allocated block)
    bf80d000-bf822000 rw-p bf80d000 00:00 0 [stack]
    bf822000-bff29000 rwxp bf822000 00:00 0 ---------> (allocated block)

    Six Virtual Memory Areas, or VMAs, reflect the memory request. A VMA is a memory area that groups pages with the same access permission and/or the same backing file. VMAs can exist anywhere within user space, as long as that space is available.

    Now you might think, "Why six? Why not a single big VMA containing all blocks?" There are two reasons. First, it is often impossible to find such a big "hole" to coalesce the blocks into a single VMA. Second, the program does not ask to allocate that approximately 3GB block all at once, but piece by piece. Thus, the glibc allocator has complete freedom to arrange the memory however it wants.

    Why do I mention available pages? Memory allocation occurs in page-sized granularity. This is not a limit of the operating systems, but a feature of the Memory Management Unit (MMU) itself. Pages have various sizes, but the normal setting for x86 is 4K. You can discover the page size manually by using getpagesize() or sysconf() (with the _SC_PAGESIZE parameter) libc functions. The libc allocator manages each page: slicing them into smaller blocks, assigning them to processes, freeing them, and so on. For example, if your program uses 4097 bytes total, you need to use two pages, even though in reality the allocator gives you somewhere between 4105 to 4109 bytes.

    With 256MB of RAM and no swap, you have 65536 available pages. Is that right? Not really. What you don't see is that some memory areas are in use by kernel code and data, so they're unavailable for any other need. There is also a reserved part of memory for emergencies or high-priority needs. dmesg reveals these numbers for you:

    $ dmesg | grep -n kernel
    36:Memory: 255716k/262080k available (2083k kernel code, 5772k reserved,
    637k data, 172k init, 0k highmem)
    171:Freeing unused kernel memory: 172k freed

    init refers to kernel code and data that is only necessary for the initialization stage; thus the kernel frees it when it is no longer useful. That leaves 2083 + 5772 + 637 = 8492KB. Practically speaking, 2123 pages are gone from the user's point of view. If you enable more kernel features or insert more kernel modules, you'll use up more pages for exclusive kernel use, so be wise.

    Another kernel internal data structure is the page cache. The page cache buffers data recently read from block devices. The more caching work you do, the fewer free pages you actually have--but they are not really occupied, as the kernel will reclaim them when memory is tight.

    From the kernel and hardware points of view, these are the important things to remember:

    1. There is no guarantee that allocated memory area is physically contiguous; it's only virtually contiguous.

      This "illusion" comes from the way address translation works. In a protected mode environment, users always work with virtual addresses, while hardware works with physical addresses. The page directory and page tables translate between these two. For example, two blocks with starting virtual addresses 0 and 4096 could map to the physical addresses 1024 and 8192.

      This makes allocation easier, because in reality it is unlikely to always get continuous blocks, especially for large requests (megabytes or even gigabytes). The kernel will look everywhere for free pages to satisfy the request, not just adjacent free blocks. However, it will do a little more work to arrange page tables so that they appear virtually contiguous.

      There is a price. Because memory blocks might be non-contiguous, sometimes the L1 and L2 caches go underused. Virtually adjacent memory blocks may be spread across different physical cache lines; this means slowing down (sequential) memory access.

    2. Memory allocation takes two steps: first extending the length of memory area and then allocating pages when needed. This is demand paging. During VMA extension, the kernel merely checks whether the request overlaps existing VMA and if the range is still inside user space. By default, it omits the check whether actual allocation can occur.

      Thus it is not strange if your program asks for a 1GB block and gets it, even if in reality you have only 16MB of RAM and 64MB of swap. This "optimistic" style might not please everybody, because you might get the false hope of thinking that there are still free pages available. The Linux kernel offers tunable parameters to control this overcommit behavīor.

    3. There are two type of pages: anonymous pages and file-backed pages. A file-backed page originates from mmap()-ing a file in disk, whereas an anonymous page is the kind you get when doing malloc(). It has no relationship with any files at all. When the RAM becomes tight, the kernel swaps out anonymous pages to swap space and flushes file-backed pages to the file to give room for current requests. In other words, anonymous pages may consume swap area while file-backed pages don't. The only exception is for files mmap()-ed using the MAP_PRIVATE flag. In this case, file modification occurs in RAM only.

      This is where the understanding of swap as RAM extension comes from. Clearly, accessing the page requires bringing it back into RAM.



    Inside the Allocator

    The real work actually takes place inside the glibc memory allocator. The allocator hands out blocks to the application, carving them from the heap that comes (however infrequently) from the kernel.

    The allocator is the manager, while the kernel is the worker. With this in mind, it's easy to understand that maximum efficiency comes from a good allocator, not from the kernel.

    glibc uses an allocator named ptmalloc. Wolfram Gloger created it as a modified version of the original malloc library created by Doug Lea. The allocator manages the allocated blocks in terms of "chunks." Chunks represent the memory block you actually requested, but not its size. There is an extra header added inside this chunk besides the user data.

    The allocator uses two functions to get a chunk of memory from the kernel:

    • brk() sets the end of the process's data segment.
    • mmap() creates a new VMA and passes it to the allocator.

    Of course, malloc() uses these functions only if there are no more free chunks in the current pool.

    The decision on whether to use brk() or mmap() requires one simple check. If the request is equal or larger than M_MMAP_THRESHOLD, the allocator uses mmap(). If it is smaller, the allocator calls brk(). By default, M_MMAP_THRESHOLD is 128KB, but you may freely change it by using mallopt().

    In the OOM context, how ptmalloc frees memory blocks is interesting. Blocks allocated via mmap() get freed via an unmap() call, and thus become completely released. Freeing blocks allocated via brk() means marking them as free, but they remain under the allocator's control. It can reassign free chunks to satisfy another malloc() if the request's size is less than or equal to the chunk's size. The allocator can consolidate multiple free chunks, as long as they are adjacent. It may even split a free chunk into smaller chunks to satisfy smaller future requests.

    This implies that a free chunk may go abandoned if the allocator cannot fit future requests within it. Failure to coalesce free chunks may also trigger faster OOM. This is usually an indication of moderate to bad memory fragmentation.

    Recovery

    Once an OOM situation occurs, now what? The kernel will terminate one process for sure. Why kill? This is the only way to stop further memory requests. The kernel can not assume there is a sophisticated mechanism inside the process to stop further requests automatically, so it has no other choice but to kill it.

    How does the kernel know exactly which process to kill? The answer lies inside mm/oom_kill.c of the Linux source code. This C code represents the so-called OOM killer of the Linux kernel. The function badness() give a score to each existing processes. The one with highest score will be the victim. The criteria are:

    1. VM size. This is not the sum of all allocated pages, but the sum of the size of all VMAs owned by the process. The bigger the VM size, the higher the score.
    2. Related to #1, the VM size of the process's children are important too. The VM size is cumulative if a process has one or more children.
    3. Processes with task priorities smaller than zero (niced processes) get more points.
    4. Superuser processes are important, by assumption; thus they have their scores reduced.
    5. Process runtime. The longer it runs, the lower the score.
    6. Processes that perform direct hardware access are more immune.
    7. The swapper (pid 0) and init (pid 1) processes, as well as any kernel threads immune from the list of potential victims.

    The process with the biggest score "wins" the election and the OOM killer will kill it very soon.

    The heuristic isn't perfect, but usually it works well for most situations. Criteria #1 and #2 clearly show that it is the VMA size that matters, not the number of actual pages a process has. You might think that measuring VMA size will trigger a false alarm, but luckily it doesn't. The badness() call occurs inside the page allocation functions when there are few free pages left and page frame reclamation fails, so the VMA size closely matches the number of pages owned by the process.

    Why not just count the actual number of pages? That would require more time and require the use of locks, thus making the procedure too expensive to make a fast decision. Knowing that OOM killer isn't perfect, you must be ready for a wrong kill.

    The kernel uses the SIGTERM signal to inform the target process that it should stop.

    How to Reduce OOM Risk

    The simple rule to avoid OOM risk is actually simple: don't allocate beyond the machine's current free space. However, many factors come into play, so there are further refinements to the strategy.

    Reduce Fragmentation by Properly Ordering Allocation

    There is no need to use any sophisticated allocator. You can reduce fragmentation by properly ordering memory allocation and deallocation. As holes easily happen, employ the LIFO strategy: the last one you allocate is the first you need to free.

    For example, instead of doing:

            void *a;
    void *b;
    void *c;
    ............
    a = malloc(1024);
    b = malloc(5678);
    c = malloc(4096);

    ......................

    free(b);
    b = malloc(12345);

    It's better to do:

            a = malloc(1024);
    c = malloc(4096);
    b = malloc(5678);
    ......................

    free(b);
    b = malloc(12345);

    This way, there won't be any hole between the a and c chunks. You can also consider realloc() to resize any existingmalloc()-ed blocks.

    Two example programs (fragmented1.c and fragmented2.c) demonstrate the effect of allocation rearrangement. Reports at the end of both programs give the number of bytes allocated by the system (kernel and glibc allocator) and the number of bytes actually used. For example, on kernel 2.6.11.1, with glibc 2.3.3-27 and executing without giving an explicit parameter, fragmented1 wasted 319858832 bytes (about 305 MB) while fragmented2 wasted 2089200 bytes (about 2MB). That's 152 times smaller!

    You can do further experiments by passing various numbers as the program parameter. This parameter acts as the request size of the malloc() call.

    Tweak Kernel's Overcommit behavīor

    You can change the behavīor of the Linux kernel through the /proc filesystem, as documented in Documentation/vm/overcommit-accounting in the Linux kernel's source code. You have three choices when tuning kernel overcommit, expressed as numbers in /proc/sys/vm/overcommit_memory:

    • 0 means that the kernel will use predefined heuristics when deciding whether to allow such an overcommit. This is the default.
    • 1 always overcommits. Perhaps you now realize the danger of this mode.
    • 2 prevents overcommit from exceeding a certain watermark. The watermark is also tunable through /proc/sys/vm/overcommit_ratio. Within this mode, the total commit can not exceed the swap space(s) size + overcommit_ratio percent * RAM size. By default, the overcommit ratio is 50.

    The default mode usually work quite fine in most situation, but mode #2 offers better protection toward overcommit. On the other hand, mode #2 requires you to predict carefully how much space all running applications need. You certainly don't want to see your application unable to get more memory chunks just because the limit is too strict. However, mode #2 is a best way to avoid having a program killed suddenly.

    Suppose that you have 256MB of RAM and 256MB of swap and you want to limit overcommit at 384MB. That means 256 + 50 percent * 256MB, so put 50 on /proc/sys/vm/overcommit_ratio.

    Check for NULL Pointer after Memory Allocation and Audit for Memory Leak

    This is a simple rule, but it sometimes goes omitted. By checking for NULL, at least you know that the allocator could extend the memory area, although there is no obvious guarantee that it will allocate the needed pages later. Usually, you need to bail out or delay the allocation for a moment, depending on your scenarios. Together with overcommit tunables, you have a decent tool to anticipate OOM because malloc() will return NULL if it believes that it cannot acquire free pages later.

    Memory leak is also a source of unnecessary memory consumption. A leaked memory block is one that the application no longer tracks, but that the kernel will not reclaim because, from the kernel's point of view, the task still has it under control. Valgrind is a nice tool to find out such occurrences inside your code without the need to re-code.

    Always Consult Memory Allocation Statistics

    The Linux kernel provides /proc/meminfo as a way to find complete information about memory conditions. This /proc entry is also an information source for utilities such as top, free, and vmstat.

    What you need to check is the free and reclaimable memory. The word "free" needs no further explanation, but what does "reclaimable" mean? It refers to buffers and page caches--the disk cache. They are reclaimable because, when memory is tight, the Linux kernel can simply flush them out back to the disk. These are file-backed pages. I've lightly edited this example of memory statistics:

       $ cat /proc/meminfo
    MemTotal: 255944 kB
    MemFree: 3668 kB
    Buffers: 13640 kB
    Cached: 171788 kB
    SwapCached: 0 kB
    HighTotal: 0 kB
    HighFree: 0 kB
    LowTotal: 255944 kB
    LowFree: 3668 kB
    SwapTotal: 909676 kB
    SwapFree: 909676 kB

    Based on this above output, the free virtual memory is MemFree + Buffers + Cached + SwapFree = 1098772 kB.

    I failed to find any formalized C (glibc) function to find out free (including reclaimable) memory space. The closest I found is by using get_avphys_pages() or sysconf() (with the _SC_AVPHYS_PAGES parameter). They only report the amount of free memory, not the free + reclaimable amount.

    That means to get precise information, you must programmatically parse the /proc/meminfo and calculate it by yourself. If you're lazy, take the procps source package as a reference on how to do it. This package contains tools such as ps, top, and free. It is available under the GPL.

    Experiments with Alternative Memory Allocators

    Different allocators yield different ways to manage memory chunks and to shrink, expand, and create virtual memory areas. Hoard is one example. Emery Berger from the University of Massachusetts wrote it as a high performance memory allocator. Hoard seems to work best for multi-threaded applications; it introduces the concept of per-CPU heap.

    Use 64-bit Platforms

    Users who need larger user address spaces can consider using 64-bit platforms. The Linux kernel no longer uses the 3:1 VM split for these machines. In other words, user space becomes quite large. It can be a good match for machines with more than 4GB of RAM.

    This has no connection to extended addressing schemes, such as Intel's Physical Address Extension (PAE), which allows a 32-bit Intel processor to address up to 64GB of RAM. This addressing deals with physical address, while in the virtual address context, the user space itself is still 3GB (assuming the 3:1 VM split). This extra memory is reachable, but not all mappable into the address space. Unmappable portions of RAM are unusable.

    Consider Packed Types on Structures

    Packed attributes can help to squeeze the size of structs, enums, and unions. This is a way to save more bytes, especially for array of structs. Here is a declaration example:

    struct test
    {
    char a;
    long b;
    } __attribute__ ((packed));

    The con for this action is that it makes certain field(s) unaligned and thus it costs more CPU cycles to access the field. "Aligned" here means the variable's address is a multiple of its data type's natural size. The net result is that, depending on the data access frequency, the runtime may get relatively slower. However, take into account page alignment and cache coherence.

    Use ulimit() for User Processes

    With ulimit -v, you can limit the address space a process can allocate with mmap(). When you reach the limit, all mmap(), and hence malloc(), calls will return 0 and the kernel's OOM killer will never start. This is most useful in a multi-user environment where you cannot trust all of the users and want to avoid killing random processes.

    Acknowledgement

    The author gives credits to several people for their assistance and help: Peter Ziljtra, Wolfram Gloger, and Rene Hermant. Mr. Gloger also contributed the ulimit() technique.

    References

    1. "Dynamic Storage Allocation: A Survey and Critical Review," by Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles. Proceeding 1995 International Workshop of Memory Management.
    2. Hoard: A Scalable Memory Allocator for Multithreaded Applications, by Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson
    3. "Once upon a free()" by Anonymous, Phrack Volume 0x0b, Issue 0x39, Phile #0x09 of 0x12.
    4. "Vudo: An Object Superstitiously Believed to Embody Magical Powers," by Michel "MaXX" Kaempf. Phrack Volume 0x0b, Issue 0x39, Phile #0x08 of 0x12.
    5. "Policy-Based Memory Allocation," by Andrei Alexandrescu and Emery Berger. C/C++ Users Journal.
    6. "Security of memory allocators for C and C++," by Yves Younan, Wouter Joosen, Frank Piessens, and Hans Van den Eynden. Report CW419, July 2005
    7. Lecture notes (CS360) about malloc(), by Jim Plank, Dept. of Computer Science, University of Tennessee.
    8. "Inside Memory Management: The Choices, Tradeoffs, and Implementations of Dynamic Allocation," by Jonathan Bartlett
    9. "The Malloc Maleficarum," by Phantasmal Phantasmagoria
    10. Understanding The Linux Kernel, 3rd edition, by Daniel P. Bovet and Marco Cesati. O'Reilly Media, Inc.
  • kmalloc, vmalloc分配的内存结构

    2008-5-08

    对于提供了MMU(存储管理器,辅助操作系统进行内存管理,提供虚实地址转换等硬件支持)的处理器而言,Linux提供了复杂的存储管理系统,使得进程所能访问的内存达到4GB。

      进程的4GB内存空间被人为的分为两个部分--用户空间与内核空间。用户空间地址分布从0到3GB(PAGE_OFFSET,在0x86中它等于0xC0000000),3GB到4GB为内核空间。

      内核空间中,从3G到vmalloc_start这段地址是物理内存映射区域(该区域中包含了内核镜像、物理页框表mem_map等等),比如我们使用的 VMware虚拟系统内存是160M,那么3G~3G+160M这片内存就应该映射物理内存。在物理内存映射区之后,就是vmalloc区域。对于 160M的系统而言,vmalloc_start位置应在3G+160M附近(在物理内存映射区与vmalloc_start期间还存在一个8M的gap 来防止跃界),vmalloc_end的位置接近4G(最后位置系统会保留一片128k大小的区域用于专用页面映射)

         kmalloc和get_free_page申请的内存位于物理内存映射区域,而且在物理上也是连续的,它们与真实的物理地址只有一个固定的偏移,因此存在较简单的转换关系,virt_to_phys()可以实现内核虚拟地址转化为物理地址:
        #define __pa(x) ((unsigned long)(x)-PAGE_OFFSET)
        extern inline unsigned long virt_to_phys(volatile void * address)
        {
             return __pa(address);
        }
    上面转换过程是将虚拟地址减去3G(PAGE_OFFSET=0XC000000)。

    与之对应的函数为phys_to_virt(),将内核物理地址转化为虚拟地址:
        #define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
        extern inline void * phys_to_virt(unsigned long address)
        {
             return __va(address);
        }
    virt_to_phys()和phys_to_virt()都定义在include\asm-i386\io.h中。

    而vmalloc申请的内存则位于vmalloc_start~vmalloc_end之间,与物理地址没有简单的转换关系,虽然在逻辑上它们也是连续的,但是在物理上它们不要求连续。

    我们用下面的程序来演示kmalloc、get_free_page和vmalloc的区别:
    #include <linux/module.h>
    #include <linux/slab.h>
    #include <linux/vmalloc.h>
    MODULE_LICENSE("GPL");
    unsigned char *pagemem;
    unsigned char *kmallocmem;
    unsigned char *vmallocmem;

    int __init mem_module_init(void)
    {
     //最好每次内存申请都检查申请是否成功
     //下面这段仅仅作为演示的代码没有检查
     pagemem = (unsigned char*)get_free_page(0);
     printk("<1>pagemem addr=%x", pagemem);

     kmallocmem = (unsigned char*)kmalloc(100, 0);
     printk("<1>kmallocmem addr=%x", kmallocmem);

     vmallocmem = (unsigned char*)vmalloc(1000000);
     printk("<1>vmallocmem addr=%x", vmallocmem);

     return 0;
    }

    void __exit mem_module_exit(void)
    {
     free_page(pagemem);
     kfree(kmallocmem);
     vfree(vmallocmem);
    }

    module_init(mem_module_init);
    module_exit(mem_module_exit);

      我们的系统上有160MB的内存空间,运行一次上述程序,发现pagemem的地址在0xc7997000(约3G+121M)、kmallocmem地址在0xc9bc1380(约3G+155M)、vmallocmem的地址在0xcabeb000(约3G+171M)处,符合前文所述的内存布局。

    来源: http://dev.yesky.com/412/2639912.shtml

  • linux学习网站

    2008-5-01

    http://www.linux-tutorial.info/index.php

    Welcome to the
    Linux Knowledge Base and Tutorial
    "The place where you learn Linux"

    Looking for an in-depth and easy-to-understand introduction to Linux? Then look no further!
    We don't just show you how to execute a handful of commands and use a few utilities. The Linux Tutorial goes beyond the basics, providing you with the knowledge necessary to get the most out of your Linux system.

    Jump right in by clicking here.

  • Want an in-depth introduction to Linux? Then check out hundreds of articles in the tutorial.
  • Need answers to specific questions? Then check out our forums.
  • Not sure what a particular term means? Then check out the glossary.
  • Want to make sure you learned everything? Then test yourself with more than 600 questions in our online quiz.

    For details on all of the features, check out the FAQ.

  • mobilelin创意大赛

    2008-4-30

    欢迎来到 YourMove,Moblin 开发人员的挑战竞技场。我们为您提供开发 Moblin 核心、基于 Linux 的应用程序所需的所有工具,以供娱乐或与 MID 设备通信。您现在进入的是一个神奇的世界,能够充分发挥您的创造力。您将保留知识产权而我们将帮助您获得市场领先地位。获胜者将获得丰厚的现金奖励、MID 设备以及免费参加世界任何地区的一个开源活动。请抓紧时间注册,竞赛将已经开始。我们开始吧!

    Moblin在MID软件系统中扮演三个关键角色

    1)    创建一个开发人员社区,关注创建和增强基于Linux的MID和其他设备所需的核心技术。最先创建Moblin时,这是该项目的首要目标。创建MID为目标的Linux操作系统版本的供应商将使用此核心技术。

    2)    创建一个开发人员社区,关注创建能给Moblin核心Linux堆栈增加价值或带来创新的新软件和服务。随着基于最新Intel® Centrino® Atom™处理技术的第一代MID的发布,Moblin项目的角色逐渐包含了这一关键功能。

    3)    定义一组标准和工具,使操作系统供应商(OSV)和ISV能够轻松确保跨多个供应商的Linux版本的二进制兼容性。
     
     

    http://moblincontest.csdn.net/

  • Gmail手机客户端

    2008-4-30

    在移动中使用 Gmail!


     

    在移动中使用 Gmail!

    现在您可以使用手机或移动设备通过网络浏览器查看 Gmail 邮件,随时随地阅读和回复 Gmail 邮件。

    • 免费。但您使用的无线服务可能仍需要付费,因此,请最好先与您的服务供应商核实清楚。
    • 智能。可以处理照片及 .pdf 文件等附件。
    • 还有下列超酷功能
      •自动优化所用手机的界面
      •打开邮件的附件,包括照片、Microsoft Word 文档及 .pdf 文件
      •如果发件人的电话号码储存在 Gmail 联系人列表中,您还可以通过电话回复发件人

     


    现在就试一试吧!

    有网络功能的手机及拥有无线数据服务的设备大都可以运行 Gmail 移动版。亲身体验一下!

    使用手机的网络浏览器访问 http://gmail.com

     

    每天的工作就是处理一堆邮件,突然间想到是不是可以用手机收发邮件,在等车,坐车。。。甚至上洗手间的时候可以收发邮件。 我还是比较喜欢用gmail的,搜索功能很棒,空间也很大。

    Gmail Mobile

    在我的E65上安装了一个。 安装过程很简单,用智能手机访问gmail.com/app即可下载

     

  • unbuntu在Nokia N810上的移植(转linuxdevices)

    2008-4-22

    看到这个新闻很高兴,因为我也一直期待unbuntu能够出arm版本。 这样一来,我们在手持设备上也能享受丰富的unbuntu软件了。

    稍后我会翻译一下


    A Nokia-sponsored project is porting Ubuntu Linux to the ARM architecture. The "Handheld Mojo" team has completed ARM builds of Feisty Fawn (dubbed "Frisky Firedrake") and Gutsy Gibbon ("Grumpy Griffin"), with Hardy Heron compilation starting soon.

    Spread the word:
    digg this story
    Mojo's Ubuntu port for ARM can be tested in QEMU, an open source emulator that supports various ARM architectures. Or, it can be run in a chrooted environment from an SD flash memory card installed on Nokia's Linux-based N8xx series Internet tablets. Since the ports are built for ARM Ltd.'s ARMv5EL and ARMv6EL-VFP architectures, they should also run on lots of other devices with ARM9 and ARM11 cores.

    With a few exceptions, most Ubuntu software compiles fine for ARM, according to Andrew Christian, the engineering fellow at Nokia who leads the effort. Notable exceptions are Java, Mono, G77 (a Fortran compiler), and the software that depends on them.

    Speaking at the Embedded Linux Conference in Mountain View this week, Christian showed an N800 tablet running the GIMP, an open source image processing package that he said worked well on the device.

    Christian told attendees that cross-compiling is much faster than native compilation. However, he said that most Debian (and by extension, Ubuntu) packages are not correctly set up for cross-build environments. For that reason, his team found it better to compile natively, because less human intervention is required.

    In setting up a native build environment, Mojo went to the extent of assembling its own single-board computer around an ARM-based Intel processor. Installed in 1U rackmount cases, and stacked up in a rack, the boards can collectively compile the 25,000 binaries comprising a full Ubuntu distribution (some packages build more than one object file) in about 10 days, Christian said. He commented that cooling fans installed in the cases were "probably overkill."

    To bootstrap a native ARM development toolchain, Christian used the ARM EABI port contributed to Debian in early 2007 by single-board computer vendor ADS. This saved considerable time, he said.

    Christian also said he thought Debian should change how it packages source code for ARM's several variations. Instead of treating each as a completely separate architecture, the project should use the Deb package format's directory structure to organize sub-architectures, and the architecture field in the format's meta-data to specify where the package ought to build, with "ARM-all" being one possible option.

    In an interview with LinuxDevices, Christian said that his team looked forward to creating more powerful native ARM build systems. In particular, he was encouraged to try commodity ARM-based NAS servers that could be modified to accept up to 2GB of RAM, according to reports from hobbyists around the Internet. The current Mojo boards top out at 256MB, and become memory-bound building large packages like KDE, he said.

    As an alternative to ARM-based machines, Mojo is also testing x86 servers with QEMU-ARM emulation software. QEMU is reportedly faster than real ARM hardware when run on newer x86-based PCs.

    Besides Christian, other key developers working on the port include Brian Avery, veteran of HP's iPaq Linux port, and George France, former maintainer of the Linux kernel for the Alpha architecture.
  • 社会新鲜人

    2008-4-22

        公司里新进了一个实习生,坐在我的斜对角。 王兄正在对他进行辅导。
        我记得我刚工作的时候,同事们见到我都说,你怎么精神这么好。 我想这个可能是刚工作人的特点——对工作充满兴趣跟憧憬,充满斗志,好奇心,还有学习热情。
        看来工作真的在消磨每个人的斗志,好奇心跟热情。所以我现在在想的问题是,如何保存这些特征呢? 或者说如何唤醒自己当初的热情,好奇心,斗志,要跟puppy一样,而不是做一条沉闷的老狗。
        昨天看到凤凰卫视里的一档节目,有一句话让我很感触,“为什么生活的路越走越窄”。
        从事技术工作让我变得很安静,生活范围的确是在越来越窄,热情似乎也在慢慢的消磨。 这个时候我的确应该重新唤起一下斗志了。
  • 转《两个小时学会DirectDraw开发》

    2008-4-18

    这并非哗众取宠, 通常学习一种电脑技术有两种方法. 一种是自己摸索, 在错误的方向上一错再错, 屡战屡败, 不过最后得道成功. 另一种是有人 或好的材料指导, 因而事半功倍, 在正确的方向上走了速成的捷径. 就象KFC 的鸡一样. 第一种学法能学出电脑天才, 因为所谓电脑高手, 其实就是排错试错的高手. 而第二种则出电脑专才. 这个两小时(?)的学习, 不能使你深入的掌握DD, 不过可以给你编制DD的框架. 能给你 一个起始点, 这个教程就算成功了. 


    DirectDraw编程需要一些背景知识: 

    DirectDraw是为在 Windows95/NT 下实现高速图形显示所写的程式库. 


    高速图形显示的基本方法是用一种叫做 Page Flipping的技术. 关于什么是 Page Flipping, 参见古技介绍.如果你 不急的话, 看到下面, 你也会看到.


    在 Windows95/NT下做 Page Flipping 分为全屏的和窗口的两种. 在全屏下Page Flipping 叫做Flip, 在窗口下叫做 Blit. 


    知道了这些背景知识, 我们可以开始写程式了. 


    写所有 DirectDraw的程式, 差不多都有以下几个步骤, 

    1. 初始化, 这是每个程式都需要的劳什子.
    2. 设置显示模式.
    3. 在内存里建立PageFlipping所需要的两个页, 前页和后页.
    4. 给显示的区域加个画框以免画到外面来.
    5. 在后页画图, 然后"刷"的一下子换到前页来.


    步骤一: 初始化 

    DirectDraw 是一个面向对象的函数库. "面向对象"的意思并不是指面对著你的女朋友, "对象" 在这里, 你可以简单地想象成是一个模板, 比方说,"政府", 一旦你说:"我成立了一个政府". 别人 就会立即把你套入"政府模板", 自然而然地认为你有印钞票的功能. 在我们的程式里, 你一旦声明 一个变量(比如 myDD)是 DirectDraw对象 (DirectDraw对象的正式名为 LPDIRECTDRAW) , 这个myDD就有了 DirectDraw对象的所有的功能和特性. 定义 的语法是:

    LPDIRECTDRAW pMyDD; 
    除了 DD的对象外, 还有几个重要的对象, "页面", "裁剪板" 和 "调色板". "页对象"用来定义"前页"和"后页". 定义如下:

    LPDIRECTDRAWSURFACE pMyDDSFront;
    LPDIRECTDRAWSURFACE pMyDDSBack; 
    一个"裁剪板对象", 在窗口模式下用来剪去画出窗口边界的部份. 

    LPDIRECTDRAWCLIPPER pMyClipper; 
    "调色板"设定屏幕的颜色表, 在读取256色的 Bitmap时要用到. 
    LPDIRECTDRAWPALETTE myDDPal; 
    最最重要的"对象"就是这些了. 当然 DirectX还有很多复杂晦涩的对象. 这是速成不起来的. 

    编制 Windows 程式有一大堆变量和对象是 Windows所要求的, 这也是我最烦 Microsoft的地方. Microsoft 似乎知道这点. 所以在 VC4.0后的版本有了 Wizard的功能帮你自动生成代码. 尽量地去用它的 Wizard使我们的生活变得容易.

    由于我们的程式可能会占用一个窗口, 就给这个窗口一个 handle: 

    HWND myWnd 
    初始化的工作还没有完, 我们要把这些对象指向一个安全的地方 Null.

    pMyDD = NULL;
    pMyDDSFront = NULL;
    pMyDDSBack = NULL;
    pMyClipper = NULL;
    pMyDDPal=NULL; 

    最后, 在 Windows系统为我们的 myDD对象开辟相应的区域:

    DirectDrawCreate( NULL, //用当前的显示驱动 
    &pMyDD, NULL)) 
    Okay, 烦人的初始化总算完了.


    步骤二: 设置屏幕的显示方式.

    DirectDraw 有自己的设置屏幕的方式, 而且它的屏幕模式分为"全屏"( exclusive mode)和"窗口"( normal mode). 各有各的设置方法. 设置的主要区别在于 SetCooperativeLeve的参数.

    SetCooperativeLeve 在"窗口"模式下这样设置:
    pMyDD->SetCooperativeLevel(AfxGetMainWnd()->GetSafeHwnd(),DDSCL_NORMAL); 
    而在"全屏"模式下这样设置:

    pMyDD->SetCooperativeLevel( hwnd, DDSCL_EXCLUSIVE | DDSCL_FULLSCREEN ); 
    如果它们的返回值为 DD_OK表示成功. 我们就可以把屏幕调节成我们想要的样子, 例如 640x480x8. 也就是256色. 究竟有那些屏幕模式可用取决于你的显示卡

    pMyDD->SetDisplayMode( 640, 480, 8 ); 
    现在, 我们已经有了一个屏幕, 不过还不能在上面画画, 我们需要步骤三来 替我们建立一个可供画画涂涂用的画板.


    步骤三: 建立前后页(两块画板).

    两块画板的好处是可以一边在一块上面画, 一边给别人看已经画好的另一块. 等这块画好了, 两块板就对调一下, 让别人看新画好的这块. 如果画的足够快, 换的足够快. 看的人就会看到动画了, 就象电影的效果一样. 我们把这叫做 Page Flipping. 
    这里先要介绍的是怎样在系统中建立两块画板( double buffering), 不过你也可以根据需要建立三块,四块画板.
    DDSURFACEDESC ddsd; //这个结构描述"页"的特徵.
    ddsd.dwFlags = DDSD_CAPS; 
    ddsd.ddsCaps.dwCaps = DDSCAPS_PRIMARYSURFACE;//指定我们用的是前页.

    ddsd.dwSize = sizeof(ddsd); //尺寸


    // 做前页:
    HRESULT result;
    result = pMyDD->CreateSurface(&ddsd, &pMyDDSFront, NULL);
    当发生错误时, 要记得 Release对象.

    if (result!=DD_OK) 

    pMyDD->Release();
    pMyDD = NULL;

    }


    ddsd.dwWidth = scr_width; //设定后页的大小, 
    ddsd.dwHeight = scr_height;

    //指定 我们要后页
    ddsd.dwFlags = DDSD_WIDTH | DDSD_HEIGHT | DDSD_CAPS; 
    ddsd.ddsCaps.dwCaps = DDSCAPS_OFFSCREENPLAIN;

    //做后页
    result = pMyDD->CreateSurface(&ddsd, &pMyDDSBack, NULL);




    步骤四: 给显示区加一个画框(裁剪板). 

    在窗口下. 为了防止 DirectDraw 画到窗口外面去. 需要加一个画框(裁剪板). 可以用 CreateClipper来 创建剪贴板. 
    result = pMyDD->CreateClipper(0, &pMyClipper, NULL);

    创建后,把它套到窗口上去, 所以要知道是那一个窗口( Handle).

    myWnd = AfxGetMainWnd()->GetSafeHwnd();// 从系统中拿到窗口的 Handle
    result = pMyClipper->SetHWnd(0, myWnd);

    // 把剪贴板加到窗口上去
    result = pMyDDSFront->SetClipper(myClipper);


    步骤五: 在后页画图, 前后页互换. 

    其实到这里才是真正开始写游戏的地方, 以前在 DOS下写游戏, 就是直接从 这个步骤开始的. 以上这些工作, 都是 Microsoft强加给我们的.


    写游戏之前我们先来解决前后互换的问题. 

    // 如果前页的内存被 Windows"征用"了, 这里把它要回来. 这个检察常常会被忘记.
    if (pMyDDSFront->IsLost() == DDERR_SURFACELOST) 
    pMyDDSFront->Restore();


    DirectDraw 用来互换的语句有 Blt和 BltFast. BltFast据称比 Blt快10%.

    result = pMyDDSFront->Blt(&rcTo, pMyDDSBack, &rcFrom, DDBLT_WAIT,NULL);
    result = pMyDDSFront->BltFast( 0, 0, pMyDDSBack, &rcFrom, DDBLTFAST_SRCCOLORKEY);


    如果程式工作在"全屏"模式下. 前后页互换容易得多, 只是一句:

    result = pMyDDSFront->Flip( NULL, 0 ); 
    现在就到了游戏的主要部份了, 我们称之为"游戏逻辑"部, 在普通的游戏中, "游戏逻辑" 通常 要做很多事, 如画游戏场景, 故事处理等等. 做完这些事后, 再和前页做换屏. 不过怎样做你的 游戏, 就没有我什么事了:-) 

    所有的步骤都讲完了. 是不是觉得特容易? 半个小时就够了? okay, 剩下的一个半小时让 你把它们变成真正的代码吧. 打开你的 Visual C++, 我用的是 VC5.0, 不过你也可以用 VC4.0. 再低恐怕就不可以了. 别忘了检查一下你的 DirectX SDK 有没有安装好. 打开VC, 选择 MFC app EXE Wizard 来生成程式的框架, 我假设学DirectDraw的人应该 会用VC, 所以怎样用 Wizard, 我就不再赘述.

    进入 IDE环境后, 加入一个新 CPP file, 把上面用到的子函数的代码打入.
    当然你还需要一个 .H file 来放变量名, 对象名和子程式名.

    MFC(Microsoft Fried Checken)的 Wizard会帮你生成 ::InitInstance() 和 ::OnIdle(LONG lCount), 把你的 初始化部份, 建页等步骤放在 InitInstance子类里. 把"游戏逻辑"和换页放在 OnIdle里.


    最后我想给大家一个完整的可运行的例子, 不过平均每分钟六个汉字的速度打得我头昏眼花. 四肢发 麻. 过两天再说.

我的最新图片

Open Toolbar