Linux kernel address mapping model and Linux kernel high-end memory

April 03, 2023

The Linux operating system and drivers run in kernel space, and applications run in user space. Both cannot simply use pointers to pass data because the virtual memory mechanism used by Linux, user-space data may be swapped out, when the kernel space uses the user The space pointer, the corresponding data may not be in memory.

Linux kernel address mapping model

The x86 CPU uses a segment page address mapping model. The address in the process code is a logical address. After the segment page address mapping, the physical memory is actually accessed.

Segment page mechanism is shown below.

Linux kernel address space division

Usually 32-bit Linux kernel address space is divided into 0~3G user space, 3~4G kernel space. Note that this is a 32-bit kernel address space partition, 64-bit kernel address space division is different.

The origin of the high-end memory of the Linux kernel

When the kernel module code or thread accesses the memory, the memory addresses in the code are all logical addresses. Corresponding to the real physical memory address, a one-to-one mapping of the addresses is required. For example, the physical address corresponding to the logical address 0xc0000003 is 0 & TImes; The physical address corresponding to 0xc0000004 is 0 & TImes; 4, ..., and the relationship between logical address and physical address is

Physical address = logical address - 0xC0000000

Assume that according to the above simple address mapping relationship, the kernel logical address space access is 0xc0000000 ~ 0xffffffff, then the corresponding physical memory range is 0 & TImes; 0 ~ 0&Times; 40000000, ie only 1G physical memory can be accessed. If 8G physical memory is installed in the machine, then the kernel can only access the first 1G physical memory, and the latter 7G physical memory will not be accessible because the kernel's address space has been mapped to the physical memory address range 0x0 ~ 0x40000000. Even if 8G physical memory is installed, the physical address is 0Ã—40000001. How does the kernel access it? The code must have a logical address of the memory. The address space 0xc0000000 ~ 0xffffffff has been used up, so it is impossible to access the memory after the physical address 0x40000000.

Obviously the kernel address space 0xc0000000 ~ 0xfffffff can not be used for a simple address mapping. So the x86 architecture divides the kernel address space into three parts: ZONE_DMA, ZONE_NORMAL, and ZONE_HIGHMEM. ZONE_HIGHMEM is the high-end memory, which is the origin of the memory high-end memory concept.

In the x86 structure, the three types of areas are as follows:

16MB of ZONE_DMA memory starting

ZONE_NORMAL 16MB~896MB

ZONE_HIGHMEM 896MB ~ End

Linux kernel high-end memory understanding

Earlier we explained the origin of high-end memory. Linux divides the kernel address space into three parts ZONE_DMA, ZONE_NORMAL, and ZONE_HIGHMEM. The high memory HIGH_MEM address space ranges from 0xF8000000 to 0xFFFFFFFF (896MB to 1024MB). So how does the kernel use 128MB high-end memory address space to achieve access to all physical memory?

When the kernel wants to access more than 896MB of physical address memory, it searches for a corresponding free size of logical address space from the address space range of 0xF8000000 to 0xFFFFFFFF for a while. By using this logical address space, the physical memory that is mapped to the desired session (ie, populating the kernel PTE page table) is established, temporarily used for a while, and returned after use. In this way, other people can borrow this address space to access other physical memory and use a limited address space to access all the physical memory. As shown below.

For example, the kernel wants to access a 2MB physical memory of a size of 1MB, that is, the physical address range is 0x80000000 ~ 0x800FFFFF. Before accessing, find a 1MB free address space. Assume that the free address space found is 0xF8700000 ~ 0xF87FFFFF. Use this 1MB logical address space to map the physical address space 0x80000000 ~ 0x800FFFFF memory. The mapping is as follows:

After the kernel finishes accessing physical memory of 0x80000000 ~ 0x800FFFFF, it releases the linear space of 0xF8700000 ~ 0xF87FFFFF kernel. In this way, other processes or code can use 0xF8700000 ~ 0xF87FFFFF to access other physical memory.

From the above description, we can know the basic idea of â€‹â€‹high-end memory: borrow a piece of address space, set up a temporary address map, release it after use, reach this address space can be recycled, access all physical memory.

See here, can not help but someone can ask: If there is a kernel process or module has been occupying a certain segment of the logical address space does not release, how to do? If this happens, the high-end memory address space of the kernel becomes more and more tense. If it is occupied and not released, no mapping to physical memory cannot be established.

In some office buildings in Tsim Sha Tsui, Hong Kong, there are few toilets and door locks. If the customer wants to go to the bathroom, he can take the keys to the front desk. After the convenience is over, the keys are returned to the front desk. Although there is only one toilet, it can meet the needs of all customers going to the bathroom. If a customer has been occupying the restroom and the keys have not been returned, other customers will not be able to go to the bathroom. The idea of â€‹â€‹high-end memory management in the Linux kernel is similar.

Linux kernel high memory partition

The kernel divides the high memory into three parts: VMALLOC_START~VMALLOC_END, KMAP_BASE~FIXADDR_START, and FIXADDR_START~4G.

For high memory, the corresponding page can be obtained by alloc_page() or other functions, but to access the actual physical memory, you have to turn the page into a linear address (why? Think about how MMU accesses physical memory), That is, we need to find a linear space for the page corresponding to the high-end memory. This process is called high-end memory mapping.

Corresponding to the three parts of the high-end memory, there are three ways to high-end memory mapping: mapping to "noncontiguous memory allocation" (noncontiguous memory allocation)

This method is very simple, because through the vmalloc (), in the "kernel dynamic mapping space" when applying for memory, it is possible to get the page from the upper memory (see vmalloc implementation), so that high-end memory may be mapped to "kernel dynamic mapping Space".

Persistent kernel mapping

If it is through the alloc_page () to get the high-end memory corresponding page, how to find a linear space for it?

The kernel specifically sets aside a linear space for this purpose, from PKMAP_BASE to FIXADDR_START, which maps high memory. On the 2.6 core, this address range is 4G-8M to 4G-4M. This space is called "kernel permanent mapping space" or "permanent kernel mapping space." This space uses the same page directory table as other spaces. For the kernel, it is swapper_pg_dir. For ordinary processes, it points to the CR3 register. Usually, this space is 4M, so only one page table is needed. The kernel looks for this page table by pkmap_page_table. With kmap(), you can map a page to this space. Since this space is 4M in size, up to 1024 pages can be mapped at the same time. Therefore, for pages that are not used, and should be released from this space (that is, unmapped), a page-specific linear address can be released from this space by kunmap().

The temporary kernel mapping kernel reserves some linear space between FIXADDR_START and FIXADDR_TOP for special needs. This space is called a "fixed mapping space." In this space, there is a temporary mapping for high-end memory.

This space has the following characteristics: (1) Each CPU occupies a space (2) In the space occupied by each CPU, and is divided into multiple small spaces, each small space is a page, each Small space is used for a purpose defined in km_type in kmap_types.h.

When you want to perform a temporary mapping, you need to specify the purpose of the mapping. According to the mapping purpose, you can find the corresponding small space, and then use the address of this space as the mapping address. This means that a temporary mapping will cause the previous mapping to be overwritten. A temporary map can be implemented with kmap_atomic().

common problem:

1. Does the user space (process) have a high-end memory concept?

The user process does not have a high-end memory concept. Only high-end memory exists in the kernel space. A user process can only access up to 3G of physical memory, and the kernel process can access all physical memory.

2. Is there high memory in the 64-bit kernel?

In reality, the 64-bit Linux kernel does not have high-end memory, because the 64-bit kernel can support more than 512GB of memory. If the physical memory installed on the machine exceeds the kernel address space, there will be high memory.

3. How much physical memory can a user process access? How much physical memory can the kernel code access?

A 32-bit system user process can access up to 3 GB, and kernel code can access all physical memory.

A 64-bit system user process can access more than 512GB, and kernel code can access all physical memory.

4, the relationship between high-end memory and physical address, logical address, linear address?

High-end memory is only related to logical addresses, and it is not directly related to logical addresses and physical addresses.

5. Why not allocate all address space to the kernel?

If all the address space is given to memory, how does the user process use memory? How to ensure that the kernel uses memory and user processes without conflicts?

(1) Let us ignore Linux support for segmented memory mapping. In protected mode, we know that regardless of whether the CPU is running in user mode or core mode, the address accessed by the CPU execution program is a virtual address. The MMU must read the value of the control register CR3 as a pointer to the current page directory, and then according to the paging. The memory mapping mechanism (see related documents) converts this virtual address to a real physical address so that the CPU can actually access the physical address.

(2) For 32-bit Linux, each process has a 4G address space, but when a process accesses an address in its virtual memory space, how is it not confused with other processes' virtual space? ? Each process has its own page directory, PGD. Linux stores a pointer to this directory in the memory structure task_struct.(struct mm_struct)mm->pgd corresponding to the process. Whenever a process is scheduled (schedule()) is about to enter the running state, the Linux kernel will use its PGD pointer to set CR3 (switch_mm()).

(3) When creating a new process, create a new page directory PGD for the new process, and copy the kernel interval page directory entry from the kernel page directory swapper_pg_dir to the corresponding position of the new process page directory PGD. As follows: do_fork() --> copy_mm() --> mm_init() --> pgd_alloc() --> set_pgd_fast() --> get_pgd_slow() --> memcpy(&PGD + USER_PTRS_PER_PGD, swapper_pg_dir + USER_PTRS_PER_PGD, (PTRS_PER_PGD - USER_PTRS_PER_PGD) * sizeof(pgd_t)) In this way, the page directory of each process is divided into two parts. The first part is "user space" which is used to map its entire process space (0x0000 0000-0xBFFF FFFF) ie 3G words. The virtual address of the section; the second part is â€œsystem spaceâ€ and is used to map (0xC000 0000-0xFFFF FFFF) a 1G byte virtual address. It can be seen that the second part of the page directory of each process in the Linux system is the same, so from the perspective of the process, each process has 4G bytes of virtual space, and the lower 3G bytes are its own user space. The highest 1 Gbyte is the system space shared with all processes and the kernel.

(4) Now suppose that we have the following scenario: In process A, set the host name of the computer in the network by the system call sethostname(const char *name, seze_t len). In this scenario, we will inevitably involve user space. The problem of transferring data to the kernel space, name is the address in user space, which is set to an address in the kernel through a system call. Let us look at some of the details of this process: The specific implementation of the system call is to store the parameters of the system call into the registers ebx, ecx, edx, esi, and edi in order (up to 5 parameters, the scenario has two name and len Then, the system call number is stored in the register eax, and then the process A is entered into the system space by the interrupt instruction "int 80". Since the CPU running level of the process is less than or equal to the entry level 3 of the trap gate set for the system call, it is possible to unimpededly enter the system space to execute the function pointer system_call() set for the int 80. Since system_call() belongs to the kernel space, its run level DPL is 0, and the CPU switches the stack to the kernel stack, which is the system space stack of process A. We know that when the kernel creates a task_struct structure for a new process, it allocates two consecutive pages, that is, 8K in size, and uses a size of about 1k at the bottom for the task_struct (such as #define alloc_task_struct() ((struct task_struct *) __get_free_pages( GFP_KERNEL,1))), and the rest of the memory is used for the system space stack space, that is, when going from user space to system space, the stack pointer esp becomes (alloc_task_struct()+8192), which is why system space is usually Defines the cause of the current process's task_struct address using the macro definition of current (see its implementation). Each time the process enters system space from user space, the system stack is pushed into the user stack SS, user stack pointer ESP, EFLAGS, user space CS, and EIP in turn, and then system_call() pushes eax in and calls again. SAVE_ALL presses ES, DS, EAX, EBP, EDI, ESI, EDX, ECX, EBX, and then calls sys_call_table+4*%EAX. This scenario is sys_sethostname().

(5) In sys_sethostname(), after some protection considerations, call copy_from_user(to, from, n), where to points to kernel space system_utsname.nodename, such as 0xE625A000, from points to user space such as 0x8010FE00. Now process A enters the kernel and runs in system space. MMU completes the mapping of virtual addresses to physical addresses based on their PGD, and finally completes the replication of user space to system space data. The kernel needs to determine the legality of the user space address and length before preparing to copy. Whether the whole range of a certain length from the user space address has been mapped and does not check, if an address in the interval does not have mapping or read and write permissions When a problem occurs, it is regarded as a bad address and a page exception is generated and the page exception service routine is processed. The procedure is as follows: copy_from_user()->generic_copy_from_user()->access_ok()+__copy_user_zeroing().

(6) Summary: * Process address space 0~4G * Process can only access 0~3G in user mode, only access kernel mode to access 3G~4G * Process enters kernel mode through system call * 3G of each process virtual space The ~4G part is the same *Process enters kernel mode from user mode will not cause CR3 change but will cause stack change Linux simplifies the segmentation mechanism so that the virtual address and the linear address are always the same, so Linux's virtual address space Also 0 to 4G. The Linux kernel divides this 4 Gbyte space into two parts. The highest 1G bytes (from virtual address 0xC0000000 to 0xFFFFFFFF) are used by the kernel, called "kernel space." Instead, the lower 3G bytes (from virtual address 0x00000000 to 0xBFFFFFFF) are used by various processes, which are called "user space." Because each process can enter the kernel through a system call, the Linux kernel consists of all processes within the system. Therefore, from the point of view of the specific process, each process can have 4G bytes of virtual space.Linux uses two levels of protection: Level 0 for the kernel, Level 3 for the user program. As can be seen from the figure (here can't represent the graph), each process has its own private user space (0 ~ 3G), this space is invisible to other processes in the system. The highest 1GB byte virtual kernel space is for all processes and the kernel shared.

1. Virtual Kernel Space to Physical Space Mapping

Kernel code and data are stored in the kernel space, and user space code and data are stored in the user space of the process. Whether it is kernel space or user space, they are all in virtual space. The reader will ask, when the system starts, kernel code and data are not loaded into physical memory? Why are they in virtual memory? This is related to the compiler. We will understand this later through specific discussions.

Although the kernel space occupies a maximum of 1 GB bytes in each virtual space, mapping to physical memory always starts at the lowest address (0x00000000). For the kernel space, its address mapping is a very simple linear mapping, 0xC0000000 is the displacement between the physical address and the linear address, called PAGE_OFFSET in the Linux code.

Let's take a look at the description and definition of address mapping in kernel space in include/asm/i386/page.h: /** This handles the memory map.. We could make this a config* option, but too many people screw It up, and too few need* it.** A __PAGE_OFFSET of 0xC0000000 means that the kernel has* a virtual address space of one gigabyte, which limits the* amount of physical memory you can use to about 950MB. ** If you want More physical memory than this then see the CONFIG_HIGHMEM4G* and CONFIG_HIGHMEM64G options in the kernel configuration.*/#define __PAGE_OFFSET (0xC0000000)... #define PAGE_OFFSET ((unsigned long)__PAGE_OFFSET)#define __pa(x) ((unsigned long) ( x)-PAGE_OFFSET)#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET)) Note in the source code that if your physical memory is greater than 950MB then you need to compile the kernel Add CONFIG_HIGHMEM4G and CONFIG_HIGHMEM64G options, we will not consider this case. If the physical memory is less than 950 MB, then for kernel space, given a virtual address x, the physical address is "x-PAGE_OFFSET", given a physical address x, and its virtual address is "x+ PAGE_OFFSET".

Here again, the macro __pa() maps only the virtual address of a kernel space to a physical address, and never to user space, and the user-space address mapping is much more complex.

2. Kernel image

In the following description, we call the kernel code and data the kernel image. When the system starts, the Linux kernel image is installed at the physical address 0x00100000, which is the interval starting from 1MB (the 1M is reserved for other uses). However, in normal operation, the entire kernel image should be in the virtual kernel space. Therefore, when connecting the kernel image, the linker adds an offset PAGE_OFFSET to all symbol addresses. In this way, the kernel image starts in the kernel space. The address is 0xC0100000.

For example, the process's page directory PGD (which belongs to the kernel data structure) is in kernel space. When the process is switched, the register CR3 is set to point to the page directory PGD of the new process, and the starting address of the directory is a virtual address in the kernel space, but the physical address is needed for the CR3, and __pa is used at this time. () Perform address translation. In mm_context.h there is a line like this: asm volatile("movl %0,%%cr3": :"r" (__pa(next->pgd));

This is a line of embedded assembly code, which means that the next process's page directory start address, next_pgd, is converted to a physical address by __pa(), stored in a register, and then written to the CR3 register using the mov instruction. in. After this line of processing, CR3 points to the new process page directory PGD next

66kV Power Transformer

For the 66kV Oil Immersed Power Transformer, we can produce capacity upto 180MVA. We use the best quality of raw material and advance design software to provide low noise, low losses, low partial discharge and high short-circuit impedance for power transformer.

Our power transformer are widely used in national grid, city grid, rural grid, power plant, industrial and mining enterprise, and petrochemical industry.

Rectifier Transformer,66Kv Power Transformer,66Kv Transformer,66Kv Transformer With Octc

Hangzhou Qiantang River Electric Group Co., Ltd.(QRE) , https://www.qretransformer.com