1. Preface.

In the power management ( ) of the linux cpu core, we mentioned several concepts such as smp, cpu core, and other concepts, although it's hard to write a. They're closely related to the evolution process of cpu and eventually reflected on the cpu topology ( topology ). Therefore, this paper will introduce cpu topology main line, introduce cpu ( mainly taking arm cpu as example ).

In addition, the cpu topology describes the composition of cpu, its main function is to provide the necessary information to the kernel scheduler so that it can. And this is why I classified the"cpu topology"as" power management subsystem".

2. Cpu topology.

2. 1 an example.

Before you start, look at an example, and here's the cpu architecture information for the compiled server used by the rollover:

[ xxx @ cs ~ ] # lscpu

Architecture: x86_64
Cpu op-mode ( s ): 32 -bit, 64 -bit
Byte order: little endian
Cpu ( s ): 24
On-line cpu ( s ) list: 0 -23
Thread ( s ) per core: 2
Core ( s ) per socket: 6
Socket ( s ): 2
Numa node ( s ): 2
Vendor id: GenuineIntel
Cpu family: 6
Model: 62
Stepping: 4
Cpu mhz: 2100.118
Bogomips: 4199.92
Virtualization: vt-x
L1D cache: 32 k
L1I cache: 32 k
L2 cache: 256 k
L3 cache: 15360 k
Numa node0 cpu ( s ): 0,2, 4, 6, 8, 10. 14, 16, 18, 20, 22.
Numa node1 cpu ( s ): 1,3, 5, 7, 9, 11. 15, 17, 19, 21, 23.

Note the blue font section, which has 24 cpus, and the composition is: 2 sockets, each socket has 6 core, each core has 2 thread. In addition, these cpus can be divided into 2 numa nodes. see what I'm talking about. I don't know. I'll make a further explanation.

2. 2 and.

In english, single core ( single-core ) and multicore ( multi-core ) are called uniprocessor and multiprocessor, and here's a description of these concepts:

The core ( or processor ) mentioned here's a general term. The angle of a consumer ( or consumer ) Look at the computer system. Therefore, core, or processor, or processor ( cpu ), is a logical concept, which refers to a core that can be manipulated independently.

And this core can exist in any form, for example: A separate chip ( such as a processor ); A chip integrates multiple cores ( such as smp, symmetric multiprocessing ); A core implementation of multiple hardware context to support multithreading ( such as smt, simultaneous multithreading ); Wait. This is from. Angle of hardware implementation Look.

Finally, from Angle of operating system process scheduling A unified look at the core of these different hardware implementations, such as the cpu ( 24 cpus ) mentioned in 2. 1, because they've a common feature: Execute process ( or thread ).

In traditional era, the only way to improve the performance of the processor is to increase the frequency. But limited to physical processes, frequency cannot be unlimited ( such as heat dissipation problems etc. ). For multicore processors, the amount of space increases, and the heat dissipation is easier to solve. It's multiprocessor is background.

In addition, the requirements, as well as the foundation of multiprocessor development, can be used by a processor to handle communication protocols.

2. 3 smp, smt, numa, etc.

A common multiprocessor implementation is to integrate the same processor with the same functionality ( can be on a chip, or in multiple chips ), which share the bus, memory, and other system resources, called Smp ( symmetric multi-processing ) As a core000, core001, in the image below. From the point of view of linux kernel, these functions are typically called core.

In addition, based on the process,, the chip factory will encapsulate multiple cores on a chip, which is also called sockets. The concept of socket is used in an x86 architecture, which can be understood as a slot. Assuming that a slot has two cores, I plugged in 2 slots on the motherboard, 4 core systems, and 4 slots, which is the 8 kernel system. But the socket is less than the arm architecture, and later we'll introduce another similar concept ( cluster ).

Most operating systems, such as windows, linux, have the notion of processes and threads. A process is a running instance of a program that can include a. Thread is the smallest unit of scheduling. So some processor ( core ) can execute multiple threads at the same time by copying the hardware register state, which is called Smt ( simultanous multi-thread ) .

The following image and the examples in 2. 1 reflect the simple topology of the multicore system.

. mc_support

In previous cases, the core is shared between bus, memory, and so on. If the number of cores is less, there's no problem, but with the increase of core, the demand for bus and memory bandwidth will increase significantly, the final bus and memory will become the bottleneck of system performance. The solution is:

Some cores, private bus and memory, called node. Normally, core memory accesses memory within node, so you can reduce the bandwidth requirements for bus and memory,. However, in some scenarios, core will inevitably be able to access other node memory, which leads to large access latency.

Therefore, this technique is called Numa ( non-uniform memory access ) To reduce the bandwidth requirements for bus and memory, in the cost of memory access. This structure requires higher demand for process scheduling algorithms, and can reduce the number of memory access times to improve system performance.

2. 4 arm ( heterogeneous multi-processing ).

The topology structure mentioned earlier, mostly in x86 architecture pcs, the only goal is to raise the cpu 's rational performance ( not power consumption ). But in the mobile market ( most of arm 's world ), things are complicated.

With the popularity of intelligent devices, users have more demand for mobile devices, and more power consumption is required, which brings higher demands to. At the same time, battery technology doesn't evolve with the evolution of the cpu topology, which leads to the topology of the above topology isn't suitable for.

The chinese meaning of heterogeneous is", varied,"from a literal meaning, which is the inner core of its internal core ( relative to smp ). Its generation is based on the following two facts:

1 ) the higher the performance of the processor, the higher the performance of the processor when dealing with the same transaction. It's determined by the physical process.

2 ) taking a smartphone as an example, the tra & action that must be done by high-performance cpus is very small in all things, such as large games, hd video playback, etc. Even a never used to use.

As a result, the arm architecture, which is similar to the following architecture, encapsulates two types of arm cores, a class of high-performance core ( such as an cortex a15, also known as a big core ), which is a low performance core, such as an little, also known as core, and therefore is known as big · little architecture. Here:

At the big core, the rational can be high, but the power consumption is large;

Low power consumption of little core;

As a result, software ( such as os scheduler ) can be assigned to big core or little to meet the balance of performance and power consumption.


In the terms of arm, all big core or all little core combinations, called cluster ( which can be compared to the described in 2. 3, but the meaning is totally different ), so the cpu topology is as follows:

Cluster--> core--> threads

On the software model, the"socket-> core--> threads"topology, described in basic and 2. 3.

3. Linux kernel cpu topology driver.

After you understand the physical basis of the cpu topology, then look at the linux kernel 's cpu topology driver as much as the software level follows:

--------------------------------------------- --------------------------------------------
| cpu topology driver | | task scheduler etc. |.
--------------------------------------------- -------------------------------------------

| kernel general cpu topology |

| arch-dependent cpu topology |

Kernel general cpu topology is in include/linux/topology. H, defining standard interfaces for obtaining the system cpu topology information. The underlying arch dependent cpu topology will implement the interfaces defined by the kernel based on features of the platform.

The cpu topology information has two important use scenarios: One is to provide the user with the current cpu information ( lscpu in 2.1 ), which is implemented by the cpu topology driver; The other is to provide cpu core information to the scheduler for reasonable scheduling.

Here will focus on kernel general cpu topology, arch-dependent cpu topology and cpu topology driver, where arch dependent cpu topology is as an example of the arm64 platform. As for how to know the task scheduler, it's more complicated and will be introduced in other articles.

3. 1 kernel general cpu topology.

Kernel general cpu topology provides apis primarily in the form of a macro definition of # ifndef. # define type, which is intended to be: Underlying arch dependent cpu topology can redefine these macros, as long as the underlying layer is defined, the underlying api is preferred, otherwise the default api in the kernel general cpu topology is used, primarily:

 1: /* include/linux/topology.h */


 3: #ifndef topology_physical_package_id

 4: #define topology_physical_package_id(cpu) ((void)(cpu), -1)

 5: #endif

 6: #ifndef topology_core_id

 7: #define topology_core_id(cpu) ((void)(cpu), 0)

 8: #endif

 9: #ifndef topology_thread_cpumask

 10: #define topology_thread_cpumask(cpu) cpumask_of(cpu)

 11: #endif

 12: #ifndef topology_core_cpumask

 13: #define topology_core_cpumask(cpu) cpumask_of(cpu)

 14: #endif


 16: #ifdef CONFIG_SCHED_SMT

 17: static inline const struct cpumask *cpu_smt_mask(int cpu)

 18: {

 19: return topology_thread_cpumask(cpu);

 20: }

 21: #endif


 23: static inline const struct cpumask *cpu_cpu_mask(int cpu)

 24: {

 25: return cpumask_of_node(cpu_to_node(cpu));

 26: }

A topology physical package id is used to obtain a package id of a cpu, which is the socket or cluster described in chapter 2, which depends on the implementation of

Topology core id is used for or a core id of a cpu. In the second chapter, the core, the specific meaning depends on the specific platform implementation;

Topology_thread_cpumask, get the cpu and all cpus that belong to the same core as the same core, which is thread;

A topology_core_cpumask that gets all cpus with the same packet ( socket ) as the cpu;

A cpu_cpu_mask that gets all cpus that belong to the same node;

A cpu_smt_mask for smt scheduling ( config_sched_smt ), meaning same as topology _ thread _ cpumask.

In addition,"include/linux/topology. H"provides a numa api, because the possibility of using numa technology in the current arm is less likely, we're going to be.

3. 2 arch-dependent cpu topology.

For arm64, the arch-dependent cpu topology is in the arch/arm64/include/asm/topology. H and arch/arm64/kernel/topology. C, which is primarily responsible for the arm64 platform relative topology transformation, including:

1 ) define a data structure and a variable based on the data structure for the cpu topology of the storage system

 1: /* arch/arm64/include/asm/topology.h */


 3: struct cpu_topology {

 4: int thread_id;

 5: int core_id;

 6: int cluster_id;

 7: cpumask_t thread_sibling;

 8: cpumask_t core_sibling;

 9: };


 11: extern struct cpu_topology cpu_topology[NR_CPUS];

In cluster_id, core_id, thead id correspond to three levels of topology structure described by 2, 3, 2.4 chapters, which are used by cluster instead of socket.

A variable of cpumask t and core peers is a variable of t, and all of the cpus with the same level ( same core and same cluster ) are saved;

Each cpu ( number of the number is specified by nr cpus ) has a struct cpu topology variable that describes the status of the cpu in the entire topology. These variables are maintained as arrays ( cpu_topology ).

2 ) the macro definition related to the cpu topology

 1: /* arch/arm64/include/asm/topology.h */


 3: #define topology_physical_package_id(cpu) (cpu_topology[cpu].cluster_id)

 4: #define topology_core_id(cpu) (cpu_topology[cpu].core_id)

 5: #define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)

 6: #define topology_thread_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)

It's simpler to fetch the specified fields from the struct structure of the cpu topology variable.

3 ) provides methods to initialize and build the cpu topology so that they can be invoked at system startup

 1: /* arch/arm64/include/asm/topology.h */


 3: void init_cpu_topology(void);

 4: void store_cpu_topology(unsigned int cpuid);

The call path for the init cpu topology is: A kernel_init--> smp_prepare_cpus--> init_cpu_topology, primarily the following tasks:

Initialize all possible cpus struct cpu topology 变量;

Attempts to resolve the cpu topolog configuration from dts, the format of the configuration is as follows:

Cpus %.
7b cpu-map %
7b cluster0 %
7b core0 %
7b thread0 %
Cpu = <& amp; big0>;
7b thread1 %
Cpu = <& amp; big1>;
7b core1 %

Big0: cpu @ 0 % 7b
Device_type ="cpu";
Compatible ="arm, cortex-a15";
Reg = <0x0>;

Detailed reference to the description in Documentation/devicetree/bindings/arm/topology. Txt.

The path to the store cpu topology is: In kernel_init--> smp_prepare_cpus--> store_cpu_topology, the topology information from arm64 mpidr register is read from register, and the corresponding code is.

3. 3 cpu topology driver.

Cpu topology driver is located in driversbasetopology. C, based on the api provided by include/linux/topology. H, which provides the interface for obtaining the topology information to user.

Concrete implementations are simpler, and the format of columns can refer to"documentationcputopology. Txt", which is no longer detailed.

In the original article, please note the origin.

Copyright © 2011 Dowemo All rights reserved.    Creative Commons   AboutUs