B生的部落格: 2010年5月6日

SMP 硬體體系結構：
對於 SMP 最簡單可以理解為系統存在多個完全相同的 CPU ，所有 CPU 共用匯流排，擁有自己的寄存器。對於記憶體和外部設備訪問，由於共用匯流排，所以是共用的。 Linux 作業系統多個 CPU 共用在系統空間上映射相同，是完全對等的。
由於系統中存在多個 CPU ，這是就引入一個問題，當外部設備產生中斷的時候，具體有哪一個 CPU 進行處理？
為此， intel 公司提出了 IO APCI 和 LOCAL APCI 的體系結構。
IO APIC 連接各個外部設備，並可以設置分發類型，根據設定的分發類型，中斷信號發送的對應 CPU 的 LOCAL APIC 上。
LOCAL APIC 負責本地 CPU 的中斷處理， LOCAL APIC 不僅可以接受 IO APIC 的中斷，也需要處理本地 CPU 產生的異常。同時 LOCAL APIC 還提供了一個計時器。
如何確定那個 CPU 是引導 CPU ？
根據 intel 公司中的資料，系統上電後，會根據 MP Initialization Protocol 隨機選擇一個 CPU 作為 BSP ，只有 BSP 會運行 BIOS 程式，其他 AP 都進入等待狀態， BSP 發送 IPI 中斷觸發後才可以運行。具體的 MP Initialization Protocol 細節，可以參考 Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1 第 8 章。
引導 CPU 如何控制其他 CPU 開始運行？
BSP 可以通過 IPI 消息控制 AP 從指定的起始位址運行。 CPU 中集成的 LOCAL APIC 提供了這個功能。可以通過寫 LOCAL APIC 中提供的相關寄存器，發送 IPI 消息到指定的 CPU 上。
如何獲取系統硬體 CPU 資訊的？
在系統初始化後，硬體會在記憶體的規定位置提供關於 CPU ，匯流排 , IO APIC 等的資訊，即 SMP MP table 。在 linux 初始化的過程，會讀取該位置，獲取系統相關的硬體資訊。
2. linux SMP 啟動過程流程簡介
setup_arch()
setup_memory();
reserve_bootmem(PAGE_SIZE, PAGE_SIZE);
find_smp_config(); // 查找 smp mp table 的位置
smp_alloc_memory();
trampoline_base = (void *) alloc_bootmem_low_pages(PAGE_SIZE); // 分配 trampoline ，用於啟動 AP 的引導代碼。
get_smp_config(); // 根據 smp mp table ，獲取具體的硬體資訊
trap_init()
init_apic_mappings();
mem_init();
zap_low_mappings(); 如果沒有定義 SMP 的話，清楚用戶空間的位址映射。
rest_init();
kernel_thread(init, NULL, CLONE_FS CLONE_SIGHAND);
init();
set_cpus_allowed(current, CPU_MASK_ALL);
smp_prepare_cpus(max_cpus);
smp_boot_cpus(max_cpus);
connect_bsp_APIC();
setup_local_APIC(); // 初始化 BSP 的 LOCAL APCI 。
map_cpu_to_logical_apicid();
針對每個 CPU 調用 do_boot_cpu(apicid, cpu)
smp_init(); // 每個 CPU 開始進行調度
trampoline.S AP 引導代碼，為 16 進制代碼，啟用保護模式
head.s 為 AP 創建分頁管理
initialize_secondary 根據之前 fork 創建設置的資訊，跳轉到 start_secondary 處
start_secondary 判斷 BSP 是否啟動，如果啟動 AP 進行任務調度。
3. 代碼學習總結
find_smp_config(); ，查找 MP table 在記憶體中的位置。具體協定可以參考 MP 協定的第 4 章。
這個表的作用在於描述系統 CPU ，匯流排， IO APIC 等的硬體資訊。
相關的兩個總體變數： smp_found_config 是否找到 SMP MP table ， mpf_found SMP MP table 的線性位址。
smp_alloc_memory() 為啟動 AP 的啟動程式分配記憶體空間。相關總體變數 trampoline_base ，分配的啟動位址的線性位址。
get_smp_config() 根據 MP table 中提供的內容，獲取硬體的資訊。
init_apic_mappings(); 獲取 IO APIC 和 LOCAL APIC 的映射位址。
zap_low_mappings(); 如果沒有定義 SMP 的話，清楚用戶空間的位址映射。將 swapper_pg_dir 中表項清零。
setup_local_APIC(); 初始化 BSP 的 LOCAL APCI 。
do_boot_cpu(apicid, cpu)
idle = alloc_idle_task(cpu);
task = copy_process(CLONE_VM, 0, idle_regs(&regs), 0, NULL, NULL, 0);
init_idle(task, cpu);
將 init 進程使用 copy_process 複製，並且調用 init_idle 函數，設置可以運行的 CPU 。
idle->thread.eip = (unsigned long) start_secondary;
修改 task_struct 中的 thread.eip ，使得 AP 初始化完成後，就運行 start_secondary 函數。
start_eip = setup_trampoline();
調用 setup_trampoline() 函數，複製 trampoline_data 到 trampoline_end 之間的代碼到 trampoline_base 處， trampoline_base 就是之前在 setup_arch 處申請的記憶體。 start_eip 返回值是 trampoline_base 對應的物理位址。
smpboot_setup_warm_reset_vector(start_eip); 設置記憶體 40:67h 處為 start_eip 為啟動地址。
wakeup_secondary_cpu(apicid, start_eip); 在這個函數中通過操作 APIC_ICR 寄存器， BSP 向目標 AP 發送 IPI 消息，觸發目標 AP 從 start_eip 位址處，從實模式開始運行。
trampoline.S
ENTRY(trampoline_data)
r_base = .
wbinvd # Needed for NUMA-Q should be harmless for others
mov %cs, %ax # Code and data in the same place
mov %ax, %ds
cli # We should be safe anyway
movl $0xA5A5A5A5, trampoline_data - r_base
這個是設置標識，以便 BSP 知道 AP 運行到這裏了。

lidtl boot_idt - r_base # load idt with 0, 0
lgdtl boot_gdt - r_base # load gdt with whatever is appropriate
載入 ldt 和 gdt
xor %ax, %ax
inc %ax # protected mode (PE) bit
lmsw %ax # into protected mode
# flush prefetch and jump to startup_32_smp in arch/i386/kernel/head.S
ljmpl $__BOOT_CS, $(startup_32_smp-__PAGE_OFFSET)
啟動保護模式，跳轉到 startup_32_smp 處
# These need to be in the same 64K segment as the above;
# hence we don't use the boot_gdt_descr defined in head.S
boot_gdt:
.word __BOOT_DS + 7 # gdt limit
.long boot_gdt_table-__PAGE_OFFSET # gdt base
boot_idt:
.word 0 # idt limit = 0
.long 0 # idt base = 0L
.globl trampoline_end
trampoline_end:
在這段代碼中，設置標識，以便 BSP 知道該 AP 已經運行到這段代碼，載入 GDT 和 LDT 表基址。
然後啟動保護模式，跳轉到 startup_32_smp 處。
Head.s 部分代碼：
ENTRY(startup_32_smp)
cld
movl $(__BOOT_DS),%eax
movl %eax,%ds
movl %eax,%es
movl %eax,%fs
movl %eax,%gs
xorl %ebx,%ebx
incl %ebx
如果是 AP 的話，將 bx 設置為 1
movl $swapper_pg_dir-__PAGE_OFFSET,%eax
movl %eax,%cr3 /* set the page table pointer.. */
movl %cr0,%eax
orl $0x80000000,%eax
movl %eax,%cr0 /* ..and set paging (PG) bit */
ljmp $__BOOT_CS,$1f /* Clear prefetch and normalize %eip */
啟用分頁，
lss stack_start,%esp
使 esp 執行 fork 創建的進程內核堆疊部分，以便後續跳轉到 start_secondary
#ifdef CONFIG_SMP
movb ready, %cl
movb $1, ready
cmpb $0,%cl
je 1f # the first CPU calls start_kernel
# all other CPUs call initialize_secondary
call initialize_secondary
jmp L6
1:
#endif /* CONFIG_SMP */
call start_kernel
如果是 AP 啟動的話，就調用 initialize_secondary 函數。
void __devinit initialize_secondary(void)
{
/*
* We don't actually need to load the full TSS,
* basically just the stack pointer and the eip.
*/
asm volatile(
"movl %0,%%esp\n\t"
"jmp *%1"
:
:"r" (current->thread.esp),"r" (current->thread.eip));
}
設置堆疊為 fork 創建時的堆疊， ip 為 fork 時的 ip ，這樣就跳轉的了 start_secondary 。
start_secondary 函數中處理如下：
while (!cpu_isset(smp_processor_id(), smp_commenced_mask))
rep_nop();
進行 smp_commenced_mask 判斷，是否啟動 AP 運行。 smp_commenced_mask 在 smp_init() 中設置。
cpu_idle();
如果啟動了，調用 cpu_idle 進行任務調度。
本文來自CSDN博客，轉載請標明出處：http://blog.csdn.net/jemmy858585/archive/2009/09/01/4509375.aspx

--> 閱讀更多...

B生的部落格

2010年5月6日星期四

linux SMP 啟動過程學習筆記

MUSIC

Benson 歡迎您：

2010年5月6日 星期四

linux SMP 啟動過程學習筆記

2010年5月6日星期四