I recently found the cause for a more indirect bug, PR pkg/51623: running qemu-x86_64 with -smp 4 - the additional CPUs don't start. It's not specifically exciting, but I still thought I'd do a writeup about things I did on the way to finding the cause. The bug ------- QEMU is a great emulator, and attempting to run NetBSD with -smp 4 would fail. It would boot normally until: cpu1: failed to start cpu2: failed to start cpu3: failed to start Booting without -smp, or with a single emulator processor works fine. Since NetBSD boots on an identical real machine, the cause is likely within QEMU, but we'd still like it to work. Where do we start? Setting up ---------- First of all, we'd like to setup an image to be used by QEMU. I've fetched a NetBSD/amd64 ISO image, and ran: dd if=/dev/zero of=nbsd.img # until it got large enough (4GB for me) qemu -cdrom NetBSD-7.99.73-amd64.iso -hda nbsd.img And did an install within QEMU. That got me a working image I could boot with the simple command of qemu -hda nbsd.img Another thing needed are NetBSD sources and a build environment. To fetch NetBSD sources and build a kernel: cvs -danoncvs@anoncvs.NetBSD.org:/cvsroot co src cd src ./build.sh -U -u -j10 -m amd64 -O ~/obj tools At which point you can start building kernels, like the kernel in sys/arch/amd64/conf/GENERIC, with the following command: ./build.sh -U -u -j10 -m amd64 -O ~/obj kernel=GENERIC The most important to know flag here is -u (build incrementally). You may want to remove it, for example if you edit the kernel config or make a change to header files, or just feel paranoid about your changes taking effect. Quickly testing changes ----------------------- Emulators are awesome testing platforms. You can test your changes very quickly. To test out a new kernel, we'll first create a vnd (filesystem image that can be mounted): # vndconfig vnd0 nbsd.img Then we can mount it to change files: # mount /dev/vnd0a /mnt # mv /home/fly/obj/sys/arch/amd64/compile/GENERIC/netbsd /mnt # umount /mnt I chose 'mv' because it would error if the kernel is not yet ready when repeatedly testing. You don't want to manipulate the image while it is mounted from QEMU. Otherwise you'll experience some weird non-existent files and possible host crashes. You'll end up with a dirty filesystem image by repeatedly killing QEMU and not running fsck anyway, so that may come and bite you. Now to test a change I will rebuild a kernel, mount image, mv kernel to image, umount image, and run qemu. Time to test a change is a few seconds, perfect for "I am trying to guessing my way in the dark for a problem I don't fully understand". Actually the bug ---------------- Since QEMU is popular, a number of people mentioned managing to boot with -smp with Linux KVM with various options, 'other CPUs in QEMU' was a good starting point. Having heard reports it works on some CPUs, I initially suspected that the crazy netbsd/x86 kernel feature of patching the kernel at boot based on CPU features is behind the problem. Indeed, attempting to emulate an older CPU (-cpu phenom, but I tried a bunch) did work. A good code pointer is the string "failed to start". It appears in sys/arch/x86/x86/cpu.c:774. We know with confidence that anything that went wrong, happened before it. The code leading up to it attempts to add CPUF_GO to ci->ci_flags, and then waits checking to see if CPUF_RUNNING is set in ci->ci_flags. In our case, it's not. What should set it? Looking for the string 'CPUF_RUNNING'. Where it's set isn't obvious: atomic_or_32(&ci->ci_flags, CPUF_RUNNING); within cpu_init, which can only be called from cpu_hatch. But in case that fails, we can also look for 'CPUF_GO', which is more obvious, within cpu_hatch: * Wait to be brought online. Use 'monitor/mwait' if available, * in order to make the TSC drift as much as possible. so that * we can detect it later. If not available, try 'pause'. * We'd like to use 'hlt', but we have interrupts off. */ while ((ci->ci_flags & CPUF_GO) == 0) { if ((cpu_feature[1] & CPUID2_MONITOR) != 0) { x86_monitor(&ci->ci_flags, 0, 0); if ((ci->ci_flags & CPUF_GO) != 0) { continue; } x86_mwait(0, 0); } else { for (i = 10000; i != 0; i--) { x86_pause(); } } } Secondary CPUs loop waiting until CPUF_GO is set, and they can continue, at which point they set CPUF_RUNNING. This code also tests for a CPU feature. Perfect candidate for our bug, which is CPU-feature-in-emulator-dependent. Since testing a change is still only a few seconds, I tried to quickly use both cases by surrounding it with #if 0; #endif. It didn't immediately work, but I still thought it's a promising candidate, so I looked at the actual function x86_patch. It's in an assembly file, sys/arch/amd64/amd64/cpufunc.S:443 NENTRY(x86_pause) pause ret What is the x86 instruction 'pause' supposed to do? well, searching suggested it's equivalent to 'nop', so I figured I'd try a bunch of nops in place of it, and sure enough, it boots. So instead of sleeping and checking the value, it'd hang here, not exiting the loop in time and setting CPUF_RUNNING, leading to the hang we saw. Not shown in this article ------------------------- - Having spent some time thinking it's an Intel vs. AMD thing, as 'phenom' worked, but no Intel thing I tried did. I should've tested more CPUs I've attempted to disable various Intel discovery bits and putting prints along the way to no avail. - Looking at the CPU definition in QEMU with no gained benefit. - Trying to disable parts and all of x86_patch, thinking it is the 'cx8' feature, and thinking it doesn't work because parts of the function are needed. - Printing strings in random places, hoping to catch the exact code where before the hang. - Sometimes changes would fail to boot or build. We default to -Werror, so to avoid errors from failing to use a variable, I'd add CFLAGS=-Wno-error=unused-function or someother to the environment. Summary ------- having previously swapped arguments for -cpu phenom a couple of times and thinking I got it right, I made sure to triple check things. A clean kernel build, and clean sources with just that change, and checking I'm testing the same command as in the bug. After 45 kernels and several hours, I finally have kernel booting. What's next? it was very slow at boot, so it might be critical code. monitoring a value sounds a lot better than repeatedly pausing and checking it, so it's probably what we want to do. is CPUID2_MONITOR set incorrectly on NetBSD or QEMU? Maybe the whole code is broken and happened to not be an issue because machines with multiple CPUs also had CPUID2_MONITOR. Update ------ We're checking CPUF_GO every 10,000 pauses. Printing stuff shows it does run the loop several times, and lowering the number of pause calls between checks to 10 also made it successfully boot, too. What appears to be the QEMU implementation in pause in target/i386/misc_helper.c is: static void do_pause(X86CPU *cpu) { CPUState *cs = CPU(cpu); /* Just let another CPU run. */ cs->exception_index = EXCP_INTERRUPT; cpu_loop_exit(cs); } Certainly sounds more expensive than nops. Most emulated CPUs don't claim support for MONITOR, but Phenom does. Maybe permalink: http://coypu.sdf.org/20170525-qemu-smp