Beginner's introduction to GCC

About me

maya@NetBSD.org

coypu@sdf.org

NetBSD/pkgsrc for the last 3 years

(Not a GCC expert, just sharing knowledge)

NetBSD

Entire, bootable operating system (not just a kernel)

x86-64, ARM, POWERPC...

...m68k, VAX, SuperH

This talk

GNU-centric view of toolchains

(alternatives exist, won't be mentioned)

Might be familiar for many

let's all get on the same page

Top-level overview

preprocessor
compiler
assembler
linker
preprocessor
compiler
assembler
linker
runtime linker (ld.so)

Upstreams overview

projectcomponent
GCCpreprocessor
GCCcompiler
binutilsassembler
binutilslinker
OSruntime linker (ld.so)

Independent tools

commandcomponent
cpppreprocessor
/usr/libexec/cc1compiler
asassembler
ldlinker
(kernel)runtime linker (ld.so)

can pass flags to each component

GCC flagcomponent
-Wp,preprocessor
(none)compiler
-Wa,assembler
-Wl,linker

Can stop after component

GCC flagstop at
-Epreprocessor
-Scompiler
-cassembler
(none)linker

runtime linker - kernel

readelf ... PT_INTERP:

[Requesting program interpreter: /libexec/ld.elf_so]

sys/kern/exec_elf.c:
                if (pp->p_type == PT_INTERP) {
		...
                        interp = PNBUF_GET();
                        if ((error = exec_read_from(l, epp->ep_vp,
                            pp->p_offset, interp, pp->p_filesz)) != 0)
                                goto bad;
		..
		}

Preprocessor

important for packaging, most of our problems are here

Expands preprocessor directives

  • #include <math.h>
  • #if defined(__NetBSD__) || defined(__linux__)...
  • #error "OS not in long list of supported OSes"

#include <math.h> ?

(why would things "just work"?)

  • Existing lookup directories
  • Visible with -Wp,--verbose
  • Each tool has --verbose

#include <math.h>

~> gcc -Wp,--verbose test.c
#include "..." search starts here:
#include <...> search starts here:
 /usr/include/gcc-6
 /usr/include
End of search list.
I have /usr/include/math.h

#if defined(__NetBSD__) ?

I never defined that...

GCC internal code builtin_define("__NetBSD__")

visible with gcc -dM -E - < /dev/null

~> gcc -dM -E - < /dev/null
#define __NetBSD__ 1
#define _LP64 1
#define __STDC_VERSION__ 201112L
...

Results can be inspected after preprocessing

-save-temps to save all the results

  • Single file with all the used code
  • Standalone test case for compiler problems
  • No need to dig through many include files
  • quickly debug your qt5+boost program with limited prior knowledge
(the reason for this talk)

compiler

?? Magical C to assembler machine ??

assembler

  • Human-like assembly output
  • Some directives
	.file	"test2.c"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	$0, %eax
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (nb2 20180327) 6.4.0"

Linker/object files

  • Machine-readable
  • Many tools to parse

Parsing binary objects (nm)

~> nm hello-world
0000000000600a78 d _DYNAMIC
0000000000600bf8 d _GLOBAL_OFFSET_TABLE_
                 w _Jv_RegisterClasses
0000000000600a58 D __CTOR_LIST_END__
0000000000400968 r __GNU_EH_FRAME_HDR
00000000004006ea T ___start
0000000000600c92 B __bss_start
                 w __deregister_frame_info
0000000000600c48 D __dso_handle
0000000000600c40 D __progname
0000000000600c98 B __ps_strings
                 w __register_frame_info
00000000004005e8 r __rela_iplt_end
00000000004005e8 r __rela_iplt_start
0000000000400690 T __start
                 U __syscall
0000000000600c92 D _edata
0000000000600cb0 B _end
                 U _exit
00000000004008d0 T _fini
00000000004005f0 T _init
                 U _libc_init
0000000000400690 T _start
                 U abort
                 U atexit
0000000000600ca8 B environ
                 U exit
00000000004008c1 T main

C runtime stuff

(where all these extra symbols came from?)

We can see that the linker adding those

C runtime stuff

~> gcc -Wl,--verbose test2.c
.. (linker script) ..
attempt to open /usr/lib/crt0.o succeeded
/usr/lib/crt0.o
attempt to open /usr/lib/crti.o succeeded
/usr/lib/crti.o
attempt to open /usr/lib/crtbegin.o succeeded
/usr/lib/crtbegin.o
attempt to open /var/tmp//ccrus4oG.o succeeded
/var/tmp//ccrus4oG.o
attempt to open /usr/lib/libgcc_s.so succeeded
-lgcc_s (/usr/lib/libgcc_s.so)
attempt to open /usr/lib/libgcc.so failed
attempt to open /usr/lib/libgcc.a succeeded
attempt to open /usr/lib/libc.so succeeded
-lc (/usr/lib/libc.so)
attempt to open /usr/lib/libgcc_s.so succeeded
-lgcc_s (/usr/lib/libgcc_s.so)
attempt to open /usr/lib/libgcc.so failed
attempt to open /usr/lib/libgcc.a succeeded
attempt to open /usr/lib/crtend.o succeeded
/usr/lib/crtend.o
attempt to open /usr/lib/crtn.o succeeded
/usr/lib/crtn.o

specfiles

Not very legible, gcc -dumpspecs

Responsible for not needing to specify -lc

attempt to open /usr/lib/libc.so succeeded
-lc (/usr/lib/libc.so)
attempt to open /usr/lib/libgcc_s.so succeeded
-lgcc_s (/usr/lib/libgcc_s.so)

specfiles

*lib:
%{pthread:	  %{!p:		      %{!pg:-lpthread}}
%{p:-lpthread_p}		     
%{shared:-lc}	       %{pg:-lc_p}}}

Parsing binary objects (readelf)

Can see the libraries we use

Dynamic section at offset 0x2adc0 contains 22 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libedit.so.3]
 0x0000000000000001 (NEEDED)             Shared library: [libterminfo.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.12]
 0x000000000000000f (RPATH)              Library rpath: [/lib]

Semver

We can see we specify libc.MAJOR

libc.MAJOR -> libc.MAJOR.minor (symlink)

Can change library minor without binary noticing
New library major won't be used

GCC configuration

  • Primary file: gcc/config.gcc
  • per-OS+arch
  • Add headers to override definitions
  • Add extra makefiles for extra code to run e.g. on init
    (e.g. NetBSD has custom code to rename cabs to __c99_cabs

Can dump lots of intermediate results

  • -fdump-rtl-all-all
  • -fdump-tree-all

Backend (RTL)

  • gcc/config/ARCH/arch.md
  • lisp-like, rules to match

Backend (RTL)

  • constraints must match
  • multiple templates possible
(define_insn "extendqihi2"
  [(set (match_operand:HI 0 "nonimmediate_operand" "=g")
        (sign_extend:HI (match_operand:QI 1 "nonimmediate_operand" "g")))]
  ""
  "cvtbw %1,%0")

Condition code

  • Want to reorder assembly
  • Need to specify side-effects (condition codes)
    (x86 FLAGS)
MIPS
add r0 r1 r2
x86-64:
cmp $0, 4(%rbx)
jne forward

Backend (RTL)


(define_insn "*cmp<mode>_minus_1"
  [(set (reg FLAGS_REG)
        (compare
          (minus:SWI (match_operand:SWI 0 "nonimmediate_operand" "<r>m,<r>")
                     (match_operand:SWI 1 "<general_operand>" "<r><i>,<r>m"))
          (const_int 0)))]
  "ix86_match_ccmode (insn, CCGOCmode)"
  "cmp{<imodesuffix>}\t{%1, %0|%0, %1}"
  [(set_attr "type" "icmp")
   (set_attr "mode" "<MODE>")])