Hello World (IBM PC bootstrap)

From LiteratePrograms
Jump to: navigation, search
Other implementations: Alice ML | Amiga E | Assembly Intel x86 Linux | Assembly Intel x86 NetBSD | AWK | bash | BASIC | Batch files | C | C, Cairo | C, Xlib | Candle | C++ | C# | Dylan | E | Forth | FORTRAN | Fortress | Go | Groovy | IBM PC bootstrap | Inform 7 | JavaScript | LaTeX | Logo | Lua | MATLAB | OCaml/F Sharp | occam | Oz | Perl | PHP | PIR | PLI | Prolog | Python | Rexx | Ruby | Scheme | sh | SQL | Standard ML | Tcl | Tcl Tk | Visual Basic

This article describes a small IBM PC bootstrap to flat 32-bit protected mode — just enough to display the classic "Hello, World!".

Contents

[edit] theory

Ontogeny recapitulates phylogeny — Ernst Haeckel

IBM PCs have traditionally recapitulated their development, powering up in a mode very similar to that of an 1981-era machine, then enabling the "recently acquired" features during the bootstrap process.

[edit] practice

[edit] 32 bit protected mode application

As expected, there is not much to Hello World itself:

<<hello.c>>=
extern void cls(), at();
void rawmain()
{
        cls();
        at(32,12,"Hello, world!");
}

[edit] device driver

Not having stdio available, we must provide a some kind of device driver for output. The PC boots into a text display mode with a memory-mapped screen buffer at a fixed address, so we simply use C to place the required data onto the screen.

<<screen.c>>=
#define VIDEOMEM	(char *)0xb8000
#define SCRX            80
#define SCRY            25
#define ATTRIB          0x71    /* blue on grey */

void cls()
{
        char *p = VIDEOMEM;
        int n;
        for(n = 0; n < SCRX*SCRY; ++n) {
                *p++ = ' ';
                *p++ = ATTRIB;
        }
}

void at(int x, int y, char *m)
{
        char *p = VIDEOMEM + 2*(x+SCRX*y);
        while(*m)       {
		*p++ = *m++;
                *p++ = ATTRIB;
	}
}

[edit] runtime library

In order to call the functions above, we must first have a working stack. This assembly code initializes the stack and some of the segment descriptors so the C code will run in the proper environment. Traditionally, start is the assembly-language entry point to a C program, but here we will just begin with the first instruction of the binary application, and arrange the link so this code appears in the right place.

<<crt0.s>>=
	xor %eax, %eax		# boot sets CS,DS,ES,SS
	mov %eax, %fs
	mov %eax, %gs

	mov $0x20000, %esp	# set up a stack
	call _rawmain		# and enter the C code

The C program might return, but we don't have any continuation at this point. In the absence of better options, we wait for a keypress (with a small polling device driver), then reboot.

<<crt0.s>>=
wkbd:	mov $0x64, %edx		# if it returns, wait for any keypress
	inb %dx, %al
	andl $1, %eax
	jz wkbd

	mov $0x64, %edx		# then reboot the machine
	mov $0xfe, %eax
	outb %al, %dx

	cli
loop:	jmp loop		# (or at least hang)

Exercise: implement exit()

[edit] 16 bit real mode bootstrap

Now, if the PC starts out in 16-bit ca. 1981 real mode, how do we establish a large flat address space for the C application? The PC looks for a boot sector on the peripherals when it starts up, and while hard drives work slightly differently, CD-ROMS and USB keys (neither of which were available in 1981) can be viewed as floppy-compatible. The boot sector code must be only a few hundred bytes, but this suffices to load additional code and enable a more recent CPU configuration. We will use debug to assemble this (largely 8086 compatible) bootstrap code.

[edit] resources

First, a few pieces of data:

  • the application itself is, to the boot sector, just data to be loaded. We will place it in the two sectors following the boot sector.
  • each floppy carries some metadata in its boot sector.

Exercise: Here we copy the data from a disk image which has been formatted by FREEDOS — figure out a better way of providing, or better yet, calculating, this information

Exercise: Unfortunately this approach, while it appears to have a FAT filesystem, does not respect it, and can be easily corrupted — rearrange the disk image so that it is useful both as a FAT disk and as boot disk

<<boot.src>>=
f 100,700 0
a
;;;; first we bring in our C program (300-700)	;;;;

n a.bin
l 300

a
;;;; then the floppy parameter table (103-13F)	;;;;
; parameters taken from a FREEDOS formatted 1.44M  ;

e 103          46 52 44 4F 53
e 108 34 2E 31 00 02 01 01 00
e 110 02 E0 00 40 0B f0 09 00
e 118 12 00 02 00 00 00 00 00
e 120 00 00 00 00 00 00 29 12
e 128 15 5D 33 45 4D 50 54 59
e 130 44 49 53 4B 20 20 46 41
e 138 54 31 32 20 20 20 31 C0

[edit] disk input

While we are in 16-bit real mode, we can take advantage of having the BIOS to read the application off the drive. The BIOS loads the boot code at 0000:7C00-0000:7E00, so we load the application immediately following, at 0000:7E00-0000:8200.

<<boot.src>>=
a
;;;; now we assemble the boot sector		;;;;

a 100
jmp 140                          ; skip over the floppy parameters

a 140
; == floppy I/O ==
xor dx,dx
mov es,dx
mov cx,02
mov bx,7e00
mov ax,0202
int 13				; read the next (al=2) sectors...
mov dx,03f2
xor ax,ax
out dx,al			; ... then turn off the floppy
jmp 0:7c80

(the far jump is just to ensure that we are entering the next code with CS:IP of 0000:7C80, not 007C:0080 or some other combination of selectors)

At this point we are done with the BIOS and ready to switch to protected mode — we ask the Sherpas to turn back, and make a dash for the summit.

[edit] mode switch

Debug, while still available on XP, was originally an 8086 application. Luckily, the machine does not care if its instructions were assembled with mnemonics, so we enter some of the more unusual instructions as strings of data. (at least we are not toggling them in on front panel switches)

<<boot.src>>=

a 180
; == enter protected mode ==
cli				; turn off interrupts
				; load GDT (using CS override)
db 2e, 0f, 01, 16, 00, 7d
				; set pmode bit in cr0
db 0f, 20, c0
or al, 1
db 0f, 22, c0
				; load 32-bit data segment registers, and...
mov dx, 10
mov es, dx
mov ds, dx
mov ss, dx
				; ...far jump to load 32-bit code segment	
jmp 18:7e00

If all has gone well, we have loaded the application to 0000:7e000, which is identically 00007e000 in the flat mapping, and this last jump will take us into the runtime library startup code from crt0.s. If not, the machine will, at best, reboot — or, more likely, execute some random code — caveat lusor!

[edit] tables

We can't turn off segmentation in IA32, but we can effectively avoid it by providing a Global Descriptor Table setting each segment to an identity map over the entire address-space.

00: always null -- the space here is used for the descriptor needed by lgdt
08: (unused)
10: data segment (0-4G)
18: code segment (0-4G)

These descriptor tables are themselves backwards compatible with 16-bit descriptor tables, so the bitfields are split up in odd ways. Here we treat the entire table as an opaque resource, and again just enter the required data rather than worrying about how to construct it.

<<boot.src>>=

a 200
; // GDT //
db 1f, 00, 00, 7d, 00, 00, 00, 00
db 00, 00, 00, 00, 00, 00, 00, 00
db ff, ff, 00, 00, 00, 92, cf, 00
db ff, ff, 00, 00, 00, 9a, cf, 00

a 2fe
; // signature for boot record //
db 55, aa

[edit] output

Having built an in-memory image of the boot sector and the following 2 application sectors, we save it to produce boot.bin.

<<boot.src>>=

a 300
;;;; finally, we produce the binary (3 sectors)	;;;;

n boot.bin
r cx
600
w
q

Question: suppose the size of the application changes. What numbers above must be modified?

[edit] wrapping up

Finally, we set up compilation

  • without any standard libraries (they are meant for the host environment, not this one)
  • mapping code and data to where they will be loaded by the boot sector, at hex addresses 00007e00 and 00007f00 respectively
  • and arranging for crt0.obj to be located at the start of the text section
<<build.bat>>=
set OBJS=a.exe a.bin
set SRCS=crt0.s hello.c screen.c
set ARGS=-nostdlib -Wl,-Ttext -Wl,0x7e00 -Wl,-Tdata -Wl,0x7f00

and give a batch file to build boot.bin with the GNU toolchain and Microsoft's debug.


<<build.bat>>=
del *.bin
gcc %ARGS% %SRCS%
objcopy -O binary a.exe a.bin
if not exist a.bin goto err
debug < boot.src
del %OBJS%

Optional features:

<<build.bat>>=
if not exist tail.img goto end

copy /b boot.bin + tail.img 144.img
del boot.bin

if not exist bochsrc.bxrc goto end

bochsrc.bxrc
goto end

:ERR
@echo off
echo -
echo -
echo gcc and objcopy failed or missing from PATH
pause
:END
Download code
hijacker
hijacker
hijacker
hijacker