Article Index |
---|
Your First CPU - Chapter 3 - Your First Assembler |
The Bison Parser |
All Pages |
What good is a cpu without an assembler? In this chapter we use flex and bison (i.e. lex and yacc) to make an assembler for our cpu. The assembler generates a binary file that can be loaded by the verilog $readmemh statement, or in a synthesized FPGA we would use the binary file to initialize the instruction memory inside the FPGAs internal memory blocks.
Flex and Bison
Flex and bison are used to create many compilers for many languages and was also used to create the GCC compiler collection. Flex and Bison are the modern implementations of Lex and Yacc which was originally created for unix systems. Yacc is an acronym for "Yet Another Compiler Compiler", which whimsically states it's purpose.
Text lexer/scanner using Flex
We begin creating the lexer/scanner for our assembler by coding a flex input file. The term scanner and lexer can be used interchangeably. This file defines how the input assember program as text is broken down into tokens, identifiers and constants. The main section of this file lists a collection of regular expressions to match against the input text and a corresponding action to generate a token it's value to pass on to the parser. The action is coded as regular C/C++ but typically only a line or two of code is required for each action. This lexer input file is used by the flex program at compile time to generate a C file containing your specific text input scanner. In fact, we add an extra step in the makefile to automatically regenerate our scanner C file any time the lexer file is changed. Also, the scanner C file is considered an intermediate file and is not required to be included during distribution, issuing a 'make clean' also deletes this file.
Here are the regular expressions and actions that implement our first assembler. In the top section, before the %% line, we define some simple regular expressions (shortcuts) for common text sequences we expect to encounter. After which, we define the actual scanner tokens and the associated action to execute for each occurrence. Some of the actions simply return a constant integer defined elsewhere that specify the token that was encountered. Others, such as the definition of an identifier and string, also pass a value to the parser via the yylval variable.
Partial listing of the scanner input file (scanner.l):
17: delim [ \t]
18: whitesp {delim}+
19: digit [0-9]
20: alpha [a-zA-Z]
21: alphanum [a-zA-Z0-9]
22: number [-]?{digit}*[.]?{digit}+
23: integer [-]?{digit}+
24: hex "0x"[0-9a-fA-F]+
25: string \"[^\"]*\"
26: register [rR][-]?{digit}+
27: comment "#"[^\n]*
28: identifier {alphanum}[a-zA-Z0-9_]*
Above we simply declare some regular expression shortcuts for text we expect to encounter. Below are the actual scanner expressions and associated actions. These regular expressions are separated by a %% line in the source file.
32: {register} { sscanf(yytext+1, "%d", &yylval); return REG; }
33: {integer} { sscanf(yytext, "%d", &yylval); return INTEGER; }
34: {hex} { sscanf(yytext+2, "%x", &yylval); return INTEGER; }
35:
36: "\n" { return NEWLINE; }
37: "," { return COMMA; }
38: ":" { return COLON; }
39:
40: "NOP" { return NOP; }
41: "LRI" { return LRI; }
42: "ADD" { return ADD; }
43: "SUB" { return SUB; }
44: "OR" { return OR; }
45: "XOR" { return XOR; }
46: "HALT" { return HALT; }
47: "BRA" { return BRA; }
48: "BRANZ" { return BRANZ; }
49: "BRAL" { return BRAL; }
50: "BRALNZ" { return BRALNZ; }
51: "CALL" { return CALL; }
52:
53: ".imem" { return sIMEM; }
54: ".regfile" { return sREGFILE; }
55: ".base" { return sBASE; }
56: ".define" { return sDEFINE; }
57: ".register" { return sREGISTER; }
58: ".end" { return END; }
59:
60: {identifier} {
61: yylval = yf_getsymbol(yytext);
62: if(yylval<=0)
63: yylval = yf_addsymbol(yytext, ST_UNKNOWN, 0);
64: return IDENTIFIER;
65: }
66: {string} {
67: yytext[strlen(yytext)-1] = 0;
68: yylval = yf_addstring(&yytext[1]);
69: return STRING;
70: }
71:
72: {whitesp} { /* No action and no return */ }
73: {comment} { /* No action and no return */ }