Yogi rides the new 6 wheeled robotic base platform. Designed in Solidworks and milled on my CNC machine.

Thursday April 18 , 2024
Font Size
   

Your First CPU - Chapter 3 - Your First Assembler - The Bison Parser

Electronics

Article Index
Your First CPU - Chapter 3 - Your First Assembler
The Bison Parser
All Pages

The parser using Bison and a grammar definition file

Using a similar process as with the flex input file, we create a bison grammer file to implement our parser. Tokens stream from the lexer/scanner into the parser and our grammar determines which tokens may follow another, i.e. it defines the syntax of our language. The grammar of the grammar used to define the syntax of our language is very similar to BNF grammar form. We often see BNF form when we study the syntax of languages like C/C++, Java, SQL, and more.

For example, here is a sample of a BNF-like grammer from the mysql documentation:

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] tbl_name
{ LIKE old_tbl_name | (LIKE old_tbl_name) }
create_definition:
col_name column_definition
| [CONSTRAINT [symbol]] PRIMARY KEY [index_type] (index_col_name,...)
| {INDEX|KEY} [index_name] [index_type] (index_col_name,...)
| [CONSTRAINT [symbol]] UNIQUE [INDEX|KEY]
[index_name] [index_type] (index_col_name,...)
| {FULLTEXT|SPATIAL} [INDEX|KEY] [index_name] (index_col_name,...)
| [CONSTRAINT [symbol]] FOREIGN KEY
[index_name] (index_col_name,...) reference_definition
| CHECK (expr)

column_definition:
data_type [NOT NULL | NULL] [DEFAULT default_value]
[AUTO_INCREMENT] [UNIQUE [KEY] | [PRIMARY] KEY]
[COMMENT 'string'] [reference_definition]

Like the flex input file, inside the bison grammar input file actions can be specified for any input token sequence encountered. For our assembler, these actions will emit the binary machine code for our program in a format loadable by the verilog $readmemh statement. In the grammar actions the $n variables are used to access the token value (the contents of the scanner's yylval variable) for previously encountered tokens. These are the only non-C symbols found in the actions and are converted to C expressions in the intermediate C file generated by Bison. Thus, in the following sample grammar input line:

result : INTEGER sMUL INTEGER { $$ = $1 * $3; }

The variable $1 references the first INTEGER token value, and the $3 references the second INTEGER token value. (It's in the 3rd token location.) The $$ variable references the value of the result token, which based on the action, would be the result of the first integer times the second integer. This is an arbitrary example of a grammar that parses mathematical expressions.

Most variables and functions used in the following code listing are defined in another source file and emit the assembly output.

Partial listing of the grammar definition file for our assembler parser (parser.y):

 51:  input:              /* empty string */
52: | input line { yylineno++; }
53: ;
54: line: NEWLINE
55: | statement NEWLINE
56: | asm_expr NEWLINE
57: | label_decl NEWLINE
58: | register_decl NEWLINE
59: | definition NEWLINE
60: | END
61: ;
62:
63: /* assembler directives */
64: asm_expr: sIMEM INTEGER { alloc_imem( $2 ); }
65: | sIMEM INTEGER INTEGER { sys.imem_width = $3; alloc_imem( $2 ); }
66: | sREGFILE INTEGER INTEGER { sys.regfile = $2; sys.reg_addr_bits = $3; }
67: | sREGFILE INTEGER { sys.regfile = $2; }
68: | sBASE INTEGER { sys.base = $2; }
69: ;
70:
71: /* grammers for each cpu mnemonic */
72: statement: NOP { gen( xNOP, 0); }
73: | LRI format_rd_imm { gen( xLRI, $2); }
74: | ADD format_ra_rb_rd { gen( xADD, $2 ); }
75: | SUB format_ra_rb_rd { gen( xSUB, $2 ); }
76: | OR format_ra_rb_rd { gen( xOR, $2 ); }
77: | XOR format_ra_rb_rd { gen( xXOR, $2 ); }
78: | BRA format_0_rb_0 { gen( xBRA, $2 ); }
79: | BRANZ format_ra_rb_0 { gen( xBRANZ, $2 ); }
80: | BRAL format_0_label { gen( xBRAL, $2 ); }
81: | BRALNZ format_label_ra { gen( xBRALNZ, $2 ); }
82: | CALL format_label_rd { gen( xCALL, $2 ); }
83: | HALT { gen( xHALT, 0xfff ); }
84: ;
85:
86: /* grammers for each of the mnemonic formats */
87: format_rd_imm: reg COMMA INTEGER { $$ = ENCR( ($3 >> sys.reg_addr_bits) & 0xf, $3 & 0xf, $1 ); };
88: format_ra_rb_rd: reg COMMA reg COMMA reg { $$ = ENCR( $1, $3, $5 ); };
89: format_0_rb_0: reg { $$ = ENCR( 0, $1, 0 ); };
90: format_ra_rb_0: reg COMMA reg { $$ = ENCR($1, $2, 0); };
91: format_label_ra: label COMMA reg { $$ = ENCR($3, $1 & 0xf, ($1 >> sys.reg_addr_bits)&0xf ); };
92: format_label_rd: label COMMA reg { $$ = ENCR( ($1 >> sys.reg_addr_bits)&0xf, $1 & 0xf, $3 ); };
93: format_0_label: label { $$ = ENCR(0, $1 & 0xf, ($1 >> sys.reg_addr_bits)&0xf ); };
94: //format_0_rb_rd: reg COMMA reg { $$ = ENC( 0, $1, $3 ); };
95: //format_ra_imm: reg COMMA INTEGER { $$ = ENCR($1, $3 & 0xf, ($3 >> sys.reg_addr_bits)&0xf ); };
96: //format_imm_0: INTEGER { $$ = ENCR(0, $1, 0); };
97:
98: label_decl: IDENTIFIER COLON { yf_setsymbol( $1, ST_LABEL, sys.base); };
99: register_decl: sREGISTER REG IDENTIFIER { yf_setsymbol( $3, ST_REGISTER, $2); };
100: definition: sDEFINE IDENTIFIER INTEGER { yf_setsymbol( $2, ST_INT, $3); }
101: | sDEFINE IDENTIFIER STRING { yf_setsymbol( $2, ST_STRING, $3); }
102: ;
103:
104: /* a label is a reference to a memory address, a constant is also valid */
105: label: INTEGER { $$ = $1; }
106: | IDENTIFIER { yf_symbol s = yf_getsymbol($1);
if(s.type==ST_LABEL) $$ = s.lvalue; else yyerror("expected label"); }
107:
108: /* a reg is a reference to a register */
109: reg: REG { $$ = $1; }
110: | IDENTIFIER { yf_symbol s = yf_getsymbol($1);
if(s.type==ST_REGISTER) $$ = s.lvalue; else yyerror("expected register"); }

 

Download the complete source code for Chapter 3

Chapter 3 - Your First Assembler (tgz)

Chapter 3 - Your First Assembler (zip)



User Rating: / 100
PoorBest 

RSS Feeds

Visitor Poll

What sort of peripherals do you desire in a robotics main board? (you may vote more than once.)

Who's Online

We have 4 guests online