DIV2js

An ugly presentation about transpilation

## Disclaimer Yes, I know [DIV Go](http://www.divgo.net/), I know [the source code is on GitHub](https://github.com/DIVGAMES/DIV-Games-Studio) and I also know this is probably a useless project but, you know? **I don't care.**
## Index 1. How does DIV work? 2. The execution model 3. The memory model 4. The execution pipeline
## How does DIV work?
![DIV2 running on DOS-BOX](./imgs/div2-dosbox.png)
DIV programs have a point of entry...
![DIV2 main program](./imgs/div2-main-program.png)
...and a set of processes.
![DIV2 processes](./imgs/div2-processes.png)
DIV2 processes are similar to Linux processes in a single-thread.
They run for a while, until executing a `FRAME` sentence. ![Processes list](./imgs/processes-list.png)
![Process Execution](./imgs/div2-process-execution.png)
When all the processes have executed `FRAME`, the graphic engine draws a new frame.
And the execution continues **after the `FRAME`**... ![Resume process list](./imgs/resume-execution.png) ...until finding another one.
![Next frame](./imgs/div2-process-execution-next-frame.png)
Do you know what in JavaScript is very similar to this?
A [generator](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Statements/function*)! ```js function* process() { let counter = 0; while (true) { yield counter; // This is like FRAME. counter++; } } ```
![Process hierarchy](./imgs/div2-process-hierarchy.png) Processes have a hierarchy, a type, an ID...
...and memory.
DIV memory is **manually managed**.
![DIV2 Memory](./imgs/memory.png) It is a continous chunk of 4-byte aligned space.
And you know what in JavaScript is very similar to this?
A [TypedArray](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays)! ```js const memory = new Int32Array(5); // 5 32-bit intenger cells. const ANGLE = 2; memory[ANGLE] = 90000; // Variable assignation. memory[ANGLE + 1] = 90000; // Pointer aritmethic! ```
## The execution model
Let's consider generators.
With generators, a DIV process... ```div PROCESS asteroid() BEGIN graph = 1; WHILE (NOT out_of_region(ID, 0)) graph = ((graph + 1) % 10 + 1); y += 10; FRAME; END END ```
...would be translated into a JavaScript generator: ```js function* asteroid(mem) { const GRAPH = 100; const Y = 101; mem[GRAPH] = 1; while (!out_of_region(mem[ID], 0)) { mem[GRAPH] = ((mem[GRAPH] + 1) % 10 + 1); mem[Y] += 10; yield { type: 'frame', completion: 100 }; } } ```
Creating a new process would look like: ```js function newProcess(name, args) { const processType = processes[name]; const mem = new Int32Array(PROCESS_SIZE); initializeMemory(mem, args); const process = processType(mem); process.mem = mem; processList.push(process); } ```
And the scheduler would look like this: ```js function runProcessList() { let retValue; while (processList.length) { for (let i = 0, current; current = processList[i]; i++) { const result = current.next(retValue); // run until yield. retValue = handle(result, current.mem); }; drawFrame(); } } ```
`handle()` would manipulate the process list according to the result of the execution of the process.
`retValue` is sent to the process since some statements can return values (i.e. _calling a process returns its ID_).
* `FRAME` continues with the next process. * `<EOP>` removes the process from the list. * `anotherProcess()` adds a new process to the list. * `DEBUG` starts the debugger, then continues with the current process. * `CLONE` adds a new process, copying the state of the current process.
WAIT A SECOND!
`CLONE` does **WHAT!?**
```div PROGRAM test; BEGIN write_int(0, 0, 0, 0, ID); write(0, 0, 10, 0, "Only the parent."); CLONE x = 100; write_int(0, x, 20, 0, ID); write(0, x, 30, 0, "Only the child."); END write_int(0, x, 40, 0, ID); write(0, x, 50, 0, "Both."); LOOP FRAME; END // alt+x to quit END ```
![Clone execution](./imgs/clone-execution.png)
Here is the problem: copying the memory state is possible but **copying the execution state is not**.
So I tried [the standard way](https://esdiscuss.org/topic/proposal-generator-clone-and-generator-goto). ![Discussion about adding generator#clone and generator#goto](./imgs/generator-clone.png)
I knew the solution was a state machine...
So [I wrote my own](https://github.com/delapuente/div2js/blob/master/src/context.js#L139).
This is how translating `CLONE` looks like: ```js translators.CloneSentence = function (divClone, context) { var insideCloneLabel = context.newLabel(); var afterCloneLabel = context.newLabel(); context.clone(insideCloneLabel, afterCloneLabel); context.label(insideCloneLabel); translateBody(divClone, context); context.label(afterCloneLabel); }; ```
And this is how the `CLONE` example looks once transpiled: ```js function program_test(mem, exec, args) { while (true) { switch (exec.pc) { case 1: return __yieldCallFunction(2, 'write_int', [0, 0, 0, 0, mem[ID]]); case 2: exec.retv.dequeue(); return __yieldCallFunction(4, 'write', [0, 0, 10, 0, 'Only the parent.']); case 4: exec.retv.dequeue(); return __yieldClone(6, 11); case 6: mem[x] = 100; return __yieldCallFunction(8, 'write_int', [0, mem[x], 20, 0, mem[ID] ]); case 8: exec.retv.dequeue(); return __yieldCallFunction(10, 'write', [0, mem[x], 30, 0, 'Only the child.']); case 10: exec.retv.dequeue(); case 11: return __yieldCallFunction(12, 'write_int', [0, mem[x], 40, 0, mem[ID]]); case 12: exec.retv.dequeue(); return __yieldCallFunction(14, 'write', [0, mem[x], 50, 0, 'Both.']); case 14: exec.retv.dequeue(); case 15: exec.pc = 15; break; case 16: return __yieldEnd; } } } ```
### DEBUGGING
You migh be wondering **why so many yields**?
Is it necessary to yield on **every** call?

It is.

(if you want to debug it)

Because you need to return to the scheduler to have the opportunity of launching the debugger.
## The memory model
Each process has a _reserved_ memory region.
![Local data](./imgs/div2-local-data.png)

Implementation goal:

Preserve pointer semantics across regions.

### Process size
Kind reminder:
Memory is a `Int32Array` ``` const mem = new Int32Array(PROCESS_SIZE); ```
So, first question, what is `PROCESS_SIZE`?
Hypothesis: _with no privates, size of the process is size of locals_.
Gut feeling: _`ID` is the address of the process_.
According to the debugger, there are **44 locals**.
```div PROGRAM calc_process_size; BEGIN p(); p(); DEBUG; END PROCESS p(); BEGIN LOOP FRAME; END END ```
![Substracting one process ID fron another, you have the process size](./imgs/div2-process-size.png)

1751 - 1707 = 44; 1795 - 1751 = 44

Fair enough.

Hypothesis: _with privates, size of the process is size of locals + privates_.
```div PROGRAM calc_process_size; BEGIN p(); q(); DEBUG; END PROCESS p(); PRIVATE a; BEGIN LOOP FRAME; END END PROCESS q(); PRIVATE a; b; BEGIN LOOP FRAME; END END ```
![Size is constant](./imgs/div2-constant-size.png)

1769 - 1723 = 46; 1815 - 1769 = 46

WTF?

So, DIV has a fixed process size, which is the sum of local segment and the biggest private region.
![Process memory region](./imgs/memory-segment.png) Regions include enough space for _locals_ and _private_ variables.
In addition to a fixed number of process, the design leads to: * Fixed program footprint. * Very fast process allocation.
### Memory map
Variable names will be array indices ``` mem[X] = 10; // x = 10 mem[Y] = 10; // y = 10 ```
So, second question is, where is the data inside these regions?
What is the first variable? ```div PROGRAM memory_map; PRIVATE pid; BEGIN pid = p(); (*pid) = 1234567890; DEBUG; END PROCESS p(); BEGIN LOOP FRAME; END END ```
![First local is process id](./imgs/div2-first-local.png)
Where are the other? ```div PROGRAM memory_offsets; PRIVATE target; result; BEGIN target = &reserved.id_scan; result = target - ID; write_int(0, 0, 0, 0, &result); LOOP FRAME; END END ```
![id_scan offset is 1](./imgs/div2-memory-locations.png)
Double check privates come after locals. ```div PROGRAM segment_order; PRIVATE a; BEGIN *(ID + 44) = 1234567890; DEBUG; END ```
![Privates goes after locals](./imgs/div2-segment-order.png)
### Pointer arithmetic
Pointer airthmetic goes in between the brackets ```js mem[X] = mem[MOUSE_STRUCT + MOUSE_X]; // x = mouse.x ```
The only type in DIV (first version) was the 4 byte int. So, no need for types at all since this was also the width of the pointer.
But DIV2 adds `BYTE`, `WORD` and `INT` to explicitly refer to 1 byte, 2 byte and 4 byte values respectively.
And it also adds `POINTER` to those types and the `STRING` view.
The alignment is still 4 bytes but DIV2 ensures arrays are continuous. So...
|Value | Space required | Space used | | ------------- | |WORD[1] a; | 4 bytes | 4 bytes| |WORD b, c; | 4 bytes | 8 bytes|
So, how to access `a[1]`?
```div PROGRAM pointer_arithmetic; PRIVATE WORD POINTER pa, pb, pc; WORD[1] a; WORD b, c; BEGIN pa = &a; pb = &b; pc = &c; DEBUG; END ```
![Contiguous memory](./imgs/div2-contiguous-memory.png)
```div PROGRAM pointer_arithmetic; PRIVATE WORD POINTER pa, pb, pc; WORD[1] a; WORD b, c; BEGIN pa = &a[1]; pb = &b; pc = &c; DEBUG; END ```
![No offset](./imgs/div2-no-offset.png)
```div PROGRAM pointer_arithmetic; PRIVATE WORD POINTER pa, pb, pc; WORD[1] a; WORD b, c; BEGIN pa = &a[1]; pb = &b; pc = &c; *pa = 123; *(pa + 1) = 456; DEBUG; END ```
![Double pointer arithmetic](./imgs/div2-double-arithmetic.png)
```div PROGRAM pointer_arithmetic; PRIVATE WORD POINTER pa, pb, pc; WORD[1] a; WORD b, c; BEGIN pa = &a[1]; pb = &b; pc = &c; *pa = 123; *pa = 456 << 16 or *pa; DEBUG; END ```
![Pointer arithmetic for array values](./imgs/div2-double-arithmetic-fixed.png)
So, there is a **double pointer arithmetic**.
A DIV legacy one for **4 byte aligned values**. ```js mem[A] = 123; mem[A + 1] = 456; ```
Another for **array values**. ```js mem[A] = 456 << 16 | mem[A]; ```
### Globals and text segments
Is it possible to achieve total pointer compatibility?
Yes, if processes receive **all the memory**.
Fortunately, source is now on GitHub so [figuring out global memory](https://github.com/DIVGAMES/DIV-Games-Studio/blob/0c006cca548f9d6dc66d174d4f05d167148c7e78/dll/div.h) map was much easier.
  • So, where the globals are?
  • And the text of the program?
  • Are there gaps in between?
## The Execution Pipeline
The process to execute a DIV program traverses several stages.
1. Transpilation 2. Load 3. Execution
### Transpilation
Is the process for which a DIV program is translated into a JavaScript.
```div PROGRAM test; BEGIN text_z = 1; DEBUG; a(); DEBUG; text_z = 3; DEBUG; END PROCESS a(); BEGIN text_z = 2; END ```
During **parsing**, source code is translated into a **DIV Abstract Syntax Tree**.
```js { "type": "Unit", "program": { "type": "Program", "name": { "type": "Identifier", "name": "test" }, "consts": null, "globals": null, "locals": null, "privates": null, "body": { "type": "ProcessBody", "sentences": [ { "type": "ExpressionSentence", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "Identifier", "name": "text_z" }, "right": { "type": "Literal", "value": 1, "raw": "1" } } }, { "type": "DebugSentence" }, { "type": "ExpressionSentence", "expression": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "a" }, "arguments": [] } }, { "type": "DebugSentence" }, { "type": "ExpressionSentence", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "Identifier", "name": "text_z" }, "right": { "type": "Literal", "value": 3, "raw": "3" } } }, { "type": "DebugSentence" } ] } }, "processes": [ { "type": "Process", "name": { "type": "Identifier", "name": "a" }, "privates": null, "body": { "type": "ProcessBody", "sentences": [ { "type": "ExpressionSentence", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "Identifier", "name": "text_z" }, "right": { "type": "Literal", "value": 2, "raw": "2" } } } ] } } ] } ``` [JISON](http://zaa.ch/jison/) is used to write the grammar and parse the source code.
> JISON: Your friendly JavaScript parser generator! [_friendly_](https://github.com/delapuente/div2js/blob/master/src/grammar.y)
The **semantic checker** runs now (which includes the type checker) extracting the enough context to drive the translation.
```js { "_processes": { "a": true }, "_auxNames": {}, "_currentProcessPrivates": {} } ```
With the DIV AST + context, the translator has enough information to create a **JavaScript AST**.
```js { "type": "Program", "body": [ { "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "G_BASE" }, "init": { "type": "Literal", "value": 0, "raw": "0" } }, { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "G_TEXT_Z" }, "init": { "type": "Literal", "value": 0, "raw": "0" } } ], "kind": "var" }, { "type": "FunctionDeclaration", "id": { "type": "Identifier", "name": "program_test" }, "params": [ { "type": "Identifier", "name": "mem" }, { "type": "Identifier", "name": "exec" }, { "type": "Identifier", "name": "args" } ], "defaults": [], "body": { "type": "BlockStatement", "body": [ { "type": "WhileStatement", "test": { "type": "Literal", "value": true, "raw": "true" }, "body": { "type": "BlockStatement", "body": [ { "type": "SwitchStatement", "discriminant": { "type": "MemberExpression", "computed": false, "object": { "type": "Identifier", "name": "exec" }, "property": { "type": "Identifier", "name": "pc" } }, "cases": [ { "type": "SwitchCase", "test": { "type": "Literal", "value": 1, "raw": "1" }, "consequent": [ { "type": "ExpressionStatement", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "MemberExpression", "computed": true, "object": { "type": "Identifier", "name": "mem" }, "property": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Identifier", "name": "G_BASE" }, "right": { "type": "Identifier", "name": "G_TEXT_Z" } } }, "right": { "type": "Literal", "value": 1, "raw": "1" } } }, { "type": "ReturnStatement", "argument": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "__yieldDebug" }, "arguments": [ { "type": "Literal", "value": 3, "raw": "3" } ] } } ] }, { "type": "SwitchCase", "test": { "type": "Literal", "value": 3, "raw": "3" }, "consequent": [ { "type": "ReturnStatement", "argument": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "__yieldNewProcess" }, "arguments": [ { "type": "Literal", "value": 4, "raw": "4" }, { "type": "Literal", "value": "a", "raw": "\"a\"" }, { "type": "ArrayExpression", "elements": [] } ] } } ] }, { "type": "SwitchCase", "test": { "type": "Literal", "value": 4, "raw": "4" }, "consequent": [ { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "MemberExpression", "computed": false, "object": { "type": "MemberExpression", "computed": false, "object": { "type": "Identifier", "name": "exec" }, "property": { "type": "Identifier", "name": "retv" } }, "property": { "type": "Identifier", "name": "dequeue" } }, "arguments": [] } }, { "type": "ReturnStatement", "argument": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "__yieldDebug" }, "arguments": [ { "type": "Literal", "value": 6, "raw": "6" } ] } } ] }, { "type": "SwitchCase", "test": { "type": "Literal", "value": 6, "raw": "6" }, "consequent": [ { "type": "ExpressionStatement", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "MemberExpression", "computed": true, "object": { "type": "Identifier", "name": "mem" }, "property": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Identifier", "name": "G_BASE" }, "right": { "type": "Identifier", "name": "G_TEXT_Z" } } }, "right": { "type": "Literal", "value": 3, "raw": "3" } } }, { "type": "ReturnStatement", "argument": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "__yieldDebug" }, "arguments": [ { "type": "Literal", "value": 8, "raw": "8" } ] } } ] }, { "type": "SwitchCase", "test": { "type": "Literal", "value": 8, "raw": "8" }, "consequent": [ { "type": "ReturnStatement", "argument": { "type": "Identifier", "name": "__yieldEnd" } } ] } ] } ] } } ] }, "generator": false, "expression": false }, { "type": "FunctionDeclaration", "id": { "type": "Identifier", "name": "process_a" }, "params": [ { "type": "Identifier", "name": "mem" }, { "type": "Identifier", "name": "exec" }, { "type": "Identifier", "name": "args" } ], "defaults": [], "body": { "type": "BlockStatement", "body": [ { "type": "WhileStatement", "test": { "type": "Literal", "value": true, "raw": "true" }, "body": { "type": "BlockStatement", "body": [ { "type": "SwitchStatement", "discriminant": { "type": "MemberExpression", "computed": false, "object": { "type": "Identifier", "name": "exec" }, "property": { "type": "Identifier", "name": "pc" } }, "cases": [ { "type": "SwitchCase", "test": { "type": "Literal", "value": 1, "raw": "1" }, "consequent": [ { "type": "ExpressionStatement", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "MemberExpression", "computed": true, "object": { "type": "Identifier", "name": "mem" }, "property": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Identifier", "name": "G_BASE" }, "right": { "type": "Identifier", "name": "G_TEXT_Z" } } }, "right": { "type": "Literal", "value": 2, "raw": "2" } } }, { "type": "ReturnStatement", "argument": { "type": "Identifier", "name": "__yieldEnd" } } ] } ] } ] } } ] }, "generator": false, "expression": false } ] } ``` The output AST follows the [ESTree](https://github.com/estree/estree) specification which is based in the [Mozilla Parser API](https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_API).
Which is the AST for this: ```js var G_BASE = 0, G_TEXT_Z = 0; function program_test(mem, exec, args) { while (true) { switch (exec.pc) { case 1: mem[G_BASE + G_TEXT_Z] = 1; return __yieldDebug(3); case 3: return __yieldNewProcess(4, 'a', []); case 4: exec.retv.dequeue(); return __yieldDebug(6); case 6: mem[G_BASE + G_TEXT_Z] = 3; return __yieldDebug(8); case 8: return __yieldEnd; } } } function process_a(mem, exec, args) { while (true) { switch (exec.pc) { case 1: mem[G_BASE + G_TEXT_Z] = 2; return __yieldEnd; } } } ```
This AST is then combined with a fixed one, which contains a set of **bridge functions** to communicate with the runtime.
```js (function (rt) { 'use strict'; /* Here comes the memory map */ function __yieldDebug(npc) { return new rt.Baton('debug', { npc: npc }); } function __yieldNewProcess(npc, processName, args) { return new rt.Baton('newprocess', { npc: npc, processName: processName, args: args }); } var __yieldEnd = new rt.Baton('end'); return { /* Here come the process and memory maps */ }; }); ``` [Esprima](http://esprima.org/) is used to parse the code above.
Now the —3rd party— **generator** ([escodegen](https://github.com/estools/escodegen)) can create the proper JavaScript code.
```js (function (rt) { 'use strict'; var G_BASE = 0, G_TEXT_Z = 0; function __yieldDebug(npc) { return new rt.Baton('debug', { npc: npc }); } function __yieldNewProcess(npc, processName, args) { return new rt.Baton('newprocess', { npc: npc, processName: processName, args: args }); } var __yieldEnd = new rt.Baton('end'); return { pmap: { program: function program_test(mem, exec, args) { while (true) { switch (exec.pc) { case 1: mem[G_BASE + G_TEXT_Z] = 1; return __yieldDebug(3); case 3: return __yieldNewProcess(4, 'a', []); case 4: exec.retv.dequeue(); return __yieldDebug(6); case 6: mem[G_BASE + G_TEXT_Z] = 3; return __yieldDebug(8); case 8: return __yieldEnd; } } }, process_a: function process_a(mem, exec, args) { while (true) { switch (exec.pc) { case 1: mem[G_BASE + G_TEXT_Z] = 2; return __yieldEnd; } } } }, mmap: { G: { G_BASE: 0, G_TEXT_Z: 0 }, L: {}, P: {} } }; }); ```
### Load
Loading the program means linking the object artefact with the runtime to create a **executable program**.
First the unit is evaluated with `eval()` passing the runtime functions needed to communicate with the scheduler. ```js const unit = eval(objText)(runtime); const processMap = unit.pmap; const memoryMap = unit.mmap; ```
Now the process and memory maps are used to create a new program. ```js const program = new runtime.Runtime(processMap, memoryMap) ```
### Execution
Before executing the program, we can listen for some events. ```js // Both events don't finish until resolving a promise. program.ondebug = () => openDebugger(); program.onfinished = () => showFinishMessage(); ```
But our real interest is to run the program. ```js program.run(); ```
me

Salvador de la Puente González

@salvadelapuente

http://github.com/delapuente

https://delapuente.github.io/presentations/

## Questions?