DIV2js

An ugly presentation about transpilation

## Disclaimer Yes, I know [DIV Go](http://www.divgo.net/), I know [the source code is on GitHub](https://github.com/DIVGAMES/DIV-Games-Studio) and I also know this is probably a useless project but, you know? **I don't care.**

## Index 1. How does DIV work? 2. The execution model 3. The memory model 4. The execution pipeline

## How does DIV work?

![DIV2 running on DOS-BOX](./imgs/div2-dosbox.png)

DIV programs have a point of entry...

![DIV2 main program](./imgs/div2-main-program.png)

...and a set of processes.

![DIV2 processes](./imgs/div2-processes.png)

DIV2 processes are similar to Linux processes in a single-thread.

They run for a while, until executing a `FRAME` sentence. ![Processes list](./imgs/processes-list.png)

![Process Execution](./imgs/div2-process-execution.png)

When all the processes have executed `FRAME`, the graphic engine draws a new frame.

And the execution continues **after the `FRAME`**... ![Resume process list](./imgs/resume-execution.png) ...until finding another one.

![Next frame](./imgs/div2-process-execution-next-frame.png)

Do you know what in JavaScript is very similar to this?

A [generator](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Statements/function*)! ```js function* process() { let counter = 0; while (true) { yield counter; // This is like FRAME. counter++; } } ```

![Process hierarchy](./imgs/div2-process-hierarchy.png) Processes have a hierarchy, a type, an ID...

...and memory.

DIV memory is **manually managed**.

![DIV2 Memory](./imgs/memory.png) It is a continous chunk of 4-byte aligned space.

And you know what in JavaScript is very similar to this?

A [TypedArray](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Typed_arrays)! ```js const memory = new Int32Array(5); // 5 32-bit intenger cells. const ANGLE = 2; memory[ANGLE] = 90000; // Variable assignation. memory[ANGLE + 1] = 90000; // Pointer aritmethic! ```

## The execution model

Let's consider generators.

With generators, a DIV process... ```div PROCESS asteroid() BEGIN graph = 1; WHILE (NOT out_of_region(ID, 0)) graph = ((graph + 1) % 10 + 1); y += 10; FRAME; END END ```

...would be translated into a JavaScript generator: ```js function* asteroid(mem) { const GRAPH = 100; const Y = 101; mem[GRAPH] = 1; while (!out_of_region(mem[ID], 0)) { mem[GRAPH] = ((mem[GRAPH] + 1) % 10 + 1); mem[Y] += 10; yield { type: 'frame', completion: 100 }; } } ```

Creating a new process would look like: ```js function newProcess(name, args) { const processType = processes[name]; const mem = new Int32Array(PROCESS_SIZE); initializeMemory(mem, args); const process = processType(mem); process.mem = mem; processList.push(process); } ```

And the scheduler would look like this: ```js function runProcessList() { let retValue; while (processList.length) { for (let i = 0, current; current = processList[i]; i++) { const result = current.next(retValue); // run until yield. retValue = handle(result, current.mem); }; drawFrame(); } } ```

`handle()` would manipulate the process list according to the result of the execution of the process.

`retValue` is sent to the process since some statements can return values (i.e. _calling a process returns its ID_).

* `FRAME` continues with the next process. * `<EOP>` removes the process from the list. * `anotherProcess()` adds a new process to the list. * `DEBUG` starts the debugger, then continues with the current process. * `CLONE` adds a new process, copying the state of the current process.

WAIT A SECOND!

`CLONE` does **WHAT!?**

```div PROGRAM test; BEGIN write_int(0, 0, 0, 0, ID); write(0, 0, 10, 0, "Only the parent."); CLONE x = 100; write_int(0, x, 20, 0, ID); write(0, x, 30, 0, "Only the child."); END write_int(0, x, 40, 0, ID); write(0, x, 50, 0, "Both."); LOOP FRAME; END // alt+x to quit END ```

![Clone execution](./imgs/clone-execution.png)

Here is the problem: copying the memory state is possible but **copying the execution state is not**.

So I tried [the standard way](https://esdiscuss.org/topic/proposal-generator-clone-and-generator-goto). ![Discussion about adding generator#clone and generator#goto](./imgs/generator-clone.png)

I knew the solution was a state machine...

So [I wrote my own](https://github.com/delapuente/div2js/blob/master/src/context.js#L139).

This is how translating `CLONE` looks like: ```js translators.CloneSentence = function (divClone, context) { var insideCloneLabel = context.newLabel(); var afterCloneLabel = context.newLabel(); context.clone(insideCloneLabel, afterCloneLabel); context.label(insideCloneLabel); translateBody(divClone, context); context.label(afterCloneLabel); }; ```

And this is how the `CLONE` example looks once transpiled: ```js function program_test(mem, exec, args) { while (true) { switch (exec.pc) { case 1: return __yieldCallFunction(2, 'write_int', [0, 0, 0, 0, mem[ID]]); case 2: exec.retv.dequeue(); return __yieldCallFunction(4, 'write', [0, 0, 10, 0, 'Only the parent.']); case 4: exec.retv.dequeue(); return __yieldClone(6, 11); case 6: mem[x] = 100; return __yieldCallFunction(8, 'write_int', [0, mem[x], 20, 0, mem[ID] ]); case 8: exec.retv.dequeue(); return __yieldCallFunction(10, 'write', [0, mem[x], 30, 0, 'Only the child.']); case 10: exec.retv.dequeue(); case 11: return __yieldCallFunction(12, 'write_int', [0, mem[x], 40, 0, mem[ID]]); case 12: exec.retv.dequeue(); return __yieldCallFunction(14, 'write', [0, mem[x], 50, 0, 'Both.']); case 14: exec.retv.dequeue(); case 15: exec.pc = 15; break; case 16: return __yieldEnd; } } } ```

### DEBUGGING

You migh be wondering **why so many yields**?

Is it necessary to yield on **every** call?

It is.

(if you want to debug it)

Because you need to return to the scheduler to have the opportunity of launching the debugger.

## The memory model

Each process has a _reserved_ memory region.

![Local data](./imgs/div2-local-data.png)

Implementation goal:

Preserve pointer semantics across regions.

### Process size

Kind reminder:

Memory is a `Int32Array` ``` const mem = new Int32Array(PROCESS_SIZE); ```

So, first question, what is `PROCESS_SIZE`?

Hypothesis: _with no privates, size of the process is size of locals_.

Gut feeling: _`ID` is the address of the process_.

According to the debugger, there are **44 locals**.

```div PROGRAM calc_process_size; BEGIN p(); p(); DEBUG; END PROCESS p(); BEGIN LOOP FRAME; END END ```

![Substracting one process ID fron another, you have the process size](./imgs/div2-process-size.png)

1751 - 1707 = 44; 1795 - 1751 = 44

Fair enough.

Hypothesis: _with privates, size of the process is size of locals + privates_.

```div PROGRAM calc_process_size; BEGIN p(); q(); DEBUG; END PROCESS p(); PRIVATE a; BEGIN LOOP FRAME; END END PROCESS q(); PRIVATE a; b; BEGIN LOOP FRAME; END END ```

![Size is constant](./imgs/div2-constant-size.png)

1769 - 1723 = 46; 1815 - 1769 = 46

WTF?

So, DIV has a fixed process size, which is the sum of local segment and the biggest private region.

![Process memory region](./imgs/memory-segment.png) Regions include enough space for _locals_ and _private_ variables.

In addition to a fixed number of process, the design leads to: * Fixed program footprint. * Very fast process allocation.

### Memory map

Variable names will be array indices ``` mem[X] = 10; // x = 10 mem[Y] = 10; // y = 10 ```

So, second question is, where is the data inside these regions?

What is the first variable? ```div PROGRAM memory_map; PRIVATE pid; BEGIN pid = p(); (*pid) = 1234567890; DEBUG; END PROCESS p(); BEGIN LOOP FRAME; END END ```

![First local is process id](./imgs/div2-first-local.png)

Where are the other? ```div PROGRAM memory_offsets; PRIVATE target; result; BEGIN target = &reserved.id_scan; result = target - ID; write_int(0, 0, 0, 0, &result); LOOP FRAME; END END ```

![id_scan offset is 1](./imgs/div2-memory-locations.png)

Double check privates come after locals. ```div PROGRAM segment_order; PRIVATE a; BEGIN *(ID + 44) = 1234567890; DEBUG; END ```

![Privates goes after locals](./imgs/div2-segment-order.png)

### Pointer arithmetic

Pointer airthmetic goes in between the brackets ```js mem[X] = mem[MOUSE_STRUCT + MOUSE_X]; // x = mouse.x ```

The only type in DIV (first version) was the 4 byte int. So, no need for types at all since this was also the width of the pointer.

But DIV2 adds `BYTE`, `WORD` and `INT` to explicitly refer to 1 byte, 2 byte and 4 byte values respectively.

And it also adds `POINTER` to those types and the `STRING` view.

The alignment is still 4 bytes but DIV2 ensures arrays are continuous. So...

|Value | Space required | Space used | | ------------- | |WORD[1] a; | 4 bytes | 4 bytes| |WORD b, c; | 4 bytes | 8 bytes|

So, how to access `a[1]`?

```div PROGRAM pointer_arithmetic; PRIVATE WORD POINTER pa, pb, pc; WORD[1] a; WORD b, c; BEGIN pa = &a; pb = &b; pc = &c; DEBUG; END ```

![Contiguous memory](./imgs/div2-contiguous-memory.png)

```div PROGRAM pointer_arithmetic; PRIVATE WORD POINTER pa, pb, pc; WORD[1] a; WORD b, c; BEGIN pa = &a[1]; pb = &b; pc = &c; DEBUG; END ```

![No offset](./imgs/div2-no-offset.png)

```div PROGRAM pointer_arithmetic; PRIVATE WORD POINTER pa, pb, pc; WORD[1] a; WORD b, c; BEGIN pa = &a[1]; pb = &b; pc = &c; *pa = 123; *(pa + 1) = 456; DEBUG; END ```

![Double pointer arithmetic](./imgs/div2-double-arithmetic.png)

```div PROGRAM pointer_arithmetic; PRIVATE WORD POINTER pa, pb, pc; WORD[1] a; WORD b, c; BEGIN pa = &a[1]; pb = &b; pc = &c; *pa = 123; *pa = 456 << 16 or *pa; DEBUG; END ```

![Pointer arithmetic for array values](./imgs/div2-double-arithmetic-fixed.png)

So, there is a **double pointer arithmetic**.

A DIV legacy one for **4 byte aligned values**. ```js mem[A] = 123; mem[A + 1] = 456; ```

Another for **array values**. ```js mem[A] = 456 << 16 | mem[A]; ```

### Globals and text segments

Is it possible to achieve total pointer compatibility?

Yes, if processes receive **all the memory**.

Fortunately, source is now on GitHub so [figuring out global memory](https://github.com/DIVGAMES/DIV-Games-Studio/blob/0c006cca548f9d6dc66d174d4f05d167148c7e78/dll/div.h) map was much easier.

So, where the globals are?
And the text of the program?
Are there gaps in between?

## The Execution Pipeline

The process to execute a DIV program traverses several stages.

1. Transpilation 2. Load 3. Execution

### Transpilation

Is the process for which a DIV program is translated into a JavaScript.

```div PROGRAM test; BEGIN text_z = 1; DEBUG; a(); DEBUG; text_z = 3; DEBUG; END PROCESS a(); BEGIN text_z = 2; END ```

During **parsing**, source code is translated into a **DIV Abstract Syntax Tree**.

```js { "type": "Unit", "program": { "type": "Program", "name": { "type": "Identifier", "name": "test" }, "consts": null, "globals": null, "locals": null, "privates": null, "body": { "type": "ProcessBody", "sentences": [ { "type": "ExpressionSentence", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "Identifier", "name": "text_z" }, "right": { "type": "Literal", "value": 1, "raw": "1" } } }, { "type": "DebugSentence" }, { "type": "ExpressionSentence", "expression": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "a" }, "arguments": [] } }, { "type": "DebugSentence" }, { "type": "ExpressionSentence", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "Identifier", "name": "text_z" }, "right": { "type": "Literal", "value": 3, "raw": "3" } } }, { "type": "DebugSentence" } ] } }, "processes": [ { "type": "Process", "name": { "type": "Identifier", "name": "a" }, "privates": null, "body": { "type": "ProcessBody", "sentences": [ { "type": "ExpressionSentence", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "Identifier", "name": "text_z" }, "right": { "type": "Literal", "value": 2, "raw": "2" } } } ] } } ] } ``` [JISON](http://zaa.ch/jison/) is used to write the grammar and parse the source code.

> JISON: Your friendly JavaScript parser generator! [_friendly_](https://github.com/delapuente/div2js/blob/master/src/grammar.y)

The **semantic checker** runs now (which includes the type checker) extracting the enough context to drive the translation.

```js { "_processes": { "a": true }, "_auxNames": {}, "_currentProcessPrivates": {} } ```

With the DIV AST + context, the translator has enough information to create a **JavaScript AST**.

```js { "type": "Program", "body": [ { "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "G_BASE" }, "init": { "type": "Literal", "value": 0, "raw": "0" } }, { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "G_TEXT_Z" }, "init": { "type": "Literal", "value": 0, "raw": "0" } } ], "kind": "var" }, { "type": "FunctionDeclaration", "id": { "type": "Identifier", "name": "program_test" }, "params": [ { "type": "Identifier", "name": "mem" }, { "type": "Identifier", "name": "exec" }, { "type": "Identifier", "name": "args" } ], "defaults": [], "body": { "type": "BlockStatement", "body": [ { "type": "WhileStatement", "test": { "type": "Literal", "value": true, "raw": "true" }, "body": { "type": "BlockStatement", "body": [ { "type": "SwitchStatement", "discriminant": { "type": "MemberExpression", "computed": false, "object": { "type": "Identifier", "name": "exec" }, "property": { "type": "Identifier", "name": "pc" } }, "cases": [ { "type": "SwitchCase", "test": { "type": "Literal", "value": 1, "raw": "1" }, "consequent": [ { "type": "ExpressionStatement", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "MemberExpression", "computed": true, "object": { "type": "Identifier", "name": "mem" }, "property": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Identifier", "name": "G_BASE" }, "right": { "type": "Identifier", "name": "G_TEXT_Z" } } }, "right": { "type": "Literal", "value": 1, "raw": "1" } } }, { "type": "ReturnStatement", "argument": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "__yieldDebug" }, "arguments": [ { "type": "Literal", "value": 3, "raw": "3" } ] } } ] }, { "type": "SwitchCase", "test": { "type": "Literal", "value": 3, "raw": "3" }, "consequent": [ { "type": "ReturnStatement", "argument": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "__yieldNewProcess" }, "arguments": [ { "type": "Literal", "value": 4, "raw": "4" }, { "type": "Literal", "value": "a", "raw": "\"a\"" }, { "type": "ArrayExpression", "elements": [] } ] } } ] }, { "type": "SwitchCase", "test": { "type": "Literal", "value": 4, "raw": "4" }, "consequent": [ { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "MemberExpression", "computed": false, "object": { "type": "MemberExpression", "computed": false, "object": { "type": "Identifier", "name": "exec" }, "property": { "type": "Identifier", "name": "retv" } }, "property": { "type": "Identifier", "name": "dequeue" } }, "arguments": [] } }, { "type": "ReturnStatement", "argument": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "__yieldDebug" }, "arguments": [ { "type": "Literal", "value": 6, "raw": "6" } ] } } ] }, { "type": "SwitchCase", "test": { "type": "Literal", "value": 6, "raw": "6" }, "consequent": [ { "type": "ExpressionStatement", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "MemberExpression", "computed": true, "object": { "type": "Identifier", "name": "mem" }, "property": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Identifier", "name": "G_BASE" }, "right": { "type": "Identifier", "name": "G_TEXT_Z" } } }, "right": { "type": "Literal", "value": 3, "raw": "3" } } }, { "type": "ReturnStatement", "argument": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "__yieldDebug" }, "arguments": [ { "type": "Literal", "value": 8, "raw": "8" } ] } } ] }, { "type": "SwitchCase", "test": { "type": "Literal", "value": 8, "raw": "8" }, "consequent": [ { "type": "ReturnStatement", "argument": { "type": "Identifier", "name": "__yieldEnd" } } ] } ] } ] } } ] }, "generator": false, "expression": false }, { "type": "FunctionDeclaration", "id": { "type": "Identifier", "name": "process_a" }, "params": [ { "type": "Identifier", "name": "mem" }, { "type": "Identifier", "name": "exec" }, { "type": "Identifier", "name": "args" } ], "defaults": [], "body": { "type": "BlockStatement", "body": [ { "type": "WhileStatement", "test": { "type": "Literal", "value": true, "raw": "true" }, "body": { "type": "BlockStatement", "body": [ { "type": "SwitchStatement", "discriminant": { "type": "MemberExpression", "computed": false, "object": { "type": "Identifier", "name": "exec" }, "property": { "type": "Identifier", "name": "pc" } }, "cases": [ { "type": "SwitchCase", "test": { "type": "Literal", "value": 1, "raw": "1" }, "consequent": [ { "type": "ExpressionStatement", "expression": { "type": "AssignmentExpression", "operator": "=", "left": { "type": "MemberExpression", "computed": true, "object": { "type": "Identifier", "name": "mem" }, "property": { "type": "BinaryExpression", "operator": "+", "left": { "type": "Identifier", "name": "G_BASE" }, "right": { "type": "Identifier", "name": "G_TEXT_Z" } } }, "right": { "type": "Literal", "value": 2, "raw": "2" } } }, { "type": "ReturnStatement", "argument": { "type": "Identifier", "name": "__yieldEnd" } } ] } ] } ] } } ] }, "generator": false, "expression": false } ] } ``` The output AST follows the [ESTree](https://github.com/estree/estree) specification which is based in the [Mozilla Parser API](https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_API).

Which is the AST for this: ```js var G_BASE = 0, G_TEXT_Z = 0; function program_test(mem, exec, args) { while (true) { switch (exec.pc) { case 1: mem[G_BASE + G_TEXT_Z] = 1; return __yieldDebug(3); case 3: return __yieldNewProcess(4, 'a', []); case 4: exec.retv.dequeue(); return __yieldDebug(6); case 6: mem[G_BASE + G_TEXT_Z] = 3; return __yieldDebug(8); case 8: return __yieldEnd; } } } function process_a(mem, exec, args) { while (true) { switch (exec.pc) { case 1: mem[G_BASE + G_TEXT_Z] = 2; return __yieldEnd; } } } ```

This AST is then combined with a fixed one, which contains a set of **bridge functions** to communicate with the runtime.

```js (function (rt) { 'use strict'; /* Here comes the memory map */ function __yieldDebug(npc) { return new rt.Baton('debug', { npc: npc }); } function __yieldNewProcess(npc, processName, args) { return new rt.Baton('newprocess', { npc: npc, processName: processName, args: args }); } var __yieldEnd = new rt.Baton('end'); return { /* Here come the process and memory maps */ }; }); ``` [Esprima](http://esprima.org/) is used to parse the code above.

Now the —3rd party— **generator** ([escodegen](https://github.com/estools/escodegen)) can create the proper JavaScript code.

```js (function (rt) { 'use strict'; var G_BASE = 0, G_TEXT_Z = 0; function __yieldDebug(npc) { return new rt.Baton('debug', { npc: npc }); } function __yieldNewProcess(npc, processName, args) { return new rt.Baton('newprocess', { npc: npc, processName: processName, args: args }); } var __yieldEnd = new rt.Baton('end'); return { pmap: { program: function program_test(mem, exec, args) { while (true) { switch (exec.pc) { case 1: mem[G_BASE + G_TEXT_Z] = 1; return __yieldDebug(3); case 3: return __yieldNewProcess(4, 'a', []); case 4: exec.retv.dequeue(); return __yieldDebug(6); case 6: mem[G_BASE + G_TEXT_Z] = 3; return __yieldDebug(8); case 8: return __yieldEnd; } } }, process_a: function process_a(mem, exec, args) { while (true) { switch (exec.pc) { case 1: mem[G_BASE + G_TEXT_Z] = 2; return __yieldEnd; } } } }, mmap: { G: { G_BASE: 0, G_TEXT_Z: 0 }, L: {}, P: {} } }; }); ```

### Load

Loading the program means linking the object artefact with the runtime to create a **executable program**.

First the unit is evaluated with `eval()` passing the runtime functions needed to communicate with the scheduler. ```js const unit = eval(objText)(runtime); const processMap = unit.pmap; const memoryMap = unit.mmap; ```

Now the process and memory maps are used to create a new program. ```js const program = new runtime.Runtime(processMap, memoryMap) ```

### Execution

Before executing the program, we can listen for some events. ```js // Both events don't finish until resolving a promise. program.ondebug = () => openDebugger(); program.onfinished = () => showFinishMessage(); ```

But our real interest is to run the program. ```js program.run(); ```

Salvador de la Puente González

@salvadelapuente

http://github.com/delapuente

https://delapuente.github.io/presentations/

## Questions?