Skip to content
Louis Goessling edited this page Jul 24, 2017 · 8 revisions

Orth has a relatively simple syntax. It is inspired by python, ruby, and to a lesser extent go and C. An orth program is a text file that contains some number of top-level entities. These are either

  1. Import statements
  2. Globals declarations
  3. Functions
  4. Type declarations

Type declarations may also contain functions, but this is semantically equivilent to the function being outside of a type declaration.

Import Statements

Import statements are written as either import module_on_path or as import "relative/path/to/module.ort". One module per import. The search path is searched in order to resolve bare names (and also searches for "packages", folders containing an __init__.ort)

Globals

Globals are defined with the declaration syntax. E.g. int foo or Logger log. Globals are zero-initialized (0 for integer types, null for pointer types.)

Type Declarations

A type declaration is of the form

type FooType is packed
    int a
    float b
    FooType next
endtype

(packed is optional - it denotes the compiler should not perform structure packing on it's memory layout) Members are laid out in order in memory as declared in the class.

Additionally, functions may be declared within the class - it is the same as declaring them anywhere else. All members should be declared in a block immediatley following the class header, but it is not required.

Function Pointers

A function pointer type can be declared as

type FunctionReturningInt is a function->int

Currently no parameter information is used (or allowed) in this syntax. It may be added later.

Code syntax

Inside of blocks, the code flows as any other language - one line to the next, excepting control flow structures. Semicolons are not required, but may be used equivalently to line breaks. Statements end on a line break, unless the last character is a backslash, in which case they continue as if there were no line break. Indentation should only ever be tabs!!! This is not enforced, it's just my dictate. Comments may be either //single line or

<# multi
   line  #>

IMPORTANT NOTE: Record types (classes/structs/aggregate/product types) are of type pointer-to-record when thinking about it in the C manner. For example, assigning FooType something = get_something(), no FooType object is copied, nor is space for one allocated on the stack. Only space for a pointer is allocated (if that.) In C, the equivalent statement would be FooType* something = get_something();. They are strictly semantically the same.

Arithmetic

Operators, in order of precedence, are: * and /, + and -, %, ==, !=, <, >, <=, >=, ^, &, <<, >>. Parenthesis may be used to change order of execution as in C or python.

Arithmetic operations can be overloaded on user-defined types, and are implemented as a call to a member function with a name representing the operation, as in python. The overload implementer functions are as follows:

+----------------+    +----------------+
| +  -> __add__  |    | -  -> __sub__  |
| *  -> __mul__  |    | /  -> __div__  |
| == -> __eq__   |    | != -> __ne__   |
| >  -> __gt__   |    | <  -> __lt__   |
| >= -> __ge__   |    | <= -> __le__   |
| %  -> __mod__  |    | &  -> __band__ |
| >> -> __shr__  |    | << -> __shl__  |
| ^  -> __xor__  |    |                |
+----------------+    +----------------+

Additionally, unary minus is implemented via __neg__, and takes a single argument

The signature for these must take the type they are being implemented on as the first argument. The second type and return type should usually be the same, but you are free to do whatever you want.

Variables

Variables are declared as int x or FooType y. Variables are referenced simply with their names. Variables should be declared before use (this is enforced in the 1st generation compiler, not required in shoc.) Using the name of a function in the context of a variable evaluates to a rvalue of the function of the address of an anonymous function pointer type.

Calling methods

Methods are called as in C. As expressions they evaluate to their result, but storing it is not necessary even if one is produced.

Casting

Casting is via the pipe operator, and similar to Go's .(Type) cast. The syntax is some_var|TargetType. It has a higher precedence than operators and accessors, so some_var|TypeImplementingFoo:foo() is legal, and resolves to TypeImplementingFoo:foo(some_var|TypeImplementingFoo). Casting from a smaller integer to a larger zero-extends, and from a larger to smaller truncates. Casting float to integer rounds in a implementation-defined way. Casting integer to float results in the closes value as produced in an implementation-defined way. Casting integer to pointer is a bitcast, as is pointer to integer and pointer to pointer. Casting float to pointer or pointer to float is illegal (duh!)

Accessors

Members of record types may be accessed with the dot. As variables that are of a record type are really pointers to an instance of said type, the dot operator is equivalent to the C -> operator in that it is a offset and dereference.

The :: psuedo-accessor is simply syntactic sugar and does nothing (but it is a legal name component, and may thusly be used for psuedo-namespacing.)

The : psuedo-accessor is used when calling a "instance method" off of a instance of a type. It rewrites the call to be to the implementing method of the class the instance is of. For example, for a FooType var, executing var:some_method() is equivilent to executing the method defined with name FooType:some_method with the argument (var). Additional arguments are passed on the end, like in python. For example, var:something(1,2,3) calls FooType:something(var, 1, 2, 3).

Indexing

Indexing with square brackets is the same as in C, with the exception that the it always acts as if it were a string, returning the byte at a given offset. Indexing may be done from any type. Indexing outside of the length of a peice of memory is undefined. An index array[offset] is equivlent to the C code *((char*)array + offset).

Control Flow: If

Ifs are of the form if condition do/elif condition do/else do and the final block must be terminated with done and behave exactly as in C. Condition must be a bool. For example:

int a
if b==72 do
    a=2
elif a+2+some:thing(else)==5 do
   a=-1
else do
   orth::fail("Invalid A!")
done

Control Flow: While

Whiles are of the form while condition do and are terminated with done. Condition must be a bool. For example:

int i=0
while i<args:len() do
    printf("Arg %02i: %s\n", i, args:get(i))
    i+=1
done

Literals

Literals can be base ten integers, base ten floats, double-quoted "strings", true and false, and null. Additionally, hex literals are allowed in the format 0x123abc, but this is unimplemented in shoc. Float literals are also currently unimplemented (all resolve to 0.0). Integer literals are of type int, float literals of type float, string literals of type cstr (representing a null-terminated string), true and false of type bool, and null of type ptr (representing the bitcasted value 0.)

Intrinsic functions are functions that are provided by the compiler. They are executed at compile-time and resolve to concrete values or operations. They are called by placing an at symbol before the function name, and after the closing paren. For example: @sizeof(ptr)@ will resolve to be the same as an 8 literal when compiling on a 64-bit system, and the same as a 4 literal in 32-bit mode. An intrinsic must be the last element in a set of parens (may not be followed with binops/etc). Obviously these bend the semantics of function calls, and each behaves in it's own way. See the intrinsics page.

NOTE: As of some updates to shoc, the trailing at is not required and this is the preffered style. For example: @sizeof(ptr) is now preffered. This is not supported in the 1st gen compiler, but if you plan on writing code that will only be compiled with shoc (you should be) this is the way to go.