Introduction

This is a collection of notes on programming languages, frameworks and other computer science topics that I am writing to keep track of my studies. It is a work in progress.

Notes on C

Table of Contents

Basics

The minimal "hello world" C program:

#include <stdio.h>

int main()
{
    printf("Hello World!\n");
    return 0;
}

Any position that is not listed in the initializer is set to 0. A[2] == 0.0

Don’t compare to 0, false, or true. Also, all scalars have a truth value. An integer or floating point 0 will always evaluate to false.

// GOOD
bool b = true;
if ( b ) {
  // do something
}

// BAD
bool b = true;
if (( b != false ) == true ) {
  // do something
}

The scalar types:

NameWhereprintf
size_t<stddef.h>"%zu" "%zx
doubleBuilt in"%e" "%f" "%g" "%a"
signedBuilt in"%d"
unsignedBuilt in"%u" "%x"
bool<stdbool.h>"%d"
ptrdiff_t<stddef.h>"%td"
char const*Built in"%s"
charBuilt in"%c"
void*Built in"%p"
unsigned charBuilt in"%hhu" "%02hhx"

size_t is any integer on the interval [0, SIZE_MAX], as defined on stdint.h.

Best Practice: Never modify more than one object in a statement.

Best Practice: Never declare various pointers or arrays on the same line to avoid errors.

Ternary operator:

// Pattern:
// (boolean_expression) ? if_true : if_false;

size_t size_min(size_t a , size_t b) {
    return ( a < b ) ? a : b;
}

Attention: in an expression such as f(a) + g(b), there is no pre-established order specifying whether f(a) or g(b) is to be computed first. If either the function f or g works with side effects (for instance, if f modifies b behind the scenes), the outcome of the expression will depend on the chosen order. The same holds for function arguments:

printf("%g and %g\n", f(a), f(b));

Best Practice: Functions that are called inside expressions should not have side effects.

Every type in C is either an object type or a function type.

C is call-by-value. When you provide an argument to a function, the value of that argument is copied into a distinct variable for use within the function.

Scopes can be nested, with inner and outer scopes. For example, you can have a block scope inside another block scope, and every block scope is defined within a file scope. The inner scope has access to the outer scope, but not vice versa. If you declare the same identifier in both the inner scope and an outer scope, the identifier declared in the outer scope is hidden by the identifier within the inner scope, which takes precedence. In this case, naming the identifier will refer to the object in the inner scope; the object from the outer scope is hidden and cannot be referenced by its name.

Scope and lifetime are different. Scope applies to identifiers, whereas lifetime applies to objects. The scope of an identifier is the code region where the object denoted by the identifier can be accessed by its name. The lifetime of an object is the time period for which the object exists.

Automatic lifetimes are declared within a block or as a function parameter. The lifetime of these objects begins when the block in which they’re declared begins execution, and ends when execution of the block ends. If the block is entered recursively, a new object is created each time, each with its own storage.

Objects declared in file scope have static storage duration. The lifetime of these objects is the entire execution of the program, and their stored value is initialized prior to program startup. One can use static to declare a variable within a block scope to have a static lifetime. These objects persist after the function has exited.

Best Practice: Never declare functions with an empty parameter list in C. Always use void in the parameter like so:

int my_function(void);

Enums

Allows you to define a type that assigns names (enumerators) to integer values in cases with an enumerable set of constant val- ues.

enum day { sun, mon, tue, wed, thu, fri, sat };
enum cardinal_points { north = 0, east = 90, south = 180, west = 270 };
enum months { jan = 1, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec };

Pointers

int* ip;
char* cp;
void* vp;

int i = 17;
int* ip = &i;

The & operator takes the address of an object or function. The * operator converts a pointer to a type into a value of that type. It denotes indirection and operates only on pointers.

Arrays

A contiguously allocated sequence of objects that all have the same element type.

int ia[11];
float* afp[17];

int matrix[3][5];

Initializing an array:

double A[5] = {
    [0] = 9.0,
    [1] = 2.9,
    [4] = 3.E+25,
    [3] = .00007,
};

// A[2] is initialized as 0.0

Structs

A struct contains sequentially allocated member objects. One can reference members of a struct by using the struct member operator (.). If you have a pointer to a struct, you can reference its members with the struct pointer operator (->).

typedef struct vec2 { float x, y; } vec2;

vec2 v0 = {1.0f, 2.0f};
vec2 v1 = {.x = 1.0f, .y = 2.0f};
vec2 v2 = {.y = 2.0f};    // missing struct members are set to zero



// Inside functions, runtime-variable values can be used for initialization:
float get_x(void) {
    return 1.0f;
}

void bla(void) {
    vec2 v0 = { .x = get_x(), .y = 2.0f };
}

// But this doesn't work:
vec2 v0;
// this doesn't work
v0 = {1.0f, 2.0f};
// instead a type hint is needed:
v0 = (vec2) {1.0f, 2.0f};

Unions

Union types are similar to structures, except that the memory used by the member objects overlaps. Unions can contain an object of one type at one time, and an object of a different type at a different time, but never both objects at the same time, and are primarily used to save memory.

Qualifiers

Types can be qualified by using one or more of the following qualifiers: const, volatile, and restrict.

  • const: not modifiable. Will be placed in read-only memory by the compiler, and any attempt to write to them will result in a runtime error.
  • volatile: values stored in these objects may change without the knowledge of the compiler. For example, every time the value from a real-time clock is read, it may change, even if the value has not been written to by the C program. Using a volatile-qualified type lets the compiler know that the value may change, and ensures that every access to the real-time clock occurs (­otherwise, an access to the real-time clock may be optimized away or replaced by a previously read and cached value).
  • restrict: Used to promote optimization. Objects ­indirectly accessed through a pointer frequently cannot be fully optimized because of potential aliasing, which occurs when more than one pointer refers to the same object. Aliasing can inhibit optimizations, because the compiler can’t tell if portions of an object can change values when another apparently unrelated object is modified, for example.

Preprocessor

Defines a macro as 0 and an empty macro:

#define  __MACRO__ 0

#define  __MACRO2__

Checks if a macro is defined. If it is, throws an error (the error stops the compilation):

#ifdef  __MACRO__
#error "Error!"
#endif

Checks if a macro is not defined. If it isn't, defines it:

#ifndef __MACRO__
#define __MACRO__
#endif

Some Best Practices

  • Enable all warnings: -Wall and -Wextra on GCC and Clang
  • Wrap your structs in a typedef:
typedef struct bla{
    int a, b, c;
} bla;

// Attention: the POSIX standard reserves the ‘_t’ postfix for its own type names to prevent collisions with user types.
  • Use void to indicate that a function does not receive arguments:
// GOOD
void my_func(void) {
    ...
}

// BAD
void my_func() {
    ...
}
  • Don’t be afraid to pass and return structs by value:
typedef struct float2{ float x, y; } float2;

float2 addf2(float2 v0, float2 v1) {
    return (float2) { v0.x + v1.x, v0.y + v1.y };
}

...
float2 v0 = {1.0f, 2.0f};
float2 v1 = {3.0f, 4.0f};
float2 v3 = addf2(v0, v1);
...

// You can also move the initialization of the two inputs right into the function call:
float2 v3 = addf2((float2){ 1.0f, 2.0f }, (float2){ 3.0f, 4.0f });

Some Headers

  • stdint.h: defines integer types with certain widths.
  • tgmath.h: defines type generic math functions, for both real and complex numbers.

Bibliography

  • SEACORD, Robert. Effective C: an introduction to professional C programming. No Starch Press. 2020.

Notes on C++

Table of Contents

Basics

The minimal "hello world" C++ program:

#include <iostream>
int main()
{
    std::cout<<"Hello, World!\n";
    return 0;
}

The #include directive imports a library to be used. The main function is the entry point of the program and returns an integer upon successful completion. Blocks of code that perform some well defined action should be encapsulated on a function. That function would then be called on main(). A function must be declared before being used. The declaration follows this syntax:

Node* next_node();     // Receives no argument, returns pointer to Node
void call_number(int); // Receives an int, returns nothing
double exp(double);    // Receives a double, returns a double.

There can be more than one argument. The types of the arguments and of the return are checked at compile time. The code that makes up a software must be comprehensible because this is the first step to maintainability. To make code more comprehensible, we must divide its tasks into functions. That way, we will have a basic vocabulary for data (types, both built-in and user defined) and for actions on that data (functions). Encapsulating a specific action into a function, forces us to not repeat ourselves (DRY) and to document that action's dependencies. If a function with the same name is declared with different arguments, the compiler will accept all of them and use the function corresponding to the type used. This allows us to change the behavior of an action based on the type of the data we are dealing with. Attention to the fact that the declarations cannot be ambiguous. If so, the compiler will throw an error. Another point is that if a set of functions have the same name, they should have the same semantics, i.e., they should perform the same action. Example:

// This is called function overloading
void sum(int, int);
void sum(double, double);

void use()
{
  sum(2, 2);
  sum(2.7, 3.14);
}

Every object in the language has a type. An object is a portion of memory that holds a value of the given type. A value is a set of bits that are interpreted according to the type. Some built-in types are bool, char, int and double. Int variables' representation default to decimal (42) but can also be declared as binary (0b1010), hexadecimal (0x04AF) or octal (0432). The usual arithmetic, comparison, logical and modification operations are present:

ArithmeticComparisonLogicalModification
x + yx == yx && yx += y
x - yx != yx || yx -= y
x * yx < y!x++x
x / yx > y--x
x % yx <= yx *= y
x >= yx /= y
x %= y

Attention to the fact that C++ makes implicit conversion on basic types:

void my_function()
{
  double d = 3.14;
  int i = 5;
  d = d + i; // d is 8.14 here
  i = d * i; // i is 15 here.
  // The resulting multiplication is 15.7,
  // but the value is truncated to fit an int
}

There are two forms of variable initialization, "=" or "{}":

double d = 3.14;
double d2 {2.7};

// Vector of ints
vector<int> v {1, 2, 3};

The "=" is a tradition from C. "{}" is preferred because it avoids implicit conversions:

int a = 4.8; // a is 4 here
int b {4.8}; // Error

If the data is coming from the user, for example, implicit conversions are a recipe for disaster, if not checked. The type of a variable can be deduced by the compiler with the "auto" keyword. My opinion is that auto should only be used when the types are very long (as they usually are with templates). Explicit (short) types help with code readability.

auto a = 4.8;
auto b {4.8}; // Both a and b are double

-> 1.5

User-Defined Types

Modularity

Classes

Constructors

A constructor without parameters or with default parameters set is called a default constructor. It is a constructor which can be called without arguments:

class MyClass
{
  public:
    MyClass()
    {
      std::cout << "Default constructor invoked.\n";
    }
};

int main()
{
  MyClass o; // invoke a default constructor
}

// Default arguments
class MyClass
{
  public:
    MyClass(int x = 123, int y = 456)
    {
      std::cout << "Default constructor invoked.\n";
    }
};

int main()
{
  MyClass o; // invoke a default constructor
}

// If a default constructor is not explicitly defined in the code, the compiler will generate a default constructor.
// Constructors are invoked when object initialization takes place. They can’t be invoked directly.

Member initializer list:

class MyClass
{
 public:
   int x, y;
   MyClass(int xx, int yy)
     : x{ xx }
     , y{ yy } // member initializer list
   {
   }
};

int main()
{
   MyClass o{ 1, 2 }; // invoke a user-defined constructor
   std::cout << o.x << ' ' << o.y;
}

Copy Constructors

When we initialize an object with another object of the same class, we invoke a copy constructor. If we do not supply our copy constructor, the compiler generates a default copy constructor.

class MyClass
{
  private:
    int x, y;

  public:
    MyClass(int xx, int yy) : x{ xx }, y{ yy }
    {
    }
};
int main()
{
  MyClass o1{ 1, 2 };
  MyClass o2 = o1; // default copy constructor invoked
}

A user defined copy constructor has this signature: MyClass(const MyClass& rhs)

class MyClass
{
  private:
    int x, y;

  public:
    MyClass(int xx, int yy) : x{ xx }, y{ yy }
    {
    }

    MyClass(const MyClass& rhs)
    : x { rhs.x }
    , y { rhs.y }
    {
    }
};
int main()
{
  MyClass o1{ 1, 2 };
  MyClass o2 = o1; // user copy constructor invoked
}

Copy Assignment

When an object is created on one line and then assigned to in the next line, it then uses the copy assignment operator to copy the data from another object:

MyClass from, to;
to = from; // copy assignment

A copy assignment operator is of the following signature: MyClass& operator=(const MyClass& rhs)

class MyClass
{
  public:
    MyClass& operator=(const MyClass& rhs)
    {
      // implement the copy logic here
      return *this;
    }
};

// The overloaded = operators must return a dereferenced this pointer at the end.

Move Constructor

We can move the data from one object to the other. We call it a move semantics.

Operations

Templates

Generic Programming

Standard Library

Strings and Regex

To accept a string from standard input, use std::getline(read_from, into):

std::string s;
std::cout << "Enter string: ";
std::getline(std::cin, s);

To create a substring from a string, use the method .substring(starting_index, length):

std::string s = "Hello World";
std::string sub_s = s.substr(6, 5);

To find a substring in a given string, use the method .find(). If the method finds, it returns the position of the first found substring (the index of the first char of the substring). If the method doesn't find, it returns std::string:npos. The return type of the function is std::string::size_type:

std::string s = "Hello World";
std::string search_for = "World";
std::string::size_type found = s.find(search_for);

I/O

TODO

Containers

TODO

Algorithms

TODO

Utilities

TODO

Numerics

TODO

Concurrency

TODO

References

  • Bjarne Stroustrup. A Tour of C++. Pearson Education. 2018.
  • Slobodan Dmitrovic. Modern C++ for Absolute Beginners. 2020.

Notes on Cython

Table of Contents

Compilation

Using distutils with cythonize

Consider a fib.pyx Cython source code. Our goal is to use distutils to create a compiled extension module (fib.so on Mac OS X or Linux and fib.pyd on Windows). For that, we use a setup.py file like so:

from distutils.core import setup
from Cython.Build import cythonize

setup(name='Fibonacci App',
      ext_modules=cythonize('fib.pyx',
                            nthreads=4,
                            force=False,
                            annotate=True,
                            compiler_directives={'binding': True},
                            language_level="3"))

The arguments of the function cythonize can be seen here. The most important ones are explained below:

  • The first argument is the name of the Cython files. It can also be a glob pattern such as src/*.pyx;
  • nthreads: The number of concurrent builds for parallel compilation (requires the multiprocessing module);
  • force: Forces the recompilation of the Cython modules, even if the timestamps don’t indicate that a recompilation is necessary. Default is False;
  • annotate: If True, will produce a HTML file for each of the .pyx or .py files compiled. The HTML file gives an indication of how much Python interaction there is in each of the source code lines, compared to plain C code. It also allows you to see the C/C++ code generated for each line of Cython code. Default is False;
  • compiler_directives: Allows to set compiler directives. More information here;
  • language_level: The level of the Python language. 3 is for Python 3.

These two function calls succinctly demonstrate the two stages in the pipeline: cythonize calls the cython compiler on the .pyx source file or files, and setup compiles the generated C or C++ code into a Python extension module. A C compiler, such as gcc, clang or MSVC is necessary at compile time.

To build on Linux and MacOS, run:

python3 setup.py build_ext --inplace

The build_ext argument is a command instructing distutils to build the Extension object or objects that the cythonize call created. The optional --inplace flag instructs distutils to place each extension module next to its respective Cython .pyx source file.

On Windows:

python setup.py build_ext --inplace --compiler=msvc

If you use setuptools instead of distutils, the default action when running python3 setup.py install is to create a zipped egg file which will not work with cimport for pxd files when you try to use them from a dependent package. To prevent this, include zip_safe=False in the arguments to setup().

One can also set compiler options in the setup.py, before calling cythonize(), like so:

from distutils.core import setup
from Cython.Build import cythonize

from Cython.Compiler import Options

Options.embed = True

setup(name='Fibonacci App',
      ext_modules=cythonize('fib.pyx',
                            nthreads=4,
                            force=False,
                            annotate=True,
                            compiler_directives={'binding': True},
                            language_level="3"))

The embed option embeds the Python interpreter, in order to make a standalone executable. This will provide a C function which initialises the interpreter and executes the body of this module. More options here.

Typing

Typing variables

Untyped dynamic variables are declared and behave exactly like Python variables:

a = 42

Statically typed variables are declared like so:

cdef int a = 42
cdef size_t len
cdef double *p
cdef int arr[10]

and behave like C variables.

It is possible to mix both kinds of variables if there is a trivial correspondence between the types, like C and Python ints:

# C variables
cdef int a, b, c

# Calculations using a, b, and c...

# Inside a Python tuple
tuple_of_ints = (a, b, c)

In Python 3, all int objects have unlimited precision. When converting integral types from Python to C, Cython generates code that checks for overflow. If the C type cannot represent the Python integer, a runtime OverflowError is raised. A Python float is stored as a C double. Converting a Python float to a C float may truncate to 0.0 or positive or negative infinity, according to IEEE 754 conversion rules. The Python complex type is stored as a C struct of two doubles. Cython has float complex and double complex C-level types, which correspond to the Python complex type.

We can also use cdef to statically declare variables with a Python type. We can do this for the built-in types like list, tuple, and dict and extension types like NumPy arrays:

cdef list particles, modified_particles
cdef dict names_from_particles
cdef str pname
cdef set unique_particles

The more static type information we provide, the better Cython can optimize the result.

C Functions

When used to define a function, the cdef keyword creates a function with C-calling semantics. A cdef function’s arguments and return type are typically statically typed, and they can work with C pointer objects, structs, and other C types that cannot be automatically coerced to Python types. It is helpful to think of a cdef function as a C function that is defined with Cython’s Python-like syntax.

cdef long factorial(long n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)

A function declared with cdef can be called by any other function (def or cdef) inside the same Cython source file. However, Cython does not allow a cdef function to be called from external Python code. Because of this restriction, cdef functions are typically used as fast auxiliary functions to help def functions do their job. If we want to use factorial from Python code outside of this extension module, we need a minimal def function that calls factorial internally:

def wrap_factorial(n):
    return factorial(n)

One limitation of this is that wrap_factorial and its underlying factorial are restricted to C integral types only, and do not have the benefit of Python’s unlimited-precision integers. This means that wrap_factorial gives erroneous results for arguments larger than some small value, depending on how large an unsigned long is on your system. We always have to be aware of the limitations of the C types.

C Functions with Automatic Python Wrappers

A cpdef function combines features from cdef and def functions: we get a C-only version of the function and a Python wrapper for it, both with the same name. When we call the function from Cython, we call the C-only version; when we call the function from Python, the wrapper is called.

cpdef long factorial(long n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)

A cpdef function has one limitation, due to the fact that it does double duty as both a Python and a C function: its arguments and return types have to be compatible with both Python and C types.

Both cdef and cpdef can be given an inline hint that the C compiler can use or ignore, depending on the situation:

cpdef inline long factorial(long n):
    if n <= 1:
        return 1
    return n * factorial(n - 1)

The inline modifier, when judiciously used, can yield performance improvements, especially for small inlined functions called in deeply nested loops, for example.

Exception Handling

A def function always returns some sort of PyObject pointer at the C level. This invariant allows Cython to correctly propagate exceptions from def functions without issue. Cython’s other two function types (cdef and cpdef) may return a non-Python type, which makes some other exception-indicating mechanism necessary. Example:

cpdef int divide_ints(int i, int j):
    return i / j

To correctly propagate the exception that occurs when j is 0, Cython provides an except clause:

cpdef int divide_ints(int i, int j) except? -1:
    return i / j

The except? -1 clause allows the return value -1 to act as a possible sentinel that an exception has occurred. If divide_ints ever returns -1, Cython checks if the global exception state has been set, and if so, starts unwinding the stack. In this example we use a question mark in the except clause because -1 might be a valid result from divide_ints, in which case no exception state will be set. If there is a return value that always indicates an error has occurred without ambiguity, then the question mark can be omitted.

C structs, unions, enums an typedefs

The following C constructs:

struct mycpx {
    int a;
    float b;
};

union uu {
    int a;
    short b, c;
};

enum COLORS {ORANGE, GREEN, PURPLE};

Can be declared on Cython like this:

cdef struct mycpx:
    float real
    float imag

cdef union uu:
    int a
    short b, c

cdef enum COLORS:
    ORANGE, GREEN, PURPLE

We can combine struct and union declarations with ctypedef, which creates a new type alias for the struct or union:

ctypedef struct mycpx:
    float real
    float imag

ctypedef union uu:
    int a
    short b, c

To declare and initialize:

cdef mycpx a = mycpx(3.1415, -1.0)

# Or
cdef mycpx b = mycpx(real=2.718, imag=1.618034)

# Or
cdef mycpx zz
zz.real = 3.1415
zz.imag = -1.0

# Or, structs can be assigned from a Python dictionary (with CPython overhead):
cdef mycpx zz = {'real': 3.1415, 'imag': -1.0}

Efficient Loops

Considering this Python for loop over a range:

n = 100
# ...
for i in range(n):
    # ...

Its cythonized version that would produce the best performing C code is:

cdef unsigned int i, n = 100
for i in range(n):
    # ...

Extension Types

A Python class such as:

class Particle():
    def __init__(self, m, p, v):
        self.mass = m
        self.position = p
        self.velocity = v
    def get_momentum(self):
        return self.mass * self.velocity

Would be cythonized as:

cdef class Particle():
    cdef double mass, position, velocity

    def __init__(self, m, p, v):
        self.mass = m
        self.position = p
        self.velocity = v
    def get_momentum(self):
        return self.mass * self.velocity

To make an attribute readonly (for a Python caller):

cdef class Particle():
    cdef readonly double mass
    cdef double position, velocity
# ...

mass will be readable and not writable by the Python caller, but position and velocity will be completely private.

cdef class Particle():
    cdef readonly double mass
    cdef public double position
    cdef double velocity
# ...

Here, position will be both readable and writable by the Python caller. If C-level allocations and deallocations must occur, then use the __cinit__ and __dealloc__ methods:

cdef class Matrix:
    cdef:
        unsigned int nrows, ncols
        double *_matrix

    def __cinit__(self, nr, nc):
        self.nrows = nr
        self.ncols = nc
        self._matrix = <double*>malloc(nr * nc * sizeof(double))
        if self._matrix == NULL:
            raise MemoryError()

    def __dealloc__(self):
        if self._matrix != NULL:
        free(self._matrix)

You can cast a Python object to a static object:

# p is a Python object that may be a Particle

cdef Particle static_p = p

# Or, with the possibility of segfault if p is not a particle:

<Particle>p

# Or, safelly, but with overhead:

<Particle?>p

None can be passed as argument for functions that receive static type. This will lead to segfaults. To protect against it:

def dispatch(Particle p not None):
    print p.get_momentum()
    print p.velocity

Wrapping C++

TODO

Profiling

TODO

Typed Memoryviews

TODO

Parallelism

TODO

References

Notes on Julia

Julia is a high-level, general-purpose dynamic programming language, most commonly used for numerical analysis and computational science. Distinctive aspects of Julia's design include a type system with parametric polymorphism and the use of multiple dispatch as a core programming paradigm, efficient garbage collection, and a just-in-time (JIT) compiler (with support for ahead-of-time compilation).

Julia Deep Dive

Table of Contents

Basics

The minimal "hello world" program:

# Single line comment

#=
Multi-line comment
=#

println("Hello, World!")

Indentation doesn't matter. Indexing starts at 1, like Matlab and Octave. In the REPL, by pressing "]" you can enter the "package mode", where you can write commands that manage the packages you have or want. Some commands:

  • status: Retrieves a list with name and versions of locally installed packages
  • update: Updates your local index of packages and all your local packages to the latest version
  • add myPkg: Automatically downloads and installs a package
  • rm myPkg: Removes a package and all its dependent packages that has been installed automatically only for it
  • add pkgName#master: Checkouts the master branch of a package (and free pkgName returns to the released version)
  • add pkgName#branchName: Checkout a specific branch
  • add git@github.com:userName/pkgName.jl.git: Checkout a non registered pkg

To use a package on a Julia script, write using [package] at the beginning of the script. To use a package without populating the namespace, write import [package]. But then, you will have to use the functions as [package].function(). You can also include local Julia scripts as such: include("my_script.jl").

I think that using [package] is bad practice because it pollutes the namespace. The best way to import a package is this:

# Importing the JSON package through an alias
import JSON as J

# Using:
J.print(Dict("Hello, " => "World!"))

A particular class of variable names is one that contains only underscores. These identifiers can only be assigned values, which are immediately discarded, and cannot therefore be used to assign values to other variables (i.e., they cannot be used as rvalues) or use the last value assigned to them in any way.

Data Types and Structures

Some built-in data types and structures of the Julia language:

Scalar Types

The usual scalar types are present: Int64, UInt128, BigInt, Float64, Char and Bool.

Const values

Constant values are declared as such:

const foo = 1234

Basic Math

Complex numbers can be defined like so, with im being the square root of -1:

a = 1 + 2im

Exact integer division can be done like this:

a = 2 // 3

All standard basic mathematical arithmetic operators are supported (+, -, *, /, %, ^). Mathematical constants can be used like so:

MathConstants.e
MathConstants.pi

Natural exponentiation can be done like this:

 a = exp(b)

Strings

Strings are immutable. We use single quote for chars and double quote for strings. A string on a single row can be created using a single pair of double quotes, while a string on multiple rows can use a triple pair of double quotes:

a = "a string"
b = "a string\non multiple rows\n"
c = """
    a string
    on multiple rows
    """

Some string operations are also present, like:

  • split: Separates string into other strings based on a char. Default char is whitespace.
  • join([string1, string2], ""): Concatenates strings with a certain string.
  • replace(s, "toSearch" => "toReplace"): Replaces occurrences on the string s.
  • strip(s): Remove leading and trailing whitespaces.

Other ways to concatenate strings:

  • Concatenation operator: *;
  • Function string(string1,string2,string3);
  • Interpolate string variables in a bigger one using the dollar symbol: a = "$str1 is a string and $(myobject.int1) is an integer".

To convert strings representing numbers to integers or floats, use myInt = parse(Int64,"2017"). To convert integers or floats to strings, use myString = string(123).

You can broadcast a function to work over a collection (instead of a scalar) using the dot (.) operator. For example, to broadcast parse to work over an array:

myNewList = parse.(Float64,["1.1","1.2"])

Arrays

Arrays are N-dimensional mutable containers. Ways to create one:

  • a = [] or a = Int64[] or a = Array{T,1}() or a = Vector{T}(): Empty array. Array{} is the constructor, T is the type and Vector{} is an alias for 1 dimensional arrays.
  • a = zeros(5) or a = zeros(Int64,5) or a = ones(5): Array of zeros (or ones)
  • a = fill(j, n): n-element array of identical j elements
  • a = rand(n): n-element array of random numbers
  • a = [1,2,3]: Explicit construction (column vector).
  • a = [1 2 3]: Row vector (this is a two-dimensional array where the first dimension is made of a single row)
  • a = [10, "foo", false]: Can be of mixed types, but will be much slower

If you need to store different types on a data structure, better to use an Union: a = Union{Int64,String,Bool}[10, "Foo", false]. Some operations on arrays:

  • a[1]: Access element.
  • a[from:step:to]: Slice
  • collect(myiterator): Transforms an iterator in an array.
  • y = vcat(2015, 2025:2028, 2100): Initialize an array expanding the elements. 2025:2028 means [2025, 2026, 2027, 2028].
  • push!(a,b): Append b to the end of a
  • append!(a,b): Append the elements of b to the end of a. If b is scalar, append b to the end of a.
  • a = [1,2,3]; b = [4,5]; c = vcat(1,a,b): Concatenation of arrays.
  • pop!(a): Remove element from the end of a.
  • popfirst!(a): Remove first element of a.
  • deleteat!(a, pos): Remove element at position pos from array a.
  • pushfirst!(a,b): Add b at the beginning of array a.
  • sort!(a) or sort(a): Sorting, depending on whether we want to modify or not the original array.
  • unique!(a) or unique(a): Remove duplicates
  • a[end:-1:1]: Reverses array a.
  • in(1, a): Checks for existence.
  • length(a): Length of array.
  • a...: The “splat” operator. Converts the values of an array into function parameters
  • maximum(a) or max(a...): Maximum value. max returns the maximum value between the given arguments.
  • minimum(a) or min(a...): Minimum value. min returns the minimum value between the given arguments.
  • isempty(a): Checks if an array is empty.
  • reverse(a): Reverses an array.
  • sum(a): Return the summation of the elements of a.
  • cumsum(a): Return the cumulative sum of each element of a (returns an array).
  • empty!(a): Empty an array (works only for column vectors, not for row vectors).
  • b = vec(a): Transform row vectors into column vectors.
  • shuffle(a) or shuffle!(a): Random-shuffle the elements of a (requires using Random before).
  • findall(x -> x == value, myArray): Find a value in an array and return its indexes.
  • enumerate(a): Get (index,element) pairs. Return an iterator to tuples, where the first element is the index of each element of the array a and the second is the element itself.
  • zip(a,b): Get (a_element, b_element) pairs. Return an iterator to tuples made of elements from each of the arguments

Functions that end in '!' modify their first argument.

Map applies a function to every element in the input arrays:

map(func, my_array)

Filter takes a collection of values, xs, and returns a subset, ys, of those values. The specific values from xs that are included in the resulting ys are deter- mined by the predicate p. A predicate is a function that takes some value and always returns a Boolean value:

ys = filter(p, xs)

Reduce takes some binary function, g, as the first argument, and then uses this function to combine the elements in the collection, xs, provided as the second argument:

y = reduce(g, xs)

Mapreduce can be understood as reduce(g, map(f, xs)).

Multidimensional and Nested Arrays

A matrix is an array of arrays that have the same length. The main difference between a matrix and an array of arrays is that, with a matrix, the number of elements on each column (row) must be the same and rules of linear algebra apply.

Attention: Julia is column-major

Ways to create one:

  • a = Matrix{T}()
  • a = Array{T}(undef, 0, 0, 0)
  • a = [[1,2,3] [4,5,6]]: [[elements of the first column] [elements of the second column] ...].
  • a = hcat(col1, col2). By the columns.
  • a = [1 4; 2 5; 3 6]: [elements of the first row; elements of the second row; ...].
  • a = vcat(row1, row2): By the rows.
  • a = zeros(2,3) or a = ones(2,3): A 2x3 matrix filled with zeros or ones.
  • a = fill(j, 2, 3): A 2x3 matrix of identical j elements
  • a = rand(2, 3): A 2x3 matrix of random numbers

Attention to the difference:

  • a = [[1,2,3],[4,5,6]]: creates a 1-dimensional array with 2-elements.
  • a = [[1,2,3] [4,5,6]]: creates a 2-dimensional array (a matrix with 2 columns) with three elements (scalars).

Access the elements with a[row,col]. You can also make a boolean mask and apply to the matrix:

a = [[1,2,3] [4,5,6]]
mask = [[true,true,false] [false,true,false]]
println(a[mask])
# Will print [1, 2, 5]. Always flattened.

Other useful operations:

  • size(a): Returns a tuple with the sizes of the n dimensions.
  • ndims(a): Returns the number of dimensions of the array.
  • a': Transpose operator.
  • reshape(a, nElementsDim1, nElementsDim2): Reshape the elements of a in a new n-dimensional array with the dimensions given.
  • dropdims(a, dims=(dimToDrop1,dimToDrop2)): Remove the specified dimensions, provided that the specified dimension has only a single element

These last three operations performe only a shallow copy (a view) on the matrix, so if the underlying matrix changes, the view also changes. Use collect(reshape/dropdims/transpose) to force a deep copy.

Tuples

Tuples are an immutable collection of elements. Initialize with a = (1,2,3) or a = 1,2,3. Tuples can be unpacked like so: var1, var2 = (x,y). And you can convert a tuple into a vector like this: v = collect(a).

Named Tuples

Named tuples are immutable collections of items whose position in the collection (index) can be identified not only by their position but also by their name.

  • nt = (a=1, b=2.5): Define a NamedTuple
  • nt.a: Access the elements with the dot notation
  • keys(nt): Return a tuple of the keys
  • values(nt): Return a tuple of the values
  • collect(nt): Return an array of the values
  • pairs(nt): Return an iterable of the pairs (key,value). Useful for looping: for (k,v) in pairs(nt) [...] end

Dictionaries

Dictionaries are mutable mappings from keys to values. Ways to create one:

  • mydict = Dict{T,U}()
  • mydict = Dict('a'=>1, 'b'=>2, 'c'=>3)

Useful operations:

  • mydict[key] = value: Add pairs to the dictionary
  • mydict[key]: Look up value. If it doesn't exist, raises error.
  • get(mydict,'a',0): Look up value with a default value for non-existing key.
  • keys(mydict): Get all keys. Results in an iterator. Use collect() to transform into array.
  • values(mydict): Iterator of all the values.
  • haskey(mydict, 'a'): Checks if a key exists.
  • in(('a' => 1), mydict): Checks if a given key/value pair exists.
  • delete!(amydict,'akey'): Delete the pair with the specified key from the dictionary.

You can iterate over both keys and values:

for (k,v) in mydict
   println("$k is $v")
end

Sets

A set is a mutable collection of unordered and unique values. Ways to create one:

  • a = Set{T}(): Empty set
  • a = Set([1,2,2,3,4]): Initialize with values
  • push!(s, 5): Add elements
  • delete!(s,1): Delete elements
  • intersect(set1,set2), union(set1,set2), setdiff(set1,set2): Intersection, union, and difference.

Memory and Copy

Shallow copy (copy of the memory address only) is the default in Julia. Some observations:

  • a = b: This is a name binding. It binds the entity referenced by b to the a identifier. If b rebinds to some other object, a remains referenced to the original object. If the object referenced by b mutates, so does those referenced by a.
  • When a variable receives other variable: Basic types (Float64, Int64, String) are deep copied. Containers are shallow copied.
  • copy(x): Simple types are deep copied, containers of simple types are deep copied, containers of containers, the content is shadow copied (the content of the content is only referenced, not copied).
  • deepcopy(x): Everything is deep copied recursively.

Observations on types:

You can check if two objects have the same values with == and if two objects are actually the same with ===.

To cast an object into a different type:

convertedObj = convert(T,x)

Random Numbers

  • rand(): Random float in [0,1].
  • rand(a:b): Random integer in [a,b].
  • rand(a:0.01:b): Random float in [a,b] with "precision" to the second digit.
  • rand(2,3): Random 2x3 matrix.
  • rand(DistributionName([distribution parameters])): Random float in [a,b] using a particular distribution (Normal, Poisson,...). Requires the Distributions package.
  • rand(Uniform(a,b)): Random float in [a,b] using an uniform distribution.
  • import Random:seed!; seed!(1234): Sets a seed.

Basic Syntax

The typical control flow is present:

# 1 and 5 are included on this range
for i = 1:5
    println(i)
end

for j in [1, 2, 3]
    println(j)
end

# Nested loops:
for i = 1:2, j = 3:4
    println((i, j))
end

i = 0
while i < 5
    println(i)
    global i += 1
end

if x < y
    println("x is less than y")
elseif x > y
    println("x is greater than y")
else
    println("x is equal to y")
end

There are list comprehensions:

[myfunction(i) for i in [1,2,3]]

[x + 2y for x in [10,20,30], y in [1,2,3]]

mydict = Dict()
[mydict[i]=value for (i, value) in enumerate(mylist)]
# enumerate returns an iterator to tuples with the index and the value of elements in an array

[students[name] = sex for (name,sex) in zip(names,sexes)]
# zip returns an iterator of tuples pairing two or multiple lists, e.g. [("Marc","M"),("Anne","F")]

map((n,s) -> students[n] = s, names, sexes)
# map applies a function to a list of arguments

The ternary operator is present:

a ? b : c
# If a is true, then b, else c

The usual logic operators exist:

  • And: &&
  • Or: ||
  • Not: !

Functions

Functions can be declared like so:

function f(x)
    x+2
end

Function arguments are normally specified by position (positional arguments). However, if a semicolon (;) is used in the parameter list of the function definition, the arguments listed after that semicolon must be specified by name (keyword arguments).

function func(a,b=1;c=2)
    # blabla
end

# Optionally restrict the types of argument the function should accept by annotating the parameter with the type:
function func(a::Int64,b::Int64=1;c::Int64=2)
    # blabla
end

Function that can operate on some types but not others:

# This function can operate on Float64 or on a Vector of Float64.
function func(par::Union{Float64, Vector{Float64}})
    # In the body we check the type using typeof()
end

Function with variable number of arguments:

# The splat operator (...) can specify a variable number of arguments in the parameter declaration
function func(a, args...)
    # The parameter that uses the ellipsis must be the last one
    # In the body we use args as an iterator
end

Julia has multiple-dispatch. If you declare the same function with different arguments, the compiler will choose the correct function to call based on the arguments you passed. You can also do type parametrization on functions:

function f(x::T)
    x+2
end

myfunction(x::T, y::T2, z::T2) where {T <: Number, T2} = 5x + 5y + 5z

Functions are objects that can be assigned to new variables, returned, or nested:

f(x) = 2x   # define a function f inline
a = f(2)    # call f and assign the return value to a
a = f       # bind f to a new variable name (it's not a deep copy)
a(5)        # call again the (same) function

Functions work on new local variables, known only inside the function itself. Assigning the variable to another object will not influence the original variable. But if the object bound with the variable is mutable (e.g., an array), the mutation of this object will apply to the original variable as well:

function f(x,y)
    x = 10
    y[1] = 10
end

x = 1
y = [1,1]

# x will not change, but y will now be [10,1]
f(x,y)

Functions that change their arguments have their name, by convention, followed by an '!'. The first parameter is, still by convention, the one that will be modified.

Anonymous functions can be declared like so:

(x, y) -> x^2 + 2y - 1
# you can assign an anonymous function to a variable.

You can broadcast a function to work over all the elements of an array:

myArray = broadcast(i -> replace(i, "x" => "y"), myArray)

# Or like this:
f = i -> replace(i, "x" => "y")
myArray = f.(myArray)

Functions whose name is a singular symbol can be used on an infix or prefix form:

5 + 3

+(5, 3)

Custom Types

There are two type operators:

  • The :: operator is used to constrain an object of being of a given type. For example, a::B means “a must be of type B”.
  • The <: operator has a similar meaning, but it’s a bit more relaxed in the sense that the object can be of any subtypes of the given type. For example, A<:B means “A must be a subtype of B”, that is, B is the “parent” type and A is its “child” type.

You can define structures like this:

# Structs are immutable by default. Hence the mutable keyword.
# Immutable structs are much faster.
mutable struct MyStruct
  property1::Int64
  property2::String
end

# Parametrized:
mutable struct MyStruct2{T<:Number}
 property1::Int64
 property2::String
 property3::T
end

# Instantiating and accessing attribute:
myObject = MyStruct(20,"something")
a = myObject.property1 # 20

Attention to this:

  • a::B: Means "a must be of type B".
  • A<:B: Means "A must be a subtype of B".

An example of object orientation in Julia:

struct Person
  myname::String
  age::Int64
end

struct Shoes
    shoesType::String
    colour::String
end

struct Student
    s::Person
    school::String
    shoes::Shoes
end

function printMyActivity(self::Student)
    println("I study at $(self.school) school")
end

struct Employee
    s::Person
    monthlyIncomes::Float64
    company::String
    shoes::Shoes
end

function printMyActivity(self::Employee)
    println("I work at $(self.company) company")
end

gymShoes = Shoes("gym","white")
proShoes = Shoes("classical","brown")

Marc = Student(Person("Marc",15),"Divine School",gymShoes)
MrBrown = Employee(Person("Brown",45),1200.0,"ABC Corporation Inc.", proShoes)

printMyActivity(Marc)
printMyActivity(MrBrown)

Observations:

  • Functions are not associated to a type. Do not call a function over a method (myobj.func(x,y)) but rather you pass the object as a parameter (func(myobj, x, y))
  • Julia doesn't use inheritance, but rather composition (a field of the subtype is of the higher type, allowing access to its fields).

Some useful functions:

  • supertype(MyType): Returns the parent types of a type.
  • subtypes(MyType): Lists all children of a type.
  • fieldnames(MyType): Queries all the fields of a structure.
  • isa(obj,MyType): Checks if obj is of type MyType.
  • typeof(obj): Returns the type of obj.

I/O

Opening a file is similar to Python. The file closes automatically in the end:

# Write to file
open("file.txt", "w") do f  # "w" for writing, "r" for read and "a" for append.
    write(f, "test\n")      # \n for newline
end

# Read whole file:
open("file.txt", "r") do f
  filecontent = read(f,String)
  print(filecontent)
end

# Read line by line:
open("file.txt", "r") do f
    for ln in eachline(f)
        println(ln)
    end
end

# Read, keeping track of line numbers:
open("file.txt", "r") do f
    for (i,ln) in enumerate(eachline(f))
        println("$i $ln")
    end
end

Metaprogramming

TODO

Exceptions

Exceptions are similar to Python:

try
    # Some dangerous code...
catch
    # What to do if an error happens, most likely send an error message using:
    error("My detailed message")
end

# Check for specific exception:
function volume(region, year)
    try
        return data["volume",region,year]
    catch e
        if isa(e, KeyError)
            return missing
        end
        rethrow(e)
    end
end

REPL

One can load a Julia file into the REPL to experiment with it:

include("my_file.jl")

DataFrames

Examples:

# Read data from a CSV
using DataFrames, CSV
myData = CSV.read(file, DataFrame, header = 1, copycols = true, types=Dict(:column_name => Int64))

# Read data from the web:
using DataFrames, HTTP, CSV
resp = HTTP.request("GET", "https://data.cityofnewyork.us/api/views/kku6-nxdu/rows.csv?accessType=DOWNLOAD")
df = CSV.read(IOBuffer(String(resp.body)))

# Read data from spreadsheet:
using DataFrames, OdsIO
df = ods_read("spreadsheet.ods";sheetName="Sheet2",retType="DataFrame",range=((tl_row,tl_col),(br_row,br_col)))

# Empty df:
df = DataFrame(A = Int64[], B = Float64[])

Insights about the data:

  • first(df, 6)
  • show(df, allrows=true, allcols=true)
  • last(df, 6)
  • describe(df)
  • unique(df.fieldName) or [unique(c) for c in eachcol(df)]
  • names(df): Returns array of column names
  • [eltype(col) for col = eachcol(df)]: Returns an array of column types
  • size(df): (r,c); size(df)[1]: (r); size(df)[2]: (c).
  • ENV["LINES"] = 60: Change the default number of lines before the content is - truncated (default 30).
  • for c in eachcol(df): Iterates over each column.
  • for r in eachrow(df): iterates over each row.

To query the data from a DataFrame you can use the Query package. Examples:

using Query

dfOut = @from i in df begin
           @where i.col1 > 1
           @select {aNewColName=i.col1, i.col3}
           @collect DataFrame
        end
 dfOut = @from i in df begin
            @where i.value != 1 && i.cat1 in ["green","pink"]
            @select i
            @collect DataFrame
        end

References

Notes on Julia Performance

Statically typing the program, or facilitating the type inference of the JIT compiler makes the code run faster. Some notes:

  • Avoid global variables and run your performance-critical code within functions rather than in the global scope;
  • Annotate the inner type of a container, so it can be stored in memory contiguously;
  • Annotate the fields of composite types (use eventually parametric types);
  • Loop matrices first by column and then by row.

Notes on profiling :

  • To time a part of the code type @time myFunc(args) (be sure you ran that function at least once, or you will measure compile time rather than run-time).
  • @benchmark myFunc(args) (from package BenchmarkTools) also works.
  • Profile a function: Profile.@profile myfunct() (best after the function has been already ran once for JIT-compilation).
  • Print the profiling results: Profile.print() (number of samples in corresponding line and all downstream code; file name:line number; function name;)
  • Explore a chart of the call graph with profiled data: ProfileView.view() (from package ProfileView).
  • Clear profile data: Profile.clear().

Julia Plotting

The Plots package provides an unified API to several supported backends. Install the packages "Plots" and at least one backend, like PlotlyJS or PyPlot.jl. Example:

using Plots
plotlyjs()
plot(sin, -2pi, pi, label="sine function")

Notes on Python

Python is a high-level, dynamically and strongly typed, garbage-collected, general-purpose programming language. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented programming, functional programming and aspect-oriented programming (including metaprogramming and metaobjects). It is often described as a "batteries included" language due to its comprehensive standard library.

Python uses dynamic typing and a combination of reference counting and a cycle-detecting garbage collector for memory management. It uses dynamic name resolution (late binding), which binds method and variable names during program execution.

Its design offers some support for functional programming in the Lisp tradition. It has filter, map and reduce functions; list comprehensions, dictionaries, sets, and generator expressions. The standard library has two modules (itertools and functools) that implement functional tools borrowed from Haskell and Standard ML.

Its core philosophy is summarized in the Zen of Python (PEP 20), which includes aphorisms such as:

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Readability counts.

Implementations

CPython is the reference implementation of Python. It is written in C, meeting the C11 standard (beginning with Python 3.11). It compiles Python programs into an intermediate bytecode which is then executed by its virtual machine. CPython is distributed with a large standard library written in a mixture of C and native Python, and is available for many platforms.

Other implementations are:

  • PyPy, a fast, compliant interpreter of Python 2.7 and 3.8. Its just-in-time compiler often brings a significant speed improvement over CPython but some libraries written in C cannot be used with it.
  • Stackless Python, a significant fork of CPython that implements microthreads; it does not use the call stack in the same way, thus allowing massively concurrent programs. PyPy also has a stackless version.
  • Pyston, a variant of the Python runtime that uses just-in-time compilation to speed up the execution of Python programs.
  • Cinder, a performance-oriented fork of CPython 3.8 that contains a number of optimizations, including bytecode inline caching, eager evaluation of coroutines, a method-at-a-time JIT, and an experimental bytecode compiler.
  • Codon, that compiles a subset of statically typed Python to machine code (via LLVM) and supports native multithreading.
  • Cython, that compiles (a superset of) Python to C. The resulting code is also usable with Python via direct C-level API calls into the Python interpreter.
  • PyJL, that compiles/transpiles a subset of Python to "human-readable, maintainable, and high-performance Julia source code".
  • Nuitka, that compiles Python into C.
  • Numba, that uses LLVM to compile a subset of Python to machine code.
  • Pythran, that compiles a subset of Python 3 to C++ 11.

Python Deep Dive

TODO

Python Performance

TODO

Python Tips

Table of Contents

General Project Guidance

Project Layout

image

Avoid storing unit tests outside the package directory. These tests should be included in a subpackage of your software so that they aren’t automatically installed as a tests top-level module by setuptools (or some other packaging library) by accident. By placing them in a subpackage, you ensure they can be installed and eventually used by other packages so users can build their own unit tests.

Note that using setup.py is highly unadvised as it introduces arbitrary code into the build process. Also, executing the setup.py directly is deprecated.

Some optional folders can also appear:

  • etc for sample configuration files
  • tools for shell scripts or related tools
  • bin for binary scripts you’ve written that will be installed by setup.py

Organize the code based on features, not on file types. Don't create functions.py or exceptions.py files, but rather api.py or time_travel.py files.

Don't create a module folder which only contains an __init__.py file. If you create a module folder, it should contain several files that belong to it's category.

Be careful about the code that you put in the __init__.py file. This file will be called and executed the first time that a module contained in the directory is loaded. Placing the wrong things in your __init__.py can have unwanted side effects. In fact, __init__.py files should be empty most of the time. Don’t try to remove __init__.py files altogether though, or you won’t be able to import your Python module at all: Python requires an __init__.py file to be present for the directory to be considered a submodule.

Versioning

Two main ways to version Python software:

PEP 440 / PyPA Guidelines

It must obey the following regex: N[.N]+[{a|b|c|rc}N][.postN][.devN]

This means versions such as 1.2.0 and 0.4.7 are allowed. Also:

  • Version 1.3.0 is quivalent to 1.3
  • Versions matching N[.N]+ (no suffix) are considered final releases.
  • N[.N]+aN (e.g., 1.2a1) denotes an alpha release, i.e., a version that might be unstable and missing features.
  • N[.N]+bN (e.g., 1.2b1) denotes a beta release, i.e., a version that might be feature complete but still buggy.
  • N[.N]+rcN (e.g., 0.4rc1) denotes a release candidate, i.e., a version that might be released as the final product unless significant bugs emerge.
  • The suffix .postN (e.g., 1.4.post2) indicates a post release. Post releases are typically used to address minor errors in the publication process, such as mistakes in release notes. You shouldn’t use the .postN suffix when releasing a bug-fix version, instead, increment the minor version number.
  • The suffix .devN (e.g., 2.3.4.dev3) indicates a developmental release. It indicates a prerelease of the version that it qualifies: e.g., 2.3.4.dev3 indicates the third developmental version of the 2.3.4 release, prior to any alpha, beta, candidate, or final release. This suffix is discouraged because it is harder for humans to parse.

More details here.

Semantic Versioning

Given a version number MAJOR.MINOR.PATCH (X.Y.Z), increment the:

  • MAJOR version when you make incompatible API changes
  • MINOR version when you add functionality in a backward compatible manner
  • PATCH version when you make backward compatible bug fixes

Software using Semantic Versioning MUST declare a public API. This API could be declared in the code itself or exist strictly in documentation. However it is done, it SHOULD be precise and comprehensive.

A normal version number MUST take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version. Each element MUST increase numerically. For instance: 1.9.0 -> 1.10.0 -> 1.11.0.

Once a versioned package has been released, the contents of that version MUST NOT be modified. Any modifications MUST be released as a new version.

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

Version 1.0.0 defines the public API. The way in which the version number is incremented after this release is dependent on this public API and how it changes.

Patch version Z (x.y.Z | x > 0) MUST be incremented if only backward compatible bug fixes are introduced. A bug fix is defined as an internal change that fixes incorrect behavior.

Minor version Y (x.Y.z | x > 0) MUST be incremented if new, backward compatible functionality is introduced to the public API. It MUST be incremented if any public API functionality is marked as deprecated. It MAY be incremented if substantial new functionality or improvements are introduced within the private code. It MAY include patch level changes. Patch version MUST be reset to 0 when minor version is incremented.

Major version X (X.y.z | X > 0) MUST be incremented if any backward incompatible changes are introduced to the public API. It MAY also include minor and patch level changes. Patch and minor versions MUST be reset to 0 when major version is incremented.

A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version. Identifiers MUST comprise only ASCII alphanumerics and hyphens [0-9A-Za-z-]. Identifiers MUST NOT be empty. Numeric identifiers MUST NOT include leading zeroes. Pre-release versions have a lower precedence than the associated normal version. A pre-release version indicates that the version is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version. Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92, 1.0.0-x-y-z.--.

Build metadata MAY be denoted by appending a plus sign and a series of dot separated identifiers immediately following the patch or pre-release version. Identifiers MUST comprise only ASCII alphanumerics and hyphens [0-9A-Za-z-]. Identifiers MUST NOT be empty. Build metadata MUST be ignored when determining version precedence. Thus two versions that differ only in the build metadata, have the same precedence. Examples: 1.0.0-alpha+001, 1.0.0+20130313144700, 1.0.0-beta+exp.sha.5114f85, 1.0.0+21AF26D3----117B344092BD.

More details here.

Linting and Formating

Use PEP8 to ensure good style of your code:

  • Use four spaces per indentation level.
  • Limit all lines to a maximum of 79 characters (this is debatable).
  • Separate top-level function and class definitions with two blank lines.
  • Encode files using ASCII or UTF-8.
  • Use one module import per import statement and per line. Place import statements at the top of the file, after comments and docstrings, grouped first by standard, then by third party, and finally by local library imports.
  • Do not use extraneous whitespaces between parentheses, square brackets, or braces or before commas.
  • Write class names in camel case (e.g., CamelCase), suffix exceptions with Error (if applicable), name functions in lowercase with words and underscores (e.g., my_function) and use a leading underscore for _private attributes or methods.

One should run linters, type checkers and formaters directly from the code editor and on CI/CD pipelines.

Ruff

Ruff is an extremely fast Python linter and formatter, written in Rust. Ruff can be used to replace Black, Flake8 (plus dozens of plugins), isort, pydocstyle, pyupgrade, and more. It can be used on VSCode or on a pipeline.

Usage as a linter:

ruff check .                        # Lint all files in the current directory (and any subdirectories).
ruff check path/to/code/            # Lint all files in `/path/to/code` (and any subdirectories).
ruff check path/to/code/*.py        # Lint all `.py` files in `/path/to/code`.
ruff check path/to/code/to/file.py  # Lint `file.py`.
ruff check @arguments.txt           # Lint using an input file, treating its contents as newline-delimited command-line arguments.

Usage as a formatter:

ruff format .                        # Format all files in the current directory (and any subdirectories).
ruff format path/to/code/            # Format all files in `/path/to/code` (and any subdirectories).
ruff format path/to/code/*.py        # Format all `.py` files in `/path/to/code`.
ruff format path/to/code/to/file.py  # Format `file.py`.
ruff format @arguments.txt           # Format using an input file, treating its contents as newline-delimited command-line arguments.

Usage as a Github Action:

name: Ruff
on: [ push, pull_request ]
jobs:
  ruff:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: chartboost/ruff-action@v1
Configuration

Ruff can be configured through a pyproject.toml, ruff.toml, or .ruff.toml file (see: Configuration, or Settings for a complete list of all configuration options).

If left unspecified, Ruff's default configuration is equivalent to:

[tool.ruff]
# Exclude a variety of commonly ignored directories.
exclude = [
    ".bzr",
    ".direnv",
    ".eggs",
    ".git",
    ".git-rewrite",
    ".hg",
    ".ipynb_checkpoints",
    ".mypy_cache",
    ".nox",
    ".pants.d",
    ".pyenv",
    ".pytest_cache",
    ".pytype",
    ".ruff_cache",
    ".svn",
    ".tox",
    ".venv",
    ".vscode",
    "__pypackages__",
    "_build",
    "buck-out",
    "build",
    "dist",
    "node_modules",
    "site-packages",
    "venv",
]

# Same as Black.
line-length = 88
indent-width = 4

# Assume Python 3.8
target-version = "py38"

[tool.ruff.lint]
# Enable Pyflakes (`F`) and a subset of the pycodestyle (`E`)  codes by default.
select = ["E4", "E7", "E9", "F"]
ignore = []

# Allow fix for all enabled rules (when `--fix`) is provided.
fixable = ["ALL"]
unfixable = []

# Allow unused variables when underscore-prefixed.
dummy-variable-rgx = "^(_+|(_+[a-zA-Z0-9_]*[a-zA-Z0-9]+?))$"

[tool.ruff.format]
# Like Black, use double quotes for strings.
quote-style = "double"

# Like Black, indent with spaces, rather than tabs.
indent-style = "space"

# Like Black, respect magic trailing commas.
skip-magic-trailing-comma = false

# Like Black, automatically detect the appropriate line ending.
line-ending = "auto"

Some configuration options can be provided via the command-line, such as those related to rule enablement and disablement, file discovery, and logging level:

ruff check path/to/code/ --select F401 --select F403 --quiet

See ruff help for more on Ruff's top-level commands, or ruff help check and ruff help format for more on the linting and formatting commands, respectively.

Ruff supports over 700 lint rules, many of which are inspired by popular tools like Flake8, isort, pyupgrade, and others. Regardless of the rule's origin, Ruff re-implements every rule in Rust as a first-party feature.

By default, Ruff enables Flake8's F rules, along with a subset of the E rules, omitting any stylistic rules that overlap with the use of a formatter, like ruff format or Black.

If you're just getting started with Ruff, the default rule set is a great place to start: it catches a wide variety of common errors (like unused imports) with zero configuration.

For a complete enumeration of the supported rules, see Rules.

PyRight

My choice on static checking for Python. More information here.

Modules, Libraries and Frameworks

Importing

The import keyword is actually a wrapper around a function named __import__.

>>> import itertools
>>> itertools
# <module 'itertools' from '/usr/.../>

is equivalent to

>>> itertools = __import__("itertools")
>>> itertools
# <module 'itertools' from '/usr/.../>

also, it's possible to

>>> it = __import__("itertools")
>>> it
# <module 'itertools' from '/usr/.../>

Modules, once imported, are essentially objects whose attributes are objects.

sys Module

The sys module provides access to variables and functions related to Python itself and the operating system it is running on. you can retrieve the list of modules currently imported using the sys.modules variable, which is a dictionary whose key is the module name you want to inspect and whose returned value is the module object. Calling sys.modules.keys(), for example, will return the complete list of the names of loaded modules.

You can also retrieve the list of modules that are built-in by using the sys.builtin_module_names variable. The built-in modules compiled to your interpreter can vary depending on what compilation options were passed to the Python build system.

Import Paths

When importing modules, Python relies on a list of paths to know where to look for the module. This list is stored in the sys.path variable.You can change this list, adding or removing paths as necessary, or even modify the PYTHONPATH environment variable. Adding paths to the sys.path variable can be useful if you want to install Python modules to nonstandard locations, such as a test environment. Note that the list will be iterated over to find the requested module, so the order of the paths in sys.path is important.

Your current directory is searched before the Python Standard Library directory. That means that if you decide to name one of your scripts random.py and then try using import random, the file from your current directory will be imported rather than the Python module.

Useful Standard Libraries

  • atexit allows you to register functions for your program to call when it exits;
  • argparse provides functions for parsing command line arguments;
  • bisect provides bisection algorithms for sorting lists;
  • calendar provides a number of date-related functions;
  • codecs provides functions for encoding and decoding data;
  • collections provides a variety of useful data structures;
  • copy provides functions for copying data;
  • csv provides functions for reading and writing CSV files;
  • datetime provides classes for handling dates and times;
  • fnmatch provides functions for matching Unix-style filename patterns;
  • concurrent provides asynchronous computation;
  • glob provides functions for matching Unix-style path patterns;
  • io provides functions for handling I/O streams. In Python 3, it also contains StringIO, which allows you to treat strings as files;
  • json provides functions for reading and writing data in JSON format;
  • logging provides access to Python’s own built-in logging functionality;
  • multiprocessing allows you to run multiple subprocesses from your application, while providing an API that makes them look like threads;
  • operator provides functions implementing the basic Python operators, which you can use instead of having to write your own lambda expressions;
  • os provides access to basic OS functions;
  • random provides functions for generating pseudorandom numbers;
  • re provides regular expression functionality;
  • sched provides an event scheduler without using multithreading;
  • select provides access to the select() and poll() functions for creating event loops;
  • shutil provides access to high-level file functions;
  • signal provides functions for handling POSIX signals;
  • tempfile provides functions for creating temporary files and directories;
  • threading provides access to high-level threading functionality;
  • urllib provides functions for handling and parsing URLs;
  • uuid allows you to generate Universally Unique Identifiers (UUIDs);

Documentation

Your project documentation should always include the following on a README.md file:

  • The problem your project is intended to solve, in one or two sentences.
  • The license your project is distributed under. If your software is open source, you should also include this information in a header in each code file; just because you’ve uploaded your code to the Internet doesn’t mean that people will know what they’re allowed to do with it.
  • A small example of how your code works.
  • Installation instructions.
  • Links to community support, mailing list, IRC, forums, and so on.
  • A link to your bug tracker system.
  • A link to your source code so that developers can download and start delving into it right away.

Also, it's useful to have a CONTRIBUTING.md file that will be displayed when someone submits a pull request. It should provide a checklist for users to follow before they submit the PR, including things like whether your code follows PEP 8 and reminders to run the unit tests.

Some documentation software:

  • Sphinx reads Markdown (through MyST) or reStructuredText and produces HTML or PDF documentation.
  • mdBook reads Markdown and produces HTML or PDF documentation.

Documenting API Changes

Whenever you make changes to an API, the first and most important thing to do is to heavily document them so that a consumer of your code can get a quick overview of what’s changing. Your document should cover:

  • New elements of the new interface
  • Elements of the old interface that are deprecated
  • Instructions on how to migrate to the new interface

Make sure that you don’t remove the old interface right away. I recommend keeping the old interface until it becomes too much trouble to do so. If you have marked it as deprecated, users will know not to use it. Example:

class Car(object):
    def turn_left(self):
        """Turn the car left.

        .. deprecated:: 1.1
            Use :func:`turn` instead with the direction argument set to left
        """
        self.turn(direction='left')

    def turn(self, direction):
        """Turn the car in some direction.

        :param direction: The direction to turn to.
        :type direction: str
        """
        pass

Python also provides the warnings module, which allows your code to issue various kinds of warnings when a deprecated function is called. These warnings, DeprecationWarning and PendingDeprecationWarning, can be used to tell the developer that a function they’re calling is deprecated or going to be deprecated, respectively. Example:

import warnings

class Car(object):
    def turn_left(self):
        """Turn the car left.

        .. deprecated:: 1.1
            Use :func:`turn` instead with the direction argument set to left
        """
        warnings.warn("turn_left is deprecated; use turn instead", DeprecationWarning)
        self.turn(direction='left')

    def turn(self, direction):
        """Turn the car in some direction.

        :param direction: The direction to turn to.
        :type direction: str
        """
        pass

Python 2.7 and later versions, by default, do not print any warnings emitted by the warnings module.The option -W all will print all warnings to stderr, which can be a good way to catch warnings and fix them early on when running a test suite. Debtcollector can automate some of this.

Diátaxis

TODO

Generating Documentation from Docstrings

TODO

Doctesting

TODO

Release Engineering

PyPA recommends Setuptools to package Python software. For what is worth, I recommend Poetry.

Poetry

TODO

Tox

TODO

The Abstract Syntax Tree

TODO

Bibliography

  • Julien Danjou. Serious Python. No Starch Press, 2019.

Python Libraries

Python's large standard library provides tools suited to many tasks and is commonly cited as one of its greatest strengths. For Internet-facing applications, many standard formats and protocols such as MIME and HTTP are supported. It includes modules for creating graphical user interfaces, connecting to relational databases, generating pseudorandom numbers, arithmetic with arbitrary-precision decimals, manipulating regular expressions, and unit testing.

Some parts of the standard library are covered by specifications—for example, the Web Server Gateway Interface (WSGI) implementation wsgiref follows PEP 333 — but most are specified by their code, internal documentation, and test suites. However, because most of the standard library is cross-platform Python code, only a few modules need altering or rewriting for variant implementations.

Notes on Numpy

Basics

Basic functions:

import numpy as np

# Creating arrays
np.array([1000, 2300, 4987, 1500])    # Create array from list: array([1000, 2300, 4987, 1500])
np.arange(10)    # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.arange(10, 25, 5)    # array([10, 15, 20])
np.loadtxt(fname='file.txt', dtype=int)    # Creates an array of ints with the contents of file.txt
np.zeros((m, n))    # Create an array of zeros with shape (m, n)
np.ones((m, n),dtype=np.int16)    # Create an array of ones with shape (m, n) and type dtype
np.linspace(0,2,9)    # Create an array of evenly spaced values: array([0., 0.25, 0.5 , 0.75, 1., 1.25, 1.5, 1.75, 2.])
np.full((m, n), a)    # Create an array with shape (m, n) whose elements are always a
np.eye(m)    # Create an identity matrix with dimension m
np.random.random((m, n))    # Create a random matrix with dimension (m, n). Random numbers between 0 and 1
np.empty((m, n))    # Create an empty array with dimension (m, n). The data is garbage.

# Attributes
data_array.dtype    # Returns the type of the elements of data_array
data_array.shape    # Returns the dimensions of the array
data_array.ndim    # Returns the number of dimensions of the array
data_array.size    # Returns the number of elements of the array

Operations with floats and integers broadcast to the whole array:

a = np.ones(4, dtype=int)
a/2
# >>> array([0.5, 0.5, 0.5, 0.5])

Selecting data:

a = np.random.random((2, 2))
a[1, 1]    # Returns the element on the second line and second column
a[-1]    # Returns the last line

b = np.random.random((10))
b[1:4]    # Returns the elements with index 1, 2 and 3
# The syntax is array[min:max:step]. min is 0 by default, max is not included.
b[:4:2]    # Returns the elements with index 0 and 2

c = np.random.random((10, 10))
c[:, 1:3]    # Returns the columns 1 and 2 of all the lines
c[:4, ::2]    # Returns the first 4 lines (indexes 0, 1, 2 and 3) with columns jumped by 2 (indexes 0, 2, 4, 6, 8)

Verifying conditions:

a = np.random.random((4))
a > 1    # Returns a boolean array: array([False, False, False, False])
a[a > 0.5]    # Returns an array containing all numbers that obey the condition

Some methods:

a.T    # Returns the transpose of the array
a.tolist()    # Converts the array to a list
a.reshape((m, n), order='C')    # Reorganizes the array into a new array with dimensions (m, n)

# Example:
a = np.arange(10)
a.reshape((5, 2), order='C')    # >>> array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
a.reshape((5, 2), order='F')    # >>> array([[0, 5], [1, 6], [2, 7], [3, 8], [4, 9]])

# array + array is element-wise sum of arrays
# list + list is concatenation of lists

np.column_stack((a, b, c))    # Take a tuple of 1D arrays and stack them as columns to make a single 2D array.
np.sum(a[:,2])    # Returns the sum of all the elements of the slice

Some basic statistics:

np.mean(a)    # Returns the mean of all the elements of the array
np.mean(a[:, 2])    # Returns the mean of all the elements of the slice
np.mean(a, axis=0)    # Returns the mean of all the elements of the array in the axis 0

np.std(a[:,2])    # Returns the standard deviation of the slice

Notes on Rust

Table of Contents

Basics

Rust's philosophy:

  • Strictly enforcing safe borrowing of data
  • Functions, methods and closures to operate on data
  • Tuples, structs and enums to aggregate data
  • Pattern matching to select and destructure data
  • Traits to define behavior on data
// Hello world
fn main() {
    println!(“Hello!”);
}

Functions to write strings:

  • format!: write formatted text to String
  • print!: same as format! but the text is printed to the console (io::stdout).
  • println!: same as print! but a newline is appended.
  • eprint!: same as format! but the text is printed to the standard error (io::stderr).
  • eprintln!: same as eprint! but a newline is appended.

Ways to print to stdout:

fn main() {
    println!(“Today is {}”, 20);

    // By variable
    let name = "world";
    println!("Hello {name}!");

    // By positional argument
    println!("{0}, this is {1}. {1}, this is {0}", "Alice", "Bob");

    // By naming the arguments
    println!("{subject} {verb} {object}", object="the lazy dog",subject="the quick brown fox", verb="jumps over");

    // By right-aligning a text with a specified width
    println!("{number:>width$}", number=1, width=6);

    // By padding numbers with extra zeroes
    println!("{number:>0width$}", number=1, width=6);

    // By rounding floats
    let pi = 3.141592;
    println!("Pi is roughly {:.3}", pi);
}

Tests:

fn main() {
    // Checks if x is equal to y. If not, panic.
    assert_eq!(x, y);

    // Checks if x is not equal to y. If not, panic.
    assert_neq!(x, y);

    // Checks if expression is true. If not, panic.
    assert!(expression);
}

Commentaries:

fn main() {
    // Regular commentary.

    /*
        Multi-line
        Commentary
    */

    /// Commentary that will become documentation with cargo doc --open
    /// It accepts markdown
    /// And can contain tests:
    ///
    /// ```
    /// use my_lib::*;
    ///
    /// assert_eq!(3, soma(1, 2));
    /// ```
    pub fn soma(x: i32, y: i32) -> i32
    {
        x + y
    }
}

The build system and dependency manager is called cargo:

# Creates a new project
cargo new [name]

# Create a new project without git
cargo new --vcs=none [name]

# Compiles
cargo build

# Compiles with optimizations
cargo build --release

# Checks for syntax errors
cargo check

# Compiles and executes
cargo run

# Updates the dependencies. Doesn't bump major version
cargo update

# Creates documentation
cargo doc --open

# Linting
cargo fmt
cargo fix
cargo clippy

You can make Clippy pedantic by adding the following at the first line of the main.rs file:

#![allow(unused)]
#![warn(clippy::all, clippy::pedantic)]
fn main() {
}

Declaration of variables:

fn main() {
    // Immutable variable
    let a = 1;

    // Mutable variable
    let mut b = 2;

    // Constant with type annotation
    const CTE: u32 = 100;

    // It is possible to do shadowing. The final value o x is 6
    let x = 5;
    let x = x + 1;
}

For integers, we have i8, i16, i32, i64, i128 and u8, u16, u32, u64, u128 for unsigned. For floats, f32 and f64. i32 and f64 are the default types. The usual operators are present: + - * / % The underscore _ means to throw away something:

#![allow(unused)]
fn main() {
// this does nothing because 42 is a constant
let _ = 42;

// this calls `get_thing` but throws away its result
let _ = get_thing();

// Starting with an underscore means the compiler  won't warn about them being unused:
let _x = 42;
}

Tuples:

fn main() {
    // Doesn't grow in size
    let tup: (i32, f64, u8) = (500, 6.4, 1);

    // Destructuring
    let (x, y, z) = tup;
    println!("The value of y is: {}", y);

    // Destructuring only a part of it
    let (_, _, one) = tup;

    // By index
    let five_hundred = tup.0;
}

Arrays:

fn main() {
    // Doesn't grow in size. All elements of the same type.
    let a = [1, 2, 3, 4, 5];
    let first = a[0];

    // Creates an array of 5 elements, each of which is 3
     let a = [3; 5];
}

Structs:

#![allow(unused)]
fn main() {
// Declaration
struct Vec2 {
    x: f64,
    y: f64,
}

// Initialization
// The order does not matter, only the names do
let v1 = Vec2 { x: 1.0, y: 3.0 };
let v2 = Vec2 { y: 2.0, x: 4.0 };

// Initializing the rest of the fields from another struct
let v3 = Vec2 { x: 14.0, ..v2 };
let v4 = Vec2 { ..v3 };

// Destructuring
// `x` is now 1.0, `y` is now `3.0`
let Vec2 { x, y } = v1;

// This throws away `v.y`
let Vec2 { x, .. } = v;

// A tuple struct
struct Point2D(u32, u32);

// A unit struct
struct Unit;
}

Tuple structs are similar to classic structs, but their fields have no names. For accessing individual variables, the same syntax is used as with regular tuples, namely, foo.0, foo.1, and so on, starting at zero. Unit structs are most commonly used as markers. They're useful when you need to implement a trait on something but don't need to store any data inside it.

fn main() {
    // Instantiate a classic struct, with named fields. Order does not matter.
    let person = Person {
        name: String::from("Adam"),
        likes_oranges: true,
        age: 25
    };

    // Instantiate a tuple struct by passing the values in the same order as defined.
    let origin = Point2D(0, 0)

    // Instantiate a unit struct.
    let unit = Unit;
}

Functions:

#![allow(unused)]
fn main() {
fn my_function(arg1: type1, arg2: type2) -> return_type
{
​   //body
}
}

You can declare methods on your own types:

#![allow(unused)]
fn main() {
struct Number {
    odd: bool,
    value: i32,
}

impl Number {
    fn is_strictly_positive(self) -> bool {
        self.value > 0
    }
}
}

Traits are something multiple types can have in common. You can implement:

  • One of your traits on anyone's type
  • Anyone's trait on one of your types
  • But not a foreign trait on a foreign type
#![allow(unused)]
fn main() {
struct Number {
    odd: bool,
    value: i32,
}

trait Signed {
    fn is_strictly_negative(self) -> bool;
}

// Our trait on our type
impl Signed for Number {
    fn is_strictly_negative(self) -> bool {
        self.value < 0
    }
}

// Our trait on a foreign type
impl Signed for i32 {
    fn is_strictly_negative(self) -> bool {
        self < 0
    }
}

// A foreign trait on our type:
// The `Neg` trait is used to overload `-`, the unary minus operator.
// An impl block is always for a type, so, inside that block, Self means that type.
impl std::ops::Neg for Number {
    type Output = Self;

    fn neg(self) -> Self {
        Self {
            value: -self.value,
            odd: self.odd,
        }
    }
}
}

Some traits are markers - they don't say that a type implements some methods, they say that certain things can be done with a type:

fn main() {
    // i32 implements trait Copy (in short, i32 is Copy):
    let a: i32 = 15;
    let b = a; // `a` is copied
    let c = a; // `a` is copied again
    print_i32(a); // `a` is copied again

    // The Number struct is not Copy, so this doesn't work:
    let n = Number { odd: true, value: 51 };
    let m = n; // `n` is moved into `m`
    let o = n; // error: use of moved value: `n`

    // It works if print_number takes an immutable reference instead:
    print_number(&n); // `n` is borrowed for the time of the call
    print_number(&n); // `n` is borrowed again

    // It also works if a function takes a mutable reference - but only if our variable binding is also mut:
    let mut m = Number { odd: true, value: 51 };
    print_number(&m);
    invert(&mut m); // `m is borrowed mutably - everything is explicit
    print_number(&m);
}

fn print_i32(x: i32) {
    println!("x = {}", x);
}

fn print_number(n: &Number) {
    println!("{} number {}", if n.odd { "odd" } else { "even" }, n.value);
}

fn invert(n: &mut Number) {
    n.value = -n.value;
}

Trait methods can also take self by reference or mutable reference:

#![allow(unused)]
fn main() {
impl std::clone::Clone for Number {
    fn clone(&self) -> Self {
        Self { ..*self }
    }
}

// Marker traits like Copy have no methods:
// `Copy` requires that `Clone` is implemented too.
// Number values will no longer be moved, but copied.
impl std::marker::Copy for Number {}

}

When invoking trait methods, the receiver is borrowed implicitly:

fn main() {
    let n = Number { odd: true, value: 51 };
    let mut m = n.clone();
    m.value += 100;

    print_number(&n);
    print_number(&m);
}

Some traits are so common, they can be implemented automatically by using the derive attribute:

#![allow(unused)]
fn main() {
#[derive(Clone, Copy)]
struct Number {
    odd: bool,
    value: i32,
}
}

Functions can be generic:

#![allow(unused)]
fn main() {
fn foobar<T>(arg: T) {
    // do something with `arg`
}

// Multiple type parameters:
fn foobar<L, R>(left: L, right: R) {
    // do something with `left` and `right`
}
}

Type parameters can have constraints:

fn print<T: Display>(value: T) {
    println!("value = {}", value);
}

fn print<T: Debug>(value: T) {
    println!("value = {:?}", value);
}

// Longer syntax:
fn print<T>(value: T)
where
    T: Display,
{
    println!("value = {}", value);
}

// If you want multiple constraints:
use std::fmt::Debug;

fn compare<T>(left: T, right: T)
where
    T: Debug + PartialEq,
{
    println!("{:?} {} {:?}", left, if left == right { "==" } else { "!=" }, right);
}

fn main() {
    compare("tea", "coffee"); // prints: "tea" != "coffee"
}

Generic functions can be navigated using ::

fn main() {
    use std::any::type_name;

    // Turbofish syntax
    println!("{}", type_name::<i32>()); // prints "i32"
    println!("{}", type_name::<(f64, char)>()); // prints "(f64, char)"
}

Structs can be generic:

struct Pair<T> {
    a: T,
    b: T,
}

fn print_type_name<T>(_val: &T) {
    println!("{}", std::any::type_name::<T>());
}

fn main() {
    let p1 = Pair { a: 3, b: 9 };
    let p2 = Pair { a: true, b: false };
    print_type_name(&p1); // prints "Pair<i32>"
    print_type_name(&p2); // prints "Pair<bool>"

    // Vec is generic
    let mut v1 = Vec::new();
    v1.push(1);
    let mut v2 = Vec::new();
    v2.push(false);
}

Enums are types that can only have some specific values

fn main() {
    enum WebEvent {
    // An `enum` may either be `unit-like`,
    PageLoad,
    PageUnload,

    // like tuple structs,
    KeyPress(char),
    Paste(String),

    // or c-like structures.
    Click { x: i64, y: i64 },
    }
}

Vectors:

fn main() {
    // Vec is generic
    let mut v1 = Vec::new();
    v1.push(1);
    let mut v2 = Vec::new();
    v2.push(false);

    // Literals:
    let v1 = vec![1, 2, 3];
    let v2 = vec![true, false, true];
}

Hash maps:

The type HashMap<K, V> stores a mapping of keys of type K to values of type V.

fn main() {
    use std::collections::HashMap;

    let mut contacts = HashMap::new();

    contacts.insert("Daniel", "798-1364");
    contacts.insert("Ashley", "645-7689");
    contacts.insert("Katie", "435-8291");
    contacts.insert("Robert", "956-1745");

    // Search for "Daniel" on contacts
    match contacts.get(&"Daniel") {
        Some(&number) => println!("Calling Daniel: {}", call(number)),
        _ => println!("Don't have Daniel's number."),
    }

    // Remove "Ashley" from contacts
    contacts.remove(&"Ashley");

    // `HashMap::iter()` returns an iterator that yields
    // (&'a key, &'a value) pairs in arbitrary order.
    for (contact, &number) in contacts.iter() {
        println!("Calling {}: {}", contact, call(number));
    }
}

Conditional:

#![allow(unused)]
fn main() {
if number % 4 == 0 {
    println!("number is divisible by 4");
} else if number % 3 == 0 {
    println!("number is divisible by 3");
} else if number % 2 == 0 {
    println!("number is divisible by 2");
} else {
    println!("number is not divisible by 4, 3, or 2");
}

let number = if condition {
    5
} else {
    6
};
}

Match:

// Similar to switch in C, but more powerful
match feeling_lucky {
    true => 6,
    false => 4,
}

/*******************************************************/
struct Number {
    odd: bool,
    value: i32,
}

fn main() {
    let one = Number { odd: true, value: 1 };
    let two = Number { odd: false, value: 2 };
    print_number(one);
    print_number(two);
}

fn print_number(n: Number) {
    match n {
        Number { odd: true, value } => println!("Odd number: {}", value),
        Number { odd: false, value } => println!("Even number: {}", value),
    }
}

/*******************************************************/
// A match has to be exhaustive: at least one arm needs to match.
fn print_number(n: Number) {
    match n {
        Number { value: 1, .. } => println!("One"),
        Number { value: 2, .. } => println!("Two"),
        Number { value, .. } => println!("{}", value),
        // If the last arm didn't exist, we would get a compile-time error
    }
}

/*******************************************************/
// _ can be used as a "catch-all" pattern:
fn print_number(n: Number) {
    match n.value {
        1 => println!("One"),
        2 => println!("Two"),
        _ => println!("{}", n.value),
    }
}

Loop:

#![allow(unused)]
fn main() {
// Infinite loop
loop {
    // body
}

// result is 20
let result = loop {
    counter += 1;
    if counter == 10 {
        break counter * 2;
    }
};

while number != 0 {
    println!("{}!", number);
    number -= 1;
}

let a = [10, 20, 30, 40, 50];
for element in a.iter() {
    println!("the value is: {}", element);
}

for number in (1..4).rev() {
    println!("{}!", number);
}
}

A pair of brackets declares a block, which has its own scope:

// This prints "in", then "out"
fn main() {
    let x = "out";
    {
        // this is a different `x`
        let x = "in";
        println!(x);
    }
    println!(x);
}

Blocks are also expressions, which mean they evaluate to a value.

#![allow(unused)]
fn main() {
// This:
let x = 42;

// Is equivalent to this:
let x = { 42 };
}

Inside a block, there can be multiple statements:

#![allow(unused)]
fn main() {
let x = {
    let y = 1; // First statement
    let z = 2; // Second statement
    y + z // This is the tail - what the whole block will evaluate to
};
}

That's why "omitting the semicolon at the end of a function" is the same as returning.

Importing and Namespaces

use directives can be used to "bring in scope" names from other namespace:

#![allow(unused)]
fn main() {
// std is a crate (~ a library), cmp is a module (~ a source file), and
// min is a function:

use std::cmp::min;
let least = min(7, 1); // This is 1
}

Within use directives, curly brackets have another meaning: they're globs". If we want to import both min and max , we can do any of these:

#![allow(unused)]
fn main() {
// this works:
use std::cmp::min;
use std::cmp::max;

// this also works:
use std::cmp::{min, max};

// this also works!
use std::{cmp::min, cmp::max};
}

A wildcard ( * ) lets you import every symbol from a namespace:

#![allow(unused)]
fn main() {
// This brings `min` and `max` in scope, and many other things
use std::cmp::*;
}

Panic, Options and Result

#![allow(unused)]
fn main() {
// Is a macro that violently stops execution with an error message, and the file name / line number of the error
panic!("Error message");

// Option is a type that contains something, or nothing. If .unwrap() is called on it, and it contains nothing, it panics
enum Option<T> {
    None,
    Some(T),
}

let o1: Option<i32> = Some(128);
o1.unwrap(); // this is fine

let o2: Option<i32> = None;
o2.unwrap(); // this panics!

// Result is an enum that can either contain something, or an error. It also panics when unwrapped and containing an error
enum Result<T, E> {
    Ok(T),
    Err(E),
}

// get returns an option with the value if the index is inside of bounds
let fruits = vec!["banana", "apple", "coconut", "orange", "strawberry"];
    for index in 0..10 {
        match fruits.get(index) {
            Some(fruit_name) => println!("It's a delicious {}!", fruit_name),
            None => println!("There is no fruit! :("),
        }
    }

let number = Some(7);
// The `if let` construct reads: "if `let` destructures `number` into
// `Some(i)`, evaluate the block (`{}`).
if let Some(i) = number {
    println!("Matched {:?}!", i);
}
}

Lifetime

Variables have lifetimes:

fn main() {
    // `x` doesn't exist yet
    {
        // `x` starts existing
        let x = 42;

        // `x_ref` starts existing - it borrows `x`
        let x_ref = &x;

        println!("x = {}", x);

        // `x_ref` stops existing
        // `x` stops existing
    }
    // `x` no longer exists
}

The lifetime of a reference cannot exceed the lifetime of the variable binding it borrows:

fn main() {
    let x_ref = {
        let x = 42;
        &x
    };
    println!("x_ref = {}", x_ref);
    // error: `x` does not live long enough
}

Memory

Stack and Heap

Both the stack and the heap are parts of memory that are available to your code to use at runtime, but they are structured in different ways. The stack is last in, first out. Adding data is called pushing onto the stack, and removing data is called popping off the stack. All data stored on the stack must have a known, fixed size. Data with an unknown size at compile time or a size that might change must be stored on the heap instead. The heap is less organized: when you put data on the heap, you request a certain amount of space. The operating system finds an empty spot in the heap that is big enough, marks it as being in use, and returns a pointer, which is the address of that location. This process is called allocating on the heap and is sometimes abbreviated as just allocating. Pushing values onto the stack is not considered allocating. Because the pointer is a known, fixed size, you can store the pointer on the stack, but when you want the actual data, you must follow the pointer. Pushing to the stack is faster than allocating on the heap because the operating system never has to search for a place to store new data; that location is always at the top of the stack. Comparatively, allocating space on the heap requires more work, because the operating system must first find a big enough space to hold the data and then perform bookkeeping to prepare for the next allocation. When your code calls a function, the values passed into the function (including, potentially, pointers to data on the heap) and the function’s local variables get pushed onto the stack. When the function is over, those values get popped off the stack. The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer. The heap is memory set aside for dynamic allocation. Unlike the stack, there's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns. Each thread gets a stack, while there's typically only one heap for the application (although it isn't uncommon to have multiple heaps for different types of allocation). The OS allocates the stack for each system-level thread when the thread is created. Typically the OS is called by the language runtime to allocate the heap for the application. The stack is attached to a thread, so when the thread exits the stack is reclaimed. The heap is typically allocated at application startup by the runtime, and is reclaimed when the application (technically process) exits. The size of the stack is set when a thread is created. The size of the heap is set on application startup, but can grow as space is needed (the allocator requests more memory from the operating system). The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or deallocation. Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor's cache, making it very fast. Another performance hit for the heap is that the heap, being mostly a global resource, typically has to be multi-threading safe, i.e. each allocation and deallocation needs to be - typically - synchronized with "all" other heap accesses in the program.

Ownership

Each value in Rust has a variable that’s called its owner. There can only be one owner at a time. When the owner goes out of scope, the value will be dropped. Copies of heap objects are shallow, meaning that a new variable is just another pointer (on the stack) to the same heap address. The first pointer becomes stale.

#![allow(unused)]
fn main() {
let s1 = String::from("Hello!");
let s2 = s1;
}

Stack variables are deep copied. They have a fixed size known in compile time. As a general rule, any group of simple scalar values can be Copy, and nothing that requires allocation or is some form of resource is Copy. Ex.: integers, floats, booleans, characters and tuples of these types. Passing a variable to a function will move or copy, just as assignment does:

fn main() {
    // s comes into scope
    let s = String::from("hello");

    // s's value moves into the function and so is no
    // longer valid here
    takes_ownership(s);
​
    // x comes into scope
    let x = 5;
​
    // x would move into the function, but i32 is Copy,
    // so it’s okay to still use x afterward
    makes_copy(x);
}
// Here, x goes out of scope, then s. But because s's
// value was moved, nothing special happens.

fn takes_ownership(some_string: String) {
​    // some_string comes into scope
    println!("{}", some_string);
}​
// Here, some_string goes out of scope and `drop` is
// called. The backing memory is freed.

fn makes_copy(some_integer: i32) {
    // some_integer comes into scope
    println!("{}", some_integer);
}
// Here, some_integer goes out of scope. Nothing
// special happens.

The ownership of a variable follows the same pattern every time: assigning a value to another variable moves it. When a variable that includes data on the heap goes out of scope, the value will be cleaned up by drop unless the data has been moved to be owned by another variable. At any given time, you can have either one mutable reference or any number of immutable references. References must always be valid.

References