Introduction to Julia

Author

Haden Bunn

1 Getting started

1.1 The Julia REPL

Opening the Pumas IDE automatically starts a Julia session and displays the interactive command-line REPL (read-eval-print-loop) in the terminal panel. This mode has several useful commands.

ctrl + c interrupts computation
? enter help mode
; enter shell mode
] enter package manager mode
ctrl + l clears screen
; after an expression will prevent its value from showing in the REPL (has no effect in scripts)

1.2 Variables

1.2.1 Simple expressions

A value can be assigned to a new variable with the assignment operator, =.

x = 5              # x is bound to an Int64 value on this 64 bit machine
x, y = 5, 10       # possible to assign multiple values on the same line

1.2.2 Compound expressions

Compound expressions are defined using a begin block or the chain constructor (;). This allows a single expression to evaluate multiple subexpressions and assign the final result to a variable.

# begin block
y = begin           # after: x = 10; y = 110
    x = 10
    x + 100
end

# ; chain
b = (a = 5; a + 30) # after: a = 5; b = 35

1.2.3 Naming restrictions

Variable names can use any combination of Unicode combination of character(s) with the exception of a few reserved keywords and system-defined variables (e.g., if, end, π).

x = 5
❌ = 5              # accessed with \:x:
χ = 5               # accessed with \chi

x + ❌ + χ

1.2.4 Namespace considerations

A list of global variables and their types can be obtained for the current session using the varinfo function.

varinfo()

Julia does not provide a method to remove variables, functions, and other objects from the current session’s namespace without restarting the session (Stop REPL, then Start REPL from the Command Palette). The current recommended alternative is to build a workflow around modules, but that is an advanced topic that can be revisited later. In practical terms, this only becomes relevant in more advanced workflows, but we mention it here so that the reader will have a resource to return to if needed.

1.3 Operators

This section includes a brief overview of common operators.

1.3.1 Mathematical operators

Julia implements the standard mathematical operators.

1 + 1               # Addition
2 - 4.0             # Subtraction
5 * 7               # Multiplication; also 5(7)
20 / 2              # Division
3^2                 # Exponentiation
(1 + 2)^3 * 2 + 1   # PEMDAS applies
3 % 2               # modulo
sqrt(4)             # square root
exp(log(1))         # base e exponentiation, log-transformation

1.3.2 Comparison operators

Standard comparison operators are available.

1 > 0               # greater than
1 ≥ 1               # greater than or equal to
0 < 1               # less than
0 ≤ 0               # less than or equal to
1 == 1              # equality
1 != 0              # inequality
1 === 1.0           # identity

Unlike most other languages, an arbitrary number of comparisons can be chained together.

1 < 2 <= 2 < 3 == 3     # true

1.3.3 Logical operators

The && and || operators are used for logical “and” and “or” operations. They also use short-circuit evaluation (i.e., they don’t necessarily evaluate their second argument). For example:

In a && b, b is only evaluated if a is true.
In a || b, b is only evaluated if a is false.

Short-circuit evaluation provides a compact alternative syntax for writing very short if statements.

<cond> && <statement> instead of if <cond> <statement> end
<cond> || <statement> instead of if !<cond> <statement> end

Warning

Julia provides bitwise versions of these operator (| and &), and while they can be used for logical operations, we advise that beginners stick with || and &&. For more detail, please refer to the documentation sections on bitwise operators and short-circuit evaluation.

Lastly, note that both operators associate to the right, but && has higher precedence than ||.

true && false || true && false      # false

2 Types

Julia’s type system is complex, but it is worth developing a basic understanding of that system before moving on to more advanced concepts.

2.1 Basic types

In programming, a literal is any notation for representing a value (e.g., number, string, boolean, character) in source code. In contrast, identifiers refer to a value in memory. Julia’s type system implements the following basic scalar literals, all of which are immutable:

1::Int       # 64 bit integer on 64 bit Julia
1.0::Float64 # 64 bit float, defines NaN, -Inf, Inf
true::Bool   # boolean, allows "true" and "false"
'c'::Char    # character, allows Unicode
"s"::String  # strings, allows Unicode, see also Strings below

Normally, type assertion is not needed for basic literal values. It was included above to highlight each type.

The type assertion operator (::) in the x::Type syntax asserts that the literal value x is of type Type. Type assertions for variables are made using similar syntax (x::Int = 10) and they can used to catch bugs in your code (more on that later).

2.2 Abstract types

Julia relies on abstract types to organize its type system into a conceptual hierarchy. The technical details are presented here, but the basics can be understood with a simple example. Consider the hierarchy for numerical types shown in the tree diagram below.

flowchart TD
    Z[Any] --> A[Number]
    A[Number]:::abType --> B[Complex]
    A[Number] --> C[Real]
    C[Real]:::abType --> D[Irrational]
    C[Real] --> E[Rational]
    C[Real] --> F[Integer]
    F[Integer]:::abType --> G[Bool]
    F[Integer] --> H[Signed]
    H[Signed]:::abType --> I[BigInt]
    H[Signed] --> J[Int64\n...]
    F[Integer] --> L[Unsigned]
    L[Unsigned]:::abType --> M[UInt64\n...]
    C[Real] --> N[AbstractFloat]
    N[AbstractFloat]:::abType --> O[BigFloat]
    N[AbstractFloat] --> P[Float64\n...]
    classDef abType fill:#f96

There is a lot of information to unpack in this diagram, but for now, it is sufficient to understand the following concepts:

All data types in Julia are a subtype of Any.
Each shaded cell (e.g., Number, Real) corresponds to an AbstractType object.
Abstract types act as organizational nodes for the types below them.
- Real and Complex numbers are both a subtype of the AbstractType, Number.
- While Number is both an AbstractType, and a supertype of Real and Complex.

2.3 Conversion

There is no automatic type conversion in Julia. The simplest, and preferred way to convert a value x to type T is by writing T(x) using the appropriate constructor.

Int64('a')      # character to integer
Int64(2.0)      # float to integer
Int64("a")      # error no conversion possible
Float64(1)      # integer to float
Bool(1)         # constructs to boolean true
Bool(0)         # constructs to boolean false
Bool(2)         # construction error
Char(89)        # integer to char
string(true)    # cast Bool to string (works with other types, note small caps)

There are edge cases where additional steps are required. For example, some Float values cannot be converted directly to Int using Int64(x).

Int64(1.3) # throws inexact conversion error

Instead, these values must be rounded using one of the following:

floor(Int64, 1.3)
ceil(Int64, 1.3)
round(Int64, 1.3)

Parsing a string to a number is a common task and is accomplished with parse(Type, str).

parse(Int64, "1") # parse "1" string as Int64

2.4 Promotion

Many operations (arithmetic, assignment) are defined in a way that performs automatic type promotion (to a common type, assuming one exists) as a work around for the lack of automatic conversion in Julia. While the user will not usually need to perform this promotion themselves, it can be achieved using promote

promote(true, BigInt(1) // 3, 1.0) # tuple (see Tuples) of BigFloats, true promoted to 1.0
promote("a", 1)                    # error, promotion to a common type is not possible

2.5 Special types

There are a few noteworthy “special” types to consider.

Union{} # subtype of all types, no object can have this type
Nothing # type indicating nothing (absence of a value), a subtype of Any
nothing # only instance of Nothing
Missing # type indicating missing value (a value exists but is unknown), a subtype of Any
missing # only instance of Missing

2.5.1 Missing values

Missing values are represented by the missing object and they are propagated automatically when passed to standard mathematical functions.

missing + 1     # missing
abs(missing)    # missing

Missing values also propagate through most comparison operators.

missing > 1             # missing
missing == missing      # missing; must use ismissing()

There are three notable exceptions to the propagation rule (===, isequal, isless). The identity operator (===) and isequal function always return a Bool value and can be used to test for missing values; however, ismissing is the preferred method.

missing === missing     # true; === always returns a Bool
isequal(1, missing)     # false; similar to == except for NaN, missing, -0.0 and 0.0

missing values are considered as greater than any other values. This also applies when sorting a collection that contains missing values.

isless(1, missing)      # true
isless(missing, Inf)    # false

2.6 Type verification

There are several ways to verify a value’s type.

typeof("abc")              # String returned which is a AbstractString subtype
isa("abc", AbstractString) # true
isa(1, Float64)            # false, integer is not a float
isa(1.0, Float64)          # true
1.0 isa Number             # an alternative syntax; true, Number is abstract type
supertype(Int64)           # supertype of Int64
subtypes(Real)             # subtypes of abstract type Real
Int <: Real                # true, <: checks if type is subtype of other type

2.7 Composite Types

This is an advanced topic, but composite (“user-defined”) types are very common in Julia, so we have included a basic description here (see the documentation for more detail.)

Composite types are defined using the struct keyword, and are immutable by default. The mutable keyword can be added to the definition if needed. structs are typically given a name and a set fields to be populated.

mutable struct Patient
    age::Int64
    wt::Float64
    ht::Real
end

p = Patient(25, 80.5, 182)
p.age             # access field
p.age = 6         # change field value
p.ht = "182"      # error, wrong data type
p.sex = "Male"    # error - no such field
fieldnames(Patient) # get names of type fields

3 Strings

This section covers the basics of working with strings. Check out the documentation for a complete discussion.

3.1 Construction

The built-in concrete type for strings and string literals in Julia is String, and it supports the full range of Unicode characters via UTF-8 encoding. There is also the SubString type which is used to avoid copying strings during certain operations. Both String and SubString are subtypes of AbstractString. Usually, when writing your own code, it is best to assume that the user will pass an arbitrary AbstractString.

String literals are defined using double or triple quotes. Triple-quoted strings special properties in addition to allowing "" within a string; see the documentation for details.

"Double quotes for simple strings."

"""Triple quotes when "quoting" is needed."""

'Not allowed'   # error, single quotes are used for defining Chars.

Long strings can be broken up by adding a \ before the newline.

"This is a \
long string"

The \ is also used to escape special characters within a string (i.e., convert a special character to a string literal).

println("string \t without \n escapes.")      # string with tab (\t) and newline (\n)
println("string \\t with \\n escapes.")       # special characters escaped with \

You can also create raw"" string literals which will treat most special characters, except double-quotes as literal values.

println(raw"string \t without \n escapes")    # escapes inserted by raw""

Caution

It is possible to index into a string ("abc"[1]); however, since Julia encodes standard strings using UTF-8, indexing is based on bytes, not characters. So correct string indexing requires you to understand how UTF-8 encoding works. See the documentation for details.

3.2 Concatenation and interpolation

Strings can be concatenated using the string function or the * operator.

y = 2025
m = "01"
d = 23
t = "09:15:00"
string(y, "-", m, "-", d, " ", t)         # entire expression in a single string call
string(y) * "-" * m * "-" * string(d) * " " * t  # using *

Concatenation can be cumbersome when multiple string calls or * operators are needed. Interpolation using the $ operator offers a more readable alternative.

"$y-$m-$d $t"       # example for previous expression
"1 + 2 = $(1 + 2)"  # complex operations can be enclosed in parentheses
"\$1,000"           # to get $ symbol, you must escape it

3.3 Common operations

Several functions are available to search for substrings; most are case-sensitive.

s = "Pharmacometrics"
findfirst("a", s)       # if found returns range of indices, else nothing 
findfirst('a', s)       # returns an integer index where Char is located
findlast("m", s)        # range for last result
findnext("r", s, 4)     # range for next result at index ≥ 4
findprev("a", s, 6)     # range for previous result at ≤ 6
occursin("Pharma", s)   # true
occursin("pharma", s)   # false

In the last example, the lowercase “p” in “pharma” causes the return value to be false. As a workaround the search string can either be normalized to remove casing, or the search pattern can be altered to be case-insensitive.

occursin("pharma", lowercase(s))  # true, see also uppercase, titlecase, and others
occursin(r"pharma"i, s)           # true, uses regex (see below) with a flag, i,
# to make the search case-insensitive

Strings can be repeated.

"TA"^3            # repeat string
repeat("TA", 3)     # same result

Any iterator can be joined into a single string with the join function.

join([1, 2, 3, 4], ", ", " and ")  # see ?join for syntax details

The chop function can be used to remove n characters from the head and/or tail of a string.

s = "Pharmacometrics"
chop(s)     # removes `s` from end of string
chop(s; head = 3, tail = 3) # removes the first and last 3 chars from string

There are times when having a string to represent an object is useful. The repr function can be used to create a string from any value using the show function.

zeros(Int64, 2, 2)        # 2x2 matrix
repr(zeros(Int64, 2, 2)) # returns "[0 0; 0 0]"; easy to copy/paste elsewhere

3.4 Regular expressions

Regular expressions (regex) are a powerful tool to search for patterns in a string instead of specific values. While a full overview of regex is beyond the scope of this document, we have included a basic example below, and encourage the reader to review the documentation for more information.

r = r"A|B"           # create new regexp
occursin(r, "CD")    # false, no match found
m = match(r, "ACBD") # find first regexp match, see the documentation for details

4 Data structures

Each data structure (i.e., “collection”) discussed in this section is also a type.

Tuple isa Type      # true
NamedTuple isa Type # true
Dict isa Type       # true
Array isa Type      # true

Broadly speaking, each collection can be described as (un)ordered, and (im)mutable. Ordered collections support indexing and mutable collections can be modified after being created. General guidance on how to choose an appropriate collection will be provided at the end of this section.

4.1 Tuples and NamedTuples

Tuples are immutable and ordered (indexed for 1). They can hold a mixture of value types.

4.1.1 Construction

Tuple literals are defined with commas and parentheses, or by using the tuple function.

tuple([1, 2, 3])  # a 1-element tuple containing a vector
([1, 2, 3],)      # same tuple, trailing `,` required
()              # empty tuple
('a', false)::Tuple{Char,Bool} # tuple type assertion

Tuples can also be constructed from an iterator using the Tuple type constructor. (note the difference between Tuple and tuple).

Tuple([1, 2, 3])  # 3-element tuple after unpacking the array

NamedTuples are constructed using a similar syntax that will allow you to name each tuple element using a Symbol.

NamedTuple()    # constructor create an empty named tuple
(a = 1,)          # a one element named tuple, trailing `,` required
(x = "a", y = 1)    # a two element named tuple

Names can also be generated programmatically.

(; a, b, c)     # convenience syntax to create a named tuple with a, b, c fields
# from variables

4.1.2 Common operations

Tuple and named tuple elements can be accessed via indexing, but neither can be modified.

x = (1, 2, 3)
x[1]            # 1 (element)
x[1:2]          # (1, 2) (tuple)
x[4]            # bounds error
x[1] = 1        # error

# named tuple
y = (a = 1, b = 2, c = 3)
y[1]            # 1 (element)
y.a             # 1; dot syntax
y[1] = 1        # error - a tuple is not mutable

Elements from both types of tuple can be “unpacked”. The number of variables provided must match the number of elements or the remaining elements will be discarded. Non-sequential elements can be accessed by including _.

a, b = x        # a = 1, b = 2, last element is dropped
a, _, b = x     # a = 1, b = 3, skips second element

The key/value pairs inside a named tuple can be accessed using the keys and values functions, respectively.

keys(x)         # returns iterator of keys in x
values(x)       # returns iterator of values in x

While both types of tuple are immutable, they can be modified using the merge function, but this should be used sparingly. If routine modification is needed, a mutable data type should be used.

x = (; a = 1)
y = (; b = 2, c = 3)
merge(x, y, (; c = 4))  # merge y into x while also modifying c

4.2 Dictionaries

A dictionary is a unordered, mutable collection of key-value pairs.

4.2.1 Construction

Dictionaries are defined using the Dict constructor and a series of key/value pairs. Each key/value pair is built using the pair operator (=>), which is equivalent to the Pair function. Any data type can be used as a key value, but the value must be unique. The paired value need not be of the same type.

# creation
Dict()                        # an empty dictionary
Dict("a" => 1, "b" => 2)          # a filled dictionary
Dict{Float64,Int64}()        # an empty dictionary mapping floats to integers

New entries can be added using the assignment operator.

d = Dict("a" => 1)          # dictionary with single entry
d["b"] = 2                  # add new entry for key "b" with value 2

4.2.2 Common operations

Dictionaries are mutable, but they do not support indexing.

d[1]            # error
d["b"] = 3      # value of "b" updated to 3

The keys and values functions also work on dictionaries.

keys(d), values(d)      # returns tuple of iterators for keys and values in y

To check whether a specific key already exists, use haskey or get.

haskey(d, "b")              # check if d contains key "b"
get(d, "c", "default")      # return d["c"] or "default" if not haskey(d, "c")

Dictionary entries can be removed with the delete! function.

delete!(d, "b")             # delete a key from a collection, see also: pop!

4.3 Arrays

Arrays are mutable and ordered. They can contain objects of type Any, but most of the time they should contain objects of a more specific type (e.g., Float64 or String).

4.3.1 Construction

There are many ways to construct an array, starting with the Array, Vector and Matrix type constructors below.

Array{T}(undef, dims...)    # uninitialized dense Array
Vector{T}(undef, n)         # one-dimensional dense array of length n
Matrix{T}(I, m, n)          # m by n identity matrix; requires using LinearAlgebra for I

The dims... argument is common among functions that create arrays and it can accept either a single Tuple of dimension sizes, or a series sizes passed as a variable number of arguments.

Array{Float64}(undef, (2, 2))    # uninitialized 2x2 matrix of type Float64
Array{Float64}(undef, 2, 2)      # as above, equivalent syntax

There are several convenience functions that make it easy to quickly define and populate N-dimensional arrays. Many of these accept the dims... argument and allow for type specification with their first argument (T). In cases where type specification is allowed but omitted, the default is Float64.

zeros(Int8, 2, 3)       # 2x3 matrix of Int8 zeros
trues(3)                # BitVector with all true, see also falses(dims...)
fill("a", 3)            # vector filled with "a"
randn(5, 2, 2)          # 5x2x2 array w/ random standard normally distributed values

Arrays literals can also be constructed directly using square brackets where [A, B, C, ...] creates a one-dimensional array. If all arguments are of the same type, that becomes the element type (eltype) of the resulting array. Arrays can be typed with T[A, B, C,...], or via promotion if possible. Heterogeneous arrays have eltype Any (e.g., Vector{Any}); this includes the literal [ ] when no arguments are given.

[1]             # array literal (vector) with one element
[1, 2]           # Array{Int64, 1} because all elements are Int64
Float64[1, 2]    # [1.0,2.0]; typed array converts Int values to float
[1.0, 2]         # Array{Float64, 1} via promotion to common type
[1, "a"]         # Array{Any, 1} heterogeneous array, promotion not possible

4.3.2 Initialization

It can be difficult to know in advance what size array will be needed for a specific task. In such cases, it is good practice to initialize an empty, typed, array with 0 elements that can be populated dynamically later on.

x = Array{Float64}(undef, 0)
x = Float64[]   # shorthand for the array above

When possible, avoid initializing an array using [], as this creates an array of type Any, which retains that eltype even if later populated entirely with another type. The impact of this may not be felt immediately, but it can lead to performance issues or errors (e.g., the array is later passed to a function that expect type Float64).

4.3.3 Indexing

Indexing into an n-dimensional array (A) uses the following general syntax:

A[i1, i2, ...]

Where each index (i) can be a scalar integer, an array of integers, a colon (:), a range (a:b:c), or an array of booleans.

Ranges

Range objects are iterators that have a variety of uses throughout the Julia language. They are defined using a start:step:stop syntax, or the range function (see ?range for details). To convert a range into an array, wrap it in the collect function.

# ranges
x = range(0, stop = 1, length = 11) # an iterator having 11 equally spaced elements
x = 1:10                # iterable from 1 to 10; implied step = 1
x = 1:2:10              # iterable from 1 to 9; step = 2
collect(x)              # converts an iterator to vector

In the example below, A is a 3x3 Matrix. Details regarding the syntax used to initialize A are covered in the section on concatenation.

# 1 4 7
# 2 5 8
# 3 6 9
A = [1:3 4:6 7:9]     # 3x3 matrix

Indexing with the begin and end keywords provides the first and last elements of A, respectively.

A[begin]    # first element
A[end]      # last element

Scalar indices used with a matrix return a single element via column-wise linear indexing; cartesian indexing is also available.

A[5]    # single element (5) using linear indexing
A[2, 2]  # single element (5) using cartesian indexing

Multiple elements can be retrieved by passing an array of indices. The return value will have the same number of dimensions as sum of dimensions for the provided indices.

A[2, 5, 8]    # error; dimension mismatch
A[[2, 5, 8]]  # column vector of 3 elements; must be wrapped in [ ]

A colon (:) can be used to retrieve all elements of a dimension. In this example, A[1,:] will return all elements of the first row of A as a column vector. To keep the original “shape”, the first element must be wrapped in [ ], (e.g., A[[1],:]).

A[1, :]      # column vector with 3 elements, use A[[1],:] for row vector

Range objects separated by a comma can also be used as indices.

A[2:3, 1:2] # 2x2 matrix

Lastly, boolean (logical) indexing can be used to select specific elements that would evaluate true.

A[A.<5]   # vector 4 elements, using logical indexing, the `.` operator is
# discussed below

4.3.4 Assignment

Array elements can be assigned based upon index, but the new value must have the same dimensions, else broadcasting is required.

Broadcasting

Broadcasting is a method of applying an operator or function element-wise across a collection (e.g., arrays). The broadcast function has a convenient dot (.) syntax that improves readability.

[1, 2] + 10              # error, no method for + and vectors
broadcast(+, 10, [1, 2]) # vector with 3 element, element-wise addition
[1, 2] .+ 10             # as above using dot syntax

x = collect(reshape(1:8, 2, 4)) # 2x4 matrix, 2d array
x[:, 2:3] = [1 2]            # error; size mismatch
x[:, 2:3] .= [1 2]           # OK, broadcasting with .
x[:, 2:3] = repeat([1 2], 2) # OK, 2d array
x[:, 2:3] .= 3               # OK, need to use broadcast with .

4.3.5 Common operations

4.3.5.1 Characterization

These functions provide an in-depth look at the size, shape, and type of an array.

x = zeros(2, 3)      # 2x3 matrix of zeros

eltype(x)           # the type of elements in x
length(x)           # the number of elements in x; 6
ndims(x)            # the number of dimensions of x; 2
size(x)             # tuple containing the dimensions of x; (2 ,3)
size(x, 1)          # size of x along dimension `n`; (n=1; 2 elements)
axes(x)             # tuple containing iterator for valid indices in x
axes(x, 1)          # iterator for value indices along dimension `n` of x

4.3.5.2 Reshaping

There are several functions for adding or removing the elements of an array.

Note

In Julia, a “bang” (!) at the end of a function name indicates that the function “mutates” at least one of its arguments. Here, the ! is included as a naming convention, not an operator, and is used to distinguish between mutating and non-mutating functions (e.g., sort, sort!).

A = [2, 1, 4, 3, 5]
sort(A)     # return a sorted copy of A
sort!(A)  # sort A in-place

A = collect(1:9)
push!(A, 10)        # add 10 to the end of A
a = pop!(A)         # return 10 and remove it from A
splice!(A, 5)        # remove and return the value at index 5, then shift remaining elements to the left
splice!(A, 2, 99)     # remove and return the value at index 2, then replace it with 99
deleteat!(A, 4:6)    # remove the provided indices and return the modified A

Arrays can also be reshaped without removing elements.

A = reshape(1:12, 3, 4)  # a 3x4 matrix-like object filled column-wise with values from 1 to 12
vec(A)                   # cast an array to vector (single dimension); reuses memory
[1 2]'                   # 2x1 Adjoint matrix (reuses memory)
permutedims([1 2])       # 2x1 matrix (permuted dimensions, new memory)

4.3.5.3 Concatenation

The basics of concatenating one- and two-dimensional arrays are discussed below. Information on joining higher-dimension arrays can be found in the documentation.

Separating array arguments with a single semicolon (;) or newline instead of a comma will vertically concatenate them.

[1:2, 4:5]      # vector of ranges with 2 elements
[1:2; 4:5]      # vector of integers with 4 elements
[
    1:2
    4:5
]           # as above

Similarly, separating arguments with a tab, space, or double semicolons (;;) will horizontally concatenate them.

[1:2 4:5]       # 2x2 matrix using spaces
[1:2;; 4:5]     # 2x2 matrix using double ;, space added for readability

Even though spaces, tabs and ;; all mean concatenation in the second dimension the latter cannot appear in the same expression unless it is serving as a line continuation character.

[
    1 2;;      # 1x4 matrix, ;; acts as line continuation only
    3 4
]

Each of these symbols can be combined to concatenate both vertically and horizontally at the same time. When combining these operations, be aware that spaces and tabs have a higher precedence than any number of semicolons.

[[1 2]; 3 4; [5 6]]     # 3x2 matrix

In addition, (;) has higher precedence than (;;), which means that in an expression using both (;) and (;;), vertical concatenation will occur first before horizontally concatenating the result.

[1:2; 3;; 4; 5:6]       # 3x2 matrix

Lastly, there is a set of convenience functions for concatenation (see ?cat for more details).

cat(A..., dims)     # concatenate input arrays along dimension(s) k
vcat(A...)          # shorthand for cat(A..., dims=1), equivalent to [A; B; ...]
hcat(A...)          # shorthand for cat(A..., dims=2), equivalent to [A B ...]
hvcat()             # simultaneous vertical and horizontal concatenation, equivalent to [A B; C D]

4.4 Choosing a collection

Consider the following exercise where each data structure and the previously-defined composite type is populated with the same information. The varinfo function is used to compare memory usage.

age = 25
wt = 80.5
height::Real = 182

_tp = (age, wt, height)     # tuple, 24 bytes
_ntp = (; age, wt, height)  # named tuple, 24 bytes
_d = Dict(pairs(_ntp))      # dictionary, 480 bytes
_a = [age, wt, height]      # array, 64 bytes
_p = Patient(age, wt, height) # Patient, 32 bytes

varinfo(Main, r"_")

Dictionaries offer a lot of flexibility, but consume the most memory
Tuples are the most memory efficient but are also the least flexible. NTPs offer a few more options without an additional memory cost but require more key strokes
Arrays offer a balance between flexibility and memory efficiency which makes them the workhorse data structure (they can hold almost anything and do it efficiently)
Structs are only slightly less efficient than tuples (note, named tuples are just anonymous structs), but can be modified after creation. Trade-off comes from the added complexity of defining your own type.

5 Programming constructs

This section provides an overview of standard programming constructs implemented in Julia.

5.1 Control flow

The basic elements of control flow are conditional (if-else) and repeated (“loops”) evaluation.

5.1.1 Conditional evaluation

Julia code can be evaluated in a branching fashion based upon the value of a boolean expression inside an if-else block.

if false
    x = 3   # false, no assignment
else
    println("$(1+1)") # else statement is used
end

The same expression could be written using the ternary operator (condition ? true-action : false-action) which offers a more terse syntax.

false ? x = 3 : println("$(1+1)")

In practice, the boolean value will be determined by a comparison operator.

if "x" == "y"   # false
    z = 1
elseif 1 > 2    # false
    z = 2
else
    a = 3       # value 3 assigned to a
end

Comparison operators can also be combined (&&) or modified (!) using boolean operators:

if true && !false   # true with false negated to true
    z = 1           # value 1 is assigned to z
else
    a = 3
end

5.1.2 Repeated evaluation

There are two basic constructs for repeat evaluation, while loops and for loops. The former is useful when the total number of iterations needed to reach a stopping condition is unknown while the latter is useful for iterating over the elements of a collection. Consider a simple while loop:

i = 1               # a counter
while true          # a condition to evaluate, loop will continue until false or break condition
    global i += 1   # using increment operator (+=) to add 1 to i, global keyword discussed below
    i > 10 && break # command to break out of a loop (stop iteration) immediately    
end

println("The value of i is: $i")

The global keyword in the example is related to the variable scope created by the while construct. Without the global keyword, the code inside the while loop cannot “see” the variable i that was defined outside the loop (more on this topic later in the tutorial).

The break keyword ensures that iteration stops once a specific condition is met (i.e., once i > 10). The break statement could have been omitted if the stop condition was defined at the start of the loop. beginning of the loop.

i = 1
while i <= 10
    global i += 1
end

println("The value of i is: $i")

In contrast, for loops iterate over all of the elements in a collection and then stop.

for v = 1:10       # v in collection, can also use v=1:10
    if 3 < v < 6
        continue    # skip one iteration
    end
    println(v)
end

varinfo(v)          # error, v is only defined in the inner scope of the loop

The continue keyword after the conditional skips one iteration of the loop (i.e., the loop will “skip” the println function and move to the next iteration).

Nested for loops allow for iteration over multiple ranges.

for i = 1:2
    for j = 3:4
        println((i, j))
        i = 0
    end
end

Note that multiple nested for loops can be condensed to a single outer loop.

for i = 1:2, j = 3:4
    println((i, j))
    i = 0
end

However, there are slight differences in their output. In the first example, the first element of the second and fourth tuple is 0 while in the second those elements are 1 and 2, respectively. This is because in the second example, both iterators (i and j) are set to their current values at the beginning of each iteration and any changes to iterators inside the code block do not affect subsequent iterations. In addition, the inclusion of a break statement in the second example would stop the entire loop, not just the inner loop.

println("Loop 1")
for i = 1:2
    for j = 3:4
        j > 3 && break
        println((i, j))  # iterations 1 and 3 will be printed
    end
end

println("Loop 2")
for i = 1:2, j = 3:4
    j > 3 && break
    println((i, j)) # only the first iteration is printed 
end

Lastly, multiple collections can be iterated over at the same time in a single for loop using the zip function.

for (i, j) in zip([1 2 3], [4 5 6 7])
    println((i, j))
end

The zip function creates an iterator that is a tuple containing subiterators for each of the containers passed to it. Each subiterator is iterated over in order and the loop will stop once any of the subiterators runs out.

5.1.3 Applications

5.1.3.0.1 Iteration

for-loops are commonly used to iterate over a collection. There are two basic syntaxes, the first retrieves an element, the second retrieves an index.

for a in A
    # Do something with the element a
end

for i in eachindex(A)
    # Do something with i and/or A[i]
end

Note

In contrast with for i = 1:length(A), iterating witheachindex` provides an efficient way to iterate over any array type.

5.1.3.1 Comprehensions

Comprehensions also make use of the for keyword and offer a concise syntax for creating arrays.

A = [f(x,y,...) for x=rx, y=ry, ...]

This code inside the [ ] can be interpreted as “evaluate function (f) for each value of x and y in ranges rx and ry”. In practice, rx and ry can be any iterable object, but they are usually ranges (e.g., 1:n). The resulting array’s type will depend on the computed elements just like any other array literal. As before, the type can be controlled by prepending a type declaration to the comprehension.

A = Float64[2x + 1 for x = 1:10]    # vector with 10 elements

Example usage:

a = [x * y for x = 1:2, y = 1:3] # 2x3 array of Int64;
sum(a, dims = 2)       # calculate sums for 3rd dimensions, similarly: mean, std,
# prod, minimum, maximum, any, all;
# using Statistics is required for statistical functions
count(>(0), a)       # count number of times a predicate is true, similar: all, any
# note that we create an anonymous function with >(0) here

5.1.3.2 Generator expressions

Comprehensions can be written with ( ) instead of [ ], producing an object known as a generator. Generators can be iterated on demand to produce a value without allocating memory for an array in advance.

g = (1 / n^2 for n = 1:1000)     # simple generate
sum(g)                       # sum of a series without memory allocation, could
# have also summed directly sum(1/n^2 for n=1:1000)

When writing generator expressions with multiple dimensions inside an argument list, parentheses are needed to separate the generator from the subsequent arguments. This is because all comma-separated expressions after the for are interpreted as ranges.

# map(tuple, 1/(i+j) for i=1:2, j=1:2, [1:4;])    # error, invalid iteration specification
map(tuple, (1 / (i + j) for i = 1:2, j = 1:2), [1:4;])  # vector with 4 tuple elements

Ranges in generators and comprehensions can depend on previous ranges by writing multiple for keywords. In such cases, the result is always 1-d.

[(i, j) for i = 1:3 for j = 1:i]    # vector with 6 tuple elements

Generated values can be filtered using the if keyword.

[(i, j) for i = 1:3 for j = 1:i if i + j == 4]    # vector with 6 tuple elements

5.2 Functions

5.2.1 Construction

Functions can be defined using the function keyword, or the shorter, inline “assignment” form. A function, f, is defined using both approaches in the example below.

function f(x, y)       # using function keyword
    return x + y
end

f(x, y) = x + y        # using assignment form

In the first example, the return keyword tells the enclosing function (f) to exit “early” (i.e., any lines that come after return would be ignored) and to return the value of the x + y expression. If omitted, as in the second example, the function will return the value of the last expression to be evaluated.

Functions can also be defined without providing a “name” which creates an anonymous function.

function (x, y)         # same as before, note that name 'f' is omitted
    return x + y
end

(x, y) -> x + y         # similar terse syntax using the arrow (->) operator
# can omit ( ) for single argument

These anonymous functions are primarily used as arguments for other functions.

map(x -> x + 3, 1:10)   # map elements of range to anonymous function

map(1:10) do x          # same map using do-block syntax
    x + 3
end

In the second example above, the do block creates an anonymous function that is then passed as the first argument to a function call (e.g., the map). This syntax is especially helpful when using more complex anonymous functions that span multiple lines.

Lastly, functions can be stored in variables just like any other object.

y = f(3, 3)    # f called using parentheses; assigns value of 3+3 to y
g = f          # parentheses omitted to assign function f to g
g(3, 3) == y   # true

5.2.2 Multiple returns

As stated earlier, multiple variables can be assigned at once by including a comma-separated list (optionally wrapped in ( )) on the LHS (left-hand side) of an expression. The RHS must be an iterator at least as long as the number of variables (any extra elements of the iterator are ignored). The process of iterating over the RHS object and assigning each element to a variable is called destructuring.

a, b, c = 1:4   # a=1, b=2, c=3; 4 is dropped

This feature allows a function to return multiple values as a Tuple or other iterable value.

function f(x, y)
    x + y, x * y        # returns both the sum and product of x and y
end

f(2, 3)              # returns a tuple, (5, 6)
a, b = f(2, 3)       # tuple destructured to a=5, b=6

5.2.3 Arguments

5.2.3.1 Passing arguments

Functions accept positional or keyword arguments. The latter are sometimes referred to as kwargs and they are separated from positional arguments by a semicolon (;). In the example below f has two positional arguments (x, y) and one kwarg (z).

f(x, y; z) = x + y * z
f(5, 10; z = 15)              # kwargs names must be specified
# f(5, y=10; z=15)            # error, positional args cannot include name

5.2.3.2 Optional arguments

Function definitions can include optional positional and keyword arguments. However, only the last positional argument(s) can have a default value.

f(x, y = 10; z = 15) = x + y + z    # f now includes two default values (y, z)
f(5)                            # 30

# f(x=10, y; z=15) = x + y + z    # error, only last positional argument(s) allowed
f(x, y = 10; z = 15, a) = x + y + z + a  # valid

5.2.3.3 Typed arguments

As with other objects, the arguments passed to a function can be restricted to a specific type.

function f(x::Int, y::Int)      # f will only accept integers
    x + y
end

However, in most cases, Julia will identify the type of data provided and compile a specialized version of the function that is suited for that type. There are a limited number of scenarios where argument type declarations are needed, and until there is a clear need for them, it’s best to avoid them.

Common reasons for declaring argument types include:

Dispatch: Functions can have multiple methods (see below) which behave differently for a given set of argument types.
Correctness: Some functions will only return a correct result for a certain argument type (common for Int and Float).
Documentation: Type declarations can serve as a form of documentation for expected arguments in a complex function.

5.2.3.4 Pass by reference

In Julia, if a mutable object is passed as a function argument and modified inside the function, those changes will reflect outside the function, even if the modified argument is not explicitly returned.

function f(x, y)
    x[1] = 42   # modifies x outside
    y = 7y      # y bound to a new value, no modification outside
    return y
end

a = [1, 2, 3]
b = 3

f(a, b)          # return 7(3)

a               # [42,2,3]; modified
b               # 3; unmodified

In the example above, f assigns a value of 42 to the first element of the column vector a, which was passed to f as its first positional argument, x. The scalar literal 3, stored in a is passed as the second positional argument, b. The function f assigns the product of 7(3) to y and returns the value of y to the caller. The value of b is unchanged despite y being assigned a new value inside of f because Int values are immutable. In contrast, the array a is mutable and passed to f by reference. When the first element of x was assigned a new value, that change was reflected in a even though x was not included in the return statement.

The reason for this behavior is somewhat technical, but it can be understood intuitively if you think of an array as a container with a unique identifier. When Julia assigns the array [1,2,3] to variable a, it is NOT basing that assignment on the contents of the array. Instead, it is assigning the array’s unique identifier to a. This is the reason that changing the first element of x is not reflected in a; x and a are still pointing to the same array. To further illustrate this point, if the expression x[1] = 42 was replaced with x = 42, the value of a would have remained unchanged outside of f. The act of assigning a new value to x “breaks” the reference to a’s identifier and creates a new object.

If unaccounted for, this behavior can lead to downstream errors, and users should take care to avoid introducing bugs into their code. The following tips should be helpful:

Avoid using function arguments on the LHS of any assignment operator inside a user-defined function unless that function is intended to modify the argument.
Adhere to the convention of appending a ! to the end of mutating functions.
Use copy or deepcopy inside the function body to create copy of any argument that will receive an array or dictionary.

Note that “copied” data will be unaffected by modifications made within a user-defined function. However, copy only creates a so-called “shallow copy” up to the first level of a mutable object. In contrast, deepcopy creates a fully distinct copy, but is more computationally expensive than copy. The difference between these two function is highlighted in the example below. When in doubt, deepcopy will ensure that an argument remains unchanged.

x = Array{Any}(undef, 2)        # new undefined array
x[1] = [1, 2, 3]                  # assign element x1
x[2] = [4, 5, 6]                  # assign element x2
a = x                           # assign value of x to a
b = copy(x)                     # create a shallow copy of x and assign its value to b
c = deepcopy(x)                 # create a deep copy of x and assign its value to c
x[1] = 99                       # update value of x1
x[2][1] = 99                    # update value of x[2][1]
a                               # identical to x
b                               # only x[2][1] changed from the original x
c                               # contents of the original x

5.2.4 Methods

Each function can have multiple methods; a different set of instructions based on argument type. The choice of which method to use is called dispatch, and most languages only use the first argument when choosing between methods. In contrast, Julia uses all of a function’s arguments to determine which method is appropriate. This process is referred to as multiple dispatch and it’s a huge part of why Julia is so fast.

Consider the example below:

g(x) = println("$x is not an Integer!")
g(x::Int) = 3x
methods(g)               # g has 2 methods

In the first method, the variable x is of type Any and prints a String. In the second, x must be an integer and returns the product 3x.

It is important to note that multiple dispatch only applies to positional arguments. Keyword arguments are processed after method selection.

t(; x::Int = 2) = x
t(; x::Bool = true) = x
t()       # true; old value of t was overwritten

6 Variable scoping

This is an intermediate level topic, but it is worth introducing here since new users often encounter errors related to scope when defining loops and functions. The topic is covered in detail in the documentation.

A variable’s scope is determined by a set of (scoping) rules that help determine whether the code that surrounds it can “see” that variable. As a result, two functions, f(x) = x and g(x) = x, can both have a positional argument, x, without causing a naming conflict. In each case, the scope of x is limited to the function in which it is defined; f does not know that x was also defined in g.

In Julia, there are two main types of scope, global scope, and local scope. When a Julia session is started, the default module (i.e., coding “workspace”), called Main is loaded which, in turn, introduces a new global scope. Certain programming constructs including comprehensions, generators, function, while, for, and do (among others) introduce a new local scope. It is worth noting that, if and begin blocks do not introduce a new scope.

To further understand these concepts, consider the following example:

t                   # error, t is undefined
f() = global t = 1  # f assigns t=1 using global keyword  
f()                 # t is defined globally after calling f

function f1(n)      # f1 introduces new local scope
    x = 0           # x, local variable within scope of f1
    for i = 1:n     # for introduces a new, "inner" local scope
        x = i       # local x already exists, so i is assigned to the existing x
    end
    x               # returned x will have same value as n
end
f1(10)              # 10; inside the loop we use the outer local variable

function f2(n)
    x = 0
    for i = 1:n
        local x     # local keyword creates new x inside inner local scope
        x = i
    end
    x
end
f2(10)              # 0; x in outer local scope remains unchanged

function f3(n)
    for i = 1:n
        h = i       # no local keyword, h not visible to outer local scope
    end
    h               # undefined
end
f3(10)              # error; h not defined in outer scope

function f4(n)
    local h         # h defined using local keyword, assignment not required
    for i = 1:n
        h = i       # h already exists in the outer local scope; assigned value i
    end
    h
end
f4(10)              # 10; h is defined in outer scope

Note that for, while, try and struct use a so-called soft local scope. Simplifying a bit, if you are working interactively (e.g., REPL, notebook) and use them in a top level (global) scope they will overwrite an existing global variables

julia> x = 5
5

julia> for i in 1:10
            x = i
        end

julia> x
10

However, the same code passed in an non-interactive session prints a warning and does not overwrite a global variable:

~$ julia -e "x=5; for i in 1:10 x = i end; println(x)"
| Warning: Assignment to `x` in soft scope is ambiguous because a global variable
by the same name exists: `x` will be treated as a new local. Disambiguate by using
`local x` to suppress this warning or `global x` to assign to the existing global variable.
| @ none:1
5

Reuse

CC BY-SA 4.0