Julia Basics

Authors

Jose Storopoli

Kevin Bonham

Juan Oneto

You’ve already seen how to assign variables in Julia, which is not so dissimilar from R. Many other things will be familiar as well, and throughout these tutorials, we will try to point out where differences exist or may be confusing.

Let’s start with Numbers!

1 🧮 Numbers and Math

Julia was designed for technical and mathematical computing, and a great deal of effort has been put in to make math in code look like and work like math written on paper.

This means that a lot of simple operations work just like you would expect:

42 * 2
84
1.3e4 / 1000
13.0
5 % 2 # remainder
1
Note

# makes the remainder of the line into a comment, just like R and Python

# order of operations is PEMDAS
(1 + 2)^3 * 2 + 1 # 3^3 * 2 + 1 => 27 * 2 + 1 => 54 + 1
55

Some mathematical operations use functions, which just like in R, are called using the function name, with arguments to the function surrounded by parentheses:

sqrt(10)
3.1622776601683795

In R, the above looks like:

> sqrt(10)
[1] 3.162278

But many functions of this sort also have unicode-based equivalents. For example, the following is identical to sqrt(10):

10 # this is typed \sqrt<TAB>
3.1622776601683795
Note

In fact, all of the mathematical symbols above are actually Julia functions. For example, 3 + 4 is actually just shorthand for +(3, 4)

You’ll learn much more about Julia functions in a future tutorial.

2 ✔️ ❌ Boolean basics

Boolean values are lowercase in Julia (eg true and false rather than TRUE and FALSE), but you can do basic comparisons as you do in R:

1 < 3 # 1 is less than 3
true
5 * 2 == 11 # 5 * 2 is equal to 11
false

And you can negate a boolean expression with !

!(5 * 2 == 11) # or, in this case, 5 * 2 != 11
true

There are also many functions that return boolean values that are often used for conditional evaluation (if / else statements).

isodd(3)
true
!isodd(3) # read "3 is not odd"
false

Boolean expressions can also be combined using && for “AND” and || for “OR”. && returns true if both statements are true, while || returns true if either statement is true.

For example:

isodd(3) && isodd(4) # 3 is odd AND 4 is odd
false
iseven(3) || iseven(4) # 3 is even OR 4 is even
true
Caution

In Julia Boolean values are a subtype of Integer, and can be used in some mathematical operations as 0 (for false) and 1 (for true). For example:

julia> 1 + true
2

But the reverse is not true. That is, you cannot use 1 in an if/else statement. This is in contrast to R, where any number other than 0 is considered TRUE, and 0 is considered FALSE.

r$> ifelse(1, 10, 20)
[1] 10

r$> ifelse(2, 10, 20)
[1] 10

r$> ifelse(0, 10, 20)
[1] 20

In Julia, this will throw an error:

julia> ifelse(1, 10, 20)
ERROR: TypeError: non-boolean (Int64) used in boolean context

You can, however, explicitly convert a Boolean to an integer with eg Int(false) and Int(true), or convert a 1 or 0 to true or false with eg Bool(1).

Using Boolean expressions is quite common in data analysis, for example to filter on observations that meet some criteria. We will see many more examples in future tutorials.

3 🧵 String basics

In Julia, strings are surrounded by double quotes (") only. Single quotes are only used for individual characters

'C'
'C': ASCII/Unicode U+0043 (category Lu: Letter, uppercase)
# this is an error
'Hello, World!'
ErrorException: ErrorException("syntax: character literal contains multiple characters")
# this is what we meant
"Hello, World!"
"Hello, World!"

To concatenate strings, use the string() function, or multiply them.

string("Hello", " ", "world!")
"Hello world!"
"Hello" * " " * "world!"
"Hello world!"

Pattern matching can be done with the contains() function. This is a boolean function that takes 2 arguments: the first is a string that you’re searching in, the second is that pattern that you’re searching for.

contains("banana", "ana")
true
contains("banana", "lana")
false

If you know regular expressions, you can use those as the second argument as well. In Julia, you can make a regular expression using a special “string literal” macro, eg r"my regex".

Don’t worry if you don’t know what this means.

contains("banana", r"(an){2}")
true

There are also a handful of functions for modifying strings.

my_string = "Let's see what I can do with this 😀"
"Let's see what I can do with this 😀"
uppercase(my_string)
"LET'S SEE WHAT I CAN DO WITH THIS 😀"
lowercase(my_string)
"let's see what i can do with this 😀"
replace(my_string, "this" => "that")
"Let's see what I can do with that 😀"
split(my_string, 'w') # this makes a "vector" of substrings
3-element Vector{SubString{String}}:
 "Let's see "
 "hat I can do "
 "ith this 😀"

For (MUCH) more on strings, check out the strings tutorial.

4 🥡 Container basics

Things like strings, integers, and floating point values are examples of “scalar” types, but you’re probably also familiar with container types, such as vectors, which are 1-dimensional, ordered containers.

4.0.1 ➡ Vectors

In R, vectors are created with the syntax c(10,20,30). In Julia, the same operation is [10,20,30].

As in R, vectors can be “indexed” or “sliced” using brackets. For example, in R

> my_vec <- c(10,20,30,40,50)
> my_vec[3]
[1] 30
> my_vec[3:5]
[1] 30 40 50
> my_vec[c(1,4)]
[1] 10 40

In Julia, the same tasks are accomplished thusly:

my_vec = [10, 20, 30, 40, 50]
5-element Vector{Int64}:
 10
 20
 30
 40
 50
my_vec[3]
30
my_vec[3:5]
3-element Vector{Int64}:
 30
 40
 50
my_vec[[1, 4]]
2-element Vector{Int64}:
 10
 40

Julia also has a special keyword end that can stand in for the last index of the container.

my_vec[end-2:end]
3-element Vector{Int64}:
 30
 40
 50

Vectors in Julia are “mutable”, which means you can change the contents - updating individual indices, or adding and removing elements.

new_vec = [10, 20, 30, 40, 50]
new_vec[3] = 100 # Change the 3rd element to 100
push!(new_vec, -1) # add -1 to the end
new_vec
6-element Vector{Int64}:
  10
  20
 100
  40
  50
  -1

These operations are equivalent to the following in R:

> my_vec[3] <- 100
> append(my_vec, -1)
[1]  10  20 100  40  50  -1
Caution

There is also a function append!() in Julia that acts on vectors. Unlike in R, this function is not typically used to add a single element, but rather to add each element of another collection. Note this difference in the following:

a_vector = [0.0, "hello", 42]
push!(a_vector, ['a', 'b'])
a_vector
4-element Vector{Any}:
  0.0
   "hello"
 42
   ['a', 'b']
a_vector = [0.0, "hello", 42]
append!(a_vector, ['a', 'b'])
a_vector
5-element Vector{Any}:
  0.0
   "hello"
 42
   'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
   'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)

There are a number of other container types in Julia as well, some of which are mutable, and some of which are not. We’ll get to some examples later, but in general, they use the same indexing convention (eg the_container[the_index]).

4.1 Matrices

A Matrix in Julia is a two-dimensional array, and works much like a Vector, with the added syntax that you can index in two dimensions, separated by a , where the first index refers to the row number, and the second index refers to the column number. For aficionados, Julia is “column major”.

Matrices can be constructed using the following multi-line syntax:

my_matrix = [
    1 10 100
    2 20 200
    3 30 300
    4 40 400
]
4×3 Matrix{Int64}:
 1  10  100
 2  20  200
 3  30  300
 4  40  400

Or in a single line, replacing line breaks with ;:

[1 10 100; 2 20 200; 3 30 300; 4 40 400] == my_matrix
true

You can also use the reshape() function to convert a vector (or other iterable) into a Matrix. The reshape() function takes 3 arguments - (1) the thing being reshaped, (2) the size of the first (row) dimension, and (3) the size of the second (column) dimension.

reshape([1, 2, 3, 4, 5, 6], 3, 2)
3×2 Matrix{Int64}:
 1  4
 2  5
 3  6

4.1.1 ☝️ Indexing

my_matrix[1, 2] # first row, second column
10
my_matrix[2, end] # second row, last column
200
my_matrix[2:4, 1] # 2nd-4th rows, first column. Returns vector
3-element Vector{Int64}:
 2
 3
 4
my_matrix[2:4, [1, 3]] # 2nd-4th rows, first and 3rd columns.
3×2 Matrix{Int64}:
 2  200
 3  300
 4  400
my_matrix[2:4, [1]] # 2nd-4th rows, first column. Returns matrix
3×1 Matrix{Int64}:
 2
 3
 4

Matrices in Julia also have a linear index, which counts down the rows, column by column.

my_matrix[4]
4
my_matrix[5]
10

and you can flatten this linear index into a vector using the vec() function,

vec(my_matrix)
12-element Vector{Int64}:
   1
   2
   3
   4
  10
  20
  30
  40
 100
 200
 300
 400

or using reshape()

reshape(my_matrix, 12)
12-element Vector{Int64}:
   1
   2
   3
   4
  10
  20
  30
  40
 100
 200
 300
 400
Note

Notice that only a single dimension is provided. If you call reshape(my_matrix, 12, 1), you would get a Matrix again, only one with a single column.

4.2 📚 Dictionaries

Dictionaries, like “lists” in R are unordered containers of key, value pairs. Both keys and values can have any type, eg strings, numbers (integer or float), symbols.

my_dict = Dict("a key" => "a value", 'b' => 42, 1 => 2)
Dict{Any, Any} with 3 entries:
  "a key" => "a value"
  'b'     => 42
  1       => 2

You access the values in a dictionary just like indexing into a vector, only you use the keys instead of linear indices.

my_dict["a key"]
"a value"
my_dict[1]
2

If you try to index using a key that doesn’t exist, you’ll get an error, but if you’re not sure whether the key exists, you can also use the get() function, which allows you to provide a value to return in case the key doesn’t exist:

my_dict["another key"]
KeyError: KeyError("another key")
get(my_dict, "another key", "hey, that doesn't exist!")
"hey, that doesn't exist!"

Alternatively, you can check whether a dictionary has a key using the boolean haskey() function (it returns either true or false):

haskey(my_dict, 'a')
false
haskey(my_dict, 'b')
true

Finally, you can update an entry or insert a new entry using the familiar assignment (=) syntax

my_dict["new key"] = "😉"
"😉"

4.3 Vectorizing operations on containers

It is often useful to apply the same operation on each object in a container. In R, those are often done implicitly, but in Julia, you must be explicit.

For example, let’s say we want to calculate the square root of each number in a vector. In R, you would just call sqrt() on the vector:

r$> a_vec <- c(1,2,3,4,5)

r$> sqrt(a_vec)
[1] 1.000000 1.414214 1.732051 2.000000 2.236068

But in Julia, the square root of a vector is undefined:

a_vec = [1, 2, 3, 4, 5]
5-element Vector{Int64}:
 1
 2
 3
 4
 5
sqrt(a_vec)
MethodError: MethodError(sqrt, ([1, 2, 3, 4, 5],), 0x000000000000831a)

In Julia, there are several different ways to accomplish this.

Tip

In R, it’s very important to make sure all operations are vectorized, since loops written in R are incredibly slow. This is not true in Julia - loops can sometimes be faster!

4.3.1 🗺️ map

The map() function takes a function as its first argument, and a container as the second. It then applies the function to each item in the container, returning another container. This is analogous to the sapply() function in R, though it is much more flexible as we’ll see in future tutorials.

For example:

map(sqrt, a_vec)
5-element Vector{Float64}:
 1.0
 1.4142135623730951
 1.7320508075688772
 2.0
 2.23606797749979
Caution

Note that the order of arguments is reversed relative to sapply(). In Julia, the function being applied comes first, and the container it applies to comes second.

Julia functions that are verbs can often be reasoned about if you put them into a sentence, with the arguments in the same order.

Eg. map(sqrt, a_vec) is “Map sqrt to a_vec”, and contains("banana", "ana") is “banana contains ana?”

For more on map() and using it to apply functions, see the Functions tutorial.

4.3.2 🤏 reduce, and mapreduce

It can also be useful to collapse a container into a single value using some operation. We can do this using the reduce() function (which works similarly to reduce() from the purrr package in R). For example, suppose that you want to multiply all of the numbers in a vector to one another.

reduce(*, a_vec)
120

Keep in mind that you should only use commutative operations or operations where the order doesn’t matter. To be fast, reduce() may apply the operation on items in an order you don’t expect.

The mapreduce() function is like combining map() and reduce(). In other words, mapreduce(op1, op2, container) should be identical to reduce(op2, map(op1, container)), with the benefit that Julia doesn’t need to make the intermediate container (for reasons not worth going into, creating large vectors can be slow).

So, if we want to multiply all of the square roots of a_vec:

mapreduce(sqrt, *, a_vec)
10.954451150103324
# just to prove it
reduce(*, map(sqrt, a_vec))
10.954451150103324

4.3.3 🤔 Comprehensions

Containers can also be created using “comprehensions.” If you are familiar with using for loops, comprehensions are like mini for loops, and even have a similar syntax in Julia.

For example, the following is identical to map(sqrt, a_vec)

[sqrt(x) for x in a_vec]
5-element Vector{Float64}:
 1.0
 1.4142135623730951
 1.7320508075688772
 2.0
 2.23606797749979

One exceptionally useful thing about comprehensions is that they can be combined with conditional evaluation, so that only things that match some boolean statement will be included. For example, the following only takes the square root of odd numbers:

[sqrt(x) for x in a_vec if isodd(x)]
3-element Vector{Float64}:
 1.0
 1.7320508075688772
 2.23606797749979

We can also make dictionaries and other containers

# for reference
my_dict
Dict{Any, Any} with 4 entries:
  "new key" => "😉"
  "a key"   => "a value"
  'b'       => 42
  1         => 2
Dict(k => my_dict[k] for k in keys(my_dict) if k isa String)
Dict{String, String} with 2 entries:
  "new key" => "😉"
  "a key"   => "a value"

5 ⚠ Interlude on types

You can do a lot in Julia without worrying too much about the types of the objects that you’re working with. But everything in Julia has a type, and it’s good to be aware of them, if only to recognize errors that might show up due to them.

In Julia, types exist in a hierarchy. Every object has a “concrete” type, and some number of “abstract” parent types.

For example, Int16, Int32, and Int64 are concrete types representing 16-bit, 32-bit, and 64-bit integers respectively. All of these types are subtypes of the abstract type Signed, which is itself a subtype of Integer (there are also “unsigned” integer types, like UInt64).

A Float64 is a 64-bit floating point number. It’s not a subtype of Integer, but it shares the abstract type Real with all Integer types.

typeof(1)
Int64
typeof(1.0)
Float64
supertype(Int64)
Signed

Or view all of the supertypes:

using InteractiveUtils: supertypes
supertypes(Int64)
(Int64, Signed, Integer, Real, Number, Any)
1.0 isa Integer
false
1.0 isa Float64
true
1.0 isa Real
true
1 isa Float64
false
1 isa Real
true

Containers also have types, and in fact are generally “parameterized” based on the types they contain.

new_dict = Dict('a' => 1, 'b' => 2, 'c' => 3)
Dict{Char, Int64} with 3 entries:
  'a' => 1
  'c' => 3
  'b' => 2
typeof(new_dict)
Dict{Char, Int64}

Notice the Char and Int64 inside the curly braces - those represent the types of the keys and values respectively.

Why do I bring this up now? Well, look what happens when I try to add a new key / value pairs, without paying attention to the types:

new_dict['d'] = 4.0
4.0
typeof(4.0)
Float64
typeof(new_dict['d'])
Int64
new_dict['e'] = 4.5
InexactError: InexactError(:Int64, Int64, 4.5)

When I added the value 4.0, even though it was a Float64, Julia was able to coerce it into an Int64. But 4.5 can’t be converted to an integer without losing information. we could explicitly round it, but Julia won’t do that for us.

new_dict['e'] = round(Int, 4.5)
4
new_dict["I'm a String, not a Char"] = 5
MethodError: MethodError(convert, (Char, "I'm a String, not a Char"), 0x000000000000831f)

So why was I able to add all kinds of different keys and values to my_dict up above? Take a look at its type signature:

typeof(my_dict)
Dict{Any, Any}

In Julia, all types are subtypes of Any. Because I initially made the dictionary with a bunch of different types, Julia could not provide it with a specific parameterization, so it just did the broadest possible one.

Caution

Here are some other examples of type issues in containers. Don’t worry too much about the details, but try to pay attention to what types you’d expect, what actually happens, and the errors that are (or are not!) induced:

floatvec = [10, 11.0, 12]
3-element Vector{Float64}:
 10.0
 11.0
 12.0
typeof(floatvec[1])
Float64
intvec = Int64[10, 11.0, 12]
3-element Vector{Int64}:
 10
 11
 12
typeof(intvec[2])
Int64
anyvec = Any[10, 11.0, 12]
3-element Vector{Any}:
 10
 11.0
 12
Int64[3, 3.5, 4]
InexactError: InexactError(:Int64, Int64, 3.5)
push!(intvec, 12.5)
InexactError: InexactError(:Int64, Int64, 12.5)
anum = 10
10
typeof(anum)
Int64
push!(floatvec, anum)
4-element Vector{Float64}:
 10.0
 11.0
 12.0
 10.0
typeof(floatvec[4])
Float64
push!(intvec, '1') # 49
4-element Vector{Int64}:
 10
 11
 12
 49
Caution

This one surprised me too! Character literals (like ‘1’) are based on the UTF-8 standard, where each character has a numerical value, which can be converted to an integer.

See here for more details.

push!(intvec, "1")
MethodError: MethodError(convert, (Int64, "1"), 0x000000000000831f)
push!(anyvec, "1")
4-element Vector{Any}:
 10
 11.0
 12
   "1"
push!(intvec, parse(Int64, "1"))
5-element Vector{Int64}:
 10
 11
 12
 49
  1

6 Miscellany

Here are some additional bits that are useful to introduce at an early stage. You don’t need to keep these things in your head, but hopefully when you see them later, it will jog your memory.

6.1 Collect

Many “array-like” things in Julia aren’t actually arrays, but can be treated as such. This has a number of advantages.

For example, consider an array of odd numbers from 1 to 2,000,000. To put this into an array, you would need to store 1 million integer objects (assuming the typical 64-bit integer, that’s 8 Mb of memory).

Instead, you can store 3 integers in a “range”.

Tip

Writing 2,000,000 would be parsed as a tuple (2, 0, 0), rather than as the integer 2 million. Instead, we can use _ for visual separation, which the Julia parser ignores in integers. So 2_000_000 is identical to 2000000 or 2_00000_0

# range syntax for `start : step : stop`
my_range = 1:2:2_000_000
1:2:1999999

This doesn’t actually materialize any of the numbers that are part of this range, but can still be indexed into, or used for indexing

sizeof(my_range) # gives size in bytes
24
my_range[1000] # the thousandth number in the range
1999
my_range[[1000, 1200, 11]]
3-element Vector{Int64}:
 1999
 2399
   21

And some algorithms can use fancy tricks to optimize calculations. For example, sum can use an optimization to calculate this almost instantly

using BenchmarkTools
@benchmark sum($my_range) # less than 5 nanoseconds
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (minmax):  5.189 ns13.251 ns   GC (min … max): 0.00% … 0.00%
 Time  (median):     5.204 ns               GC (median):    0.00%
 Time  (mean ± σ):   5.284 ns ±  0.410 ns   GC (mean ± σ):  0.00% ± 0.00%

  █▄▁▁                  ▄▁         ▂▂                     ▃ ▂
  ████▇▄▃▄▄▃▁▁▁▁▁▁▃▁▁▁▄██▅▄▅▃▃▄▁▆▁██▃▁▁▁▁▁▁▁▁▃▇▁▁▁▁▁▃▃▁▁▃█ █
  5.19 ns      Histogram: log(frequency) by time     5.96 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

A range (and some other types) can work just like a vector because it is a subtype of AbstractArray, and many functions don’t care about the internal details, they just care that they can get out indices, know the length of the object, etc. Many other “iterators” work the same way.

Nevertheless, sometimes you do actually need the concrete vector, in which case you can use the collect() function:

typeof(my_range)
StepRange{Int64, Int64}
range_as_vector = collect(my_range)
typeof(range_as_vector)
Vector{Int64} (alias for Array{Int64, 1})
sizeof(range_as_vector) # compare this to the 24 bytes used before
8000000
@benchmark sum($range_as_vector)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (minmax):  237.768 μs 1.330 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     241.359 μs               GC (median):    0.00%
 Time  (mean ± σ):   243.476 μs ± 12.728 μs   GC (mean ± σ):  0.00% ± 0.00%

    █▅   ▃▅▃  ▃▁                                                
  ▂▇██▇▅▆██████▆▄▃▂▂▁▁▁▁▁▂▂▂▂▃▃▄▄▃▃▃▃▄▃▄▃▄▃▂▂▁▁▁▁▂▂▂▂▁▁▂▂▂▂▂ ▃
  238 μs          Histogram: frequency by time          257 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

6.2 🪠 Pipes

Sometimes, it can be convenient to chain functions together in a single line. For simple expressions, this can be done in Julia using the “pipe” operator |>, which pipes the output from one expression into the input of the next. In other words, x |> y is equivalent to y(x).

The following are equivalent:

my_range |> collect |> sum

# and

sum(collect(my_range))

But this really only works for single-argument functions. As we’ll see, the Chain.jl package can be used for more complex operations. With Chain.jl, the result from each line of a calculation is passed implicitly as the first argument in the next.

using Chain

@chain my_range begin
    collect
    sum
end
1000000000000

This is of course a trivial example, we’ll see much more complicated versions in future tutorials.