42 * 2
84
You’ve already seen how to assign variables in Julia, which is not so dissimilar from R
. Many other things will be familiar as well, and throughout these tutorials, we will try to point out where differences exist or may be confusing.
Let’s start with Numbers!
Julia was designed for technical and mathematical computing, and a great deal of effort has been put in to make math in code look like and work like math written on paper.
This means that a lot of simple operations work just like you would expect:
42 * 2
84
1.3e4 / 1000
13.0
5 % 2 # remainder
1
#
makes the remainder of the line into a comment, just like R and Python
# order of operations is PEMDAS
1 + 2)^3 * 2 + 1 # 3^3 * 2 + 1 => 27 * 2 + 1 => 54 + 1 (
55
Some mathematical operations use functions, which just like in R, are called using the function name, with arguments to the function surrounded by parentheses:
sqrt(10)
3.1622776601683795
In R, the above looks like:
> sqrt(10)
1] 3.162278 [
But many functions of this sort also have unicode-based equivalents. For example, the following is identical to sqrt(10)
:
10 # this is typed \sqrt<TAB> √
3.1622776601683795
In fact, all of the mathematical symbols above are actually Julia functions. For example, 3 + 4
is actually just shorthand for +(3, 4)
You’ll learn much more about Julia functions in a future tutorial.
Boolean values are lowercase in Julia (eg true
and false
rather than TRUE
and FALSE
), but you can do basic comparisons as you do in R:
1 < 3 # 1 is less than 3
true
5 * 2 == 11 # 5 * 2 is equal to 11
false
And you can negate a boolean expression with !
5 * 2 == 11) # or, in this case, 5 * 2 != 11 !(
true
There are also many functions that return boolean values that are often used for conditional evaluation (if / else statements).
isodd(3)
true
isodd(3) # read "3 is not odd" !
false
Boolean expressions can also be combined using &&
for “AND” and ||
for “OR”. &&
returns true
if both statements are true
, while ||
returns true
if either statement is true
.
For example:
isodd(3) && isodd(4) # 3 is odd AND 4 is odd
false
iseven(3) || iseven(4) # 3 is even OR 4 is even
true
In Julia Boolean values are a subtype of Integer
, and can be used in some mathematical operations as 0 (for false
) and 1 (for true
). For example:
julia> 1 + true
2
But the reverse is not true. That is, you cannot use 1
in an if/else statement. This is in contrast to R, where any number other than 0
is considered TRUE
, and 0
is considered FALSE
.
$> ifelse(1, 10, 20)
r1] 10
[
$> ifelse(2, 10, 20)
r1] 10
[
$> ifelse(0, 10, 20)
r1] 20 [
In Julia, this will throw an error:
julia> ifelse(1, 10, 20)
ERROR: TypeError: non-boolean (Int64) used in boolean context
You can, however, explicitly convert a Boolean to an integer with eg Int(false)
and Int(true)
, or convert a 1 or 0 to true
or false
with eg Bool(1)
.
Using Boolean expressions is quite common in data analysis, for example to filter on observations that meet some criteria. We will see many more examples in future tutorials.
In Julia, strings are surrounded by double quotes ("
) only. Single quotes are only used for individual characters
'C'
'C': ASCII/Unicode U+0043 (category Lu: Letter, uppercase)
# this is an error
'Hello, World!'
ErrorException: ErrorException("syntax: character literal contains multiple characters")
# this is what we meant
"Hello, World!"
"Hello, World!"
To concatenate strings, use the string()
function, or multiply them.
string("Hello", " ", "world!")
"Hello world!"
"Hello" * " " * "world!"
"Hello world!"
Pattern matching can be done with the contains()
function. This is a boolean function that takes 2 arguments: the first is a string that you’re searching in, the second is that pattern that you’re searching for.
contains("banana", "ana")
true
contains("banana", "lana")
false
If you know regular expressions, you can use those as the second argument as well. In Julia, you can make a regular expression using a special “string literal” macro, eg r"my regex"
.
Don’t worry if you don’t know what this means.
contains("banana", r"(an){2}")
true
There are also a handful of functions for modifying strings.
= "Let's see what I can do with this 😀" my_string
"Let's see what I can do with this 😀"
uppercase(my_string)
"LET'S SEE WHAT I CAN DO WITH THIS 😀"
lowercase(my_string)
"let's see what i can do with this 😀"
replace(my_string, "this" => "that")
"Let's see what I can do with that 😀"
split(my_string, 'w') # this makes a "vector" of substrings
3-element Vector{SubString{String}}:
"Let's see "
"hat I can do "
"ith this 😀"
For (MUCH) more on strings, check out the strings tutorial.
Things like strings, integers, and floating point values are examples of “scalar” types, but you’re probably also familiar with container types, such as vectors, which are 1-dimensional, ordered containers.
In R, vectors are created with the syntax c(10,20,30)
. In Julia, the same operation is [10,20,30]
.
As in R, vectors can be “indexed” or “sliced” using brackets. For example, in R
> my_vec <- c(10,20,30,40,50)
> my_vec[3]
1] 30
[> my_vec[3:5]
1] 30 40 50
[> my_vec[c(1,4)]
1] 10 40 [
In Julia, the same tasks are accomplished thusly:
= [10, 20, 30, 40, 50] my_vec
5-element Vector{Int64}:
10
20
30
40
50
3] my_vec[
30
3:5] my_vec[
3-element Vector{Int64}:
30
40
50
1, 4]] my_vec[[
2-element Vector{Int64}:
10
40
Julia also has a special keyword end
that can stand in for the last index of the container.
end-2:end] my_vec[
3-element Vector{Int64}:
30
40
50
Vectors in Julia are “mutable”, which means you can change the contents - updating individual indices, or adding and removing elements.
= [10, 20, 30, 40, 50]
new_vec 3] = 100 # Change the 3rd element to 100
new_vec[push!(new_vec, -1) # add -1 to the end
new_vec
6-element Vector{Int64}:
10
20
100
40
50
-1
These operations are equivalent to the following in R:
> my_vec[3] <- 100
> append(my_vec, -1)
1] 10 20 100 40 50 -1 [
There is also a function append!()
in Julia that acts on vectors. Unlike in R, this function is not typically used to add a single element, but rather to add each element of another collection. Note this difference in the following:
= [0.0, "hello", 42]
a_vector push!(a_vector, ['a', 'b'])
a_vector
4-element Vector{Any}:
0.0
"hello"
42
['a', 'b']
= [0.0, "hello", 42]
a_vector append!(a_vector, ['a', 'b'])
a_vector
5-element Vector{Any}:
0.0
"hello"
42
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
There are a number of other container types in Julia as well, some of which are mutable, and some of which are not. We’ll get to some examples later, but in general, they use the same indexing convention (eg the_container[the_index]
).
A Matrix
in Julia is a two-dimensional array, and works much like a Vector
, with the added syntax that you can index in two dimensions, separated by a ,
where the first index refers to the row number, and the second index refers to the column number. For aficionados, Julia is “column major”.
Matrices can be constructed using the following multi-line syntax:
= [
my_matrix 1 10 100
2 20 200
3 30 300
4 40 400
]
4×3 Matrix{Int64}:
1 10 100
2 20 200
3 30 300
4 40 400
Or in a single line, replacing line breaks with ;
:
1 10 100; 2 20 200; 3 30 300; 4 40 400] == my_matrix [
true
You can also use the reshape()
function to convert a vector (or other iterable) into a Matrix. The reshape()
function takes 3 arguments - (1) the thing being reshaped, (2) the size of the first (row) dimension, and (3) the size of the second (column) dimension.
reshape([1, 2, 3, 4, 5, 6], 3, 2)
3×2 Matrix{Int64}:
1 4
2 5
3 6
1, 2] # first row, second column my_matrix[
10
2, end] # second row, last column my_matrix[
200
2:4, 1] # 2nd-4th rows, first column. Returns vector my_matrix[
3-element Vector{Int64}:
2
3
4
2:4, [1, 3]] # 2nd-4th rows, first and 3rd columns. my_matrix[
3×2 Matrix{Int64}:
2 200
3 300
4 400
2:4, [1]] # 2nd-4th rows, first column. Returns matrix my_matrix[
3×1 Matrix{Int64}:
2
3
4
Matrices in Julia also have a linear index, which counts down the rows, column by column.
4] my_matrix[
4
5] my_matrix[
10
and you can flatten this linear index into a vector using the vec()
function,
vec(my_matrix)
12-element Vector{Int64}:
1
2
3
4
10
20
30
40
100
200
300
400
or using reshape()
reshape(my_matrix, 12)
12-element Vector{Int64}:
1
2
3
4
10
20
30
40
100
200
300
400
Notice that only a single dimension is provided. If you call reshape(my_matrix, 12, 1)
, you would get a Matrix
again, only one with a single column.
Dictionaries, like “lists” in R
are unordered containers of key
, value
pairs. Both keys and values can have any type, eg strings, numbers (integer or float), symbols.
= Dict("a key" => "a value", 'b' => 42, 1 => 2) my_dict
Dict{Any, Any} with 3 entries:
"a key" => "a value"
'b' => 42
1 => 2
You access the values in a dictionary just like indexing into a vector, only you use the keys instead of linear indices.
"a key"] my_dict[
"a value"
1] my_dict[
2
If you try to index using a key that doesn’t exist, you’ll get an error, but if you’re not sure whether the key exists, you can also use the get()
function, which allows you to provide a value to return in case the key doesn’t exist:
"another key"] my_dict[
KeyError: KeyError("another key")
get(my_dict, "another key", "hey, that doesn't exist!")
"hey, that doesn't exist!"
Alternatively, you can check whether a dictionary has a key using the boolean haskey()
function (it returns either true
or false
):
haskey(my_dict, 'a')
false
haskey(my_dict, 'b')
true
Finally, you can update an entry or insert a new entry using the familiar assignment (=
) syntax
"new key"] = "😉" my_dict[
"😉"
It is often useful to apply the same operation on each object in a container. In R, those are often done implicitly, but in Julia, you must be explicit.
For example, let’s say we want to calculate the square root of each number in a vector. In R, you would just call sqrt()
on the vector:
$> a_vec <- c(1,2,3,4,5)
r
$> sqrt(a_vec)
r1] 1.000000 1.414214 1.732051 2.000000 2.236068 [
But in Julia, the square root of a vector is undefined:
= [1, 2, 3, 4, 5] a_vec
5-element Vector{Int64}:
1
2
3
4
5
sqrt(a_vec)
MethodError: MethodError(sqrt, ([1, 2, 3, 4, 5],), 0x00000000000082cc)
In Julia, there are several different ways to accomplish this.
In R, it’s very important to make sure all operations are vectorized, since loops written in R are incredibly slow. This is not true in Julia - loops can sometimes be faster!
map
The map()
function takes a function as its first argument, and a container as the second. It then applies the function to each item in the container, returning another container. This is analogous to the sapply()
function in R, though it is much more flexible as we’ll see in future tutorials.
For example:
map(sqrt, a_vec)
5-element Vector{Float64}:
1.0
1.4142135623730951
1.7320508075688772
2.0
2.23606797749979
Note that the order of arguments is reversed relative to sapply()
. In Julia, the function being applied comes first, and the container it applies to comes second.
Julia functions that are verbs can often be reasoned about if you put them into a sentence, with the arguments in the same order.
Eg. map(sqrt, a_vec)
is “Map sqrt
to a_vec
”, and contains("banana", "ana")
is “banana contains ana?”
For more on map()
and using it to apply functions, see the Functions tutorial.
reduce
, and mapreduce
It can also be useful to collapse a container into a single value using some operation. We can do this using the reduce()
function (which works similarly to reduce()
from the purrr
package in R). For example, suppose that you want to multiply all of the numbers in a vector to one another.
reduce(*, a_vec)
120
Keep in mind that you should only use commutative operations or operations where the order doesn’t matter. To be fast, reduce()
may apply the operation on items in an order you don’t expect.
The mapreduce()
function is like combining map()
and reduce()
. In other words, mapreduce(op1, op2, container)
should be identical to reduce(op2, map(op1, container))
, with the benefit that Julia doesn’t need to make the intermediate container (for reasons not worth going into, creating large vectors can be slow).
So, if we want to multiply all of the square roots of a_vec
:
mapreduce(sqrt, *, a_vec)
10.954451150103324
# just to prove it
reduce(*, map(sqrt, a_vec))
10.954451150103324
Containers can also be created using “comprehensions.” If you are familiar with using for
loops, comprehensions are like mini for
loops, and even have a similar syntax in Julia.
For example, the following is identical to map(sqrt, a_vec)
sqrt(x) for x in a_vec] [
5-element Vector{Float64}:
1.0
1.4142135623730951
1.7320508075688772
2.0
2.23606797749979
One exceptionally useful thing about comprehensions is that they can be combined with conditional evaluation, so that only things that match some boolean statement will be included. For example, the following only takes the square root of odd numbers:
sqrt(x) for x in a_vec if isodd(x)] [
3-element Vector{Float64}:
1.0
1.7320508075688772
2.23606797749979
We can also make dictionaries and other containers
# for reference
my_dict
Dict{Any, Any} with 4 entries:
"new key" => "😉"
"a key" => "a value"
'b' => 42
1 => 2
Dict(k => my_dict[k] for k in keys(my_dict) if k isa String)
Dict{String, String} with 2 entries:
"new key" => "😉"
"a key" => "a value"
You can do a lot in Julia without worrying too much about the type
s of the objects that you’re working with. But everything in Julia
has a type, and it’s good to be aware of them, if only to recognize errors that might show up due to them.
In Julia, types exist in a hierarchy. Every object has a “concrete” type, and some number of “abstract” parent types.
For example, Int16
, Int32
, and Int64
are concrete types representing 16-bit, 32-bit, and 64-bit integers respectively. All of these types are subtypes of the abstract type Signed
, which is itself a subtype of Integer
(there are also “unsigned” integer types, like UInt64
).
A Float64
is a 64-bit floating point number. It’s not a subtype of Integer
, but it shares the abstract type Real
with all Integer
types.
typeof(1)
Int64
typeof(1.0)
Float64
supertype(Int64)
Signed
Or view all of the supertypes:
using InteractiveUtils: supertypes
supertypes(Int64)
(Int64, Signed, Integer, Real, Number, Any)
1.0 isa Integer
false
1.0 isa Float64
true
1.0 isa Real
true
1 isa Float64
false
1 isa Real
true
Containers also have types, and in fact are generally “parameterized” based on the types they contain.
= Dict('a' => 1, 'b' => 2, 'c' => 3) new_dict
Dict{Char, Int64} with 3 entries:
'a' => 1
'c' => 3
'b' => 2
typeof(new_dict)
Dict{Char, Int64}
Notice the Char
and Int64
inside the curly braces - those represent the types of the keys and values respectively.
Why do I bring this up now? Well, look what happens when I try to add a new key / value pairs, without paying attention to the types:
'd'] = 4.0 new_dict[
4.0
typeof(4.0)
Float64
typeof(new_dict['d'])
Int64
'e'] = 4.5 new_dict[
InexactError: InexactError(:Int64, Int64, 4.5)
When I added the value 4.0
, even though it was a Float64
, Julia was able to coerce it into an Int64
. But 4.5
can’t be converted to an integer without losing information. we could explicitly round it, but Julia won’t do that for us.
'e'] = round(Int, 4.5) new_dict[
4
"I'm a String, not a Char"] = 5 new_dict[
MethodError: MethodError(convert, (Char, "I'm a String, not a Char"), 0x00000000000082d1)
So why was I able to add all kinds of different keys and values to my_dict
up above? Take a look at its type signature:
typeof(my_dict)
Dict{Any, Any}
In Julia, all types are subtypes of Any
. Because I initially made the dictionary with a bunch of different types, Julia could not provide it with a specific parameterization, so it just did the broadest possible one.
Here are some other examples of type issues in containers. Don’t worry too much about the details, but try to pay attention to what types you’d expect, what actually happens, and the errors that are (or are not!) induced:
= [10, 11.0, 12] floatvec
3-element Vector{Float64}:
10.0
11.0
12.0
typeof(floatvec[1])
Float64
= Int64[10, 11.0, 12] intvec
3-element Vector{Int64}:
10
11
12
typeof(intvec[2])
Int64
= Any[10, 11.0, 12] anyvec
3-element Vector{Any}:
10
11.0
12
Int64[3, 3.5, 4]
InexactError: InexactError(:Int64, Int64, 3.5)
push!(intvec, 12.5)
InexactError: InexactError(:Int64, Int64, 12.5)
= 10 anum
10
typeof(anum)
Int64
push!(floatvec, anum)
4-element Vector{Float64}:
10.0
11.0
12.0
10.0
typeof(floatvec[4])
Float64
push!(intvec, '1') # 49
4-element Vector{Int64}:
10
11
12
49
This one surprised me too! Character literals (like ‘1’) are based on the UTF-8 standard, where each character has a numerical value, which can be converted to an integer.
push!(intvec, "1")
MethodError: MethodError(convert, (Int64, "1"), 0x00000000000082d1)
push!(anyvec, "1")
4-element Vector{Any}:
10
11.0
12
"1"
push!(intvec, parse(Int64, "1"))
5-element Vector{Int64}:
10
11
12
49
1
Here are some additional bits that are useful to introduce at an early stage. You don’t need to keep these things in your head, but hopefully when you see them later, it will jog your memory.
Many “array-like” things in Julia aren’t actually arrays, but can be treated as such. This has a number of advantages.
For example, consider an array of odd numbers from 1 to 2,000,000. To put this into an array, you would need to store 1 million integer objects (assuming the typical 64-bit integer, that’s 8 Mb of memory).
Instead, you can store 3 integers in a “range”.
Writing 2,000,000
would be parsed as a tuple (2, 0, 0)
, rather than as the integer 2 million. Instead, we can use _
for visual separation, which the Julia parser ignores in integers. So 2_000_000
is identical to 2000000
or 2_00000_0
# range syntax for `start : step : stop`
= 1:2:2_000_000 my_range
1:2:1999999
This doesn’t actually materialize any of the numbers that are part of this range, but can still be indexed into, or used for indexing
sizeof(my_range) # gives size in bytes
24
1000] # the thousandth number in the range my_range[
1999
1000, 1200, 11]] my_range[[
3-element Vector{Int64}:
1999
2399
21
And some algorithms can use fancy tricks to optimize calculations. For example, sum
can use an optimization to calculate this almost instantly
using BenchmarkTools
@benchmark sum($my_range) # less than 5 nanoseconds
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 5.489 ns … 24.052 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 5.500 ns ┊ GC (median): 0.00%
Time (mean ± σ): 5.560 ns ± 0.411 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
██▄▅▂ ▅▁ ▂
██████▇▆▁▅▃▃▄▁▆▅▅▃▁▃▃▃▄▄▁▁▁▁▁▁▁▁▁▇███▇▄▃▃▁▁▁▃▁▁▁▃▁▄▄▃▁▁▁▁▃ █
5.49 ns Histogram: log(frequency) by time 6.02 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
A range (and some other types) can work just like a vector because it is a subtype of AbstractArray
, and many functions don’t care about the internal details, they just care that they can get out indices, know the length of the object, etc. Many other “iterators” work the same way.
Nevertheless, sometimes you do actually need the concrete vector, in which case you can use the collect()
function:
typeof(my_range)
StepRange{Int64, Int64}
= collect(my_range) range_as_vector
typeof(range_as_vector)
Vector{Int64} (alias for Array{Int64, 1})
sizeof(range_as_vector) # compare this to the 24 bytes used before
8000000
@benchmark sum($range_as_vector)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 220.313 μs … 2.197 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 221.999 μs ┊ GC (median): 0.00%
Time (mean ± σ): 224.189 μs ± 20.990 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▃▇█▇▅▃▁▁▁▁ ▁▁▂▁▁▁ ▁▂▂▃▃▄▃▃▃▃▂▁▁ ▂
▄▆▇█████████████▇▆▇▇████████████████████▇▇▆▆▅▆▅▅▅▇▇██████▇▇▇ █
220 μs Histogram: log(frequency) by time 234 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
Sometimes, it can be convenient to chain functions together in a single line. For simple expressions, this can be done in Julia using the “pipe” operator |>
, which pipes the output from one expression into the input of the next. In other words, x |> y
is equivalent to y(x)
.
The following are equivalent:
|> collect |> sum
my_range
# and
sum(collect(my_range))
But this really only works for single-argument functions. As we’ll see, the Chain.jl
package can be used for more complex operations. With Chain.jl
, the result from each line of a calculation is passed implicitly as the first argument in the next.
using Chain
@chain my_range begin
collect
sumend
1000000000000
This is of course a trivial example, we’ll see much more complicated versions in future tutorials.