join(["Hello", "world!"], " ")
"Hello world!"
At the highest level, all programming can be thought of as “things” and “actions.” The “things” are data like numbers or strings, or complex objects that contain other data like matrices.
The “actions” are functions. If you’ve done any programming, you’ve no doubt used functions that others have written. Even just displaying a number in the REPL implicitly requires a number of functions to be called. But you can go a long way programming in languages like R without ever writing a function yourself.
This can be nice, since it’s one less thing to think about. And really, you could also program in Julia without writing many functions. But it can also be stifling - if your problem doesn’t fit neatly into a function that someone else has written, you might get stuck. Any time that you’re going to repeat the same action more than two or three times, it’s probably worth writing a function to do it.
In Julia, writing functions is easy! And there’s nothing special about user-written functions and those that are built-in, so they can also be incredibly powerful.
Functions are the parts of computer programs that DO things. Let’s look at how to use them in Julia.
The 3 main components of a function are
For example, in the following expression:
join(["Hello", "world!"], " ")
"Hello world!"
The function name is join()
. There are two arguments: the vector ["Hello", "world!"]
and the string " "
. And the return value is a string, "Hello world!"
.
In Julia, function names can contain any alphanumerical characters (though they can’t start with numbers), and arguments are inside parentheses, separated by commas.
There are a lot of nuances to each of these components that will appear to violate the assumptions above - for example “anonymous” functions can be unnamed, the number of arguments can be zero, and functions can return nothing
. But we’ll get to that!
Probably the most common way to write a function in Julia is using the function
keyword. It looks like this:
function my_function(arg1, arg2, arg3)
# stuff
return nothing
end
my_function (generic function with 1 method)
In R, the equivalent would be something like:
<- function(arg1, arg2, arg3) {
my_function # stuff
NA
}
For very simple functions, Julia also has a one-line version for defining functions using the assignment (=
) operator. This is also known as the compact “assignment form”. In other words,
function f(x, a, b, c)
return a * x^2 + b * x + c
end
could instead be written:
f(x, a, b, c) = a * x^2 + b * x + c
Notice that this requires neither the function
nor end
keywords.
As mentioned above, function names can contain any alphanumeric symbols (including unicode symbols!) plus underscores (_
), though they can not start with numbers.
All of the following are valid function names:
tHe_BeSt_FuNcTiON()
th3w0r5t()
😉()
By convention, function names in Julia use only lowercase letters, and, unless they have long names that are tough to understand, don’t use underscores. That is, myfunc()
is preferred over my_func()
. Also, the convention is to avoid long names that would need underscores.
Arguments are the values passed to the function, and are separated by commas in between the parentheses. Functions may take any number of arguments, including zero. There are two kinds of arguments in Julia, “positional” arguments, and “keyword” arguments.
The arguments seen above are all examples of positional arguments (often called “args”), and as their name suggests, are determined by their position or order in the argument list. Be careful! It can be easy to confuse yourself if you name your variables and arguments the same way.
For example:
function afunction(thing1, thing2)
return "Here's thing1: $thing1. Here's thing2: $thing2"
end
afunction (generic function with 1 method)
afunction(10, 20)
"Here's thing1: 10. Here's thing2: 20"
= 100 thing1
100
= 200 thing2
200
afunction(thing2, thing1)
"Here's thing1: 200. Here's thing2: 100"
When calling the function in the preceding cell, the variable thing2
was placed in the first position, which is the argument thing1
. The variable thing1
was used as the second argument, thing2
. Inside the function, only the arguments thing1
and thing2
are considered.
For this reason, we usually try to avoid naming variables and arguments with similar names.
Often called “kwargs”, keyword arguments may be placed and called in any order, though all keyword arguments must come after all positional arguments. In Julia, the syntax to create a function with kwargs is to put them after a semicolon (;
).
function withkwargs(pos1; kwarg1, kwarg2)
return "Positional: $pos1. Keyword 1: $kwarg1. Keyword 2: $kwarg2."
end
withkwargs (generic function with 1 method)
withkwargs(1; kwarg2 = "world!", kwarg1 = "hello")
"Positional: 1. Keyword 1: hello. Keyword 2: world!."
When calling the function, the semicolon is not required. So, the call above could have been written withkwargs(1, kwarg2 = "world!", kwarg1 = "hello")
. By convention, we usually encourage the use of ;
to make the separation clear.
It is often useful to provide default values for function arguments. In Julia, this can be accomplished using =
in the argument list.
function withdefaults(pos1 = 1; a = 'a', b = "other")
return "Positional 1: $pos1. Keyword a: $a. Keyword b: $b"
end
withdefaults (generic function with 2 methods)
withdefaults()
"Positional 1: 1. Keyword a: a. Keyword b: other"
withdefaults("new!")
"Positional 1: new!. Keyword a: a. Keyword b: other"
withdefaults(a = 20)
"Positional 1: 1. Keyword a: 20. Keyword b: other"
You can provide defaults for all arguments, for none of them, or anything in between.
One caveat is that you cannot have a positional argument with a default followed by one without. In other words, you can do myfunc(a, b = 2, c = 3)
but not myfunc(a = 1, b, c)
.
In Julia, all functions return something. If you do not explicitly use the return
keyword, then the function will return the value of the last expression that is evaluated in the function.
For example:
function implicitreturn(a, b)
= a + b
x = x^2
y + 10
y end
implicitreturn (generic function with 1 method)
implicitreturn(10, 20) # (10 + 20) ^2 + 10
910
This is identical to the result had we done return y + 10
on the last line.
If a return
is encountered, however, the function will immediately exit, returning that value. For example:
function explicitreturn(a, b)
= a + b
x return y = x^2
+ 10
y end
explicitreturn (generic function with 1 method)
explicitreturn(10, 20)
900
Here, the y + 10
line is never evaluated. First, x
is assigned to a + b
, then the next line y = x^2
is evaluated, resulting in the value 900
, which is returned.
Why would we ever do this? Sometimes, it can be useful to leave a function early, if some value is encountered.
function bail_on_odd(arg)
if isodd(arg)
return "That's odd!"
end
= arg ÷ 2
new_val "Half of that value is $new_val"
end
bail_on_odd (generic function with 1 method)
bail_on_odd(4)
"Half of that value is 2"
bail_on_odd(3)
"That's odd!"
It is good style to explicitly use return
, rather than relying on the implicit return of the last expression. Also, if the point of your function is to do something where there isn’t an important return value, it is good practice to do return nothing
at the end to signal this intent.
Many functions, especially those used in data science, take other functions as arguments. For example, the map()
function takes a function as the first argument, and a collection as the second argument. It then calls the first argument on each item in the second argument, returning a vector of the result.
map(uppercase, ["hello", "world"])
2-element Vector{String}:
"HELLO"
"WORLD"
Sometimes, the function that we want doesn’t already exist. For example, suppose that we wanted to add an exclamation to the end of every string in a vector. We could write a named function, then use that in map
:
function addexclamation(s)
return string(s, "!")
end
addexclamation (generic function with 1 method)
addexclamation("hello")
"hello!"
addexclamation(3)
"3!"
map(addexclamation, ["hello", "world!"])
2-element Vector{String}:
"hello!"
"world!!"
But often, we are only planning to use the function once, and it is convenient to simply define the function right within the call to map
. In Julia, we do this with the syntax:
args... -> expression
For example, the map
example above could have been written as:
map(s -> string(s, "!"), ["hello", "world!"])
2-element Vector{String}:
"hello!"
"world!!"
do
blocksSometimes, our anonymous function is a bit more complicated than what can fit on one line comfortably. For example, in the example above, perhaps we don’t want to add !
if the string already ends with !
.
One way to make an expression take up multiple lines is using a begin
block we’ve seen before:
map(s -> begin
if endswith(s, "!")
return s
else
return string(s, "!")
end
end, ["hello", "world!"])
2-element Vector{String}:
"hello!"
"world!"
But, because this is such a common pattern, there’s a special syntax using the keyword do
.
This works if the first argument is a function, and in generic terms,
first_arg_func(x -> #= do stuff =#, other_args...)
is the same as
first_arg_func(other_args...) do x
#= do stuff =#
end
So the multi-line map
above could have been written a bit more clearly as:
map(["hello", "world!"]) do s
if endswith(s, "!")
return s
else
return string(s, "!")
end
end
2-element Vector{String}:
"hello!"
"world!"
Up until now, we have not mentioned types in our discussion of functions. In many cases, we can (and should!) write functions without specifying the types of the arguments. Writing generic code is a good thing, as well as being expedient.
But sometimes, we want a function to behave differently depending on the types of the arguments we pass to it.
For example, suppose we have some data that should contain only numbers, but sometimes it got incorrectly parsed. This is based on a problem one of us actually encountered in a dataset.
= [1, 2.3, "7", '5', 16, "23.2"] bad_data
6-element Vector{Any}:
1
2.3
"7"
'5': ASCII/Unicode U+0035 (category Nd: Number, decimal digit)
16
"23.2"
We’d like to convert all of these values into the correct number types. Julia has a function designed to convert strings to numbers, called parse()
:
= parse(Int, "5") parsed_int
5
typeof(parsed_int)
Int64
So what happens if we use that function on all of our inputs? Recall the map()
function can be used to apply a function to all of the elements of a collection:
map(x -> parse(Float64, x), bad_data)
MethodError: MethodError(parse, (Float64, 1), 0x0000000000007b0a)
MethodError: no method matching parse(::Type{Float64}, ::Int64)
Closest candidates are:
parse(!Matched::Type{Union{}}, ::Any...; kwargs...)
@ Base parse.jl:39
parse(::Type{T}, !Matched::AbstractString; kwargs...) where T<:Real
@ Base parse.jl:393
Stacktrace:
[1] (::var"#9#10")(x::Int64)
@ Main.Notebook ~/run/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/DataWranglingInJulia/03-functions.qmd:502
[2] iterate
@ ./generator.jl:47 [inlined]
[3] _collect(c::Vector{Any}, itr::Base.Generator{Vector{Any}, var"#9#10"}, ::Base.EltypeUnknown, isz::Base.HasShape{1})
@ Base ./array.jl:854
[4] collect_similar(cont::Vector{Any}, itr::Base.Generator{Vector{Any}, var"#9#10"})
@ Base ./array.jl:763
[5] map(f::Function, A::Vector{Any})
@ Base ./abstractarray.jl:3285
[6] top-level scope
@ ~/run/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/DataWranglingInJulia/03-functions.qmd:502
We get a MethodError
, which usually means that something about the types of our inputs is incorrect. In this case, upon reaching the first element of the vector, we tried to call parse(Float64, 1)
. The second argument here is a number instead of a string, which is not what parse()
is designed for.
Encountering error messages can be daunting, but once you learn to recognize the common kinds of errors, you will be able to find the solution much more quickly!
MethodError
s are pretty common, and typically mean that the types of some of your arguments aren’t what you thought, or that the function you’re calling doesn’t work on the inputs
We’ll need another approach. But first, some background.
So far, we have talked about functions as if each function name refers to a single thing. In fact, the same function name can refer to a whole host of methods, which operate on different combinations of argument numbers or argument types.
The process that the language uses to decide on which method to call is called “dispatch,” and one of the special sauces of Julia is that is uses multiple dispatch, which means that the types of each argument, rather than only one of them, is used to decide on the method.
This is all somewhat abstract; let’s look at some concrete examples.
Below, I’m using the compact “assignment syntax” to create functions. Recall that, for example,
function foo(x)
return x + 2
end
is the same as
foo(x) = x + 2
typecaller(x) = "Hmm - I don't know that type: $(typeof(x))"
typecaller (generic function with 1 method)
Notice the output of this function definition: typecaller (generic function with 1 method)
.
typecaller(x::Float64) = "Oooh, a Float64, I like those!"
typecaller (generic function with 2 methods)
And now, the output is: typecaller (generic function with 2 methods)
.
We now have 2 methods for the same function, typecaller()
. In defining the first method, we did not specify the type of x
, so it will work on any type. Or, more technically, any type that is a subtype of Any
, which as we learned in the types tutorial, is all types.
For the second method, we used x::Float64
- this is the syntax for specifying a type constraint on a method. Julia will always dispatch to the most specific method for its arguments, so this is the method that would be called any time a Float64 is given as an argument.
typecaller(2.2)
"Oooh, a Float64, I like those!"
typecaller("2.2")
"Hmm - I don't know that type: String"
But other types of floats (eg 32-bit floats) don’t have more specific methods, so they fall back to the Any
method.
typecaller(Float32(2.2))
"Hmm - I don't know that type: Float32"
Often, we want to write methods as generically as possible. Recall that types exist in a hierarchy with Any
at the top.
supertype(Float64)
AbstractFloat
supertype(Float32)
AbstractFloat
All individual objects have concrete types, but we can write generic methods that operate on abstract types like AbstractFloat
.
I could do
typecaller(x::AbstractFloat) = "Hey look - some generic float type!"
Let’s get back to our example from before - we want to write a function that will use parse()
to convert strings to numbers, but won’t fail when given a number.
First, let’s write a generic “fallback” method, that throws an informative error if it encounters something unexpected. When writing data cleaning functions, this can be extremely helpful, since it can flag problems in large datasets that would otherwise be a pain to track down.
# note - we don't specify a type - this is the same as `x::Any`
numberify(x) = error("Can't numberify $x, which has type $(typeof(x))")
numberify (generic function with 1 method)
Then, we’ll add a method for when the argument is a number that will just return the argument unchanged.
There are a bunch of number types, which you can explore by calling supertypes()
using InteractiveUtils
supertypes(Float64)
(Float64, AbstractFloat, Real, Number, Any)
supertypes(Int64)
(Int64, Signed, Integer, Real, Number, Any)
Here, I’ll use Real
, since this covers all integer and float types, but not imaginary numbers.
numberify(x::Real) = x
numberify (generic function with 2 methods)
Finally, we’ll define a method that works on AbstractString
s, calling parse()
. To add a bit extra, we’ll check if the number has a .
, in which case, we’ll parse it as a float, otherwise we’ll parse it as an integer.
If you ever need to do something like this in real code, just looking for a .
is probably not sufficient, since floating point numbers can also be written as eg 6e10
, or in some datasets, using ,
for decimals etc.
A more robust alternative might be to use regular expressions.
function numberify(x::AbstractString)
if contains(x, ".")
return parse(Float64, x)
else
return parse(Int, x)
end
end
numberify (generic function with 3 methods)
Recall our bad_data
from before:
bad_data
6-element Vector{Any}:
1
2.3
"7"
'5': ASCII/Unicode U+0035 (category Nd: Number, decimal digit)
16
"23.2"
Now, let’s try using numberify
to fix it:
= numberify.(bad_data) fixed_data
ErrorException: ErrorException("Can't numberify 5, which has type Char")
Can't numberify 5, which has type Char
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] numberify(x::Char)
@ Main.Notebook ~/run/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/DataWranglingInJulia/03-functions.qmd:642
[3] _broadcast_getindex_evalf
@ ./broadcast.jl:709 [inlined]
[4] _broadcast_getindex
@ ./broadcast.jl:682 [inlined]
[5] getindex
@ ./broadcast.jl:636 [inlined]
[6] copyto_nonleaf!(dest::Vector{Real}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(numberify), Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, iter::Base.OneTo{Int64}, state::Int64, count::Int64)
@ Base.Broadcast ./broadcast.jl:1098
[7] restart_copyto_nonleaf!(newdest::Vector{Real}, dest::Vector{Int64}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(numberify), Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, val::Float64, I::Int64, iter::Base.OneTo{Int64}, state::Int64, count::Int64)
@ Base.Broadcast ./broadcast.jl:1089
[8] copyto_nonleaf!(dest::Vector{Int64}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(numberify), Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, iter::Base.OneTo{Int64}, state::Int64, count::Int64)
@ Base.Broadcast ./broadcast.jl:1105
[9] copy
@ ./broadcast.jl:950 [inlined]
[10] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(numberify), Tuple{Vector{Any}}})
@ Base.Broadcast ./broadcast.jl:903
[11] top-level scope
@ ~/run/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/DataWranglingInJulia/03-functions.qmd:705
Ahh, yes. Our data contains '5'
, which is a character (Char
), which is not a subtype of AbstractString
. Our fall-back method caught it, and gave us some useful information.
If we hadn’t done that, we would have seen a MethodError
, since contains()
, which we used to check the contents of the string, isn’t defined for Char
.
contains('5', ".")
MethodError: MethodError(contains, ('5', "."), 0x0000000000007b0f)
MethodError: no method matching contains(::Char, ::String)
Closest candidates are:
contains(::Any)
@ Base strings/util.jl:186
contains(!Matched::AbstractString, ::Any)
@ Base strings/util.jl:129
Stacktrace:
[1] top-level scope
@ ~/run/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/DataWranglingInJulia/03-functions.qmd:721
There are a couple of approaches one could take, but one of the neat things about Julia’s approach to function methods, is that we can use one method to convert arguments, then call a different method of the same function!
For example, we can define a method that works for Char
by converting the argument to a string, then calls numberify()
on the result.
numberify(x::Char) = numberify(string(x))
numberify (generic function with 4 methods)
Writing functions, even simple anonymous functions, can be an extremely powerful and flexible tool in data exploration and analysis. Many times, you will encounter problems that are unique to your data, or problems that may have solutions in some package, but you don’t want to take the time to search, or you don’t want to take on a whole package for a simple solution.
As noted above, in other languages, like R, there are good reasons for users to avoid writing their own functions - they are often slower, or can be very difficult to implement correctly. This is far less true in Julia.
Give it a shot!