Functions

Authors

Jose Storopoli

Kevin Bonham

Juan Oneto

At the highest level, all programming can be thought of as “things” and “actions.” The “things” are data like numbers or strings, or complex objects that contain other data like matrices.

The “actions” are functions. If you’ve done any programming, you’ve no doubt used functions that others have written. Even just displaying a number in the REPL implicitly requires a number of functions to be called. But you can go a long way programming in languages like R without ever writing a function yourself.

This can be nice, since it’s one less thing to think about. And really, you could also program in Julia without writing many functions. But it can also be stifling - if your problem doesn’t fit neatly into a function that someone else has written, you might get stuck. Any time that you’re going to repeat the same action more than two or three times, it’s probably worth writing a function to do it.

In Julia, writing functions is easy! And there’s nothing special about user-written functions and those that are built-in, so they can also be incredibly powerful.

1 ⚙️ Function basics

Functions are the parts of computer programs that DO things. Let’s look at how to use them in Julia.

1.1 🦾 Parts of functions

The 3 main components of a function are

  1. The function name: how we refer to the function, and how we “call” it (make it do the thing it does)
  2. The arguments: the “things” that the function performs its action(s) on.
  3. The return value: the final output of the function.

For example, in the following expression:

join(["Hello", "world!"], " ")
"Hello world!"

The function name is join(). There are two arguments: the vector ["Hello", "world!"] and the string " ". And the return value is a string, "Hello world!".

In Julia, function names can contain any alphanumerical characters (though they can’t start with numbers), and arguments are inside parentheses, separated by commas.

There are a lot of nuances to each of these components that will appear to violate the assumptions above - for example “anonymous” functions can be unnamed, the number of arguments can be zero, and functions can return nothing. But we’ll get to that!

1.2 ✍️ Function definition syntax

Probably the most common way to write a function in Julia is using the function keyword. It looks like this:

function my_function(arg1, arg2, arg3)
    # stuff
    return nothing
end
my_function (generic function with 1 method)

In R, the equivalent would be something like:

my_function <- function(arg1, arg2, arg3) {
    # stuff
    NA
}
Tip

For very simple functions, Julia also has a one-line version for defining functions using the assignment (=) operator. This is also known as the compact “assignment form”. In other words,

function f(x, a, b, c)
    return a * x^2 + b * x + c
end

could instead be written:

f(x, a, b, c) = a * x^2 + b * x + c

Notice that this requires neither the function nor end keywords.

1.3 🗣️ Function names

As mentioned above, function names can contain any alphanumeric symbols (including unicode symbols!) plus underscores (_), though they can not start with numbers.

All of the following are valid function names:

  • tHe_BeSt_FuNcTiON()
  • th3w0r5t()
  • 😉()

By convention, function names in Julia use only lowercase letters, and, unless they have long names that are tough to understand, don’t use underscores. That is, myfunc() is preferred over my_func(). Also, the convention is to avoid long names that would need underscores.

1.4 🎰 Function arguments

Arguments are the values passed to the function, and are separated by commas in between the parentheses. Functions may take any number of arguments, including zero. There are two kinds of arguments in Julia, “positional” arguments, and “keyword” arguments.

1.4.1 Positional arguments

The arguments seen above are all examples of positional arguments (often called “args”), and as their name suggests, are determined by their position or order in the argument list. Be careful! It can be easy to confuse yourself if you name your variables and arguments the same way.

For example:

function afunction(thing1, thing2)
    return "Here's thing1: $thing1. Here's thing2: $thing2"
end
afunction (generic function with 1 method)
afunction(10, 20)
"Here's thing1: 10. Here's thing2: 20"
thing1 = 100
100
thing2 = 200
200
afunction(thing2, thing1)
"Here's thing1: 200. Here's thing2: 100"

When calling the function in the preceding cell, the variable thing2 was placed in the first position, which is the argument thing1. The variable thing1 was used as the second argument, thing2. Inside the function, only the arguments thing1 and thing2 are considered.

For this reason, we usually try to avoid naming variables and arguments with similar names.

1.4.2 Keyword arguments

Often called “kwargs”, keyword arguments may be placed and called in any order, though all keyword arguments must come after all positional arguments. In Julia, the syntax to create a function with kwargs is to put them after a semicolon (;).

function withkwargs(pos1; kwarg1, kwarg2)
    return "Positional: $pos1. Keyword 1: $kwarg1. Keyword 2: $kwarg2."
end
withkwargs (generic function with 1 method)
withkwargs(1; kwarg2 = "world!", kwarg1 = "hello")
"Positional: 1. Keyword 1: hello. Keyword 2: world!."
Tip

When calling the function, the semicolon is not required. So, the call above could have been written withkwargs(1, kwarg2 = "world!", kwarg1 = "hello"). By convention, we usually encourage the use of ; to make the separation clear.

1.4.3 Default values

It is often useful to provide default values for function arguments. In Julia, this can be accomplished using = in the argument list.

function withdefaults(pos1 = 1; a = 'a', b = "other")
    return "Positional 1: $pos1. Keyword a: $a. Keyword b: $b"
end
withdefaults (generic function with 2 methods)
withdefaults()
"Positional 1: 1. Keyword a: a. Keyword b: other"
withdefaults("new!")
"Positional 1: new!. Keyword a: a. Keyword b: other"
withdefaults(a = 20)
"Positional 1: 1. Keyword a: 20. Keyword b: other"

You can provide defaults for all arguments, for none of them, or anything in between.

One caveat is that you cannot have a positional argument with a default followed by one without. In other words, you can do myfunc(a, b = 2, c = 3) but not myfunc(a = 1, b, c).

1.5 🎾 Return values

In Julia, all functions return something. If you do not explicitly use the return keyword, then the function will return the value of the last expression that is evaluated in the function.

For example:

function implicitreturn(a, b)
    x = a + b
    y = x^2
    y + 10
end
implicitreturn (generic function with 1 method)
implicitreturn(10, 20) # (10 + 20) ^2 + 10
910

This is identical to the result had we done return y + 10 on the last line.

If a return is encountered, however, the function will immediately exit, returning that value. For example:

function explicitreturn(a, b)
    x = a + b
    return y = x^2
    y + 10
end
explicitreturn (generic function with 1 method)
explicitreturn(10, 20)
900

Here, the y + 10 line is never evaluated. First, x is assigned to a + b, then the next line y = x^2 is evaluated, resulting in the value 900, which is returned.

Why would we ever do this? Sometimes, it can be useful to leave a function early, if some value is encountered.

function bail_on_odd(arg)
    if isodd(arg)
        return "That's odd!"
    end

    new_val = arg ÷ 2
    "Half of that value is $new_val"
end
bail_on_odd (generic function with 1 method)
bail_on_odd(4)
"Half of that value is 2"
bail_on_odd(3)
"That's odd!"
Note

It is good style to explicitly use return, rather than relying on the implicit return of the last expression. Also, if the point of your function is to do something where there isn’t an important return value, it is good practice to do return nothing at the end to signal this intent.

2 🥷 Anonymous functions

Many functions, especially those used in data science, take other functions as arguments. For example, the map() function takes a function as the first argument, and a collection as the second argument. It then calls the first argument on each item in the second argument, returning a vector of the result.

map(uppercase, ["hello", "world"])
2-element Vector{String}:
 "HELLO"
 "WORLD"

Sometimes, the function that we want doesn’t already exist. For example, suppose that we wanted to add an exclamation to the end of every string in a vector. We could write a named function, then use that in map:

function addexclamation(s)
    return string(s, "!")
end
addexclamation (generic function with 1 method)
addexclamation("hello")
"hello!"
addexclamation(3)
"3!"
map(addexclamation, ["hello", "world!"])
2-element Vector{String}:
 "hello!"
 "world!!"

But often, we are only planning to use the function once, and it is convenient to simply define the function right within the call to map. In Julia, we do this with the syntax:

args... -> expression

For example, the map example above could have been written as:

map(s -> string(s, "!"), ["hello", "world!"])
2-element Vector{String}:
 "hello!"
 "world!!"

2.0.1 do blocks

Sometimes, our anonymous function is a bit more complicated than what can fit on one line comfortably. For example, in the example above, perhaps we don’t want to add ! if the string already ends with !.

One way to make an expression take up multiple lines is using a begin block we’ve seen before:

map(s -> begin
    if endswith(s, "!")
        return s
    else
        return string(s, "!")
    end
end, ["hello", "world!"])
2-element Vector{String}:
 "hello!"
 "world!"

But, because this is such a common pattern, there’s a special syntax using the keyword do.

This works if the first argument is a function, and in generic terms,

first_arg_func(x -> #= do stuff =#, other_args...)

is the same as

first_arg_func(other_args...) do x
  #= do stuff =#
end

So the multi-line map above could have been written a bit more clearly as:

map(["hello", "world!"]) do s
    if endswith(s, "!")
        return s
    else
        return string(s, "!")
    end
end
2-element Vector{String}:
 "hello!"
 "world!"

3 Dispatch

Up until now, we have not mentioned types in our discussion of functions. In many cases, we can (and should!) write functions without specifying the types of the arguments. Writing generic code is a good thing, as well as being expedient.

But sometimes, we want a function to behave differently depending on the types of the arguments we pass to it.

For example, suppose we have some data that should contain only numbers, but sometimes it got incorrectly parsed. This is based on a problem one of us actually encountered in a dataset.

bad_data = [1, 2.3, "7", '5', 16, "23.2"]
6-element Vector{Any}:
  1
  2.3
   "7"
   '5': ASCII/Unicode U+0035 (category Nd: Number, decimal digit)
 16
   "23.2"

We’d like to convert all of these values into the correct number types. Julia has a function designed to convert strings to numbers, called parse():

parsed_int = parse(Int, "5")
5
typeof(parsed_int)
Int64

So what happens if we use that function on all of our inputs? Recall the map() function can be used to apply a function to all of the elements of a collection:

map(x -> parse(Float64, x), bad_data)
MethodError: MethodError(parse, (Float64, 1), 0x00000000000082df)
MethodError: no method matching parse(::Type{Float64}, ::Int64)

Closest candidates are:
  parse(::Type{T}, !Matched::AbstractString; kwargs...) where T<:Real
   @ Base parse.jl:384

Stacktrace:
 [1] (::var"#9#10")(x::Int64)
   @ Main.Notebook ~/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/DataWranglingInJulia/03-functions.qmd:502
 [2] iterate
   @ ./generator.jl:47 [inlined]
 [3] _collect(c::Vector{Any}, itr::Base.Generator{Vector{Any}, var"#9#10"}, #unused#::Base.EltypeUnknown, isz::Base.HasShape{1})
   @ Base ./array.jl:802
 [4] collect_similar(cont::Vector{Any}, itr::Base.Generator{Vector{Any}, var"#9#10"})
   @ Base ./array.jl:711
 [5] map(f::Function, A::Vector{Any})
   @ Base ./abstractarray.jl:3263
 [6] top-level scope
   @ ~/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/DataWranglingInJulia/03-functions.qmd:502

We get a MethodError, which usually means that something about the types of our inputs is incorrect. In this case, upon reaching the first element of the vector, we tried to call parse(Float64, 1). The second argument here is a number instead of a string, which is not what parse() is designed for.

Tip

Encountering error messages can be daunting, but once you learn to recognize the common kinds of errors, you will be able to find the solution much more quickly!

MethodErrors are pretty common, and typically mean that the types of some of your arguments aren’t what you thought, or that the function you’re calling doesn’t work on the inputs

We’ll need another approach. But first, some background.

3.1 Methods vs Functions

So far, we have talked about functions as if each function name refers to a single thing. In fact, the same function name can refer to a whole host of methods, which operate on different combinations of argument numbers or argument types.

The process that the language uses to decide on which method to call is called “dispatch,” and one of the special sauces of Julia is that is uses multiple dispatch, which means that the types of each argument, rather than only one of them, is used to decide on the method.

This is all somewhat abstract; let’s look at some concrete examples.

Note

Below, I’m using the compact “assignment syntax” to create functions. Recall that, for example,

function foo(x)
    return x + 2
end

is the same as

foo(x) = x + 2
typecaller(x) = "Hmm - I don't know that type: $(typeof(x))"
typecaller (generic function with 1 method)

Notice the output of this function definition: typecaller (generic function with 1 method).

typecaller(x::Float64) = "Oooh, a Float64, I like those!"
typecaller (generic function with 2 methods)

And now, the output is: typecaller (generic function with 2 methods).

We now have 2 methods for the same function, typecaller(). In defining the first method, we did not specify the type of x, so it will work on any type. Or, more technically, any type that is a subtype of Any, which as we learned in the types tutorial, is all types.

For the second method, we used x::Float64 - this is the syntax for specifying a type constraint on a method. Julia will always dispatch to the most specific method for its arguments, so this is the method that would be called any time a Float64 is given as an argument.

typecaller(2.2)
"Oooh, a Float64, I like those!"
typecaller("2.2")
"Hmm - I don't know that type: String"

But other types of floats (eg 32-bit floats) don’t have more specific methods, so they fall back to the Any method.

typecaller(Float32(2.2))
"Hmm - I don't know that type: Float32"

Often, we want to write methods as generically as possible. Recall that types exist in a hierarchy with Any at the top.

supertype(Float64)
AbstractFloat
supertype(Float32)
AbstractFloat

All individual objects have concrete types, but we can write generic methods that operate on abstract types like AbstractFloat.

I could do

typecaller(x::AbstractFloat) = "Hey look - some generic float type!"

3.2 Write methods, not functions

Let’s get back to our example from before - we want to write a function that will use parse() to convert strings to numbers, but won’t fail when given a number.

First, let’s write a generic “fallback” method, that throws an informative error if it encounters something unexpected. When writing data cleaning functions, this can be extremely helpful, since it can flag problems in large datasets that would otherwise be a pain to track down.

# note - we don't specify a type - this is the same as `x::Any`
numberify(x) = error("Can't numberify $x, which has type $(typeof(x))")
numberify (generic function with 1 method)

Then, we’ll add a method for when the argument is a number that will just return the argument unchanged.

There are a bunch of number types, which you can explore by calling supertypes()

using InteractiveUtils
supertypes(Float64)
(Float64, AbstractFloat, Real, Number, Any)
supertypes(Int64)
(Int64, Signed, Integer, Real, Number, Any)

Here, I’ll use Real, since this covers all integer and float types, but not imaginary numbers.

numberify(x::Real) = x
numberify (generic function with 2 methods)

Finally, we’ll define a method that works on AbstractStrings, calling parse(). To add a bit extra, we’ll check if the number has a ., in which case, we’ll parse it as a float, otherwise we’ll parse it as an integer.

Tip

If you ever need to do something like this in real code, just looking for a . is probably not sufficient, since floating point numbers can also be written as eg 6e10, or in some datasets, using , for decimals etc.

A more robust alternative might be to use regular expressions.

function numberify(x::AbstractString)
    if contains(x, ".")
        return parse(Float64, x)
    else
        return parse(Int, x)
    end
end
numberify (generic function with 3 methods)

Recall our bad_data from before:

bad_data
6-element Vector{Any}:
  1
  2.3
   "7"
   '5': ASCII/Unicode U+0035 (category Nd: Number, decimal digit)
 16
   "23.2"

Now, let’s try using numberify to fix it:

fixed_data = numberify.(bad_data)
ErrorException: ErrorException("Can't numberify 5, which has type Char")
Can't numberify 5, which has type Char
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] numberify(x::Char)
    @ Main.Notebook ~/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/DataWranglingInJulia/03-functions.qmd:642
  [3] _broadcast_getindex_evalf
    @ ./broadcast.jl:683 [inlined]
  [4] _broadcast_getindex
    @ ./broadcast.jl:656 [inlined]
  [5] getindex
    @ ./broadcast.jl:610 [inlined]
  [6] copyto_nonleaf!(dest::Vector{Real}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(numberify), Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, iter::Base.OneTo{Int64}, state::Int64, count::Int64)
    @ Base.Broadcast ./broadcast.jl:1068
  [7] restart_copyto_nonleaf!(newdest::Vector{Real}, dest::Vector{Int64}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(numberify), Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, val::Float64, I::Int64, iter::Base.OneTo{Int64}, state::Int64, count::Int64)
    @ Base.Broadcast ./broadcast.jl:1059
  [8] copyto_nonleaf!(dest::Vector{Int64}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(numberify), Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, iter::Base.OneTo{Int64}, state::Int64, count::Int64)
    @ Base.Broadcast ./broadcast.jl:1075
  [9] copy
    @ ./broadcast.jl:920 [inlined]
 [10] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(numberify), Tuple{Vector{Any}}})
    @ Base.Broadcast ./broadcast.jl:873
 [11] top-level scope
    @ ~/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/DataWranglingInJulia/03-functions.qmd:705

Ahh, yes. Our data contains '5', which is a character (Char), which is not a subtype of AbstractString. Our fall-back method caught it, and gave us some useful information.

If we hadn’t done that, we would have seen a MethodError, since contains(), which we used to check the contents of the string, isn’t defined for Char.

contains('5', ".")
MethodError: MethodError(contains, ('5', "."), 0x00000000000082e4)
MethodError: no method matching contains(::Char, ::String)

Closest candidates are:
  contains(!Matched::AbstractString, ::Any)
   @ Base strings/util.jl:110
  contains(::Any)
   @ Base strings/util.jl:167

Stacktrace:
 [1] top-level scope
   @ ~/_work/PumasTutorials.jl/PumasTutorials.jl/tutorials/DataWranglingInJulia/03-functions.qmd:721

There are a couple of approaches one could take, but one of the neat things about Julia’s approach to function methods, is that we can use one method to convert arguments, then call a different method of the same function!

For example, we can define a method that works for Char by converting the argument to a string, then calls numberify() on the result.

numberify(x::Char) = numberify(string(x))
numberify (generic function with 4 methods)

4 💭 Final Thoughts

Writing functions, even simple anonymous functions, can be an extremely powerful and flexible tool in data exploration and analysis. Many times, you will encounter problems that are unique to your data, or problems that may have solutions in some package, but you don’t want to take the time to search, or you don’t want to take on a whole package for a simple solution.

As noted above, in other languages, like R, there are good reasons for users to avoid writing their own functions - they are often slower, or can be very difficult to implement correctly. This is far less true in Julia.

Give it a shot!