using CSV
Reading and Writing Data
The first step of data analysis is to read data. Julia has a rich ecosystem of packages devoted to reading data. In this tutorial, we will cover 3 main file types:
- CSV (Comma Separated Values) files
- Excel files
- SAS data (
.sas7bdat
and.xpt
) files - Apache Arrow format (
.arrow
) files
These likely constitute the bulk of the data types that you will encounter doing data wrangling in Julia.
If you find yourself with an exotic file type that you need to read data from, donโt forget to check the JuliaData
organization at GitHub. You might find just the right package for your needs.
Starting at the next tutorials, weโll use the PharmaDatasets.jl
package to load our datasets.
1 ๐ CSV files with CSV.jl
CSV (Comma Separated Values) files are the most common data type you will probably encounter while reading data. CSV files have the .csv
extension in the filename.
This file type has the characteristic that it does not use a proprietary protocol and the data is hard-coded in a simple text file.
It uses lines to denote rows (or observations) and frequently the first line is used as a header for the column/variable names.
A specific character is used to separate values. This is often a comma (,
) and thatโs why they are called comma separated values.
Observe that different characters can be used to separate values.
In some European countries and also in most Latin American countries the decimal number system uses commas to denote decimals instead of the dot used in anglo-saxon number system which in turn is adopted by countries like the United States of America, United Kingdom and Canada. Thus, the first 2 digits of \(\pi\), for example, would be portrayed as \(3,14\) instead of \(3.14\).
So, in some countries, the CSV file uses the semicolon (;
) as the character to separate values. Despite being โsemicolon separated valuesโ, the file extension remains .csv
.
Furthermore, the โtabโ character (\t
) is also used, but the file type changes to TSV (Tab Separated Values) and has the .tsv
extension.
In order to read a CSV file into Julia youโll need to first import CSV.jl
into your environment. This is done with the using
statement:
1.1 Reading a Single CSV File
CSV.jl
has two main functions to parse and read text-delimited data:
CSV.read()
: passes the input to a valid sink type that will be the recipient of the input data, such as aDataFrame
, without making extra copies.CSV.File()
1: materializes the input to a valid sink type. This means that the input will be copied before being passed to a valid sink.
Our advice is to almost exclusively use CSV.read()
. Youโll probably will never need to read a CSV file with CSV.File()
.
Also, the most frequent sink youโll use will be a DataFrame
(which we will cover in the next tutorials). So, to read a CSV file into a DataFrame
, youโll need to pass to CSV.read()
a file path followed by a sink type.
Since we need DataFrames.jl
to use DataFrame
s as sink, letโs import it:
using DataFrames
= CSV.read("data/iv_sd_demogs.csv", DataFrame)
df first(df, 5)
Row | ID | AGE | WEIGHT | SCR | ISMALE | eGFR |
---|---|---|---|---|---|---|
Int64 | Float64 | Float64 | Float64 | Int64 | Float64 | |
1 | 1 | 34.823 | 38.212 | 1.1129 | 0 | 42.635 |
2 | 2 | 32.765 | 74.838 | 0.8846 | 1 | 126.0 |
3 | 3 | 35.974 | 37.303 | 1.1004 | 1 | 48.981 |
4 | 4 | 38.206 | 32.969 | 1.1972 | 1 | 38.934 |
5 | 5 | 33.559 | 47.139 | 1.5924 | 0 | 37.198 |
Note that we are using a string to indicate the relative file path of the desired CSV file.
A much better approach would be to create a file path that is robust to different filesystems and also operation systems. This means that if a user is using Windows and shares their Julia script with another user that is using MacOS, then the script wouldnโt fail. This is because MacOS, for instance, follows a pattern that uses forward slashes, /
, for the file paths (example: /Users/JuliaUser/project/src/script.jl
). And Windows instead uses backslashes, \
, for the file paths (example: C:\user\JuliaUser\project\src\script.jl
).
This can be accomplished with filesystem best practices. Specifically, three functions/commands that are robust to different operation systems are: @__FILE__
, dirname()
, and joinpath()
.
@__FILE__
: returns the file path of the current Julia script.dirpath()
: returns the directory of a file path.joinpath()
: join components to create a full file or directory path.
When we combine those three functions/commands we can create a robust file path that will not break our Julia script if we share or run in different operation systems.
Suppose you have a script inside a src/
folder and you need to access a CSV file in the data/
folder which is located in the same directory as the src/
folder. You can do this with the following Julia command:
= joinpath(dirname(@__FILE__), "..", "data", "file.csv") my_file
The code above extracts the directory of the current Julia script with dirname(@__FILE__)
and then joins this with ".."
, which represents the upper directory, "data"
, the data/
directory, and "file.csv"
, the desired CSV file inside the data/
directory.
For more information, see this section of the Julia official documentation.
1.2 Custom CSV File Specifications
Most of the time a plain CSV.read()
without passing any customized options will work and parse and read correctly the desired file. This is due to the guessing that CSV.jl
does under the hood to try to infer what are the specifications that the file is using.
But CSV.jl
might sometimes fail to correctly guess the specifications of the underlying file. This is where youโll have to pass the specifications yourself. Fortunately, CSV.read()
2 has the following arguments:
delim
: either a character, e.g.';'
, or string, e.g."\t"
, that will be used to indicate how the values are delimited in a file. If empty,CSV.jl
will try to guess and detect from the first 10 rows of the file.decimal
: a character that will be used to parse floating point numbers. If','
, then3,14
will be parsed as a float. The default is a dot,'.'
.
See this example where we are reading a CSV file that uses the semicolon as the delimiter and commas as decimals. This is a common specification when reading EU/LatAm CSV files.
# the first two lines of the CSV file
readlines("data/iv_sd_demogs_eu.csv")[1:2]
2-element Vector{String}:
"ID;AGE;WEIGHT;SCR;ISMALE;eGFR"
"1;34,823;38,212;1,1129;0;42,635"
# using the keyword arguments delim and decimal
= CSV.read("data/iv_sd_demogs_eu.csv", DataFrame; delim = ';', decimal = ',')
df_eu first(df_eu, 5)
Row | ID | AGE | WEIGHT | SCR | ISMALE | eGFR |
---|---|---|---|---|---|---|
Int64 | Float64 | Float64 | Float64 | Int64 | Float64 | |
1 | 1 | 34.823 | 38.212 | 1.1129 | 0 | 42.635 |
2 | 2 | 32.765 | 74.838 | 0.8846 | 1 | 126.0 |
3 | 3 | 35.974 | 37.303 | 1.1004 | 1 | 48.981 |
4 | 4 | 38.206 | 32.969 | 1.1972 | 1 | 38.934 |
5 | 5 | 33.559 | 47.139 | 1.5924 | 0 | 37.198 |
Sometimes our CSV files have some comments or information at the initial lines. These do not constitute the data but might be some hard-coded metadata or other important information.
To handle these situations, CSV.File()
and CSV.read()
have the keyword argument skipto
which accepts an integer specifying the row number where the data is located.
Similarly, if you have this kind of information or metadata located at the bottom of the CSV file you can use the keyword argument footerskip
which also takes an integer and specifies the number of rows to ignore at the end of the file.
1.3 Specifying Custom Types for Columns while Reading a CSV File
Sometimes youโll want to overrule the automatic type detection that CSV.jl
will infer for the columns present in the CSV file. This can be done with the keyword argument types
. It accepts several inputs, but the easiest and most customizable is a Julia dictionary, where the keys are either an integer (for the column indices) or a string/symbol (for column names) and the values are the desired types.
For example, suppose we want to have the columns :ID
and :ISMALE
to be parsed as String
and Bool
respectively:
= CSV.read(
df_custom_types "data/iv_sd_demogs.csv",
DataFrame;= Dict(:ID => String, :ISMALE => Bool),
types
)first(df_custom_types, 5)
Row | ID | AGE | WEIGHT | SCR | ISMALE | eGFR |
---|---|---|---|---|---|---|
String | Float64 | Float64 | Float64 | Bool | Float64 | |
1 | 1 | 34.823 | 38.212 | 1.1129 | false | 42.635 |
2 | 2 | 32.765 | 74.838 | 0.8846 | true | 126.0 |
3 | 3 | 35.974 | 37.303 | 1.1004 | true | 48.981 |
4 | 4 | 38.206 | 32.969 | 1.1972 | true | 38.934 |
5 | 5 | 33.559 | 47.139 | 1.5924 | false | 37.198 |
1.4 Selecting and Dropping Columns while Reading a CSV File
By default, CSV.read()
will parse and read all the columns in a CSV file. If you want to select or drop specific columns you can do so with the following arguments. Both accept a vector of either strings or integers that represents the names and indices, respectively, of the columns to select or drop.
select
drop
See the following examples. We are reading the same CSV file as before but now we are selecting and dropping some columns either by name or index.
# reading the same CSV file but now selecting some columns
# with a vector of strings as names
= CSV.read("data/iv_sd_demogs.csv", DataFrame; select = ["ID", "AGE"])
df_select_names first(df_select_names, 5)
Row | ID | AGE |
---|---|---|
Int64 | Float64 | |
1 | 1 | 34.823 |
2 | 2 | 32.765 |
3 | 3 | 35.974 |
4 | 4 | 38.206 |
5 | 5 | 33.559 |
# reading the same CSV file but now selecting some columns
# with a vector of integers as indices
= CSV.read("data/iv_sd_demogs.csv", DataFrame; select = [1, 2])
df_select_idxs first(df_select_idxs, 5)
Row | ID | AGE |
---|---|---|
Int64 | Float64 | |
1 | 1 | 34.823 |
2 | 2 | 32.765 |
3 | 3 | 35.974 |
4 | 4 | 38.206 |
5 | 5 | 33.559 |
Notice that we can be clever with the indices. If we want to select a range of indices we can materialize it into a vector using the collect()
:
# reading the same CSV file but now selecting some columns
# with a vector of integers as indices
= CSV.read("data/iv_sd_demogs.csv", DataFrame; select = collect(1:3))
df_select_idxs2 first(df_select_idxs2, 5)
Row | ID | AGE | WEIGHT |
---|---|---|---|
Int64 | Float64 | Float64 | |
1 | 1 | 34.823 | 38.212 |
2 | 2 | 32.765 | 74.838 |
3 | 3 | 35.974 | 37.303 |
4 | 4 | 38.206 | 32.969 |
5 | 5 | 33.559 | 47.139 |
Now letโs explore some examples with the drop
keyword argument:
# reading the same CSV file but now dropping some columns
# with a vector of strings as names
= CSV.read("data/iv_sd_demogs.csv", DataFrame; drop = ["SCR", "eGFR"])
df_drop_names first(df_drop_names, 5)
Row | ID | AGE | WEIGHT | ISMALE |
---|---|---|---|---|
Int64 | Float64 | Float64 | Int64 | |
1 | 1 | 34.823 | 38.212 | 0 |
2 | 2 | 32.765 | 74.838 | 1 |
3 | 3 | 35.974 | 37.303 | 1 |
4 | 4 | 38.206 | 32.969 | 1 |
5 | 5 | 33.559 | 47.139 | 0 |
# reading the same CSV file but now dropping some columns
# with a vector of integers as indices
= CSV.read("data/iv_sd_demogs.csv", DataFrame; drop = [4, 6])
df_drop_idx first(df_drop_idx, 5)
Row | ID | AGE | WEIGHT | ISMALE |
---|---|---|---|---|
Int64 | Float64 | Float64 | Int64 | |
1 | 1 | 34.823 | 38.212 | 0 |
2 | 2 | 32.765 | 74.838 | 1 |
3 | 3 | 35.974 | 37.303 | 1 |
4 | 4 | 38.206 | 32.969 | 1 |
5 | 5 | 33.559 | 47.139 | 0 |
You can also pass a vector of symbols to select
and drop
arguments. So feel free to choose whatever works best for you.
1.5 Specifying Missing Values in a CSV File
By default, every blank value in a CSV file will be parsed as a missing
value.
Sometimes missing values will be hardcoded as some other value instead of a blank value. For example, it could be a dot "."
or a string "NA"
. In fact, any value can be specified to represent missing
.
We cover how to handle missing/NA values extensively in our Handling NAs and Missing Values tutorial.
Take a look at this dataset. It has several occurrences of "."
which confuses Julia into parsing these columns as String
types:
= CSV.read("data/iv_sd_final_target_format.csv", DataFrame)
missing_df first(missing_df, 5)
Row | ID | TIME_AFTER_DOSING | CONC | AMT | AGE | WEIGHT | SCR | ISMALE | eGFR | EVID | CMT |
---|---|---|---|---|---|---|---|---|---|---|---|
Int64 | Float64 | String15 | String3 | Float64 | Float64 | Float64 | Int64 | Float64 | Int64 | String1 | |
1 | 1 | 0.0 | . | 100 | 34.823 | 38.212 | 1.1129 | 0 | 42.635 | 1 | 1 |
2 | 1 | 0.25 | 13.026 | . | 34.823 | 38.212 | 1.1129 | 0 | 42.635 | 0 | . |
3 | 1 | 0.5 | 14.984 | . | 34.823 | 38.212 | 1.1129 | 0 | 42.635 | 0 | . |
4 | 1 | 0.75 | 14.16 | . | 34.823 | 38.212 | 1.1129 | 0 | 42.635 | 0 | . |
5 | 1 | 1.0 | 19.316 | . | 34.823 | 38.212 | 1.1129 | 0 | 42.635 | 0 | . |
Fortunately, it is very easy to specify which values CSV.jl
should treat as missing
values. You just have to use the keyword argument missingstring
:
=
missing_df_corrected read("data/iv_sd_final_target_format.csv", DataFrame; missingstring = ".")
CSV.first(missing_df_corrected, 5)
Row | ID | TIME_AFTER_DOSING | CONC | AMT | AGE | WEIGHT | SCR | ISMALE | eGFR | EVID | CMT |
---|---|---|---|---|---|---|---|---|---|---|---|
Int64 | Float64 | Float64? | Int64? | Float64 | Float64 | Float64 | Int64 | Float64 | Int64 | Int64? | |
1 | 1 | 0.0 | missing | 100 | 34.823 | 38.212 | 1.1129 | 0 | 42.635 | 1 | 1 |
2 | 1 | 0.25 | 13.026 | missing | 34.823 | 38.212 | 1.1129 | 0 | 42.635 | 0 | missing |
3 | 1 | 0.5 | 14.984 | missing | 34.823 | 38.212 | 1.1129 | 0 | 42.635 | 0 | missing |
4 | 1 | 0.75 | 14.16 | missing | 34.823 | 38.212 | 1.1129 | 0 | 42.635 | 0 | missing |
5 | 1 | 1.0 | 19.316 | missing | 34.823 | 38.212 | 1.1129 | 0 | 42.635 | 0 | missing |
If you have more than one value representing missing values, the keyword argument missingstring
can also accept a vector of strings
1.6 Reading Multiple CSV Files
You might encounter data that is dispersed across several CSV files. The CSV file organizing logic could be something temporal, e.g. results_jan_2021.csv
, or spatial, e.g. samples_boston.csv
, or any other logic, e.g. results_john.csv
.
With CSV.jl
, you can simply pass a vector of strings to CSV.read()
or CSV.File()
and it will then concatenate all the CSV file paths into a single unit and materialize the associated files into the desired sink, such as a DataFrame
.
If you have any information that is encoded in the CSV file name, it will be lost if you do not use the source
argument. When passing a vector of strings to CSV.read()
, you can pass the argument source
with a string or symbol. CSV.jl
will then create a column with the name as the string/symbol youโve passed that will capture the file name of the CSV file and add it as a value for all observations in the final data to be passed to your sink:
= CSV.read(files, DataFrame; source = "DAY")
df_multiple_csv first(df_multiple_csv, 5)
where files
is a vector of CSV file paths as strings.
All the arguments that weโve used before, delim
, decimal
, select
, and drop
, work fine with the CSV.read()
using a vector of strings representing multiple CSV file paths as input. So feel free to use them if necessary.
In the example below, we do just that with the ["data/multiple_csv/12-Dec-21.csv", "data/multiple_csv/13-Dec-21.csv"]
as the vector of strings.
= filter(x -> endswith(x, ".csv"), readdir("data/multiple_csv/"; join = true)) files
2-element Vector{String}:
"data/multiple_csv/12-Dec-21.csv"
"data/multiple_csv/13-Dec-21.csv"
= CSV.read(files, DataFrame; source = "DAY")
df_multiple_csv first(df_multiple_csv, 5)
Row | ID | CONC | DAY |
---|---|---|---|
Int64 | Float64? | String | |
1 | 1 | missing | data/multiple_csv/12-Dec-21.csv |
2 | 1 | 13.026 | data/multiple_csv/12-Dec-21.csv |
3 | 1 | 14.984 | data/multiple_csv/12-Dec-21.csv |
4 | 1 | 14.16 | data/multiple_csv/12-Dec-21.csv |
5 | 1 | 19.316 | data/multiple_csv/12-Dec-21.csv |
1.7 Writing CSV Files
In order to write CSV files, youโll use the CSV.write()
function which can be used in two ways:
By passing a file path as a string as the first argument and a table (such as a
DataFrame
) as the second argument. For example:write("data/my_file.csv", my_df) CSV.
where
my_df
is aDataFrame
.By โpipingโ the table (such as a
DataFrame
) into theCSV.write()
function and also specifying a file path as a string as the first argument:|> CSV.write("data/my_file.csv") my_df
where
my_df
is aDataFrame
.
All the keywords arguments that weโve used so far can be used to write a specific CSV file format. For example:
|> CSV.write("data/my_file.csv"; delim = ';', decimal = ',') my_df
will create an EU/LatAm CSV file.
2 ๐ Excel Files with XLSX.jl
Excel files are also ubiquitous as a data file format. Commonly used file extensions are .xls
and .xlsx
.
Prefer .xlsx
over .xls
files. First, .xlsx
files are compressed by default, thus they have smaller file sizes when compared to .xls
. Second, and most important, .xls
files only support a maximum of 65,536 (\(2^{16}\)) rows and 256 (\(2^8\)) columns. Whereas .xlsx
files support a maximum of 1,048,576 (\(2^{20}\)) rows and 16,384 (\(2^{14}\)) columns. Finally, .xls
files whenever the maximum capacity is exceed, fail silently.
In order to read an Excel file into Julia youโll need to first import XLSX.jl
into your environment. This is done with the using
statement:
using XLSX
2.1 Reading Excel Files
XLSX.jl
has two main functions to read and write Excel files:
XLSX.readxlsx(file.xlsx)
: reads the wholefile.xlsx
and returns a custom type namedXLSXFile
which contains a summary of all the sheets presented in the file along with their respective cell ranges.XLSXFile
works pretty much like a Julia dictionary. You can index it to a specific sheet and it will return another custom type namedWorksheet
. But donโt worry! Just callXLSX.eachtablerow()
on a worksheet and pipe it to aDataFrame
and youโll have aDataFrame
for your desired indexed sheet. See an example below:
= XLSX.readxlsx("data/iv_sd_demogs.xlsx") xf
XLSXFile("data/iv_sd_demogs.xlsx") containing 1 Worksheet
sheetname size range
-------------------------------------------------
iv_sd_demogs 101x6 A1:F101
= DataFrame(XLSX.eachtablerow(xf["iv_sd_demogs"]))
df first(df, 5)
Row | ID | AGE | WEIGHT | SCR | ISMALE | eGFR |
---|---|---|---|---|---|---|
Any | Any | Any | Any | Any | Any | |
1 | 1 | 34.823 | 38.212 | 1.1129 | 0 | 42.635 |
2 | 2 | 32.765 | 74.838 | 0.8846 | 1 | 126.0 |
3 | 3 | 35.974 | 37.303 | 1.1004 | 1 | 48.981 |
4 | 4 | 38.206 | 32.969 | 1.1972 | 1 | 38.934 |
5 | 5 | 33.559 | 47.139 | 1.5924 | 0 | 37.198 |
XLSX.readtable(file.xlsx, sheet)
: reads an specific sheet of thefile.xlsx
. For the second argument,sheet
you can pass either a string with the sheet name or an integer representing the sheet index. See the following example:
= DataFrame(XLSX.readtable("data/iv_sd_demogs.xlsx", 1))
df first(df, 5)
Row | ID | AGE | WEIGHT | SCR | ISMALE | eGFR |
---|---|---|---|---|---|---|
Any | Any | Any | Any | Any | Any | |
1 | 1 | 34.823 | 38.212 | 1.1129 | 0 | 42.635 |
2 | 2 | 32.765 | 74.838 | 0.8846 | 1 | 126.0 |
3 | 3 | 35.974 | 37.303 | 1.1004 | 1 | 48.981 |
4 | 4 | 38.206 | 32.969 | 1.1972 | 1 | 38.934 |
5 | 5 | 33.559 | 47.139 | 1.5924 | 0 | 37.198 |
2.2 Inferring Excel Column Types with infer_eltypes
You might have noticed that all the columns for the dataframes we created from Excel files have the type Any
(just hover your mouse over the table and it will display). This is the default behavior of both XLSX.eachtablerow()
and XLSX.readtable()
. You can change this by passing the keyword argument infer_eltypes=true
to XLSX.readtable()
:
= DataFrame(XLSX.readtable("data/iv_sd_demogs.xlsx", 1; infer_eltypes = true))
df first(df, 5)
Row | ID | AGE | WEIGHT | SCR | ISMALE | eGFR |
---|---|---|---|---|---|---|
Int64 | Float64 | Float64 | Float64 | Int64 | Float64 | |
1 | 1 | 34.823 | 38.212 | 1.1129 | 0 | 42.635 |
2 | 2 | 32.765 | 74.838 | 0.8846 | 1 | 126.0 |
3 | 3 | 35.974 | 37.303 | 1.1004 | 1 | 48.981 |
4 | 4 | 38.206 | 32.969 | 1.1972 | 1 | 38.934 |
5 | 5 | 33.559 | 47.139 | 1.5924 | 0 | 37.198 |
XLSX.jl
does not have keyword arguments or options to specify missing
values. By default, any Excel cell that is blank will be considered missing
.
2.3 Writing Excel Files
To write a DataFrame
as an Excel file, you can use the XLSX.writetable()
function. Be aware that this function accepts a tuple of data and column names. This can be accomplished with collect(eachcol(df), names(df))
where df
is your dataframe. For example:
writetable("filename.xlsx", collect(eachcol(df), names(df))) XLSX.
Now if you want to write data into separate worksheets, you can do so by passing a NamedTuple
with the sheet name. For example:
writetable(
XLSX."filename.xlsx",
= (collect(eachcol(df1), names(df1))),
sheet_1 = (collect(eachcol(df2), names(df2))),
sheet_2 )
names
, like the same function in dplyr
will return a vector of strings with the column names of the underlying dataframe.
3 ๐ SAS Data Files with ReadStatTables.jl
We can read SAS files with the package ReadStatTables.jl
. We will cover two of the most common SAS data file extensions:
.sas7bdat
files.xpt
files
ReadStatTables.jl
can read, as the name suggests, files from different statistical software such as Stata, SPSS, SAS and more.
If you need to import data different from the ones above, donโt hesitate to check ReadStatTables.jl
documentation.
First, letโs load ReadStatTables.jl
:
using ReadStatTables
ReadStatsTables.jl
has the readstat()
function which takes a file path and returns a ReadStatTable
object:
= readstat("data/iv_bolus_sd.sas7bdat") tb
300ร6 ReadStatTable:
Row โ id time conc amt route dose
โ Float64 Float64 Float64? Float64? InlineStrings.String3 Float64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 1.0 0.0 missing 250.0 iv 250.0
2 โ 1.0 0.0 8.38202 missing iv 250.0
3 โ 1.0 0.5 7.6079 missing iv 250.0
4 โ 1.0 1.0 8.73121 missing iv 250.0
5 โ 1.0 2.0 5.84823 missing iv 250.0
6 โ 1.0 6.0 2.25315 missing iv 250.0
7 โ 1.0 10.0 1.17591 missing iv 250.0
8 โ 1.0 16.0 0.346308 missing iv 250.0
โฎ โ โฎ โฎ โฎ โฎ โฎ โฎ
294 โ 30.0 1.0 22.1795 missing iv 750.0
295 โ 30.0 2.0 18.0047 missing iv 750.0
296 โ 30.0 6.0 8.33707 missing iv 750.0
297 โ 30.0 10.0 3.50049 missing iv 750.0
298 โ 30.0 16.0 1.05457 missing iv 750.0
299 โ 30.0 20.0 0.472131 missing iv 750.0
300 โ 30.0 24.0 0.242979 missing iv 750.0
285 rows omitted
which can be easily converted to a DataFrame
by calling the DataFrame()
constructor on it:
= DataFrame(tb)
df_sas7bdat first(df_sas7bdat, 5)
Row | id | time | conc | amt | route | dose |
---|---|---|---|---|---|---|
Float64 | Float64 | Float64? | Float64? | String3 | Float64 | |
1 | 1.0 | 0.0 | missing | 250.0 | iv | 250.0 |
2 | 1.0 | 0.0 | 8.38202 | missing | iv | 250.0 |
3 | 1.0 | 0.5 | 7.6079 | missing | iv | 250.0 |
4 | 1.0 | 1.0 | 8.73121 | missing | iv | 250.0 |
5 | 1.0 | 2.0 | 5.84823 | missing | iv | 250.0 |
Ok, that was for .sas7bdat
files. We can do the same for .xpt
files. But now we are going to do it in a single line of code by combining the last two operations:
= DataFrame(readstat("data/iv_bolus_sd.xpt"))
df_xpt first(df_xpt, 5)
Row | id | time | conc | amt | route | dose |
---|---|---|---|---|---|---|
Float64 | Float64 | Float64? | Float64? | String3 | Float64 | |
1 | 1.0 | 0.0 | missing | 250.0 | iv | 250.0 |
2 | 1.0 | 0.0 | 8.38202 | missing | iv | 250.0 |
3 | 1.0 | 0.5 | 7.6079 | missing | iv | 250.0 |
4 | 1.0 | 1.0 | 8.73121 | missing | iv | 250.0 |
5 | 1.0 | 2.0 | 5.84823 | missing | iv | 250.0 |
Note that the package is called ReadStatTables.jl
, so it performs only reading of files, and not writing. If you need to write SAS (or any other statistical software proprietary file format) for some reason, youโll probably need to export as either CSV or Excel and then convert it with an external tool to the desired file format.