R has no consistent table class
And neither do Python and Rust
The usual case
R, in addition to being array-based, can also be table-based: it has a table class in the base language, data.frame
. This is great, because a lot of data comes in table form.
Here are some simple examples:
twocols <- data.frame(
a = rep(1:3, 4),
b = rep(1:2, 6)
)
twocols
## a b
## 1 1 1
## 2 2 2
## 3 3 1
## 4 1 2
## 5 2 1
## 6 3 2
## 7 1 1
## 8 2 2
## 9 3 1
## 10 1 2
## 11 2 1
## 12 3 2
onecol <- data.frame(
a = rep(1, 5)
)
onecol
## a
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1
One thing we can do with these tables is to look for, or remove, duplicate rows:
duplicated(twocols)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
duplicated(onecol)
## [1] FALSE TRUE TRUE TRUE TRUE
unique(twocols)
## a b
## 1 1 1
## 2 2 2
## 3 3 1
## 4 1 2
## 5 2 1
## 6 3 2
unique(onecol)
## a
## 1 1
Simple enough, right? Right.
The edge case
Now let’s try this with a data frame with no columns. This is something that R allows, so this should work as expected.
nocol <- onecol[, FALSE, drop = FALSE]
nocol
## data frame with 0 columns and 5 rows
Now, what do we expect to happen when we look for duplicates? Well, every row is the same, so every row after the first is a duplicate, so unique
should leave a single row. The fact that the rows contain no data is irrelevant. What actually happens?
duplicated(nocol)
## logical(0)
duplicated
returns a zero-length vector, as if there were no rows. This results in R claiming there are no rows after removing duplicates:
unique(nocol)
## data frame with 0 columns and 0 rows
Uh oh. How about if we only have the one row to begin with?
nocol_onerow <- nocol[1, , drop = FALSE]
nocol_onerow
## data frame with 0 columns and 1 row
duplicated(nocol_onerow)
## logical(0)
unique(nocol_onerow)
## data frame with 0 columns and 0 rows
Oh, dear.
Why this matters
In practice, a table with no columns is not going to turn up much, so you could argue that this doesn’t matter. However, it should matter, if nothing else, for reasons of consistency: if we’re working programmatically, we have no idea what dimension of table we’re working with.
In fact, I’ve run into this problem multiple times when writing the autodb
package for decomposing a data table into a partially-normalised database.
A database is composed of several relations, which are tables with some additional information. One piece of additional information is the relation’s (candidate) keys, which are sets of the columns that, together, uniquely determine the rows. Each row has a unique set of values for the key’s columns; vice versa, knowing the values for the key’s columns determines which row we’re looking at.
When turning a table of real data into a database, you can get a relation with an empty key. This happens when a column has the same value in every row: its value is constant, and determinable with no information. Such a relation can only have 0 or 1 rows, since an empty key can’t distinguish between multiple rows.
There are a few reasons an empty key is a problem in R, given how we saw its data frames deal with this case, but let’s take the example where we’re checking that a given database is valid. One thing we need to check is that the columns in each key of a relation have unique values over its rows.
For example, suppose x
above has both of its columns as its sole key. Does the key have unique values over its rows? No, because there are duplicates:
twocols_key <- c("a", "b")
anyDuplicated((twocols[, twocols_key, drop = FALSE])) # returns 0 if unique
## [1] 7
However, removing the duplicates makes the key values unique:
anyDuplicated(unique(twocols)[, twocols_key, drop = FALSE])
## [1] 0
Now, let’s try validating a valid table with an empty key, which can only have 0 or 1 rows:
v <- data.frame(a = 1L, b = 2L, c = FALSE)
v
## a b c
## 1 1 2 FALSE
v_key <- character()
anyDuplicated(v[, v_key, drop = FALSE]) # the right answer...
## [1] 0
duplicated(v[, v_key, drop = FALSE]) # ... for the wrong reason
## logical(0)
How about if that table invalidly has multiple rows?
u <- data.frame(a = c(1L, 2L), b = c(2L, 3L), c = c(FALSE, TRUE))
u
## a b c
## 1 1 2 FALSE
## 2 2 3 TRUE
u_key <- character()
anyDuplicated(u[, u_key, drop = FALSE]) # the wrong answer...
## [1] 0
duplicated(u[, u_key, drop = FALSE]) # ... for the wrong reason
## logical(0)
This shows that we can run into this problem, even when dealing with realistic data. This is clearly a problem when writing a library that models databases! I end up having to write nasty code like this:
dups <- if (length(u_key) == 0) {
if (nrow(u) == 0)
logical() # length 0 boolean vector
else
c(FALSE, rep(TRUE, nrow(u) - 1))
}else
duplicated(u[, u_key, drop = FALSE])
dups
## [1] FALSE TRUE
u[dups, , drop = FALSE]
## a b c
## 2 2 3 TRUE
Tibbles are inconsistent
OK, R’s base data.frame
class is inconsistent, but people also like to use the tibble
and data.table
classes instead, from their eponymous libraries. Do they do any better?
Here’s tibble
:
library(tibble)
nocol_tib <- as_tibble(nocol) # should be 5x0
nocol_tib
## # A tibble: 5 × 0
nocol_onerow_tib <- as_tibble(nocol_onerow) # should be 1x0
nocol_onerow_tib
## # A tibble: 1 × 0
The row counts are preserved, as before.
duplicated(nocol_tib)
## logical(0)
try(unique(nocol_tib))
## Error in x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE] :
## Can't subset rows with `!duplicated(x, fromLast = fromLast, ...)`.
## ✖ Logical subscript `!duplicated(x, fromLast = fromLast, ...)` must be size 1 or 5, not 0.
duplicated(nocol_onerow_tib)
## logical(0)
try(unique(nocol_onerow_tib))
## Error in x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE] :
## Can't subset rows with `!duplicated(x, fromLast = fromLast, ...)`.
## ✖ Logical subscript `!duplicated(x, fromLast = fromLast, ...)` must be size 1 or 1, not 0.
Asking for unique rows, however, returns an error. That’s no good, although it’s probably better than the base data frames silently doing the wrong thing.
Data tables are inconsistent
How about data.table
?
library(data.table)
nocol_dt <- as.data.table(nocol) # should be 5x0
nocol_dt
## Null data.table (0 rows and 0 cols)
nocol_onerow_dt <- as.data.table(nocol_onerow) # should be 1x0
nocol_onerow_dt
## Null data.table (0 rows and 0 cols)
As much as I like data.table
over tibble
, this is even worse: the rows are all dropped on conversion. Creating the table directly as a data.table
, instead of converting from a data.frame
, makes no difference.
Arrow tables are inconsistent
Another table class is arrow
, which is an interface for Apache’s Arrow C++ library. How does arrow
do?
library(arrow)
##
## Attaching package: 'arrow'
## The following object is masked from 'package:utils':
##
## timestamp
nocol_arw <- as_arrow_table(nocol)
nocol_arw
## Table
## 0 rows x 0 columns
##
##
## See $metadata for additional Schema metadata
nocol_onerow_arw <- as_arrow_table(nocol_onerow)
nocol_onerow_arw
## Table
## 0 rows x 0 columns
##
##
## See $metadata for additional Schema metadata
Not well: all rows are dropped, as they were for data.table
.
try(duplicated(nocol_arw))
## Error in duplicated.default(nocol_arw) :
## duplicated() applies only to vectors
unique(nocol_arw)
## Table (query)
##
##
## See $.data for the source Arrow object
try(duplicated(nocol_onerow_arw))
## Error in duplicated.default(nocol_onerow_arw) :
## duplicated() applies only to vectors
unique(nocol_onerow_arw)
## Table (query)
##
##
## See $.data for the source Arrow object
Furthermore, duplicated
, can’t be used at all, because there’s no duplicated
method for Arrow tables, only one for unique
.
Files and file-driven table classes aren’t consistent either
We’ve looked at table classes within an R session. How do file formats do for read/write operations handling zero columns properly?
We look at four formats here1:
csv
, handled with basic R read/write functions, and theparquetize
andvroom
packages;feather
, handled with thefeather
andarrow
packages;fst
, handled with thefst
package;parquet
, handled with theparquetize
andarrow
packages.
Since parquet
should be able to read from several file formats, we check this one as we go.2
The basic issue is that writing a zero-column data frame to a CSV file results in something that can’t be parsed properly:
tf <- tempfile()
write.csv(nocol, tf, row.names = FALSE)
readLines(tf)
## [1] "\"\"" "" "" "" "" ""
try(read.csv(tf, row.names = FALSE))
## Error in read.table(file = file, header = header, sep = sep, quote = quote, :
## first five rows are empty: giving up
parquet
and vroom
don’t have much better luck writing and reading it:
tf_parquet <- tempfile() # for writing parquet files via parquetize
tf_parquet_arrow <- tempfile() # for writing parquet files via arrow
try(parquetize::csv_to_parquet(tf, path_to_parquet = tf_parquet))
## Reading data...
## Error : Could not guess the delimiter.
##
## Use `vroom(delim =)` to specify one explicitly.
vroom::vroom(tf, delim = ",")
## New names:
## Rows: 0 Columns: 1
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## # A tibble: 0 × 1
## # ℹ 1 variable: ...1 <chr>
I wasn’t expecting any option to turn a \(5 \times 0\) table into a \(0 \times 1\) table, but there it is.
Writing the row names doesn’t improve matters much:
tf_rn <- tempfile()
write.csv(nocol, tf_rn, row.names = TRUE)
readLines(tf_rn)
## [1] "\"\"" "\"1\"," "\"2\"," "\"3\"," "\"4\"," "\"5\","
try(read.csv(tf_rn, row.names = TRUE))
## Error in read.table(file = file, header = header, sep = sep, quote = quote, :
## more columns than column names
try(parquetize::csv_to_parquet(tf_rn, path_to_parquet = tf_parquet))
## Reading data...
## Error : Could not guess the delimiter.
##
## Use `vroom(delim =)` to specify one explicitly.
vroom_tf_rn <- vroom::vroom(tf_rn, delim = ",")
## New names:
## Rows: 5 Columns: 1
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," num
## (1): ...1
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
subset(vroom::problems(vroom_tf_rn), , -file)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## # A tibble: 5 × 4
## row col expected actual
## <int> <int> <chr> <chr>
## 1 2 2 1 columns 2 columns
## 2 3 2 1 columns 2 columns
## 3 4 2 1 columns 2 columns
## 4 5 2 1 columns 2 columns
## 5 6 2 1 columns 2 columns
vroom_tf_rn
## # A tibble: 5 × 1
## ...1
## <dbl>
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
vroom
now returns a \(5 \times 1\) table, where the row names are misread as the single column’s values.
How about feather
?
tf_feather <- tempfile()
tf_feather_arrow <- tempfile()
feather::write_feather(nocol, tf_feather)
feather::read_feather(tf_feather)
## # A tibble: 5 × 0
arrow::write_feather(nocol, tf_feather_arrow)
arrow::read_feather(tf_feather_arrow)
## # A tibble: 0 × 0
try(feather::read_feather(tf_feather_arrow))
## Error in eval(expr, envir, enclos) : Invalid: Not a feather file
arrow::read_feather(tf_feather)
## # A tibble: 5 × 0
We have slightly better luck here, depending on which package we use to handle Feather files.
We’re not so lucky with fst
:
tf_fst <- tempfile()
fst::write_fst(nocol, tf_fst)
fst::read_fst(tf_fst)
## data frame with 0 columns and 0 rows
arrow::write_parquet(nocol, tf_parquet_arrow)
arrow::read_parquet(tf_parquet_arrow)
## # A tibble: 0 × 0
Let’s summarise everything done above for the \(5 \times 0\) table nocol
:
format | rows | cols |
---|---|---|
vroom w/o rownames | 0 | 1 |
vroom w/ rownames | 5 | 1 |
feather (write feather, read feather) | 5 | 0 |
feather (write arrow, read arrow) | 0 | 0 |
feather (write feather, read arrow) | 5 | 0 |
fst | 0 | 0 |
parquet | 0 | 0 |
The only library that gives the correct dimensions here is feather
as the file writer. However, it’s only correct when writing the file using the original feather
package: that package hasn’t been updated since 2019, since the format was integrated into Apache Arrow and it was integrated into arrow
, so there are no maintained packages that get this right.
Pandas is no better
Come on, we can’t make it look like Python is preferable.
import pandas as pd
A 2x1 table works as expected:
py_onecol = pd.DataFrame(data = {'a': [1, 1]})
py_onecol
## a
## 0 1
## 1 1
py_onecol.duplicated()
## 0 False
## 1 True
## dtype: bool
py_onecol.drop_duplicates()
## a
## 0 1
But now let’s remove the only column:
py_nocol = py_onecol.iloc[[0, 1], []]
py_nocol
## Empty DataFrame
## Columns: []
## Index: [0, 1]
py_nocol.duplicated()
## Series([], dtype: bool)
py_nocol.drop_duplicates()
## Empty DataFrame
## Columns: []
## Index: [0, 1]
Like data.table
, this treats the table as empty.
duplicated
and drop_duplicates
take a subset of columns to check, so we could use this for uniqueness checks by taking the subset as our key. What if the table has a non-zero number of columns, but the key is empty?
py_onecol2 = pd.DataFrame(data = {'a': [1, 2]})
py_onecol2
## a
## 0 1
## 1 2
try: py_onecol2.duplicated(subset = [])
except Exception as e: print(e)
## not enough values to unpack (expected 2, got 0)
try: py_onecol2.drop_duplicates(subset = [])
except Exception as e: print(e)
## not enough values to unpack (expected 2, got 0)
Well, that’s no good either.
Rust’s polars is no better
As it turns out, we can call Rust’s polars library for data frames from R:
library(polars)
ps <- pl$DataFrame(a = 1:5)
ps
## shape: (5, 1)
## ┌─────┐
## │ a │
## │ --- │
## │ i32 │
## ╞═════╡
## │ 1 │
## │ 2 │
## │ 3 │
## │ 4 │
## │ 5 │
## └─────┘
rownames(ps)
## [1] "1" "2" "3" "4" "5"
ps$select()
## shape: (0, 0)
## ┌┐
## ╞╡
## └┘
rownames(ps$select())
## character(0)
This is the same behaviour as that of data.table
– not surprising, since polars is inspired by pandas – so even the Rustaceans don’t get this right.
Why it’s like this, and possible fixes
I can’t say for the other implementations, but let’s look at base R’s code for duplicated.data.frame
:
duplicated.data.frame
## function (x, incomparables = FALSE, fromLast = FALSE, ...)
## {
## if (!isFALSE(incomparables))
## .NotYetUsed("incomparables != FALSE")
## if (length(x) != 1L) {
## if (any(i <- vapply(x, is.factor, NA)))
## x[i] <- lapply(x[i], as.numeric)
## if (any(i <- (lengths(lapply(x, dim)) == 2L)))
## x[i] <- lapply(x[i], split.data.frame, seq_len(nrow(x)))
## duplicated(do.call(Map, `names<-`(c(list, x), NULL)),
## fromLast = fromLast)
## }
## else duplicated(x[[1L]], fromLast = fromLast, ...)
## }
## <bytecode: 0x00000220520238c0>
## <environment: namespace:base>
Here we see an approach for looking for duplicate columns that I’ve used directly before: use Map(list, x)
, after some tidying of x
, to return a list of rows, where each row is given as the list of its values. Effectively, we take the column-based data frame format, and turn it inside out to get a row-based format. We then check whether these rows are duplicated, using duplicated
’s default method, so we’re comparing list elements instead of several columns at once.
This is a reasonable approach if we have at least one column. What happens if we try this conversion with no columns?
Data frames are stored as a list, with each element giving a column’s values, and the elements having to be the same length. If there are no columns, this list is empty:
unclass(nocol)
## named list()
## attr(,"row.names")
## [1] 1 2 3 4 5
Therefore, Map(list, z)
returns an empty list, rather than a list of empty row lists:
Map(list, nocol)
## named list()
When we pass this into duplicated
, of course, we get a zero-length logical vector.
I don’t think there’s much that can be done about this, outside of changing how data frames are stored. It’s a strange situation where only the row names preserve the row count. If we make a copy where they’re removed, as done in data.table
, then this information is lost:
a <- nocol
attr(a, "row.names") <- NULL # skips `row.names<-` sanity checks
a
## data frame with 0 columns and 0 rows
unclass(a)
## named list()
In turn, this information is only kept because, when asked for a data frame’s row count, R uses the row names:
nrow
## function (x)
## dim(x)[1L]
## <bytecode: 0x0000022051a4ae50>
## <environment: namespace:base>
dim.data.frame
## function (x)
## c(.row_names_info(x, 2L), length(x))
## <bytecode: 0x0000022053099f50>
## <environment: namespace:base>
Effectively, the row names are used like a “header”, but for the rows instead of the columns. This is probably why, if you try to remove them with something like row.names(a) <- NULL
, R immediately adds integer row names as replacements: removing row names completely would break the information about the table’s size, in a way that removing the column names can’t. We can see this with tables that have columns, too:
b <- data.frame(a = 1:4, b = 2:3)
dim(b)
## [1] 4 2
attr(b, "row.names") <- NULL # attr lets us treat classes as mere suggestions
dim(b) # R now thinks there are 0 rows...
## [1] 0 2
unclass(b) # ... but the data's still there
## $a
## [1] 1 2 3 4
##
## $b
## [1] 2 3 2 3
length(b$a)
## [1] 4
This means that we could fix duplicated.data.frame
by having it make use of the row names. What would such an implementation of duplicated
for tables look like? Writing it in a way that’s agnostic to the number of columns is easy enough, but might be inefficient:
duplicated2 <- function(x, incomparables = FALSE, fromLast = FALSE, ...) {
UseMethod("duplicated2")
}
duplicated2.data.frame <- function(x, incomparables = FALSE, fromLast = FALSE, ...) {
if (!isFALSE(incomparables))
.NotYetUsed("incomparables != FALSE")
if (any(i <- vapply(x, is.factor, NA)))
x[i] <- lapply(x[i], as.numeric)
lst <- lapply(
seq_along(row.names(x)),
function(row) `rownames<-`(`names<-`(x[row, , drop = TRUE], NULL), NULL)
)
duplicated(lst, fromLast = fromLast, ...)
}
duplicated2(nocol)
## [1] FALSE TRUE TRUE TRUE TRUE
duplicated2(nocol_onerow)
## [1] FALSE
duplicated2(twocols)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
duplicated2(onecol)
## [1] FALSE TRUE TRUE TRUE TRUE
Maybe it’s just better to add a second explicit edge case, so we’re not relying on nrow
using the row names:
duplicated3 <- function(x, incomparables = FALSE, fromLast = FALSE, ...) {
UseMethod("duplicated3")
}
duplicated3.data.frame <- function(x, incomparables = FALSE, fromLast = FALSE, ...) {
if (!isFALSE(incomparables))
.NotYetUsed("incomparables != FALSE")
if (length(x) == 0L) {
nr <- nrow(x)
if (nr == 0L)
return(logical())
else
return(c(FALSE, rep_len(TRUE, nr - 1L)))
}
if (length(x) != 1L) {
if (any(i <- vapply(x, is.factor, NA)))
x[i] <- lapply(x[i], as.numeric)
duplicated(do.call(Map, `names<-`(c(list, x), NULL)),
fromLast = fromLast)
}
else duplicated(x[[1L]], fromLast = fromLast, ...)}
duplicated3(nocol)
## [1] FALSE TRUE TRUE TRUE TRUE
duplicated3(nocol_onerow)
## [1] FALSE
duplicated3(twocols)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
duplicated3(onecol)
## [1] FALSE TRUE TRUE TRUE TRUE
It looks like it would also be quicker, at least for small tables like these:
microbenchmark::microbenchmark(
duplicated2(nocol),
duplicated3(nocol),
times = 1000,
check = "identical"
)
## Unit: microseconds
## expr min lq mean median uq max neval cld
## duplicated2(nocol) 88.4 100.1 144.0731 109.55 185.45 4879.2 1000 b
## duplicated3(nocol) 5.0 5.8 8.4798 6.20 10.30 67.0 1000 a
microbenchmark::microbenchmark(
duplicated2(nocol_onerow),
duplicated3(nocol_onerow),
times = 1000,
check = "identical"
)
## Unit: microseconds
## expr min lq mean median uq max neval cld
## duplicated2(nocol_onerow) 28.3 30.95 49.5871 40.85 58.65 407.6 1000 b
## duplicated3(nocol_onerow) 4.9 5.30 8.3830 6.20 10.00 122.2 1000 a
microbenchmark::microbenchmark(
duplicated(twocols),
duplicated2(twocols),
duplicated3(twocols),
times = 1000,
check = "identical"
)
## Unit: microseconds
## expr min lq mean median uq max neval cld
## duplicated(twocols) 25.8 29.9 42.3624 32.7 43.10 271.5 1000 a
## duplicated2(twocols) 184.2 206.9 299.8286 228.2 320.25 5932.2 1000 b
## duplicated3(twocols) 21.5 24.6 35.2825 27.2 36.60 349.5 1000 a
microbenchmark::microbenchmark(
duplicated(onecol),
duplicated2(onecol),
duplicated3(onecol),
times = 1000,
check = "identical"
)
## Unit: microseconds
## expr min lq mean median uq max neval cld
## duplicated(onecol) 8.5 10.0 12.7721 10.60 14.65 112.9 1000 a
## duplicated2(onecol) 64.3 74.6 93.9061 78.15 97.75 340.2 1000 b
## duplicated3(onecol) 9.0 10.7 13.2581 11.20 13.30 54.3 1000 a
For autodb
classes, I’ll probably be writing something like duplicated3
for internal use, so I don’t have this edge case all over the code any more.
Environment used
R session information:
sessionInfo()
## R version 4.3.2 (2023-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8
## [2] LC_CTYPE=English_United Kingdom.utf8
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.utf8
##
## time zone: Europe/London
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] polars_0.15.1.9000 fstcore_0.9.12 arrow_14.0.0.2 data.table_1.14.2
## [5] tibble_3.2.1
##
## loaded via a namespace (and not attached):
## [1] xfun_0.35 bslib_0.3.1 lattice_0.21-9
## [4] tzdb_0.4.0 vctrs_0.6.4 tools_4.3.2
## [7] generics_0.1.2 sandwich_3.0-2 curl_4.3.2
## [10] parallel_4.3.2 fansi_1.0.2 RSQLite_2.3.4
## [13] highr_0.9 blob_1.2.2 pkgconfig_2.0.3
## [16] Matrix_1.6-1.1 assertthat_0.2.1 lifecycle_1.0.3
## [19] compiler_4.3.2 stringr_1.4.0 microbenchmark_1.4.9
## [22] codetools_0.2-19 fst_0.9.8 htmltools_0.5.2
## [25] sass_0.4.0 yaml_2.2.1 pillar_1.9.0
## [28] crayon_1.4.2 jquerylib_0.1.4 MASS_7.3-60
## [31] ellipsis_0.3.2 cachem_1.0.6 feather_0.3.5
## [34] multcomp_1.4-19 tidyselect_1.2.0 digest_0.6.33
## [37] mvtnorm_1.1-3 stringi_1.7.6 dplyr_1.1.3
## [40] purrr_1.0.2 bookdown_0.24 splines_4.3.2
## [43] forcats_0.5.1 rprojroot_2.0.2 fastmap_1.1.0
## [46] grid_4.3.2 here_1.0.1 cli_3.6.1
## [49] magrittr_2.0.3 survival_3.5-7 utf8_1.2.2
## [52] TH.data_1.1-1 readr_2.1.1 withr_2.5.0
## [55] bit64_4.0.5 parquetize_0.5.6.1 rmarkdown_2.18
## [58] bit_4.0.4 reticulate_1.35.0 blogdown_1.16
## [61] zoo_1.8-9 png_0.1-7 hms_1.1.1
## [64] memoise_2.0.1 evaluate_0.23 knitr_1.41
## [67] haven_2.4.3 rlang_1.1.1 Rcpp_1.0.8
## [70] glue_1.6.2 DBI_1.1.2 rstudioapi_0.13
## [73] vroom_1.6.5 jsonlite_1.8.7 R6_2.5.1
Python version (a little old, but installing/updating things in Python is so awful I don’t want to do it again):
import sys
print(sys.version)
## 3.12.2 (tags/v3.12.2:6abddd9, Feb 6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)]