The jsonlite
package provides two mechanisms for
converting an arbitrary object to JSON, toJSON
and
serializeJSON
. For S4 classes neither of these messages are
whole satisfactory. There is probably no existing method for
toJSON
, and serializeJSON
produces a format
which is readable by another R program, but is difficulty to parse for
any other program (e.g., a Mongo (r) database).
To illustrate the problem, I will start with an example class: an
event in a log file. (This is a simplification of the
P4Message
class in the Proc4
package[For more
information about the Proc4
package, see https://pluto.coe.fsu.edu/Proc4, https://github.com/ralmond/Proc/, and https://ralmond.r-universe.dev/Proc4.].)
setClass("Event",
slots=list(uid="character",
mess="character",
timestamp="POSIXt",
processed="logical",
data="list"
),
contains="MongoRec")
anEvent <- new("Event",uid="Tester",mess="Typed",
timestamp=as.POSIXct("2022-05-30 16:38:58 EDT"),
processed=FALSE,
data=list(input="Hello, World!"))
The event class can be loaded by running
mongo::load_example()
. The complete source file can be
found using the command
system.file("examples","Event.R",package="mongo")
.
The event class is divided into a number of header fields, and the data, which could be anything. As such, it is perfect for storing in the Mongo database. Mongo can build indexes in the header fields, making it straightforward to build a query that asked for all of the events where Fred typed something that has not yet been processed sorted by timestamp. The data field can be anything that can be serialized in JSON.
Note that this class is also a subclass of MongoRec
which is a small class with a single slot _id
(accessed
with the function m_id()
. This is the Mongo identifier of
the object. The goal of object oriented programming is that we don’t
need to worry about the details in our package, as long at the authors
of the MongoRec
class have done their job properly.
toJSON
to serialize an S4 classNeither toJSON
nor serializeJSON
work out
of the box. The toJSON
function simply throws an error, as
the new class is not yet registered. The function
serializeJSON
generates proper JSON output, but not in a
format for which it will be easy to write queries about.
try(jsonlite::toJSON(anEvent))
#> Error : No method for S4 class:Event
jsonlite::serializeJSON(anEvent)
#> {"type":"S4","attributes":{"uid":{"type":"character","attributes":{},"value":["Tester"]},"mess":{"type":"character","attributes":{},"value":["Typed"]},"timestamp":{"type":"double","attributes":{"class":{"type":"character","attributes":{},"value":["POSIXct","POSIXt"]},"tzone":{"type":"character","attributes":{},"value":[""]}},"value":[1653928738]},"processed":{"type":"logical","attributes":{},"value":[false]},"data":{"type":"list","attributes":{"names":{"type":"character","attributes":{},"value":["input"]}},"value":[{"type":"character","attributes":{},"value":["Hello, World!"]}]},"_id":{"type":"character","attributes":{},"value":[]}},"value":{"class":"Event","package":".GlobalEnv"}}
A simple trick produces something a lot closer to usable JSON output.
First, apply the attributes()
function to turn the object
into a list, and then toJSON()
to turn the list into
JSON.
jsonlite::toJSON(attributes(anEvent),pretty=TRUE)
#> {
#> "uid": ["Tester"],
#> "mess": ["Typed"],
#> "timestamp": ["2022-05-30 16:38:58"],
#> "processed": [false],
#> "data": {
#> "input": ["Hello, World!"]
#> },
#> "_id": [],
#> "class": ["Event"]
#> }
This is almost there. The toJSON
function has turned all
of the scalars into vectors. That is because, in R, there is no
difference. However, it is going be harder to build the queries if the
header fields are all vectors. The function
jsonlite::unbox()
takes care of this. The function
unboxer()
takes this one step further, adding the unbox
flag to any scaler it finds, no matter how deeply nested.
jsonlite::prettify(
jsonlite::toJSON(lapply(attributes(anEvent),unboxer)),
indent=2)
#> {
#> "uid": "Tester",
#> "mess": "Typed",
#> "timestamp": "2022-05-30 16:38:58",
#> "processed": false,
#> "data": {
#> "input": "Hello, World!"
#> },
#> "_id": [
#>
#> ],
#> "class": "Event"
#> }
#>
This is much closer, but still a bit crude. In particular, it is not
particularly generalizable. The as.json
function and
as.jlist
helper function provide a more generalizable
method for handling S4 classes.
as.json
and as.jlist
functions.The as.json()
function implements the following
algorithm for converting an S4 class to JSON is as follows:
attributes()
; this
list will be called a jlist in the sequel.toJSON()
on the resulting list.Steps 1 and 3 are common to many different S4 objects, it is step 2
that needs customization for each one. So as.json()
calls
as.jlist()
to perform this step. The default method for
as.json()
is shown below. (The function
as.json()
is generic, so if necessary a new method can be
introduced at this level as well.) Most of the options are simply passed
to toJSON
.
## as.json,ANY-method
function(x, serialize = TRUE, ...) {
jlist <- as.jlist(x, attributes(x), serialize)
mongolite::toJSON(jlist,...)
}
The first argument of as.jlist()
is the object itself.
This is used mostly for method dispatch, but sometimes it will be easier
to work with the original object than the jlist. The second argument,
ml
, is the jlist (produced by
attributes(obj)
). The third object is a serialize flag that
is meant to indicate whether compound objects should be serialized or
not.
setMethod("as.jlist",c("Event","list"),
function(obj, ml, serialize=TRUE) {
ml$uid <- unboxer(ml$uid)
ml$mess <- unboxer(ml$mess)
ml$timestamp <- unboxer(ml$timestamp)
ml$processed <- unboxer(ml$processed)
ml$data <- unparseData(ml$data,
serialize)
callNextMethod(obj, ml, serialize)
})
Several things to note about this generic function.
First, it modifies the ml
argument, which is what is
eventually returned. In fact, the ANY
method for
as.jlist
just returns the ml
argument, so that
is the base case. Second, it finishes with a call to
callNextMethod
. The idea is that the MongoRec
method for as.jlist
knows how to take care of the
_id
field: callNextMethod
ensures that code is
called. Finally, the unparseData()
function is called to
deal with the arbitrary data field. This function is described in the
following section.
The next example shows as.json()
in action.
jsonlite::prettify(as.json(anEvent), indent=2)
#> {
#> "uid": "Tester",
#> "mess": "Typed",
#> "timestamp": {
#> "$date": 1653928738000
#> },
#> "processed": false,
#> "data": "{\"type\":\"list\",\"attributes\":{\"names\":{\"type\":\"character\",\"attributes\":{},\"value\":[\"input\"]}},\"value\":[{\"type\":\"character\",\"attributes\":{},\"value\":[\"Hello, World!\"]}]}",
#> "class": "Event"
#> }
#>
unparseData
functionThere are three ways to deal with the large complex object
data
whose exact composition is unknown. The safest way is
to call serializeJSON
and turn it into a representation
which can be read exactly back into R. This, however, is not inspectable
and will not be easy for other programs to parse. The second way is to
attempt to apply the unboxer
function recursively to try to
force scalar elements into scalars in the JSON. These two approaches are
implemented using unparseData(data, serialize=TRUE)
and
unparseData(data, serialize=FALSE)
respectively. Note that
the serialize
argument is passed from as.json
to as.jlist
, so this can be decided at a later point in
time.
jsonlite::prettify(as.json(anEvent, serialize=FALSE),
indent=2)
#> {
#> "uid": "Tester",
#> "mess": "Typed",
#> "timestamp": {
#> "$date": 1653928738000
#> },
#> "processed": false,
#> "data": {
#> "input": "Hello, World!"
#> },
#> "class": "Event"
#> }
#>
The third method is to write custom code for unparsing (turning into
something toJSON
will properly serialize) and parsing the
custom object. This method is superior to the others, but much more
costly in terms of programmer time.
So the serialized form of the object has all of the fields needed to
rebuild the object. The class
field gives the name of the
class, and the other field the appropriate slot values. The function
fromJSON
returns the serialized data as a list (a
jlist), so essentially the object can be recreated by
new(class,jlist)
.
The function parse.json
function takes this approach. It
converts the JSON object to a list (using fromJSON
) and
then passes the list to a builder function for making the object. The
default builder function buildObject
function. This calls
the parse.jlist
method appropriate for the class to clean
up the list of fields, then calls new
to create the
object.
## buildObject
function (rec, class=rec$class) {
jlp <- selectMethod("parse.jlist",c(class,"list"))
if (!is.null(jlp))
rec <- do.call(jlp,list(class,rec))
rec$class <- NULL # Make sure it is not marked as an extra argument.
do.call("new",c(class,rec))
}
The actual buildObject
implementation is slightly more
complicated to try an handle some common S3 cases.
This makes heavy use of the do.call
function, which
allows the program to compute the call to the function. In the last line
of the function, the argument list for new
is built by
prepending the class to the list of slot values rec
, and
then new
is called on the result.
S4 (and R6) objects work with new
function. S3 object
constructors do not have a uniform syntax. The default
buildObject
implementation handles the common
implementation of S3 classes as a list of fields with a
class
attribute. However, in many cases, a custom builder
is needed. This can be passed as an argument to as.json
(as
well as getOneRec
and getManyRecs
).
Just as the as.json
delegate much of its work to the
as.jlist
, buildObject
delegates its work to
the parse.jlist
function. Again, this is an object orient
function, so that each class can process its own unique slots and then
callNextMethod
to deal with the inherited slots. It bottoms
out at an ANY
method which removes the class
element so it doesn’t in the way when the object is built.)
Note that there is object of the required class available to dispatch
the method for the parse.jlist
generic function. So
buildObject
once again uses the do.call
trick,
this time with the selectMethod
function used to find the
method from the class name.
Here is the parse.jlist
method for the
Event
object.
setMethod("parse.jlist",c("Event","list"),
function(class, rec) {
rec$uid <- as.character(ununboxer(rec$uid))
rec$mess <- as.character(ununboxer(rec$mess))
rec$timestamp <- as.POSIXct(ununboxer(rec$timestamp))
rec$processed <- as.logical(ununboxer(rec$processed))
rec$data <- parseData(rec$data)
callNextMethod(class, rec)
})
For the most part, the steps are just ensuring that all of the slots
of the class have the proper value types. The parseData
function is the inverse of unparseData
and handles both
serialized and non-serialized data.
The call to ununboxer
seems a bit odd. The reason is
mostly to facilitate testing. This function removes the mark that
jsonlite
adds to indicate that a value should be a scalar
and not a vector. That mark will cause all.equal
to fail,
this makes it harder to text. In particular, after adding the
ununboxer
calls, the as.jlist
and
parse.jlist
are inverses, and this can be used to build
unit tests.
atlist <- attributes(anEvent)
jlist <- as.jlist(anEvent,atlist)
plist <- parse.jlist(anEvent,jlist)
## Need to ensure order is the same.
res <- try(testthat::expect_equal(plist[names(atlist)],atlist))
#> Error : plist[names(atlist)] (`actual`) not equal to `atlist` (`expected`).
#>
#> `names(actual$_id)` is a character vector ('oid')
#> `names(expected$_id)` is absent
#>
#> `actual$_id`: NA
#> `expected$_id`:
#>
#> `actual$class` is an S3 object of class <scalar/character>, a character vector
#> `expected$class` is a character vector ('Event')
res
#> [1] "Error : plist[names(atlist)] (`actual`) not equal to `atlist` (`expected`).\n\n`names(actual$_id)` is a character vector ('oid')\n`names(expected$_id)` is absent\n\n`actual$_id`: NA\n`expected$_id`: \n\n`actual$class` is an S3 object of class <scalar/character>, a character vector\n`expected$class` is a character vector ('Event')\n"
#> attr(,"class")
#> [1] "try-error"
#> attr(,"condition")
#> <expectation_failure/expectation/error/condition>
#> plist[names(atlist)] (`actual`) not equal to `atlist` (`expected`).
#>
#> `names(actual$_id)` is a character vector ('oid')
#> `names(expected$_id)` is absent
#>
#> `actual$_id`: NA
#> `expected$_id`:
#>
#> `actual$class` is an S3 object of class <scalar/character>, a character vector
#> `expected$class` is a character vector ('Event')
#> Backtrace:
#> ▆
#> 1. ├─base::try(testthat::expect_equal(plist[names(atlist)], atlist))
#> 2. │ └─base::tryCatch(...)
#> 3. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 4. │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 5. │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 6. └─testthat::expect_equal(plist[names(atlist)], atlist)
Oops. This failed. The problem is related to the _id
field inherited from MongoRec
. This can be seen below:
The problem is two different conventions for a missing
_id
attribute. The Event
class needs an
initializion method that takes care if this. For the purposes of
illustration, this is done manually below.
m_id(anEvent) <- NA_character_
atlist <- attributes(anEvent)
jlist <- as.jlist(anEvent,atlist)
plist <- parse.jlist(anEvent,jlist)
## Need to ensure order is the same.
res <- try(testthat::expect_equal(plist[names(atlist)],atlist))
#> Error : plist[names(atlist)] (`actual`) not equal to `atlist` (`expected`).
#>
#> `actual$class` is an S3 object of class <scalar/character>, a character vector
#> `expected$class` is a character vector ('Event')
res
#> [1] "Error : plist[names(atlist)] (`actual`) not equal to `atlist` (`expected`).\n\n`actual$class` is an S3 object of class <scalar/character>, a character vector\n`expected$class` is a character vector ('Event')\n"
#> attr(,"class")
#> [1] "try-error"
#> attr(,"condition")
#> <expectation_failure/expectation/error/condition>
#> plist[names(atlist)] (`actual`) not equal to `atlist` (`expected`).
#>
#> `actual$class` is an S3 object of class <scalar/character>, a character vector
#> `expected$class` is a character vector ('Event')
#> Backtrace:
#> ▆
#> 1. ├─base::try(testthat::expect_equal(plist[names(atlist)], atlist))
#> 2. │ └─base::tryCatch(...)
#> 3. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 4. │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 5. │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 6. └─testthat::expect_equal(plist[names(atlist)], atlist)
A more complete test would be to create an Event object, write it out
to JSON format and read it back, building the object. This test requires
an equality test for the Event object, that is a method for
all.equal
; as this is an S3 generic, the method will be
all.equal.Event
.
all.equal.Event <- function (target, current,
...,
checkTimestamp=FALSE,check_ids=TRUE) {
if (!is(current,"Event"))
return(paste("Target is 'Event' and current is '",
class(current),"'."))
msg <- character()
if (check_ids)
if ((is.na(target@"_id") && !is.na(current@"_id")) ||
(!is.na(target@"_id") &&
!isTRUE(all.equal(target@"_id", current@"_id"))))
msg <- c(msg,"Database IDs do not match.")
if (!isTRUE(all.equal(target@uid,current@uid)))
msg <- c(msg,"User IDs do not match.")
if (!isTRUE(all.equal(target@mess,current@mess)))
msg <- c(msg,"Messages do not match.")
if (!isTRUE(all.equal(target@processed,current@processed)))
msg <- c(msg, "Processed flags do not match.")
## Check Data
namet <- names(target@data)
namec <- names(current@data)
if (length(target@data) != length(current@data) ||
!setequal(namet,namec)) {
msg <- c(msg,"Names or number of data differ.")
if (length(setdiff(namet,namec)) > 0L)
msg <- c(msg,paste("Data in target but not in current:",
setdiff(namet,namec)))
if (length(setdiff(namec,namet)) > 0L)
msg <- c(msg,paste("Data in current but not in target:",
setdiff(namec,namet)))
}
msgd <- all.equal(target@data,current@data,...)
if (!isTRUE(msgd)) msg <- c(msg,msgd)
## Timestamp
if (checkTimestamp) {
if (abs(target@timestamp-current@timestamp) >
as.difftime(.1,units="secs"))
msg <- c(msg,"Timestamps differ by more than .1 secs")
}
## Return true if message list is empty.
if (length(msg)==0L) TRUE
else msg
}
A few notes about this function. First, all.equal
returns true if the target
and current
entries
are the same, and otherwise returns a character vector giving the
differences. So, the method builds up a list of differences. Second,
remember that NA == NA
returns NA
, not true.
Using all.equal
will return TRUE
if both
arguments are true, but not false if they are different. So
isTrue(all.equal(...))
test for equality with
NA==NA
being marked as true.
So lets test this equality method.
anEvent1 <- anEvent
all.equal(anEvent,anEvent1)
#> [1] TRUE
anEvent1@mess <- "Shouted"
all.equal(anEvent,anEvent1)
#> [1] "Messages do not match."
This looks good, so now for the final check.
A big benefit of being able to store S4 objects as JSON documents and
restore them is that they now can be saved into a collection in a Mongo
database. The functions saveRec
, getOneRec
,
and getManyRecs
facilitate that.
First a reference to a Mongo collection can be made using the
mongo::MongoRec
class or the mongolite::mongo
class. MongoRec
is a wrapper for
mongolite::mongo
which facilitates including a reference to
the collection in an S4 or R6 class.
The function saveRec
saves a record to a collection.
load_example() ## Loads some sample events.
anEvent <- saveRec(eventCol,anEvent)
sampleEvents <- lapply(sampleEvents, function(e) saveRec(eventCol,e))
Note that in each case the Event object is saved back into the same
variable. The saveRec
function returns the input object,
but now has the Mongo ID (m_id
field) set. The
saveRec
function uses this field to determine whether it
will add the record as a new document in the collection
(is.na(m_id(obj))
) or replace the object with the Mongo ID
m_id(obj)
.
The following example illustrates the idea:
mdbDrop(eventCol) ## This clears the collection.
## Insertion
newEvent <- Event(uid="Student1",mess="Startled",data=list(browser="chromium"))
m_id(newEvent)
mdbCount(eventCol,buildJQuery(uid="Student1")) ## Should be 0
newEvent <- saveRec(eventCol,newEvent) ## Inserts
m_id(newEvent) ## Set during the save.
mdbCount(eventCol,buildJQuery(uid="Student1")) ## Should now be 1.
savedEvent <- getOneRec(eventCol,buildJQuery(uid="Student1"))
all.equal(savedEvent, newEvent)
m_id(newEvent) ## non-NA value indicates that it has a representation in the database.
newEvent$mess <- "Started" ## Fix misspelling
mdbCount(eventCol,buildJQuery(uid="Student1")) ## Should be 1
newEvent <- saveRec(eventCol,newEvent) ## Replaces existing event.
mdbCount(eventCol,buildJQuery(uid="Student1")) ## Should still be 1
savedEvent <- getOneRec(eventCol,buildJQuery(uid="Student1"))
savedEvent$mess ## Internal object was modified.
Note that getOneRec
(also getManyRecs
) and
mdbCount
take a query argument. This is a JSON expression
restricting the values of one or more fields. The simplest version just
selects records which match on the listed field. The
buildJQuery
function converts from R syntax to JSON
syntax.
There are two differences between getOneRec
and
getManyRecs
. The first function returns a single object,
and the second function returns a list of objects. If there are multiple
objects which match the query, then getOneRec
will return
them all.
mdbDrop(eventCol) ## reset the collection contents
load_example() ## Loads some sample events.
sampleEvents <- lapply(sampleEvents, function(e) saveRec(eventCol,e))
mdbCount(eventCol,buildJQuery(uid="Fred"))
getOneRec(eventCol,buildJQuery(uid="Fred"))
length(getManyRecs(eventCol,buildJQuery(uid="Fred")))
mdbCount(eventCol,buildJQuery(uid="Phred")) ## Phred event is unique
getOneRec(eventCol,buildJQuery(uid="Phred"))
getManyRecs(eventCol,buildJQuery(uid="Phred")) ## Returns a list.
```
The serializeJSON
and unserializeJSON
methods provide the safest way of storing an R object as a JSON string;
however, the representation is not friendly for other applications. In
particular, if the object is stored in an Mongo database, and indexes
are built on some of the fields, a class-specific approach is needed to
map the slots of the S4 class to Mongo data types.
The as.jlist
and parse.jlist
generic
functions provide a regular mechanism for describing the transformation.
The slots of the object are elements of the jlist
, and the
as.jlist
function is responsible for doing pre-processing
on the objects before toJSON
is called, and
parse.jlist
is responsible for post-processing after
fromJSON
is called.
The generic function as.json
can then be called to do
the conversion, and the function buildObject
can be called
to reconstruct an object from the output of fromJSON
. Note
that the basic mechanism may not work for all classes. Some classes may
need a special method for as.json
. Although
buildObject
is not generic, if needed, the developer can
always write a buildObject
function to do the work.
Writing serialization and unserialization methods, like writing print and equality testing methods, is part of the work required when building a new class. Although this framework doesn’t do all the work, hopefully it makes the process a little bit more straightforward, providing a mechanism which can exploit the inheritance structure of the objects.