Restricted evaluation of user input in R

Posted at 28 Oct 2024
Tags: r, security

I recently worked on a software project that required that R code submitted by users should be executed on a server. This, of course, is a security nightmare, as users can basically run any code on the server with the permissions of the R process that’s interpreting the submitted code. For example, a user may submit code using the system() function to manipulate and delete files on the server, view system configuration details, etc.

There are several ways to mitigate this problem and you would usually implement not only one of these solutions, but combine them. The first is to make sure that the R process that interprets the submitted code runs with minimum privileges and has access to a minimum set of files on the system. It’s best to use the RAppArmor package which enables dynamic sandboxing using a temporary fork of the R process via the Linux AppArmor project. Furthermore, you should run the R process inside a virtual environment such as a Docker container to isolate it from the host operating system.

Finally, and this is the focus of this blog post, you can restrict the functions and operators that are available to the user. This may be especially beneficial in a learning environment for programming, where users should solve some task only with a certain subset of functions (e.g. those that they were introduced to in the current lesson).

To run user submitted code in R, you first parse the submitted code string using parse() and then pass the resulting expression object to eval(). You can see this in action here with a potentially evil system()-call:

> expr <- parse(text = "system('echo foo')")
> eval(expr)
foo

The eval() method has two interesting arguments relating to our problem:

We can use that to build a restricted code evaluation environment in which only functions and operators that we explicitly allow to be used, can actually be used (whitelisting approach).

First, let’s try out running code in the most minimal environment one can imagine: the empty environment. We set both the envir and the enclos environments to the empty environment which is – as the name suggests – empty, meaning that no objects are defined in it. We really need to pass the enclos parameter, too, since by default the eval() function uses the base environment as enclosing environment. This is the environment of the base package which is – of course – not empty, but where all base functions, operators, etc. are defined.

> expr <- parse(text = "system('echo foo')")
> eval(expr, envir = emptyenv(), enclos = emptyenv())
Error in system("echo foo") : could not find function "system"

As expected, the submitted code fails to be evaluated because there’s no system() function defined in the empty environment. Actually, there is not a single function and not even any operator defined that a user could use in their code:

> expr <- parse(text = "1+1")
> eval(expr, envir = emptyenv(), enclos = emptyenv())
Error in 1 + 1 : could not find function "+"

The only thing we can actually “run” is code that only consists of literals, such as a number or a string:

expr <- parse(text = "32")
eval(expr, envir = emptyenv(), enclos = emptyenv())
[1] 32

So that’s very secure, fine.

At the same time it’s of course very useless, but it’s a good starting point from where we can now pass a list of operators and functions that we want to allow:

> e <- rlang::new_environment()
> assign("+", `+`, envir=e)
> expr <- parse(text = "1+1")
> eval(expr, envir = e, enclos = emptyenv())
[1] 2

I’m using rlang’s new_environment() function here, since it allows to modify the created empty environment e. I then pass the +-operator (which actually is a function in R) to this environment via assign("+",+, envir=e). Finally, we evaluate the expression “1+1” using this environment. Using any other operator (or function) than “+” would not succeed, since we only assigned this operator to the execution environment e.

We can also list all available objects in the environment using ls.str() and we can see that only “+” is available:

> ls.str(e)
+ : function (e1, e2)

All that is left now is a nicer way to define the objects that should be available in our execution environment. To do this, we can again use some tools from the rlang package and define all the objects in a character vector. Note that even parentheses are “operators” that you explicitly need to assign to the environment to be able to use them. We can then assign all the listed operators, functions and constants via lapply():

> e <- rlang::new_environment()
> ops <- c("+", "-", "*", "/", "(", "^", "sqrt",
           "exp", "expm1", "log", "logb", "log10", "log2", "log1p",
           "cos", "sin", "tan", "acos", "asin", "atan", "atan2", "pi",
           "choose", "factorial")
> lapply(ops, function(op) {
      assign(op, rlang::eval_bare(rlang::sym(op)), envir = e)
> })

To check, we make a senseless computation that use most of the whitelisted objects:

> expr <- parse(text = "(1+2-3*5)/-2^2 * 2.5 + sqrt(4)")
> eval(expr, envir = e, enclos = emptyenv())
[1] 9.5

Still, using functions we didn’t explicitly whitelist won’t work – just as we wanted it:

> expr <- parse(text = "system('echo foo')")
> eval(expr, envir = e, enclos = emptyenv())
Error in system("echo foo") : could not find function "system"

> expr <- parse(text = "mean(1:3)")
> eval(expr, envir = e, enclos = emptyenv())
Error in mean(1:3) : could not find function "mean"

You should additionally wrap this code inside a tryCatch-function to be able to catch user errors such as malformed syntax or usage of undefined symbols and report these errors accordingly.

All in all, the provided solution is a good additional security measure when you need to run untrusted R code and know in advance which operators and functions should be used, as is for example often the case in an educational scenario.

If you spotted a mistake or want to comment on this post, please contact me: post -at- mkonrad -dot- net.
 
View all posts
Figures for statistics education made with base R graphics” →