07. R studio/R 工具指南(六:后台运行R 命令)
前言
经常会遇到这样的问题。
- 在安装一个R 包,没法运行命令;
- 遇到耗时较长的代码,眼睁睁干等着它~
其实比较粗暴的方法便是,重新打开一个Rproj——“不如让我们一切重来~”
但这毕竟过于麻烦。
一个简单的思路是,我们可以不可以像linux 中的
&
一样,将命令提交到后台呢?
R studio 中设置
参考:https://www.jianshu.com/p/797778c7703e
image.png
我们可以将一些如安装包等费时的命令丢入后台,不占用我们执行其他的代码。
写好了脚本,选定需要执行的脚本,直接选择start:
通常来说,脚本中的代码并不会读取环境中的变量:
a <- 3*x
Error in eval(statements[[idx]], envir = sourceEnv) :
object 'x' not found
Calls: sourceWithProgress -> eval -> eval
Execution halted
因此需要选择选项Run job with copy of global environment。
如果我们还想获得脚本的返回结果,可以使用选项 To results object in global environment:
"Copy job results" 里有三个选项:
Don't copy: 不复制到当前全局变量
To global environment: 变量直接复制到当前全局变量,
To results object in global environment: 变量会存放在environment 对象中
这样脚本中赋值的变量即便和环境中已有的变量名发生了冲突,也不会被覆盖,变量会存放在environment 对象中:
> test_results$x
[1] 3
R 包 job
参见: https://mp.weixin.qq.com/s/67rjY7w-Uh0AfnaxNoik8Q
先前我们介绍过在后台运行R 脚本,对于耗时较长的代码运行,或者复杂的包的安装,我们可以使用该方法,从而不占用前台:
直接安装一下:
remotes::install_github("lindeloev/job")
ps: 这里发现在win 下安装会发生报错:
> remotes::install_github("lindeloev/job")
错误: Failed to install 'unknown package' from GitHub:
畸形'Config/testthat/edit ...'开头行!
现在我们有更方便的方法了,只需要在代码使用job 包中的函数,就可以实现后台操作了:
job::job(
{ tmp <- matrix(sample(letters, 1000, replace = T), ncol = 10) }
使用方式为:
job::job({<your code>})
其实只是从手动操作,变成了代码:
如果我们想要将后台运行的结果和前台运行的结果分离,不相互污染,还可以将变量保存在一个新的环境中:
job::job(brm_result = {
fit = brm(model, data)
fit = add_criterion(fit, "loo")
print(summary(fit)) # Show a summary in the job
the_test = hypothesis(fit, "hp > 0")
此时我们可以通过
brm_result$xx
的方式,调用创建的环境内部的变量,可以做到全局环境与子环境的变量互不干扰,避免变量名冲突造成的不必要的问题。
比如有多个任务:
此外还有一些有用的信息:
Finer control
RStudio jobs spin up a new session, i.e., a new environment. By default, job::job() will make this environment identical to your current one. But you can fine control this:
import: the default "auto" setting imports all objects that are referenced by the code into the job. Control this using job::job({}, import = c(model, data)). You can also import everything (import = "all") or nothing (import = NULL).
packages: by default, all attached packages are attached in the job. Control this using job::job({}, packages = c("brms")) or set packages = NULL to load nothing. If brms is not loaded in your current session, adding library("brms") to the job code may be more readable.
options: by default, all options are overwritten/inserted to the job. Control this using, e.g., job::job({}, opts = list(mc.cores = 2) or set opts = NULL to use default options. If you want to set job-specific options, adding options(mc.cores = 2) to the job code may be more readable.
export: in the example above, we assigned the job environment to brm_result upon completion. Naturally, you can choose any name, e.g., job::job(fancy_name = {a = 5}). To return nothing, use an unnamed code chunk (insert results to globalenv() and remove everything before return: (job::job({a = 5; rm(list=ls())}). Returning nothing is useful when
your main result is a text output or a file on the disk, or
when the return is a very large object. The underlying rstudioapi::jobRunScript() is slow in the back-transfer so it's usually faster to saveRDS(obj, filename) them in the job and readRDS(filename) into your current session.
Some use cases
Model training, cross validation, or hyperparameter tuning: train multiple models simultaneously, each in their own job. If one fails, the others continue.
Heavy I/O tasks, like processing large files. Save the results to disk and return nothing.
Run unit tests and other code in an empty environment. By default, devtools::test() runs in the current environment, including manually defined variables (e.g., from the last test-run) and attached packages. Call job::job({devtools::test()}, import = NULL, packages = NULL, opts = NULL) to run the test in complete isolation.
Upgrading packages
See also