Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
How could I go about performing a
str()
function in R on all of these files loaded in the workspace at the same time? I simply want to export this information out, but in a batch-like process, to a .csv file. I have over 100 of them, and want to compare one workspace with another to help locate incongruities in data structure and avoid mismatches.
I came painfully close to a solution via
UCLA's R Code Fragment
, however, they failed to include the instructions for how to form the read.dta function which loops through the files. That is the part I need help on.
What I have so far:
#Define the file path
f <- file.path("C:/User/Datastore/RData")
#List the files in the path
fn <- list.files(f)
#loop through file list, return str() of each .RData file
#write to .csv file with 4 columns (file name, length, size, value)
Here is an example of what I am after (the view from RStudio--it simply lists the Name, Type, Length, Size, and Value of all of the RData Files). I want to basically replicate this view, but export it out to a .csv. I am adding the tag to RStudio in case someone might know a way of exporting this table out automatically? I couldn't find a way to do it.
Thanks in advance.
–
–
I've actually written a function for this already. I also asked a question about it, and dealing with promise objects with the function. That post might be of some use to you.
The issue with the last column is that str
is not meant to do anything but print a compact description of objects and therefore I couldn't use it (but that's been changed with recent edits). This updated function gives a description for the values similar to that of the RStudio table. The data frames and lists are tricky because their str
output is more than one line. This should be good.
objInfo <- function(env = globalenv())
obj <- mget(ls(env), env)
out <- lapply(obj, function(x) {
vals1 <- c(
Type = toString(class(x)),
Length = length(x),
Size = object.size(x)
val2 <- gsub("|^\\s+|'data.frame':\t", "", capture.output(str(x))[1])
if(grepl("environment", val2)) val2 <- "Environment"
c(vals1, Value = val2)
out <- do.call(rbind, out)
rownames(out) <- seq_len(nrow(out))
noquote(cbind(Name = names(obj), out))
And then we can test it out on a few objects..
x <- 1:10
y <- letters[1:5]
e <- globalenv()
df <- data.frame(x = 1, y = "a")
m <- matrix(1:6)
l <- as.list(1:5)
objInfo()
# Name Type Length Size Value
# 1 df data.frame 2 1208 1 obs. of 2 variables
# 2 e environment 11 56 Environment
# 3 l list 5 328 List of 5
# 4 m matrix 6 232 int [1:6, 1] 1 2 3 4 5 6
# 5 objInfo function 1 24408 function (env = globalenv())
# 6 x integer 10 88 int [1:10] 1 2 3 4 5 6 7 8 9 10
# 7 y character 5 328 chr [1:5] a b c d e
Which is pretty close I guess. Here's the screen shot of the environment in RStudio.
–
–
–
I would write a function, something like below. And then loop through that function, so you basically write the code for a single dataset
library(foreign)
giveSingleDataset <- function( oneFile ) {
#Read .dta file
df <- read.dta( oneFile )
#Give e.g. structure
s <- ls.str(df)
#Return what you want
return(s)
#Actually call the function
result <- lapply( fn, giveSingleDataset )
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.