Tidyverse summary

8/23/2023

score is the outcome variable of interest: average professor evaluation score out of 5 as given by the students in this course.This distinction is important since many professors taught more than one course. ID uniquely identifies the course whereas prof_ID identifies the professor who taught this course.In the following table, we present a subset of 9 of the 14 variables included for a random sample of 5 courses: This data is included in the evals data frame from the moderndive package. The data consists of end of semester student evaluations for a sample of 463 courses taught by 94 professors from the University of Texas at Austin. While this is surely a basic application of its functionality, one can easily see how powerful this function can be.We first discuss the model and data background. Partial is yet another tool from the purrr package that can greatly enhance your R coding abilities.

Select(cyl, contains("mpg"), contains("hp")) # A tibble: 3 x 7

This method also allows for quantiles to be calculated for more than one variable, although post-processing would be necessary in that case. In addition, it is, in my opinion, more straightforward than a lot of the do methods. I think that this provides a pretty neat way to get the desired output in a format that does not require a large amount of post calculation manipulation. Summarize_at(vars(mpg), funs(!!!p_funs)) # A tibble: 3 x 4 The only difference is that we will now have to use the “bang-bang-bang” operator ( !!!) from rlang (it is also exported from dplyr). The beauty of this is that you can use this list in the same way you would define multiple functions in any other summarize_at or summarize_all functions (i.e. Looking at p_funs we can see that we have a named list with each element containing a function comprised of the quantile function. In this example, we will calculate the 20 th, 50 th, and 80 th percentiles. Let’s start by creating a vector of the desired percentiles to calculate. This method uses purrr::map and a Function Operator, purrr::partial, to create a list of functions that can than be applied to a data set using dplyr::summarize_at and a little magic from rlang. I love this approach for most things (and it is even the accepted for one of the SO questions mentioned above) but I worked up a new solution that I think is useful for calculating percentiles on multiple groups for any desired number of percentiles. The new recommended practice is a combination of tidyr::nest, dplyr::mutate and purrr::map for most cases of grouping. While there is no definite time frame on this, I try to use it as little as possible. However, according to Hadley, do will eventually be “going away”. Most of these solutions revolve around using the do function to calculate the quantiles on each of the groups. A quick Google search comes up with numerous stack overflow questions and answers about this. This error is telling us that the result is returning an object of length 3 (our three quantiles) when it is expecting to get only one value. If you ran the code, you will see that it throws the following error: Error in summarise_impl(.data, dots) :Ĭolumn `quants` must be length 1 (a summary value), not 3 If you don’t believe me when I say that it is not straight forward, go ahead and try to run the following block of code. Before I demonstrate, let’s load the libraries that we will need. However, I quickly ran into the realization that this is not very straight forward when using dplyr’s summarize.

Recently, I was trying to calculate the percentiles of a set of variables within a data set grouped by another variable.

0 Comments

Tidyverse summary

Leave a Reply.

Author

Archives

Categories