Unlike most commercial statistics programs which rely on precompiled
and pre-packaged routines, R allows users to program functions and run
them in the same environment. If you notice a perceptible lag when
running your R code, you may improve the performance of your programs
by taking the following steps:
- Reduce the number of loops. If it is absolutely necessary to run
loops in loops, the inside loop should have the most number of cycles
because it runs faster than the outside loop. Frequently, you can
eliminate loops by using vectors rather than scalars. Most R functions
deal with vectors in an efficient and mathematically intuitive manner.
- Do away with loops altogether. You can vectorize functions
using the apply, mapply(), sapply(), lapply(),
and replicate() functions. If you specify the function passed
to the above *apply() functions properly, the R consensus is that
they should run significantly faster than loops in general.
- You can compile your code using C or Fortran. R is not compiled,
but can use bits of precompiled code in C or Fortran, and
calls that code seamlessly from within R wrapper functions (which pass
input from the R function to the C code and back to R). Thus, almost
every regression package includes C or Fortran algorithms, which are
locally compiled in the case of Linux systems or precompiled in the
case of Windows distributions. The recommended Linux compilers are
gcc for C and g77 for Fortran, so you should make sure that your code
is compatible with those standards to achieve the widest possible
distribution.
Gary King
2011-11-29