What factors should organizations take into account when evaluating if the R language is right for their analytics needs?
R is a gloriously specialized language -- I love it. If you need what it can do, it's a match made in heaven. So, what are its strengths, and what features does it have that might tempt you into a long and meaningful relationship?
R's core strength is data sampling and data manipulation. Suppose you want, for example, to take a random sample of 100 values from a set of data that is normally distributed with a mean of 65.342 and a standard deviation of 2.1. All you need is a single line:
And from that, R will generate the data you're looking for.
Now, for many people, that might sound unbelievably boring. But the power of R analytics lies in the application of the language's abilities: It's a perfect tool for numerical simulations. For example, I recently wanted to perform a Monte Carlo simulation of a scoring system called the Net Promoter Score (NPS). Monte Carlo simulations are a vital part of analytics; they allow you to model the behavior of complex systems in order to be able to understand them. Used by analytics professionals for many years, they involve random sampling of sets of numbers thousands or even millions of times.
R excels at creating and running Monte Carlo simulations, and the NPS simulation described above took a mere nine lines of code. I would love to tell you that I'm a hero because I managed to do it in nine lines, but that really isn't the case. The R programming language is simply exceptionally good at generating huge sets of numbers and then manipulating them. It's also good for prototyping big data manipulations.
How does R manage to be so good at these kinds of tasks? The answer is that it has a whole raft of functions that are designed specifically for this kind of work. Where do they come from? R is free and open source. If people want a function and can't find it, they can write one and add it to the function "bank" that is R. They have been doing that for about 15 years, which means that most of the functions you will ever need are already there.
Finally, R is a very easy language to learn -- you can just download the language and a front-end environment (such as RStudio, which I used to create the image embedded here) and start typing.
So, if you have numerical manipulations you want to perform, particularly simulations such as Monte Carlos, I really recommend taking a look to see whether the R language fits your needs. If you don't need to manipulate numbers in these kinds of ways, R is probably not for you.
Dig Deeper on Predictive analytics
Related Q&A from Mark Whitehorn
Here's a guide to primary, super, foreign and candidate keys, what they're used for in relational database management systems and the differences ... Continue Reading
The unstructured data types common in big data systems are often better managed by a NoSQL database than relational software, Mark Whitehorn says. Continue Reading
IT managers should ask cloud providers some pointed questions about the security of data stored in cloud databases, says expert Mark Whitehorn. Continue Reading