Tuesday, June 26, 2018

R Programming and its benifits

What is R Programming ?

R is a programming language and environment commonly used in statistical computing, data analytics and scientific research.

It is one of the most popular languages used by statisticians, data analysts, researchers and marketers to retrieve, clean, analyze, visualize and present data.

Due to its expressive syntax and easy-to-use interface, it has grown in popularity in recent years.
Why use R for statistical computing and graphics?
R is open source and free!
R is free to download as it is licensed under the terms of GNU General Public license.You can look at the source to see what’s happening under the hood. There’s more, most R packages are available under the same license so you can use them, even in commercial applications without having to call your lawyer.
R is popular – and increasing in popularity
IEEE publishes a list of the most popular programming languages each year. R was ranked 5th in 2016, up from 6th in 2015. It is a big deal for a domain-specific language like R to be more popular than a general purpose language like C#.This not only shows the increasing interest in R as a programming language, but also of the fields like Data Science and Machine Learning where R is commonly used.
R runs on all platforms
You can find distributions of R for all popular platforms – Windows, Linux and Mac.
R code that you write on one platform can easily be ported to another without any issues. Cross-platform interoperability is an important feature to have in today’s computing world – even Microsoft is making its coveted .NET platform available on all platforms after realizing the benefits of technology that runs on all systems.

Learning R will increase your chances of getting a job
According to the Data Science Salary Survey conducted by O’Reilly Media in 2014, data scientists are paid a median of $98,000 worldwide. The figure is higher in the US – around $144,000.Of course, knowing how to write R programs won’t get you a job straight away, a data scientist has to juggle a lot of tools to do their work. Even if you are applying for a software developer position, R programming experience can make you stand out from the crowd.
R is being used by the biggest tech giants
Adoption by tech giants is always a sign of a programming language’s potential. Today’s companies don’t make their decisions on a whim. Every major decision has to be backed by concrete analysis of data.
Companies Using R
R is the right mix of simplicity and power, and companies all over the world use it to make calculated decisions. Here are a few ways industry stalwarts are using R and contributing to the R ecosystem.
Company Application/Contribution
Twitter Monitor user experience
Ford Analyse social media to support design decisions for their cars
New York Times Infographics, data journalism
Microsoft Released Microsoft R Open, an enhanced R distribution and Microsoft R server after acquiring Revolution Analytics in 2015
Human Rights Data Analysis Group Measure the impact of war
Google Created the R style guide for the R user community inside Google
While using R, you can rest assured that you are standing on the shoulders of giants.

Is R programming an easy language to learn?
This is a difficult question to answer. Many researchers are learning R as their first language to solve their data analysis needs.

That’s the power of the R programming, it is simple enough to learn as you go. All you need is data and a clear intent to draw a conclusion based on analysis on that data.

In fact, R is built on top of the language S programming that was originally intended as a programming language that would help the student learn programming while playing around with data.

However, programmers that come from a Python, PHP or Java background might find R quirky and confusing at first. The syntax that R uses is a bit different from other common programming languages.

While R does have all the capabilities of a programming language, you will not find yourself writing a lot of if conditions or loops while writing code in the R language. There are other programming constructs like vectors, lists, frames, data tables, matrices etc. that allow you to perform transformations on data in bulk.

Applications of R Programming in Real World
Data Science
Harvard Business Review named data scientist the “sexiest job of the 21st century”. Glassdoor named it the “best job of the year” for 2016. With the advent of IoT devices creating terabytes and terabytes of data that can be used to make better decisions, data science is a field that has no other way to go but up.Simply explained, a data scientist is a statistician with an extra asset: computer programming skills. Programming languages like R give a data scientist superpowers that allow them to collect data in realtime, perform statistical and predictive analysis, create visualizations and communicate actionable results to stakeholders.
Most courses on data science include R in their curriculum because it is the data scientist’s favourite tool.

Statistical computing
R is the most popular programming language among statisticians. In fact, it was initially built by statisticians for statisticians. It has a rich package repository with more than 9100 packages with every statistical function you can imagine.R’s expressive syntax allows researchers – even those from non computer science backgrounds to quickly import, clean and analyze data from various data sources.
R also has charting capabilities, which means you can plot your data and create interesting visualizations from any dataset.
Machine Learning
R has found a lot of use in predictive analytics and machine learning. It has various package for common ML tasks like linear and non-linear regression, decision trees, linear and non-linear classification and many more.Everyone from machine learning enthusiasts to researchers use R to implement machine learning algorithms in fields like finance, genetics research, retail, marketing and health care.
Alternatives to R programming
R is not the only language that you can use for statistical computing and graphics. Some of the popular alternatives of R programming are:

Python – Popular general purpose language
Python is a very powerful high-level, object-oriented programming language with an easy-to-use and simple syntax.

Python is extremely popular among data scientists and researchers. Most of the packages in R have equivalent libraries in Python as well.

While R is the first choice of statisticians and mathematicians, professional programmers prefer implementing new algorithms in a programming language they already know.

The choice between R vs Python also depends on what you are trying to accomplish with your code. If you are trying to analyze a dataset and present the findings in a research paper, then R is probably a better choice. But if you are writing a data analysis program that runs in a distributed system and interacts with lots of other components, it would be preferable to work with Python.

SAS (Statistical Analysis System)
SAS is a powerful software that has been the first choice of private enterprise for their analytics needs for a long time. Its GUI and comprehensive documentation, coupled with reliable technical support make it a very good tool for companies.

While R is the undisputed champion in academics and research, SAS is extremely popular in commercial analytics. But R and Python are gaining momentum in the enterprise space and companies are also trying to move towards open-source technologies. Time will tell if SAS will continue its dominance or R/Python will take over.

SPSS – Software package for statistical analysis
SPSS is another popular statistical tool. It is used most commonly in the social sciences and is considered the easiest to learn among enterprise statistical tools.

SPSS is loved by non-statisticians because it is similar to excel so those who are already familiar with it will find SPSS very easy to use.

SPSS has the same downside as SAS – it is expensive. SPSS was acquired by IBM in 2009 for a reported $1.2 billion.



Run R Programming on Your Computer

You will find the easiest way to run R programming on your system (Windows, Mac OS X or linux) in this section.

Run R Programming in Mac
Go to official site of R programming
Click on the CRAN link on the left sidebar
Select a mirror
Click “Download R for (Mac) OS X”
Download the latest pkg binary
Run the file and follow the steps in the instructions to install R.
Run R Programming in Linux
On Ubuntu
The Advanced Packaging Tool (APT) that comes with Ubuntu uses a file called sources.list to decide where to search for packages.

Before we can install R, we need to tell Ubuntu to look into the CRAN R repositories and also add a public key for secure download.

Open the sources.list file(usually located at /etc/apt/sources.list) in a text editor and add the following line at the end
deb https://<my.favorite.cran.mirror>/bin/linux/ubuntu <distribution>/
For instance, If you are running Ubuntu trusty and want to use the RStudio CRAN mirror, the line would be

deb https://cran.rstudio.com/bin/linux/ubuntu trusty/
If you are lazy like all good programmers, you can do this directly from the terminal without having to open a text editor as

sudo sh -c 'echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/" >> /etc/apt/sources.list'
Authenticate the Ubuntu packages on CRAN
The packages for Ubuntu that are stored on CRAN mirrors are all signed using a key with ID E084DAB9
We download the public key from the Ubuntu keyserver using this ID and add it to our system using the command
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
Update the list of available packages
Since we modified the sources.list, we need to tell APT to download the packages that are available from the CRAN servers by running the command.
sudo apt-get update
Download and install R
Almost done. Just download and install the R package by running the command:
sudo apt-get -y install r-base
Open up the R console and issue following command.
$ R
If there were no issues during installation. The R console should open successfully with information about your R installation.

RedHat-based Distributions
The process is similar for Redhat-based Linux Distributions like CentOS. Instead of modifying a file like sources.list, you can directly add the repository for EPEL(Extra Packages for Enterprise Linux) by using the following command.

su -c 'rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm'
You can find the url for the correct rpm file for your system here.

Now it’s just a matter of updating the list of available packages and installing R.

sudo yum update
sudo yum install R
Fedora
Installing R on fedora is a piece of cake. The Fedora repositories have the latest version of R binaries installed.

Just run the commands:

sudo yum update
sudo yum install R
Run R Programming in Windows
Go to official site of R programming
Click on the CRAN link on the left sidebar
Select a mirror
Click “Download R for Windows”
Click on the link that downloads the base distribution
Run the file and follow the steps in the instructions to install R.
Should I install the 32-bit version or the 64-bit version?
Most people don’t need to worry about this. Obviously the 64-bit version of R won’t work on a 32-bit machine but both the 32-bit and 64-bit versions of R runs seamlessly on 64-bit Windows.

You might want to consider installing 32-bit version of R if your production environment is 32-bit because some packages might have compatibility issues and might cause the “But it works on my machine” fiasco.

Installing RStudio
RStudio is the most popular IDE for running R programs and has a free license.

The installation process is straight forward. Download the RStudio (Windows, Linux and Mac OS X), run the file and follow the instructions to install it.

Note: R should be installed on your system before you can run RStudio.

After you install RStudio and open it for the first time, it will ask you to choose which version of R to use.

Choose R version in Rstudio

If RStudio detects that R hasn’t been installed on your system, it will show you a warning.

If R has been installed, you’ll see the R Studio interface. In the beginning, you can only see the R console where you can write one line statements in R and execute them.

However, even for trivial work, you will need to perform a sequence of steps and it is better to create an R script.

Go to File > New File > R Script as shown in the screenshot below to create a new R script.

New file in RStudio

You can now see the R Script Editor where you can type and save R programs that span multiple lines. RStudio isn’t just a text editor but an IDE that helps you run and debug R scripts with ease.

The R Studio GUI is divided into 4 major sections as shown in the screenshot below:

RStudio GUI



Your First R Program

R holds a reputation for getting things done with very little code. If you’re a programmer and thinking “Here comes the Hello World code”, you’re in for a surprise.

In just three lines of code, your first R program will generate 10,000 numbers in a random distribution, organize them based on frequency and create a fancy barchart.

Copy the following code into the RStudio window, press Ctrl+A(Windows) or Cmd+A(Mac) to select all three lines and press Ctrl+Enter(Windows) or Cmd+Enter(Mac)

n <- floor(rnorm(10000, 500, 100))
t <- table(n)
barplot(t)
Look at the right bottom section of RStudio and you will see this beautiful bar graph showing the bell curve of a random normal distribution.

Creating a bar graph using r

Here’s what each part of the code does:

Getting a list of random numbers in normal distribution
n <- floor(rnorm(10000, 500, 100))
The first line generates a list of 10000 random numbers in a normal distribution such that the mean of these numbers is 500 and standard deviation 100.

The floor function takes each number in this list and removes the decimal point.

You can even try running this code separately in the R console and see the output as:

R floor function

Counting occurrences of each value
The table function takes these 10000 numbers and counts the frequency of each

Table function in R programming

Since it is a normal distribution, you can clearly see the frequencies of the numbers gradually increase as we approach the mean.

Plotting the frequencies on a bar graph
The barplot function takes this table of frequencies and creates the bar chart out of the data.

We don’t really need three lines. In just one line, we could have done the same thing in one line while adding labels to the x and y axes with

barplot(table(floor(rnorm(10000, 500, 100))), xlab="Numbers", ylab="Frequencies")
This is the power of the R programming language. As a tool specifically built for statisticians, it performs all common operations using an expressive syntax that you will learn to love.

Getting started With R console

While RStudio is an amazing tool to get started learning R, it is only an interface to the R console. It is important to be familiar with running R programs directly through the command prompt or terminal because you might not always have access to a graphical interface if you are running R programs on a server.

If R is installed correctly, you can open the R console by typing ‘R’ on the terminal and pressing Return/Enter.

When you start R, the first thing you will see is the R console with the default “>” prompt. We can start typing commands directly at the prompt and hit return to execute it.

For instance, try typing the following commands on the R prompt

> n <- c(2, 3, 5, 10, 14)
> mean(n)
[1] 6.8
As you can see, each command is executed as soon as you press the return key and if there is any output(the mean in the above example), then it is displayed.

If the command is incomplete when you hit return, the prompt changes to “+” and continues to take input until the command is syntactically complete.

Alternatively, we can execute R commands stored in an external file using the function source() as follows.

> source("example.R ")
To exit the command prompt we can call the q() function (as in quit).
> q()
Different ways to run R scripts
Sometimes you may need to run an R program inside a batch or shell script. There are different ways to achieve that.

Method 1: Using R CMD BATCH command
Save your R script in a text file with .R extension and type the following command.

R CMD BATCH /home/demo/learnR/Rprogramming.R
The output of this command will be stored in a file called Rprogramming.Rout

Method 2: Using Rscript
Use the following command

Rscript /home/demo/learnR/Rprogramming.R
The difference between R CMD and Rscript is that Rscript prints the output to STDOUT instead of a file.

If you want to turn your R program into an executable, you can specify that you want the file to run using Rscript by adding the following line at the beginning of your R script.

#!/usr/bin/env Rscript
For example, If your R program looks like
#!/usr/bin/env Rscript
n <- c(2, 3, 5, 10, 14)
mean(n)
You can directly execute it from the terminal as ./Rprogramming.R

Learn to Code in R programming

There is no one best way to learn how to program using the R programming language. Depending on your learning style, you can choose between any of the resources available online.

Learn R from DataMentor
At DataMentor, we have created a ton of resources to help you get started with learning R. You can use our tutorials to get started with statistics using R. We cover how to

Download the software to run R scripts
Write R code
Understand the R syntax
Perform basic statistical operations
Learn advanced R concepts
Share:

0 comments:

Post a Comment

Connect with us in Facebook

Popular Posts