Analyzing World Datasets in R Assignment Sample

This document provides a detailed walkthrough of an R programming assignment focused on analyzing a dataset named "world.csv". It utilizes R code snippets, commands, and comments to guide the reader through each step of the data analysis process.

  • 54000+ Project Delivered
  • 500+ Experts 24x7 Online Help
  • No AI Generated Content
GET 35% OFF + EXTRA 10% OFF
- +
35% Off
£ 6.69
Estimated Cost
£ 4.35
12 Pages 2927Words

Introduction of R Programming Assignment: Analysis of World GDP and OECD Membership Data

Achieve academic success with New Assignment Help's specialized services offered in the UK.

The assignment is related to the homework of different tasks on completing some various types of exercises. These exercises are about a data set named “world.csv” data and the modification and the analysis will be done in this assignment. The codes and the commands are used as R codes in this assignment to execute the proper task outcomes. The modification will be conducted using different types of requirements. The R codes will be evaluated without errors and the proper commands will be used in this research. Here, the comments are specified with the symbol ‘#’ before the codes. The software that has been used here is named “Rstudio”. The tasks are about making a data frame based on the data set variables. Then a frequency table will be made to clarify the assignment questions properly. A bar graph will be implemented as per the requirements using R codes. The GDP calculation as per the countries per capita will be calculated and analyzed as per the task’s requirement. The mean and median values will also be executed as per the per capita for the GDP variable. The histograms will also be executed for the GDP variables for non-democratic countries. Similarly, the calculation of the mean values of GDP variables will be executed in this assignment.

Task 1 The main purpose of this task is to load the dataset on the software platform R Studio and store it as “world. Data” which is an object.

The analysis of the dataset can only begin when the dataset is loaded onto the software platform (Lund, 2020). Appropriate coding is used to load the dataset onto the platform R Studio and the required options have been selected to do so. The first task is related to making and loading CSV files in the R Studio programming language.

The data set has been saved as the name “world.csv” and then it is imported into the software. The codes and the commands are used as R codes in this assignment to execute the proper task outcomes (Chenet al. 2022). The modification will be conducted using different types of requirements. There is an option named “import dataset” in the R Studio “Environment” option. The dataset has been loaded from the browsing location of the PC and it has been derived into the software. After completing that, the dataset has been stored as a named object. That name is clarified with “world. data” and it has been done to complete task 1 of this assignment.

Task 2 Making The Dummy Variable Data Frame Dased On The Main Data Set

A categorical variable can be used to store both integer and string values where integer values refer to numerical data while string values refer to textual data (Kayaet al. 2019). The name of the data frame has been defined as ‘OECD’. It can clarify the selected countries into two different groups and the names of the groups are related to the OECD members and the Non-OECD members.

The summarization and the description of this data frame have also been demonstrated in this task as per the requirements. Next, the non-categorical and categorical values are collected to create a table for this task.

The table has been named “frequency” and the variables are created as values, frequency, and percentages (Chakravarthi et al. 2022). The data frame object has been defined as ‘ft. OECD. The columns of the tables have been modified as ‘Var1’ as value, ‘Freq.’ as frequency, and percentage. In the last part, column “Var1” has been changed to the name “OECD Member” as per the requirement of task 2.

Task 3 Making The Table Of Frequency

Task 3 has been started accordingly to the requirements of . This task is related to finding the answers to some different questions. The answers are given and evaluated as follows:

  • The countries that are members of OECD are calculated as 88.
  • The countries that are not the members of OECD are calculated as 103.
  • The percentage of the OECD member’s form all the countries is 46.7%.
  • The percentage of non-OECD member’s form all the countries is53.92%.

Those four answers to the tasks and questions have been evaluated properly to complete this task. Part (A) refers to the number of countries that are OECD members. Part (B) shows the number of countries that are not OECD countries. Part (C) shows the percentage of countries that are OECD members while Part (D) provides the percentage of countries that are not OECD members. The comments are given with the symbol ‘#’ to clarify the main topics and the work of that codes (Blischak et al. 2019). Comments are not part of the code but are additions made by the programmer to explain a particular part of the code and comments begin with the # symbol. The table has been executed properly the numbers are given according to that analysis. All the comments are given properly as per the requirement and it can give a major understanding of this research assignment. Since the answers to Part A and Part B are quantities they are given as numbers while the answers to “Part C” and “Part D” are percentages. The calculation and the percentage values are numerically given as per the requirements of the task 3.

Task 4 Requirement Of The Assignment Questions

A Nominal variable is used for naming attributes that are analyzed and the label for the x-axis is “OECD membership” while the name of the y-axis is “Number of countries”.

A bar graph has been used for the study of the nominal variables and a bar graph shows the frequencies for the variables (Hossard, 2019). The variables are plotted along the x- axis while the frequencies are plotted along the y-axis.

The frequencies are represented in the form of vertical columns in the bar graphs. The graph of a nominal variable has been implemented (Balduzzi et al. 2019). It can be done for the main description of the data frame variables. The bar graph has been executed for the nominal variables. It can help to evaluate the proper values of the nominal attributes of the data frame. The package has been used here named ‘ggplot2’ to execute all types of graphs in Rstudio (Bülow, 2020). The command has been used as “geom_bar” to implement the proper bar plot for the variables. Another bar chart has been executed for the variables of “OECD”.

  • The proper libraries have been used in this assignment to complete the tasks per the requirements. So here, the commands of libraries are used to load the packages of ggplot2 and the functions are used to plot the bar graphs.
  • The axis labels have been changed as per the question’s requirement. The X-axis has been changed to “X-lab” and the Y-axis has been defined as “Y-lab” options. The proper label has been set into the X axis as the “OECD membership”. The label for the Y axis has been defined as the “Number of countries”.

Task 5 Analysis Of The Numerical Variables As Inter-Level Variables

These variables are named ‘gdp_10’ in the dataset. It has the records of the country per capita. The GDP has been defined as 10k US dollars. The numerical values are taken as forty thousand dollars, not four dollars.

In this task, the numerical calculation has been done to execute the range of the maximum and minimum values of the variables (Arnold, 2019). The values are taken as first and third-quartile variables. One command has been used for this analysis to detect the mean and median values. The range is the difference between the maximum value of a variable and the minimum value of a variable.

The mean is the average of the values of a variable while the median is the middle value among the different values of a variable. The standard deviation of the values have been determined through R Studio commands and the standard deviation is obtained as the square root of the variance of the values. The standard deviation is a measure of how scattered the values of the variable are around the mean of the variable. While computing the standard deviation the missing values have been considered using the na.rm option.

Another analytical regression has been done to execute the standard deviation and the execution of missing values is also executed to complete this task. The R commands are provided as per the requirement with the executed numerical values. The range has been executed as max 4.7354 and the min 0.009. The standard deviation values is 0.9433982.

Task 6 The Median And Mean Values Are Separately Calculated And Analyzed.

The research has been done to clarify that the mean values are higher than the executed median values.

The distribution of the variable in this analysis has been evaluated as non-symmetric variables. As per the requirements and the analysis, the answers can be defined for the following questions. The answers are discussed below as follows:

  • The mean value is much higher the median values so it can be executed that it is negatively skewed.
  • The outcome values of mean and median values can be executed as that it is not positively skewed.

Here, the answers are given in words, not with the R commands as per the requirement of the question (Chen, 2021). The distribution of the variable is positively skewed or skewed to the right since the median is the middle value while the distribution is around the mean which has shifted to the right of the median along the positive direction of the x-axis.

Task 7

A graphical view has been implemented to execute the GDP variable graphically. Here, the axis labels have been changed as mentioned in the question. The x-lab and y-lab have been changed to the X axis with GDP per capita and the Y axis with the Number of countries.

The following graph has been implemented as the histogram of the GDP variables per capita. The histogram is used to represent the frequency distribution of a variable where the class intervals are shown along the x-axis while the frequencies for the class intervals are plotted along the y-axis

Task 8

The standard error of the mean has been calculated using the mean of the per capita GDP variable and the standard deviation of the variable which have been computed in the previous tasks.

The appropriate R Studio commands have been used for computing the standard error of the mean. The value of the standard error has been executed as 0.129188.

Task 9

In this task, the calculations of the standard error and the mean value have been executed as well as evaluated. The constructed confidence has been taken as 95% intervals.

The sample of the mean values has been taken as the “gdp_10”. Here, the R commands and the numerical values both are given as per the requirements and the analysis of this assignment.

Task 10

Here, in this task, the histograms are developed as per capita of GDP variables. The democracies and non-democracies variables are taken to make the histograms. The values of autocracy and democracy have been modified with ‘no’ and ‘yes’.

A new data frame named “dem.gdp” has been used for the generation of the histograms and the rows where the value of the “democ_regime” variable is not present.

In the last, the values of the data frame has been changed into “Democracy” from “Yes” and “Autocracy” from “No”.

Task 11

The calculations of the standard error and the mean value have been executed as well as evaluated. The constructed confidence has been taken as 95% intervals. The capita for GDP has been demonstrated in the graph.

The intervals are executed for the calculation of mean values of per capita GDP for the democracies (Bikbov, 2018). The confidence interval has been taken here also as 95%. The commands and the numerical results are given as per the requirements and the mean for democracy is 0.6579.

Task 12

In this last task, a similar calculation has been done for the mean value of per capita GDP for the variable autocracies.

It has been executed as a non-democracies variable. Here, the confidence interval has also been taken as 95% to detect the mean value. The value of the autocracy mean value in GDP per capita is executed as 0.5450.

Conclusion

The tasks are executed properly in this assignment to complete each question answer. The codes and the commands are used as R codes in this assignment to execute the proper task outcomes. The modification will be conducted using different types of requirements. The R codes will be evaluated without errors and the proper commands will be used in this research. Here, the comments are specified with the symbol ‘#’ before the codes. The software that has been used here is named “R Studio”. The tasks are about making a data frame based on the data set variables. Then a frequency table will be made to clarify the assignment questions properly. A bar graph will be implemented as per the requirements using R codes. The GDP calculation as per the countries per capita will be calculated and analyzed as per the task’s requirement. The proper commands and numerical values are executed and given as per the requirements in this assignment.

References

Arnold, T.W., 2019. Data and R code supporting" A Meta-Analysis of Band Reporting Probabilities for North American Waterfowl".

Balduzzi, S., Rücker, G. and Schwarzer, G., 2019. How to perform a meta-analysis with R: a practical tutorial. Evidence-based mental health, 22(4), pp.153-160.

Bikbov, B., 2018. R open source programming code for calculation of the kidney donor profile index and kidney donor risk index. Kidney Diseases, 4(4), pp.269-272.

Blischak, J.D., Carbonetto, P. and Stephens, M., 2019. Creating and sharing reproducible research code the workflowr way. F1000Research, 8.

Bülow, E., 2020. coder: an R package for code-based item classification and categorization. Journal of Open Source Software, 5(56), p.2916.

Chakravarthi, B.R., Priyadharshini, R., Muralidaran, V., Jose, N., Suryawanshi, S., Sherly, E. and McCrae, J.P., 2022. Dravidiancodemix: Sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text. Language Resources and Evaluation, pp.1-42.

Chen, S., 2021. Exploring Value Set Analysis for Binary Code Hardening and Vulnerability Detection (Doctoral dissertation, The Ohio State University).

Chen, Y., Pal, B., Lindeman, G.J., Visvader, J.E. and Smyth, G.K., 2022. R code and downstream analysis objects for the scRNA-seq atlas of normal and tumorigenic human breast tissue. Scientific Data, 9(1), pp.1-9.]

Ge, S.X., Son, E.W. and Yao, R., 2018. iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC bioinformatics, 19(1), pp.1-24.

Hossard, L., 2019. R code, partial dataset and results for: Modelling the impacts of agricultural landscape changes: A bibliometric review.

Kaya, E., Agca, M., Adiguzel, F. and Cetin, M., 2019. Spatial data analysis with R programming for environment. Human and ecological risk assessment: An International Journal, 25(6), pp.1521-1530.

Lund, B.D., 2020. Assessing library topics using sentiment analysis in R: a discussion and code sample. Public Services Quarterly, 16(2), pp.112-123.

Motogna, S., Cristea, D., ?otropa, D. and Molnar, A.J., 2022. Formal concept analysis model for static code analysis. Carpathian Journal of Mathematics, 38(1), pp.159-168.

Puri, R., Kung, D.S., Janssen, G., Zhang, W., Domeniconi, G., Zolotov, V., Dolby, J., Chen, J., Choudhury, M., Decker, L. and Thost, V., 2021. CodeNet: A large-scale AI for code dataset for learning a diversity of coding tasks. arXiv preprint arXiv:2105.12655.

Wolfson, D., Fieberg, J.R. and Andersen, D.E., 2018. Data and R Code Supporting: Juvenile Sandhill Cranes Exhibit Wider Ranging and More Exploratory Movements Than Adults During the Breeding Season.

 

35% OFF
Get best price for your work
  • 54000+ Project Delivered
  • 500+ Experts 24*7 Online Help

offer valid for limited time only*

×