Programming Power in Data Science and AI Applications

Table of Contents

Introduction To Programming For Artificial Intelligence And Data Science
Data Pre-processing
Data Visualization

7 Pages 1794 Words

Introduction To Programming For Artificial Intelligence And Data Science

In the modern world, data science is a crucial field since it aids in the analysis and interpretation of vast volumes of data that are necessary for making defensible decisions. Therefore businesses can make better decisions and enhance their operations by using data science to get insights into trends, customer behaviour, and market circumstances. Data science can also be used to automate procedures, lower expenses, and increase effectiveness. Also, new services, technologies, and products are being developed using data science.

Trust New Assignment Help for unparalleled academic assistance. With our online assignment help in the UK, students receive personalized support and guidance from experienced professionals. Explore our Free Assignment Samples to access a wealth of knowledge and elevate your academic performance.

Therefore, for developing and employing algorithms, which are necessary for AI and data analysis, programming is crucial for both artificial intelligence (AI) and data science. After that AI and data science cannot be possible without programming because, without it, robots would not be able to learn from and analyse data. Here this study analyses customer-related data using the Python programming language. Generally, information obtained from clients, such as buying patterns, contact details, and preferences, is known as customer data. After that by using this information, firms can better understand their clients' wants and tailor their offerings. Therefore there is also some visualisation of these specific datasets and submitting some specific “Jason.File”.

Data Pre-processing

A critical phase in the data science process is data preparation, which guarantees that the data is ready for future analysis. Cleaning, normalising, aggregating, and other data transformation processes are used to transform raw data into a more acceptable and meaningful format (Ahmad et al. 2022). Therefore, cleaning entails eliminating redundant or irrelevant data, normalising entails formatting data consistently, and aggregating entails fusing data from several sources.

After that to increase the information content of the data, feature extraction is another step in the pre-processing process. Hereafter pre-processing of the data is required to guarantee its objectivity and accuracy. It can lessen noise, find inaccuracies in the data, and help find and remove outliers. Pre-processing can also assist in finding patterns, correlations, and trends in the data that can be utilised to build predictive models. Pre-processing can also reduce the amount of time needed to process massive datasets (Giorgi et al. 2022).

Figure 1: Code to import and read the datasets

(Source: Acquired from Jupyter Notebook)

In this figure, the "rows" list in this code is where the data from the "acw user data.csv" CSV file is stored. Therefore the CSV file is parsed and the data is extracted using the CSV module and the "header" separate variable holds the CSV file's header information. After that, the “CSV.reader()” method is used to build a CSV reader object after the CSV file has been opened in read-only mode (Schwaller et al. 2022). The CSV file's rows are then added to the "rows" list by implementing the code, which then loops through each row in turn. The header and rows are printed to the console by the code at the end.

Figure 2: Implementation code to create nested data

(Source: Acquired from Jupyter Notebook)

The above figure shows the step that involves the dataset analysis in python which is composed of constructing a CSV-based reader class file and then iterating rows in the CSV file to point to the various nested-based data structures (Lee et al. 2022).

Figure 3: Code for creating processed.json file

(Source: Acquired from Jupiter Notebook)

The above image in this section is the python coding to generate the processed JSON file which is the total data of the CSV dataset. All the headers and columns are represented as keys to fetch values as per the header from the CSV dataset to the JSON file (Hua et al. 2022).

Creating employed.json file

Figure 4: Creating employed.json file

(Source: Acquired from Jupyter Notebook)

The json_filterdata in the form of processed.json and employed.json by entering the newest filtered-base data, particularly in datasets.

Figure 5: Code for creating retired.json file

(Source: Acquired from Jupyter Notebook)

The above image is the coding to create a retired JSON file from the processed JSON file to extract the data of the retired column of the CSV dataset.

Figure 6: Creating commute.json file

(Source: Acquired from Jupyter Notebook)

The above image is the coding to create a commute JSON file from the processed JSON file to extract the data of the distance commute and yearly salary column of the CSV dataset.

Figure 7: Implementation code for creating removed_card.json file

(Source: Acquired from Jupyter Notebook)

The above image is the coding to create a removed card JSON file from the processed JSON file by extracting the data of the credit card start and expiry date column of the CSV dataset.

Data Visualization

Figure 8: Import libraries in Jupyter notebook

(Source: Acquired from Jupyter Notebook)

The above figure is the figure of importing the python libraries in the notebook file for performing the necessary data visualisation on the given dataset.

Figure 9: Description of datasets

(Source: Acquired from Jupyter Notebook)

By characterising each data point, we can offer the network more understanding. To accomplish this, make a dictionary that mappings between class to colour, scatter the point separately using a for-loop, and pass the results from a combination. The bar method could be used to make a bar chart.

Figure 10: Trail and Head of the datasets

(Source: Acquired from Jupyter Notebook)

We will use the panda's value counts technique for estimating a category's frequency because the bar chart fails to do so automatically. While the process may get quite messy if there are more than thirty separate categories for categorical data, the bar chart is handy when there are fewer than 30.

Figure 11: Code for calculation of income, salary, and age

(Source: Acquired from Jupyter Notebook)

The code which is generated by using python is shown in the above figure and it involves various calculation procedures for their income, salary along with individual ages. The different code is used for proper data visualisation of the dataset by which salary ages and incomes have been calculated and above.

Figure 12: Histogram of yearly pension

(Source: Acquired from Jupyter Notebook)

This histogram is represented as essential value distribution-based plots of the various numerical-based columns. Since it mainly prepares bins for different ages in the corresponding values as well as plots that could be visualised in the manner how it would be distributed there. Dist Plot is referred to as the second important histogram because a slight enhancement of the version for visualisations provides kernel-based density, particularly over the histograms that discuss the various probability-based density functions (Rohini et al. 2022).

Figure 13: Histogram plot of age

(Source: Acquired from Jupyter Notebook)

Figure 14: Histogram of Vehicle year

(Source: Acquired from Jupyter Notebook)

The above figures show the visualization of the datasets where it generally shows the histogram plot diagram of the age and the vehicle year corresponding.

Figure 15: Data value of marital status

(Source: Acquired from Jupyter Notebook)

The figure gives a clear representation of marital status data in python that consists of the name, length, and styles as well (Nti et al. 2022).

Figure 16: Scatter Plot diagram

(Source: Acquired from Jupyter Notebook)

The bivariate-based analysis has been performed mainly to create the exploration of the existing relationship between the two significant variables. The main task is to make the exploration of essential regulations among the different variables to manufacture the potential models (Naz et al. 2022).

Figure 17: Line plot of yearly salary and age

(Source: Acquired from Jupyter Notebook)

The resulting line plot of yearly salary along with ages is disclosed in python for creating the line plot of analysis.

Line plot of dependants

Figure 18: Line plot of dependants

(Source: Acquired from Jupyter Notebook)

Artificial intelligence and data analysis require programming to enable robots to learn and analyse data. Customer data is a crucial aspect of data science, and it provides valuable information such as buying patterns, contact details, and preferences.

Line plot diagram of yearly pension

Figure 19: Line plot diagram of yearly pension

(Source: Acquired from Jupyter Notebook)

The amount that a customer receives annually is shown here graphically in a line plot diagram. It can be used to keep tabs on changes in the quantity of money they get as well as to spot any potential issues. Therefore a horizontal axis representing years and a vertical axis reflecting the annual pension amount make up the line plot diagram in most cases (Mcbride et al. 2022).

Conclusion

In conclusion, data science is a crucial field in the modern world, as it helps businesses to analyse and interpret vast amounts of data, which in turn enables them to make informed decisions. By using data science, businesses can gain insights into trends, customer behaviour, and market conditions, which can aid in automating procedures, lowering costs, and increasing efficiency. Moreover, data science is essential in developing new services, technologies, and products, and it cannot be possible without programming. By analysing this data, firms can gain a better understanding of their customers' wants and tailor their offerings to meet their needs. Visualizing datasets is also crucial in data science, as it helps to identify patterns and trends that may be difficult to spot through simple data analysis. Overall, data science and programming play a significant role in enhancing business operations and improving decision-making processes.

References

Ahmad, H. and Mustafa, H., 2022. The impact of artificial intelligence, big data analytics and business intelligence on transforming capability and digital transformation in Jordanian telecommunication firms. International Journal of Data and Network Science, 6(3), pp.727-732.

Giorgi, F.M., Ceraolo, C. and Mercatelli, D., 2022. The R language: an engine for bioinformatics and data science. Life, 12(5), p.648.

Hua, T.K., 2022. A Short Review on Machine Learning. Authorea Preprints.

Lee, I. and Perret, B., 2022, June. Preparing High School Teachers to Integrate AI Methods into STEM Classrooms. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 11, pp. 12783-12791).

Mcbride, K. and Philippou, C., 2022. “Big results require big ambitions”: big data, data analytics and accounting in masters courses. Accounting Research Journal, 35(1), pp.71-100.

Naz, F., Agrawal, R., Kumar, A., Gunasekaran, A., Majumdar, A. and Luthra, S., 2022. Reviewing the applications of artificial intelligence in sustainable supply chains: Exploring research propositions for future directions. Business Strategy and the Environment, 31(5), pp.2400-2423.

Nti, I.K., Quarcoo, J.A., Aning, J. and Fosu, G.K., 2022. A mini-review of machine learning in big data analytics: Applications, challenges, and prospects. Big Data Mining and Analytics, 5(2), pp.81-97.

Rohini, P., Tripathi, S., Preeti, C.M., Renuka, A., Gonzales, J.L.A. and Gangodkar, D., 2022, April. A study on the adoption of Wireless Communication in Big Data Analytics Using Neural Networks and Deep Learning. In 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE) (pp. 1071-1076). IEEE.

Schwaller, P., Vaucher, A.C., Laplaza, R., Bunne, C., Krause, A., Corminboeuf, C. and Laino, T., 2022. Machine intelligence for chemical reaction space. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12(5), p.e1604.

Programming For Artificial Intelligence And Data Science Assignment

Introduction To Programming For Artificial Intelligence And Data Science

Data Pre-processing

Data Visualization