9 Pages
2165 Words
Data Science In Business Growth Of Amazon Company Assignment
Get free written samples by our Top-Notch subject experts and Online Assignment Help team.
Introduction
Data science is considered as the procedure of cleaning, structuring and building datasets for analyzing along with extracting meaning. There are some steps that are required in data science such as forming hypotheses, executing the experiments for gathering data, assessing the quality of the data, cleaning as well as streamlining the datasets, organizing along with structuring data for the analysis. There are some basic impacts of data science in business applications of Amazon Industry such as gaining the insights of the customers, increasing the security system, informing the internal finances, manufacturing streamline, and also predicting the future trends of the market. When crucial thinking meets the algorithms of machine learning, insights, efforts regarding guide efficiency can be offered by data and predictions are informed. In this study the data types that are generally used and executed will be discussed and also the issues regarding this data will be mentioned. There are various machine learning methods and also infrastructure that are utilized for the business growth of Amazon Industry. The beneficial patterns and also several discoveries regarding data science for the betterment of the company will be discussed. Large scale data sets are transformed by the data scientists when valuable and essential business insights are provided to the stakeholders.
Data types, Data preprocessing and Data Issues
Data Types
Amazon generally uses Big Data that has helped to create links with the manufactures and also inventory is tracked for ensuring orders are quickly fulfilled. Big data is generally considered as a technology to help manage the large data amounts. In current times, collection of data is also digitized (Zamri et al. 2020). Big Data is the term which is utilized for several datasets that are more complex than traditional existing datasets. As in present times, there are various technologies such as Artificial Intelligence, mobile applications, Internet of Things; the complexity along with the amount of data has now increased. There are different ways in which Amazon utilizes Big Data to monitor, such as, Alexa voice recording, Recommendation system that is personalized, One-click ordering, Model of Anticipatory Shipping etc.
Data Preprocessing
There are various key phases in the life cycle of machine learning such as data collection, feature engineering, data preprocessing, mail training, evaluation and deployment. Additionally, to produce training datasets that are well prepared has generally required knowledge regarding frameworks along with the libraries of data analysis. For this a barrier has been presented that reduces the iteration speed for the practitioners (Hewage et al. 2018). To address this issue and solve this AWS Glue DataBrew is released. It is a service of visual data preparation with more than 250 transformations for automating the tasks of data preprocessing, without writing any code. Amazon EMR is generally a cluster platform which provides the capability of processing and analyzing large amounts of data with the use of Framework like Apache Spark and Apache HaDoop.
Figure 1: Architecture AWS cloud
(Source: Patil, 2017)
The steps are mentioned below:
- Loading of datasets to Amazon S3, where Census Income dataset is used for training the ML model.
- Data Preparation and Feature engineering with the use of DataBrew, where DataBrew is utilized for exploring the dataset sample uploaded to the Amazon S3.
Data Issues
There are various data issues in this company regarding Security, Control, Hackers and also the considerations. Amazon has used Data Lake for solving the big data challenges, the issues that are faced by this company are, data silos, data controllership, incorporating machine learning, difficulty in analyzing the diverse datasets (Zamri et al. 2020). Data lakes are used for breaking down the silos, managing the data access, and also accelerating machine learning.
Figure 2: Data lake essential components
(Source: Hewage, 2018)
Machine Learning Methods and Infrastructure
Methods
There are some basic steps to create a machine learning model for the betterment of the company on AWS (Turban et al. 2020). The methods are discussed below:
Formulate the Issue
Before constructing the application, the predicted target or issue should be known, regarding which the model will be created.
Gathering labels data
Amount of data in machine learning is generally large, so the labels data need to be gathered for training the model. The labeled data should have two attributes such as target and variables.
Pre-processing of data
Once the labeling is completed, the data has to be converted to a proper format that will be suitable for the algorithm of the model.
Feature Transformation
Feature transformation regarding processing of the feature is the process for changing or transforming the variables according to the model of ML. the checklist of the processing are:
- Domain specific transformation of variables
- Cartesian product related to one variable with another
- Non-linear transformation
- Missing values are replaced
Data splitting
Data is splitted as a chunk of data is needed for the training of the model (Hewage et al. 2018). Machine Learning of Amazon generally splits the dataset into a 65/35 ratio.
Training of the ML model
Learning algorithms are integrated to train the model. A linear model is employed by Amazon ML to build predictive applications.
Learning Algorithms
Machine Learning in Amazon generally uses the below mentioned learning algorithms:
- For the binary classification, logistic regression is used by Amazon ML.
- For the multiclass classification, multinomial logistic regression is used
- For regression, linear regression is used by Amazon ML.
- Understanding the parameters of training
The predictive accuracy in the ML model can be improved by the manipulation of some parameters (Turban et al. 2020). These are used during the training of the model. The hyper parameters to boost the performance of the model are, learning rate, model size, passes number, data shuffling and regulation function.
Evaluating the accuracy of the model
Various metrics along with parameters are used for evaluating the predictive accuracy of the model. The model of Binary classification is generally evaluated by AUC parameters. The model of multiclass evaluation is evaluated by the “confusion matrix”. Accuracy metrics utilized for evaluating regression models are “Root Mean Square Error (RMSE)” and “Mean Absolute Percentage Error (MAPE)”.
Making predictions
As the model of ML is trained and tested, predictions can be made. Model predictions can be made in such ways like batch predictions and Online Predictions.
Figure 3: Machine learning Methods
(Source: Zamri, 2020)
Data Infrastructure
This company uses cloud technology based infrastructure. AWS (Amazon Web Services) is considered as an evolving, comprehensive platform of cloud computing provided by Amazon that consists of IaaS, PaaS, SaaS offerings (Turban et al. 2020). Cloud infrastructure is majorly used as it offers data recovery, flexibility, easy access, no maintenance and also high level security.
Pattern Detection and Discoveries
The customers of large enterprises need a scalable data lake with a unified access mechanism of enforcement for supporting the analytics workload. The architecture of AWS lake house encompasses a framework of single management. Lake formation is considered as a fully classified service that helps to build, manage and secure the data lakes easily. Several complex steps are automated and simplified by lake formation which is generally needed for creating the data lakes (Hewage et al. 2018). These steps consist of cleansing, cataloging, moving and collecting data and also securely make the data available for machine learning.
Figure 4: Lake House architecture
(Source: Turban, 2020)
Algorithms used in ML
Logistic Regression
Logistic regression is considered as a popular algorithm that relies under algorithms of Machine Learning (Zamri et al. 2020). The output regarding the variable that is categorically dependent is predicted by logical regression. Probabilistic values are given that lie between 0-1. This regression algorithm is generally used for solving the problems related to classification.
Multinomial Logistic Regression
The loss function regarding the multinomial logistic regression classifies the function of loss related to binary logistic regression from the classes of 2 to K.
The two terms are generalized in this regression, first is none zero when y=1 and the second is none zero when y=0.
The Y vector takes 1 as the value regarding the value of K. Any vector like this, with single value=0 as well as the rest is 0, is known as the one-hot vector.
Conclusions
Data science has essential effect in the current growing market of business industry. The methodologies of data science can explore historical, comparisons can be made to the competition, and recommendations can also be made. The company can be benefited by this to make proper decisions on the products and also operating and executing the metrics. In this study the effect of data science in the Amazon Company has been discussed. There are various data types but this company generally uses big data, the preprocessing of this data and also the issues regarding this is discussed in this study. The methods by which the model of machine learning can be evaluated and executed are briefly discussed here. AWS is the cloud based platform used as the data infrastructure, which is the most comprehensive platform of cloud technology. There are various future positive opportunities in data science. Data can be easily analyzed for finding the insights of the consumer's product. The company can further access various digital methods by data science approach that can assist more technological benefits for the betterment of the industrial growth of the company.
References
Journals
Idoine, C., Krensky, P., Brethenoux, E., Hare, J., Sicular, S. and Vashisth, S., 2018. Magic Quadrant for data science and machine-learning platforms. Gartner, Inc, p.13.
Engin, Z. and Treleaven, P., 2019. Algorithmic government: Automating public services and supporting civil servants in using data science technologies. The Computer Journal, 62(3), pp.448-460.
Sharda, R., Delen, D. and Turban, E., 2020. Analytics, Data Science, & Artificial Intelligence. Pearson Education, Limited.
Zamri, N.E., Mansor, M., Mohd Kasihmuddin, M.S., Alway, A., Mohd Jamaludin, S.Z. and Alzaeemi, S.A., 2020. Amazon employees resources access data extraction via clonal selection algorithm and logic mining approach. Entropy, 22(6), p.596.
Ghimire, A., Thapa, S., Jha, A.K., Adhikari, S. and Kumar, A., 2020, October. Accelerating business growth with big data and artificial intelligence. In 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) (pp. 441-448). IEEE.
Hewage, T.N., Halgamuge, M.N., Syed, A. and Ekici, G., 2018. Big Data Techniques of Google, Amazon, Facebook and Twitter. J. Commun., 13(2), pp.94-100.
Ekapure, S., Jiruwala, N., Patnaik, S. and SenGupta, I., 2021. A data-science-driven short-term analysis of Amazon, Apple, Google, and Microsoft stocks. arXiv preprint arXiv:2107.14695.
Saleem, H., Muhammad, K.B., Nizamani, A.H., Saleem, S. and Aslam, A.M., 2019. Data Science and Machine Learning Approach to Improve E-Commerce Sales Performance on Social Web. International Journal of Computer Science and Network Security (IJCSNS), 19.
Jeble, S., Kumari, S. and Patil, Y., 2017. Role of big data in decision making. Operations and Supply Chain Management: An International Journal, 11(1), pp.36-44.
Passi, S. and Jackson, S.J., 2018. Trust in data science: Collaboration, translation, and accountability in corporate data science projects. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW), pp.1-28.