11 Pages
2744 Words
Introduction Of The Data Mining & Big Data Analytics Assignment
Business Understanding
Business understanding is the process of analyzing a problem and using some methods for solving the problem. In the business understanding first, the problem is analyzed and then a model is created based on the problem and finally, an effective solution is created (Dai et al. 2020). In this research data mining and big data analytics are used for analysis and solution. Many different algorithms such as “K-means clustering”, “regression”, and “decision tree” is used in this project. Data mining is a process of converting raw data into useful data. The organization uses this technique to understand a market properly, and the need of the customer. This data mining helps an organization to make a useful product that is required in the market and increases the profit of the organization. By using data mining the organization converts raw data into data that is useful for the organization (Hariri et al. 2019). This data is useful for understanding the market, and customer segment for a product, and helps to make a good strategy and increase sales of the organization. This data mining process depends on the quality of the data, data collection process, and processing of the data. Data mining is very useful in business analysis. Data mining is used for understanding the business pattern, and relation with data to make a good business decision. This technique also predicts a new market trend, develop a better market strategy, and predicts the customer segment for a product in a locality. “Big data analysis” is the process of analyzing an ample amount of data and getting useful information from the data. The “Big data analysis” set customers for service in a good-reaching manner (Martínez-Plumed et al. 2019). And allow participating in real-time. In this process "data mining" and "big data analysis" is used for analyzing the no of an accident in different areas, and what is the relation between an accident with different factors like traffic, time, and location, and also show the severity of the accident, what step police took during that time. This data set has many columns each column is useful in the analysis. In this project, a dataset is taken this data set shows the accident in a different locations at different times. The data set has location details, longitude, latitude, number of accidents, traffic, time, and so on. This data is analyzed in this project by using "K-means clustering", "regression", and "decision tree". The accident has an "iceberg effect" on the business. This road accident causes death, psychological problem, physical death, and many more issue. This issue has a huge impact on the victim's family. The main task is to analyze the dataset using several algorithms such as "K-means", "decision tree", and "regression" (Ristevski et al. 2018). The number of accidents and road traffic relation with different variables is analyzed in this project, what is the impact of that “road traffic” and “accident” on the business problem due to this issue (Romero et al. 2020).
Struggling to meet deadlines? New Assignment Help is here to rescue you! With our specialized assignment writing services in the UK, you can submit impeccable assignments on time, every time. Take advantage of our free assignment samples to understand our approach better.
Data Understanding
The dataset used in the analysis is used for analysis. The data set has many columns and each column represents some crucial variable that impacts road traffic and accidents. For understanding the data first the dataset is explored in an unstructured manner. This shows some initial patterns, points of involvement, and characteristics of the data set. This stage does not uncover every pattern that the data set holds. This shows a big picture of the data set, some important trends, and important points of the data set (Singh et al. 2018). This stage includes some “automated tools” like charts, initial reports, and data visualization. This process gives a future direction of analysis and helps to start the analysis in the proper direction. In this stage, the irrelevant “data points” are removed, and make the data set more simple for analysis. Sometimes “data exploration” uses tools like “data visualization” that shows a simple and direct view of the data set just by looking at the thousands of numbers of the dataset. This data set visualization shows a direct relationship between road traffic and the no of accidents in different areas. This gives a border view of the dataset and shows the important point for analysis in the dataset. During this “data exploration” stage it shows some irrelevant data in the dataset and missing data in the data set. This irrelevant data can create problems further in the analysis, so identification of the relevant data point is important. Otherwise, this data can impact the final result of the analysis. So the identification of irrelevant data in the data set is important, this makes the analysis more efficient (Wang et al. 2020). This data set also has some data that is not useful for the analysis and also has some blank data in the dataset. This blank data is removed from the data set and some variables that are not useful in the analysis are removed in and make the data set and simplified for the analysis. The data preprocessing remove the extra unnecessary data from the date set. This “data preprocessing” only deals with removing noise in the dataset and replacing the missing values of the data set this is an important stage for making the data set ready. This step makes the dataset ready for analysis. After the “data preprocessing” stage the data is ready for analysis. Our aim in this stage is to remove the irrelevant data point from the data set and irrelevant data from the data set, and reduce the noise present in the dataset (Wong et al. 2019). This makes the accident data set ready for analysis. Then a different algorithm is used for finding the relation between the number of accidents, traffic to the business., and the impact of this on the business. After the “data visualization”, and “data preprocessing” the data is now ready for further analysis. These steps make the data ready for analysis and find the relation with business problems due to accidents, and traffic.
Data Preparation
Data preparation has the method of organizing column data which it has appropriate for additional processing as well as study. Essential measures contain organizing, cleaning, as well as labeling column data in a form appropriate to the machine learning algorithm as well as then analyzing and imagining the data. The data preparation has the procedure of cleaning and changing raw data since processing and research. It has a significant stage for processing data, frequently implicates reformatting data, creating modifications to data, as well as connecting datasets to enhance data (Martínez-Plumed et al. 2019). This has an approach that has been utilized to transform the column data between the sanitary data specified. In different expressions, whenever the data has been assembled from other authorities it has been assembled in a column configuration that has not possible for analysis. For gaining more useful outcomes from the used measure in Machine Learning casts the structure of the data can be in a reasonable method. Some individualized machine learning measures, for illusion, linear regression algorithm accomplishes not help invalid value, hence to accomplish a linear regression algorithm invalid values can be organized from the initial column data cluster (Romero and Ventura, 2020). Another element has that the data stage can be formatted in similar methods that are more additional than machine learning as well as deep learning algorithm has been performed in the data stage, as well as the most useful out of this has been selected.
Figure 3.1: Importing the libraries
(Source: created by the learner)
The researcher has imported the libraries in import “numpy as np” “random as rnd”, “pandas as pd”, “seaborn as sns”, “sklearn as sk”, “logisticregression”, “kmeans”, linearregression”, decisiontreeregressor”, etc.
Figure 3.2: Importing the CSV File
(Source: created by the learner)
In this picture, the researchers show the successfully important of the data set that this is showing the data. It has essential to rather map the release of the repository structures and configuration in the directive to decide much to exploit separately incoming data to work to the required of the storage method so the data can be functional for research as well as data mining (Islam et al. 2018).
Hence, the researcher has outcomes that are exhibiting data. Explanatory statistics has virtually representing the data via a technique similar to the picture models, estimates of major movement as well as
This picture showing that the data definition of big data information. In this image we see how many accident are happend and this time injured people have any kind of protection or not. Also here mentioned about the road number, junction details, nearest places etc (Arunachalam et al. 2018).
Modeling
Data modeling directs to a bunch of procedures in that considerable locations of data has integrated and interpreted to discover connections and customs. The destination of data modeling has to utilize past data to announce forthcoming endeavors. Data mining has a measure of the data modeling technique. An application defines the types of data it utilizes with representatives. A measure has a Python category that inherits the Standard category. The model class describes a recent Variety of datastore commodities as well as the effects the Type has predicted to accept (Gupta et al. 2019). The Kind expression has determined with the instantiated category representation that inherits. Data modeling has compulsory data storage for a data warehouse and has a repository for data obtained from considerable authorities, that as an equivalent and connected data in various configurations. It has essential to rather map the release of the repository structures and configuration in the directive to decide much to exploit separately incoming data to work to the required of the storage method – so the data can be functional for research as well as data mining. The data representative has then an essential enabler of analytical instruments, organizational data techniques, data mining, as well as integration by all data techniques as well as applications. In the before sets of procedure for any method, data modeling has a key requirement that every the different stage as well as sets swing on to found the footing upon that each of the schedules, procedures, and instrument count. The data representative has an ordinary language that permits met6hods to transmit via their learning and approval of the data so defined in the measure (Hossain et al. 2019). This has additionally significant than ever in the time earth of artificial intelligence or big data, cloud connectivity or machine learning, and IoT.
Thai map use for important data visulization and and also defined their risk using the colour. Heat map provide the immediate summary of any kind of informations. This map helps viewer to understand the complex data set. Data modeling can be evolved better structured as well as standardized as additional data, better databases, as well as more assortments of data can appear (Kibria et al. 2018).
Evaluation
For an absolutely authentic purpose, data modeling can be about for as extended as data processing, and data repository, as well as computer programming, although the representation itself assumably only reached between expected service about the duration that database command techniques started to develop. There has frivolity unique and creativity around the image of planning as well as architecting a unique design. Data modeling can be evolved better structured as well as standardized as additional data, better databases, as well as more assortments of data can appear. Data modeling has additional fundamentals than ever so technologists stumble over recent authorities of data by an attack of amorphous data at importance as well as acceleration that transcend the abilities of conventional techniques. There has recently been a steady request for unique techniques, creative database designs and processes, and recent data representatives to connect this unique consequence measure concurrently.
References
- Ali F, El-Sappagh S, Islam SR, Ali A, Attique M, Imran M, Kwak KS. An intelligent healthcare monitoring framework using wearable sensors and social networking data. Future Generation Computer Systems. 2021 Jan 1;114:23-43.
- Arunachalam, D., Kumar, N. and Kawalek, J.P., 2018. Understanding big data analytics capabilities in supply chain management: Unravelling the issues, challenges and implications for practice. Transportation Research Part E: Logistics and Transportation Review, 114, pp.416-436.
- Dai, H.N., Wang, H., Xu, G., Wan, J. and Imran, M., 2020. Big data analytics for manufacturing internet of things: opportunities, challenges and enabling technologies. Enterprise Information Systems, 14(9-10), pp.1279-1303.
- Gupta, S., Chen, H., Hazen, B.T., Kaur, S. and Gonzalez, E.D.S., 2019. Circular economy and big data analytics: A stakeholder perspective. Technological Forecasting and Social Change, 144, pp.466-474.
- Hariri, R.H., Fredericks, E.M. and Bowers, K.M., 2019. Uncertainty in big data analytics: survey, opportunities, and challenges. Journal of Big Data, 6(1), pp.1-16.
- Hossain, E., Khan, I., Un-Noor, F., Sikander, S.S. and Sunny, M.S.H., 2019. Application of big data and machine learning in smart grid, and associated security concerns: A review. Ieee Access, 7, pp.13960-13988.
- Islam, M.S., Hasan, M.M., Wang, X., Germack, H.D. and Noor-E-Alam, M., 2018, May. A systematic review on healthcare analytics: application and theoretical perspective of data mining. In Healthcare (Vol. 6, No. 2, p. 54). MDPI.
- Kibria, M.G., Nguyen, K., Villardi, G.P., Zhao, O., Ishizu, K. and Kojima, F., 2018. Big data analytics, machine learning, and artificial intelligence in next-generation wireless networks. IEEE access, 6, pp.32328-32338.
- Martínez-Plumed, F., Contreras-Ochando, L., Ferri, C., Hernández-Orallo, J., Kull, M., Lachiche, N., Ramírez-Quintana, M.J. and Flach, P., 2019. CRISP-DM twenty years later: From data mining processes to data science trajectories. IEEE Transactions on Knowledge and Data Engineering, 33(8), pp.3048-3061.
- Martínez-Plumed, F., Contreras-Ochando, L., Ferri, C., Hernández-Orallo, J., Kull, M., Lachiche, N., Ramírez-Quintana, M.J. and Flach, P., 2019. CRISP-DM twenty years later: From data mining processes to data science trajectories. IEEE Transactions on Knowledge and Data Engineering, 33(8), pp.3048-3061.
- Muangprathub, J., Boonnam, N., Kajornkasirat, S., Lekbangpong, N., Wanichsombat, A. and Nillaor, P., 2019. IoT and agriculture data analysis for smart farm. Computers and electronics in agriculture, 156, pp.467-474.
- Ristevski, B. and Chen, M., 2018. Big data analytics in medicine and healthcare. Journal of integrative bioinformatics, 15(3).
- Romero, C. and Ventura, S., 2020. Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), p.e1355.
- Romero, C. and Ventura, S., 2020. Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), p.e1355.
- Singh, S. and Yassine, A., 2018. Big data mining of energy time series for behavioral analytics and energy consumption forecasting. Energies, 11(2), p.452.
- Wang, J., Yang, Y., Wang, T., Sherratt, R.S. and Zhang, J., 2020. Big data service architecture: a survey. Journal of Internet Technology, 21(2), pp.393-405.
- Wang, J., Yang, Y., Wang, T., Sherratt, R.S. and Zhang, J., 2020. Big data service architecture: a survey. Journal of Internet Technology, 21(2), pp.393-405.
- Wang, Y., Kung, L. and Byrd, T.A., 2018. Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technological forecasting and social change, 126, pp.3-13.
- Wong, Z.S., Zhou, J. and Zhang, Q., 2019. Artificial intelligence for infectious disease big data analytics. Infection, disease & health, 24(1), pp.44-48.
</ul