Skip to main content

2024 | Buch

Big Data Analytics

Theory, Techniques, Platforms, and Applications

verfasst von: Ümit Demirbaga, Gagangeet Singh Aujla, Anish Jindal, Oğuzhan Kalyon

Verlag: Springer Nature Switzerland

insite
SUCHEN

Über dieses Buch

This book introduces readers to big data analytics. It covers the background to and the concepts of big data, big data analytics, and cloud computing, along with the process of setting up, configuring, and getting familiar with the big data analytics working environments in the first two chapters. The third chapter provides comprehensive information on big data processing systems - from installing these systems to implementing real-world data applications, along with the necessary codes. The next chapter dives into the details of big data storage technologies, including their types, essentiality, durability, and availability, and reveals their differences in their properties. The fifth and sixth chapters guide the reader through understanding, configuring, and performing the monitoring and debugging of big data systems and present the available commercial and open-source tools for this purpose. Chapter seven gives information about a trending machine learning, Bayesian network: a probabilistic graphical model, by presenting a real-world probabilistic application to understand causal, complex, and hidden relationships for diagnosis and forecasting in a scalable manner for big data. Special sections throughout the eighth chapter present different case studies and applications to help the readers to develop their big data analytics skills using various big data analytics frameworks.

The book will be of interest to business executives and IT managers as well as university students and their course leaders, in fact all those who want to get involved in the big data world.

Inhaltsverzeichnis

Chapter 1. Introduction
Abstract
The world is being overrun by an unprecedented amount of data in the twenty-first century. This data comes from various sources, ranging from the subtle clicks of a mouse to the complicated data streams obtained via satellite technologies. Big data analytics is a discipline positioned to unearth priceless insights, spur innovation, and revolutionise decision-making paradigms due to the exponential growth of data. This book thoroughly introduces the complex field of big data analytics.
Chapter 2. Big Data
Abstract
This chapter introduces the fundamental concepts of big data, offering a comprehensive understanding of its definition, key characteristics, and the widely recognised 5 Vs. The multifaceted challenges associated with realising the enormous potential of big data are explored, encompassing issues related to data collection, storage, privacy, security, and the complexities of deriving value from this extensive resource. In addition, avenues for harnessing the power of big data are investigated, including applying advanced analytics and machine learning, utilising data visualisation techniques, and implementing communication strategies. Lastly, a glimpse into the future of big data is provided, shedding light on emerging trends and directions that will shape its ongoing evolution and influence across various domains.
Chapter 3. Big Data Analytics
Abstract
This chapter introduces the dynamic domain of big data analytics, illuminating its multifaceted aspects and profound significance. It commences by furnishing a comprehensive definition of big data analytics and delves into the taxonomy of this discipline, encompassing descriptive, diagnostic, predictive, prescriptive, and cognitive analytics, each underscored by its distinctive applications. Furthermore, this chapter elucidates the manifold advantages that big data analytics affords, notably its pivotal role in bolstering risk management, effecting cost reduction, facilitating informed decision-making, and catalysing advancements in product development. In parallel, it conscientiously scrutinises the challenges endemic to this field, encompassing the dearth of proficient practitioners, misconceptions, concerns about escalating data volumes, intricacies associated with tool selection, and the salient issues of data security and privacy. The essential stages inherent to big data analytics are methodically expounded to facilitate a comprehensive understanding, encompassing data acquisition, preprocessing, storage, and analysis, thereby furnishing a nuanced appreciation of the foundational principles and intricate nuances intrinsic to this pivotal discipline.
Chapter 4. Cloud Computing for Big Data Analytics
Abstract
In this chapter, the exploration unfolds within the domain of cloud computing, emphasising its instrumental role in empowering the realm of big data analytics. Commencing with a comprehensive exposition, the historical evolution of cloud computing is meticulously traced across various computing generations, culminating in its contemporary manifestation as a transformative and indispensable component of the Information Technology (IT) landscape. Subsequently, cloud computing service models are systematically elucidated in conjunction with an exhaustive examination of deployment models, including public, private, hybrid, and community clouds. Furthermore, multi-cloud strategies are explored, with an in-depth exploration of key cloud computing platforms. A thorough comparison of these renowned cloud providers is offered to aid in making well-informed decisions and provide stakeholders with the necessary knowledge to effectively use cloud computing’s promise to enhance big data analytics.
Chapter 5. Big Data Analytics Platforms
Abstract
This chapter explores big data analytics platforms by shedding light on their essential characteristics for processing and deciphering vast datasets.
Chapter 6. Big Data Storage Solutions
Abstract
Embarking on a journey through the landscape of big data storage solutions, this chapter unfolds the critical role these systems play in managing and extracting value from extensive datasets.
Chapter 7. Big Data Monitoring
Abstract
In this chapter, the reader is introduced to the pivotal domain of big data monitoring, delving into the fundamental concepts and tools essential for ensuring the seamless functioning of complex systems. The exploration begins by elucidating the different types of monitoring, namely proactive and reactive, and underscores the critical need for effective monitoring of big data systems. The chapter further dissects the components integral to monitoring, including alerts/notifications, events, logs, metrics, incidence tracking, and debugging capabilities. Providing a comprehensive overview, the narrative then outlines various available monitoring tools tailored for big data systems, ranging from DataDog and SequenceIQ to Sematext, Apache Chukwa, Nagios, Ganglia, DMon, and SmartMonit. By examining these tools, readers gain insights into the diverse functionalities and features contributing to efficient big data monitoring practices.
Chapter 8. Debugging Big Data Systems for Big Data Analytics
Abstract
This chapter unveils the intricate art of debugging big data systems for optimal analytics performance, providing a comprehensive guide to navigating real-world performance challenges. The exploration commences by delineating the critical debugging steps essential for identifying and resolving issues within big data systems. Focussing on the specific problems that can afflict these systems, such as data locality, resource heterogeneity, network issues, resource over-allocation, unnecessary speculation, and poor scheduling policies, the chapter dives into the intricacies of root cause analysis. Emphasising the importance of this analysis in the context of big data analytics, the narrative elucidates the systematic steps involved, accompanied by insightful details on tools and techniques, challenges, and considerations. The chapter explores available diagnosis tools tailored for big data systems, including Mantri, Texas Advanced Computing Centre (TACC) Stats, Data Centre Data Base (DCDB) Wintermute, and AutoDiagn, empowering practitioners to effectively diagnose and address complex issues in their analytics infrastructure.
Chapter 9. Machine Learning for Big Data Analytics
Abstract
This insightful chapter delves deeply into the enormous possibilities of using machine learning to extract meaningful insights from large amounts of data, which meticulously dissects the realm of supervised machine learning for big data analytics, unravelling the challenges inherent in its application and elucidating pre-processing methodologies essential for optimal outcomes. A comprehensive array of popular supervised machine learning algorithms is scrutinised, including Linear Regression, Logistic Regression, Decision Tree, Random Forest, Support Vector Machines, Naïve Bayes Classifier, and K-Nearest Neighbour. Transitioning seamlessly, the chapter navigates the landscape of unsupervised machine learning, shedding light on diverse techniques such as K-means Clustering, Hierarchical Clustering, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models, Principal Component Analysis, t-distributed Stochastic Neighbour Embedding (t-SNE), Apriori Algorithm, Isolation Forest, and Expectation-Maximisation. The chapter culminates by venturing into neural network algorithms, probabilistic learning fundamentals, and performance evaluation and optimisation techniques, providing a holistic panorama of machine learning paradigms tailored to the challenges of big data analytics.
Chapter 10. Real-World Big Data Analytics Case Studies
Abstract
This chapter unfolds a panoramic view across diverse sectors, unveiling the transformative impact of big data analytics on real-world challenges. The exploration commences in the government sector, where data-driven governance enhances public services, enables predictive analytics for smart city planning, fortifies security and surveillance, and even extends to election forecasting and voter analytics. Transitioning to the healthcare industry, the chapter delves into the revolutionary role of big data analytics in tailoring treatments through precision medicine and predicting and preventing disease outbreaks. The entertainment industry takes centre stage, showcasing applications such as content personalization, recommendation systems, box office predictions, revenue optimization, and audience engagement through social media analytics. The banking sector comes to life with risk assessment, credit scoring, customer relationship management, personalization, fraud detection, security, and strategic decision-making. The retail industry follows suit, emphasising inventory management, demand forecasting, customer segmentation, personalization, supply chain optimization, and in-store analytics. The chapter finally highlights the energy and utilities sector by illuminating applications in grid management, smart grids, predictive maintenance, asset optimization, energy generation, renewable integration, energy efficiency, demand response, and environmental sustainability.
Chapter 11. Big Data Analytics in Smart Grids
Abstract
In this chapter, the exploration explores applying big data analytics within the smart grid domain. The journey commences with a comprehensive examination of the smart grid concept, setting the stage for a nuanced understanding. The discourse seamlessly transitions to an in-depth analysis of various analytics types viable in smart grids, intricately detailing the essential reasons driving the need for such analytical interventions. Culminating the chapter is a practical illustration showcasing the application of big data analytics—specifically, predicting societal load demand. This example serves as a tangible demonstration of how sophisticated analytics can be wielded to gain valuable insights within the dynamic landscape of smart grids.
Chapter 12. Big Data Analytics in Bioinformatics
Abstract
This chapter introduces the intricate interplay between big data analytics and bioinformatics, providing a comprehensive perspective on leveraging large-scale genomic data. Delving into the challenges posed by big data in bioinformatics, the narrative unfolds to explore frameworks tailored for managing extensive genomic datasets and the pivotal role of biological databases. The core focus is applying big data analytics in bioinformatics, spanning the employment of Hadoop, MapReduce, and deep learning methodologies. A detailed case study exemplifies the practical implementation of variant detection in genomes, illustrating processes like data copying to HDFS, MapReduce-based data processing, and the multistep intricacies of variant calling and interpretation. This chapter serves as a roadmap by navigating the synergy between cutting-edge analytics and the intricate nuances of bioinformatics.
Metadaten
Titel
Big Data Analytics
verfasst von
Ümit Demirbaga
Gagangeet Singh Aujla
Anish Jindal
Oğuzhan Kalyon
Copyright-Jahr
2024
Electronic ISBN
978-3-031-55639-5
Print ISBN
978-3-031-55638-8
DOI
https://doi.org/10.1007/978-3-031-55639-5

Premium Partner