nach oben

2024 | Buch

Kapitel lesen Erstes Kapitel lesen

Probability and Statistics for Machine Learning

A Textbook

verfasst von: Charu C. Aggarwal

Verlag: Springer Nature Switzerland

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book covers probability and statistics from the machine learning perspective. The chapters of this book belong to three categories:

1. The basics of probability and statistics: These chapters focus on the basics of probability and statistics, and cover the key principles of these topics. Chapter 1 provides an overview of the area of probability and statistics as well as its relationship to machine learning. The fundamentals of probability and statistics are covered in Chapters 2 through 5.

2. From probability to machine learning: Many machine learning applications are addressed using probabilistic models, whose parameters are then learned in a data-driven manner. Chapters 6 through 9 explore how different models from probability and statistics are applied to machine learning. Perhaps the most important tool that bridges the gap from data to probability is maximum-likelihood estimation, which is a foundational concept from the perspective of machine learning. This concept is explored repeatedly in these chapters.

3. Advanced topics: Chapter 10 is devoted to discrete-state Markov processes. It explores the application of probability and statistics to a temporal and sequential setting, although the applications extend to more complex settings such as graphical data. Chapter 11 covers a number of probabilistic inequalities and approximations.

The style of writing promotes the learning of probability and statistics simultaneously with a probabilistic perspective on the modeling of machine learning applications. The book contains over 200 worked examples in order to elucidate key concepts. Exercises are included both within the text of the chapters and at the end of the chapters. The book is written for a broad audience, including graduate students, researchers, and practitioners.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Probability and Statistics: An Introduction

Abstract

Machine learning builds mathematical models from which the predictions are made by learning from data samples. The predictions are naturally probabilistic because the samples only provide an incomplete view of the entire data.

Charu C. Aggarwal

Chapter 2. Summarizing and Visualizing Data

Abstract

In the modern era, statisticians and analysts are often faced with large amounts of data, which creates the need to summarize and visualize the data before analysis. Therefore, methods are required for creating easily digestible summaries and visual representations of data. The creation of easily digestible summaries and visual representations enables the analyst to obtain a better idea of the broad patterns in the data.

Charu C. Aggarwal

Chapter 3. Probability Basics and Random Variables

Abstract

Probability theory predicts the expected frequencies of specific outcomes of experiments. On the other hand, statistical methods view data as outcomes of probabilistic experiments. Therefore, there is a natural connection between probability and statistics in terms of the relationship between theory and practice.

Charu C. Aggarwal

Chapter 4. Probability Distributions

Abstract

Several families of probability distributions arise repeatedly in various machine learning settings. We refer to these probability distributions as families, because they are defined in terms of parameters.

Charu C. Aggarwal

Chapter 5. Hypothesis Testing and Confidence Intervals

Abstract

The previous chapter introduced several probability distributions, including the normal distribution, the t-distribution, and the χ²-distribution. These distributions are very important in statistics because they enable the use of a very important concept in experimental science, referred to as hypothesis testing. This method is a formal technique for evaluating the reliability of a conclusion about the population from (limited) experimental data.

Charu C. Aggarwal

Chapter 6. Reconstructing Probability Distributions from Data

Abstract

Machine learning applications often assume that the observed data is sampled from probability distributions. How can these probability distributions be reverse engineered from observed data? The main challenge is that the data analyst only has access to observed data but no a priori knowledge of the shape of the underlying probability distribution.

Charu C. Aggarwal

Chapter 7. Regression

Abstract

The regression problem works with pairs of observations \((\vec {x}_1, y_1)\), \((\vec {x}_2, y_2)\), … \((\vec {x}_n, y_n)\) in order to construct a model that maps each \(\vec {x}_i\) to y_i with a functional relationship.

Charu C. Aggarwal

Chapter 8. Classification: A Probabilistic View

Abstract

The previous chapter introduces the regression modeling problem, which predicts numeric outcomes from predictor variables. What happens in cases where the outcome variable is binary or categorical? Such a change to the regression problem definition results in an important and different problem, which is referred to as that of classification.

Charu C. Aggarwal

Chapter 9. Unsupervised Learning: A Probabilistic View

Abstract

The previous two chapters have introduced algorithms for supervised learning in which the dependent variable has a significant influence on the learned model. In unsupervised learning, this type of supervision is not available. Rather, the model learns the trends and patterns in the underlying data in terms of carefully designed summaries (models).

Charu C. Aggarwal

Chapter 10. Discrete State Markov Processes

Abstract

The probabilistic processes discussed thus far in this book for generating random variables (e.g., binomial or Poisson processes) are based on trials that are independent of one another. In other words, if multiple random variables were to be generated, the generation of each variable is an independent and identical process.

Charu C. Aggarwal

Chapter 11. Probabilistic Inequalities and Approximations

Abstract

Numerous probabilistic inequalities are used to bound probabilities of different events. These inequalities generally apply to either special forms of a random variable, or to special regions of a random variable, such as its extreme regions.

Charu C. Aggarwal

Backmatter

Titel: Probability and Statistics for Machine Learning
verfasst von: Charu C. Aggarwal
Verlag: Springer Nature Switzerland
Electronic ISBN: 978-3-031-53282-5
Print ISBN: 978-3-031-53281-8
DOI: https://doi.org/10.1007/978-3-031-53282-5

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Chapter 1. Probability and Statistics: An Introduction

Chapter 2. Summarizing and Visualizing Data

Chapter 3. Probability Basics and Random Variables

Chapter 4. Probability Distributions

Chapter 5. Hypothesis Testing and Confidence Intervals

Chapter 6. Reconstructing Probability Distributions from Data

Chapter 7. Regression

Chapter 8. Classification: A Probabilistic View

Chapter 9. Unsupervised Learning: A Probabilistic View

Chapter 10. Discrete State Markov Processes

Chapter 11. Probabilistic Inequalities and Approximations

Backmatter

Premium Partner