About

This notebook isn’t really about code as much as it’s about math. As my knowledge grows (and/or refreshes) with respect to statistics, calculus and other maths, this notebook will gradually expand, and the organizational structure will shift to accommodate. All math formulas were made using my custom tool at: https://christopherball.github.io/math/asciiMath/.

Algebra

Logarithms

Trigonometry

Unit Circle Relationships

Trigonometric Function Graphs

Common Trig Values Table (Exact)

Complete Trig Values Table (Approx.)

Calculus

Derivatives

Product Rule

Formula Source:

d/dx[color(red)f(x)color(blue)g(x)]=color(red)f(x)d/dx[color(blue)g(x)]+d/dx[color(red)f(x)]color(blue)g(x)

Interpretation: When finding derivatives, often times it can be useful to chunk the expression so that it’s easier to tackle determining the derivatives of its smaller constituent parts. Key takeaway is look for expressions that contain multiplication, as they are prime candidates.


Quotient Rule

Formula Source:

d/dx[color(red)f(x)/color(blue)g(x)]=(d/dx[color(red)f(x)]color(blue)g(x)-color(red)f(x)d/dx[color(blue)g(x)])/[color(blue)g(x)]^2

Interpretation: When finding derivatives, often times it can be useful to chunk the expression so that it’s easier to tackle determining the derivatives of its smaller constituent parts. Key takeaway is look for expressions that contain division, as they are prime candidates.


Chain Rule

Formula Source:

d/dx[(f(g(x))]=f'(g(x))*g'(x)

Interpretation: When finding derivatives, often times it can be useful to chunk the expression so that it’s easier to tackle determining the derivatives of its smaller constituent parts. Key takeaway is look for expressions that can be broken down in two steps, such as a square root of x-squared, where we’d consider the outer step of square root to be f(x) and the inner step of x-squared to be g(x).


Statistics

Population Standard Deviation

Formula Source:

σ=sqrt(1/Nsum_(i=1)^N (color(red)(x_i)-color(blue)(mu))^2)

color(red)(x_i)=i^(th)"observation value"
color(blue)(mu)="population mean"
N="population size"

Summary: Population Standard Deviation (SD) is typically represented in math equations using the Greek letter sigma σ. Standard deviations apply to normal distributions and speak about deviations from the mean. The value of a standard deviation is in the same units as the response variable.

Interpretation: There is a n empirical rule (68%/95%/99.7%) corresponding to 1, 2, and 3 SD +/-. It states that approximately 68% of the given population values will fall within 1 SD +/- (that is, 1 SD above, 1 SD below), 95% within 2 SD +/-, and 99.7% within 3 SD +/-. This means that all but .03% of your normal population should fall within 3 SD.


Sample Standard Deviation

Formula Source:

s=sqrt(1/(n-1)sum_(i=1)^n (color(red)(x_i)-color(blue)(bar x))^2)

color(red)(x_i)=i^(th)"observation value"
color(blue)(bar x)="sample mean"
n="sample size"

Summary: Sample Standard Deviation is typically represented in math equations using a lowercase s, to differentiate it from the full population flavor. The only other difference worth calling out is that the final mean taken uses 1 less than the sample size, known as Bessel’s Correction.

Interpretation: Same as for Population Standard Deviation, keeping in mind that this calculation here is an estimate based on the sample.


Root Mean Square

Formula Source:

RMS=sqrt(1/nsum_(i=1)^n color(red)(x_i)^2)

color(red)(x_i)=i^(th)"observation value"
n="sample size"

Summary: Also known as the quadratic mean.

Interpretation:


Root Mean Square Error

Formula Source:

RMSE=sqrt(1/Nsum_(i=1)^N (color(red)(x_i)-color(blue)(hat x_i))^2)

color(red)(x_i)=i^(th)"observation value"
color(blue)(hat x_i)=i^(th)"prediction value"
N="population size"

Summary: RMSE is also known as the Root Mean Square Deviation (RMSD). Given a set of numeric model predictions, and their corresponding observational response values, for every prediction, we’re basically diffing how far off our prediction is, squaring that diff to eliminate negative sign issues, summing up all of these across our set, taking the mean of these values, and finally taking the square root to bring the final value back into the same scale as the predictions and observations.

Interpretation: Numbers are absolute values in the same unit as the predicted / observation values. As a model improves and matures, your RMSE should get smaller and smaller because your predictions should become closer and closer to the actual observational values (as modeled by a linear regression, etc…).


Residual Sum of Squares

Formula Source:

RSS=sum_(i=1)^n(color(red)(y_i)-color(blue)(hat y_i))^2

color(red)(y_i)=i^(th)"observation value"
color(blue)(hat y_i)=i^(th)"prediction value"
n="sample size"

Summary:

Interpretation:


Total Sum of Squares

Formula Source:

TSS=sum_(i=1)^n(color(red)(y_i)-color(blue)(bar y))^2

color(red)(y_i)=i^(th)"observation value"
color(blue)(bar y)="sample mean"
n="sample size"

Summary:

Interpretation:


R-Squared

Formula Source:

R^2=1-(RSS)/(TSS)

Summary: Otherwise known as the “Coefficient of Determination”, to be used for assessing model accuracy where only one independent variable is used in the model. If you add more independent variables, R-squared will go up which is why you should use Adjusted R-Squared instead for such situations (see below). For normal R-squared, you can think of the numerator as “explained variation” and the denominator as “total variation”.

Interpretation: Results are always between 0 and 100%, with higher generally being better. That is, high R-squared percentages signal a prediction model that likely fits (explains) the data well. There are exceptions to this rule however. It’s fairly common for R-squared results to be high, but when you look closely at the prediction curve of the model compared to the response data, there can be a pattern (bias) where certain areas of the model consistently under or over-predict response values. This is confirmed by looking at a “residuals versus fits” plot and looking for patterns (you want to see randomness).


Adjusted R-Squared

Formula Source:

R_(adj)^2=1-(1-R^2)((color(red)n-1)/(color(red)n-color(blue)k-1))

color(blue)k="number of independent variables, excluding constant"
color(red)n="sample size"

Summary: Like R-Squared, Adjusted R-Squared also indicates how well a model curve explains response values. Unlike regular R-Squared, the Adjusted version handles models that are built on top of multiple independent variables, and will penalize the score if additional independent variables don’t add to the model accuracy. Adjusted R-Squared values can’t be higher than regular R-Squared.

Interpretation: Precisely the same as normal R-Squared.


---
title: "Math Formulas Explained"
output: 
  html_notebook: 
    toc: yes
    toc_depth: 4
    toc_float: yes
---

```{css, echo = FALSE}
@media only screen and (max-width: 768px) {
  #TOC {
    display: none;
  }
}
```

## About

This notebook isn't really about code as much as it's about math. As my knowledge grows (and/or refreshes) with respect to statistics, calculus and other maths, this notebook will gradually expand, and the organizational structure will shift to accommodate. All math formulas were made using my custom tool at: <https://christopherball.github.io/math/asciiMath/>.

## Algebra

#### Logarithms

![](images/paste-E6E02BB5.png){width="550"}

## Trigonometry

#### Unit Circle Relationships

![](images/paste-965F23C4.png){width="550"}

#### Trigonometric Function Graphs

![](images/paste-876B34E8.png){width="550"}

#### Common Trig Values Table (Exact)

![](images/paste-21D5C095.png){width="550"}

#### Complete Trig Values Table (Approx.)

![](images/paste-89E5D00E.png){width="550"}

## Calculus

#### Derivatives

[**Product Rule**]{.ul}

![](images/paste-6EF66FCB.png){width="550"}

**Formula Source**:

    d/dx[color(red)f(x)color(blue)g(x)]=color(red)f(x)d/dx[color(blue)g(x)]+d/dx[color(red)f(x)]color(blue)g(x)

**Interpretation**: When finding derivatives, often times it can be useful to chunk the expression so that it's easier to tackle determining the derivatives of its smaller constituent parts. Key takeaway is look for expressions that contain multiplication, as they are prime candidates.

------------------------------------------------------------------------

[**Quotient Rule**]{.ul}

![](images/paste-BF6EEC5F.png){width="550"}

**Formula Source**:

    d/dx[color(red)f(x)/color(blue)g(x)]=(d/dx[color(red)f(x)]color(blue)g(x)-color(red)f(x)d/dx[color(blue)g(x)])/[color(blue)g(x)]^2

**Interpretation**: When finding derivatives, often times it can be useful to chunk the expression so that it's easier to tackle determining the derivatives of its smaller constituent parts. Key takeaway is look for expressions that contain division, as they are prime candidates.

------------------------------------------------------------------------

[**Chain Rule**]{.ul}

![](images/paste-5547725D.png){width="550" height="51"}

**Formula Source**:

    d/dx[(f(g(x))]=f'(g(x))*g'(x)

**Interpretation**: When finding derivatives, often times it can be useful to chunk the expression so that it's easier to tackle determining the derivatives of its smaller constituent parts. Key takeaway is look for expressions that can be broken down in two steps, such as a square root of x-squared, where we'd consider the outer step of square root to be f(x) and the inner step of x-squared to be g(x).

------------------------------------------------------------------------

## Statistics

#### Population Standard Deviation

![](images/paste-0F2BCFC2.png){width="550"}

**Formula Source**:

    σ=sqrt(1/Nsum_(i=1)^N (color(red)(x_i)-color(blue)(mu))^2)

    color(red)(x_i)=i^(th)"observation value"
    color(blue)(mu)="population mean"
    N="population size"

**Summary**: Population Standard Deviation (SD) is typically represented in math equations using the Greek letter sigma σ. Standard deviations apply to normal distributions and speak about deviations from the mean. The value of a standard deviation is in the same units as the response variable.

**Interpretation**: There is a n empirical rule (68%/95%/99.7%) corresponding to 1, 2, and 3 SD +/-. It states that approximately 68% of the given population values will fall within 1 SD +/- (that is, 1 SD above, 1 SD below), 95% within 2 SD +/-, and 99.7% within 3 SD +/-. This means that all but .03% of your normal population should fall within 3 SD.

------------------------------------------------------------------------

#### Sample Standard Deviation

![](images/paste-29CCDEBF.png){width="550"}

**Formula Source**:

    s=sqrt(1/(n-1)sum_(i=1)^n (color(red)(x_i)-color(blue)(bar x))^2)

    color(red)(x_i)=i^(th)"observation value"
    color(blue)(bar x)="sample mean"
    n="sample size"

**Summary**: Sample Standard Deviation is typically represented in math equations using a lowercase s, to differentiate it from the full population flavor. The only other difference worth calling out is that the final mean taken uses 1 less than the sample size, known as Bessel's Correction.

**Interpretation**: Same as for Population Standard Deviation, keeping in mind that this calculation here is an estimate based on the sample.

------------------------------------------------------------------------

#### Root Mean Square

![](images/paste-6D6AF2E9.png){width="550"}

**Formula Source**:

    RMS=sqrt(1/nsum_(i=1)^n color(red)(x_i)^2)

    color(red)(x_i)=i^(th)"observation value"
    n="sample size"

**Summary**: Also known as the quadratic mean.

**Interpretation**:

------------------------------------------------------------------------

#### Root Mean Square Error

![](images/paste-49B2AE09.png){width="550"}

**Formula Source**:

    RMSE=sqrt(1/Nsum_(i=1)^N (color(red)(x_i)-color(blue)(hat x_i))^2)

    color(red)(x_i)=i^(th)"observation value"
    color(blue)(hat x_i)=i^(th)"prediction value"
    N="population size"

**Summary**: RMSE is also known as the Root Mean Square Deviation (RMSD). Given a set of numeric model predictions, and their corresponding observational response values, for every prediction, we're basically diffing how far off our prediction is, squaring that diff to eliminate negative sign issues, summing up all of these across our set, taking the mean of these values, and finally taking the square root to bring the final value back into the same scale as the predictions and observations.

**Interpretation**: Numbers are absolute values in the same unit as the predicted / observation values. As a model improves and matures, your RMSE should get smaller and smaller because your predictions should become closer and closer to the actual observational values (as modeled by a linear regression, etc...).

------------------------------------------------------------------------

#### Residual Sum of Squares

![](images/paste-2E989DE2.png){width="550"}

**Formula Source**:

    RSS=sum_(i=1)^n(color(red)(y_i)-color(blue)(hat y_i))^2

    color(red)(y_i)=i^(th)"observation value"
    color(blue)(hat y_i)=i^(th)"prediction value"
    n="sample size"

**Summary**:

**Interpretation**:

------------------------------------------------------------------------

#### Total Sum of Squares

![](images/paste-AA50F75D.png){width="550"}

**Formula Source**:

    TSS=sum_(i=1)^n(color(red)(y_i)-color(blue)(bar y))^2

    color(red)(y_i)=i^(th)"observation value"
    color(blue)(bar y)="sample mean"
    n="sample size"

**Summary**:

**Interpretation**:

------------------------------------------------------------------------

#### R-Squared

![](images/paste-50B194CA.png){width="550" height="51"}

**Formula Source**:

    R^2=1-(RSS)/(TSS)

**Summary**: Otherwise known as the "Coefficient of Determination", to be used for assessing model accuracy where only one independent variable is used in the model. If you add more independent variables, R-squared will go up which is why you should use Adjusted R-Squared instead for such situations (see below). For normal R-squared, you can think of the numerator as "explained variation" and the denominator as "total variation".

**Interpretation**: Results are always between 0 and 100%, with higher *generally* being better. That is, high R-squared percentages signal a prediction model that likely fits (explains) the data well. There are exceptions to this rule however. It's fairly common for R-squared results to be high, but when you look closely at the prediction curve of the model compared to the response data, there can be a pattern (bias) where certain areas of the model consistently under or over-predict response values. This is confirmed by looking at a "residuals versus fits" plot and looking for patterns (you want to see randomness).

------------------------------------------------------------------------

#### Adjusted R-Squared

![](images/paste-6DB139DD.png){width="550"}

**Formula Source**:

    R_(adj)^2=1-(1-R^2)((color(red)n-1)/(color(red)n-color(blue)k-1))

    color(blue)k="number of independent variables, excluding constant"
    color(red)n="sample size"

**Summary**: Like R-Squared, Adjusted R-Squared also indicates how well a model curve explains response values. Unlike regular R-Squared, the Adjusted version handles models that are built on top of multiple independent variables, and will penalize the score if additional independent variables don't add to the model accuracy. Adjusted R-Squared values can't be higher than regular R-Squared.

**Interpretation**: Precisely the same as normal R-Squared.

------------------------------------------------------------------------
