Skip to content

Statistics

Sigma notation

Descriptive statistics with code in Python

Section titled “Descriptive statistics with code in Python”

Descriptive statistics

  • Divide into
    • Measures of central tendency
      • Answers “What does the middle of our data look like?”
        • Mean
        • Median
        • Mode
    • Measures of spread
      • Answers “How much does my data vary?”
        • Range and interquartile range
        • Standard deviation
        • Variance
  • Average value of a data set
  • Sum of all values divided by the number of observations
  • Define a typical value in the data set
  • Does not require calculation
  • Value that coincides with the middle of the data set
  • Value that appears the most frequently in our data
  • Intuition of the mode as the “middle” is not as immediate as mean or median, but there is a clear rationale
  • Highest weighted contributing factor to our mean
  • Range
    • Maximum - minimum value
  • Interquartile range (IQR)
    • Also called the midspread or middle 50%, or technically H-spread, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles, IQR = Q3 − Q1

Interquartile range Interquartile range example

  • Measure of the spread of your observations
  • Statement of “how much your data deviates from a typical data point”
  • Summarizes how much your data differs from the mean

Standard deviation formula

  • Square of the standard deviation, for the reason of:
    • Avoiding negative values in the sum
    • Pointing out the significance of outliers
    • Having an exponential term that allows us to find where the point of minimum deviation is
  • Usually it is enough to give mean and standard deviation, but it is good to note variance as well

Variance formula

  • Classify as 1/0 (yes/no)
  • Give probability, for example, “there is 65% probability of ‘yes’”
  • Video

Sigmoid function

  • Derivative at a point ← slope of the straight line tangent to f(x) at a chosen value x
  • Derivative of f(x) is
    • Slope of f(x)
    • Instantaneous rate of change of f(x)
  • Calculating derivative
    • f(x)=x10f(x) = x^{10}
    • f(x)=10x9f'(x) = 10x^9 ← typical derivative
    • More examples
  • Uses of derivatives
    • Find minima
    • Find maxima
    • Find inflection points

Derivative Derivative vs integral

Integral

  • S (darker colored section) ← integral of f(x) in the interval from “a” to “b”
  • abf(x)dx\int_{a}^{b}f(x)dx Integral notation
  • To calculate: we need to find equation, from which derivative would equal f(x)
  • Important formulas
    • xndx=1n+1xn+1+C\int x^n dx=\frac{1}{n+1}x^{n+1}+C ← for n1n \neq -1
      • because x1dx=10x0\int x^{-1} \, dx=\frac{1}{0}x^0 ← that is not possible
    • 1xdx=lnx+C\int \frac{1}{x} \, dx=\ln|x| + C ← can be used for n=1n=1
  • Examples:
    • x2dx=13x3+C\int x^2 \, dx=\frac{1}{3}x^3 + C
      • because (13x3)=13×3x2=x2\left( \frac{1}{3}x^3 \right)'=\frac{1}{3}\times 3x^2=x^2
      • C = any constant
    • 5x7dx=5x7dx=5×18x8+C\int 5x^7 \, dx=5\int x^7 \, dx=5\times \frac{1}{8}x^8+C
    • x2+x7x3dx=(1x+x4)dx=1xdx+x4dx=lnx+15x5+C\int \frac{x^2+x^7}{x^3} \, dx=\int \left( \frac{1}{x}+x^4 \right) \, dx=\int \frac{1}{x} \, dx+\int x^4 \, dx=\ln|x|+\frac{1}{5}x^5+C ← it’s enough to type one C
    • (5+7x14)dx=5dx+7x14dx=5x+7×43x34+C\int \left( 5+7x^{\frac{-1}{4}} \right) \, dx=\int 5 \, dx+7\int x^{\frac{-1}{4}} \, dx=5x+7 \times \frac{4}{3}x^{\frac{3}{4}}+C
      • imagine that 5=5×x05=5\times x^0 ← but it’s not normal to think like that
    • cos(x)dx=sin(x)+C\int \cos(x) \, dx=\sin (x)+C
    • sin(x)dx=cos(x)+C\int \sin(x) \, dx=-\cos(x)+C