Numpy is a Python library used for working with arrays. It also has functions for linear algebra, fourier transform, and matrices. Numpy serves to provide an array object that is way faster to process than the traditional Python lists.
Useful Functions
- np
.arange(start,stop,step)- exclusive of stop
- returns ndarray of evenly spaced values
- np
.linspace(start, stop, n)- returns n evenly spaced values, inclusive of both endpoints
- use for generating x-ranges for prediction plots
- np
.where(condition, x, y)- where condition true, yield x, otherwise yield y
np.where([[True, False], [True, True]],
[[1, 2], [3, 4]],
[[9, 8], [7, 6]])
# array([[1, 8],
# [3, 4]])
df_liverpool_2223["result"] = np.where(
df_liverpool_2223.GF > df_liverpool_2223.GA, 'W',
np.where(df_liverpool_2223.GF < df_liverpool_2223.GA, 'L',
np.where(df_liverpool_2223.GF == df_liverpool_2223.GA, 'D', '')))
df_liverpool_2223["pts"] = np.where(df_liverpool_2223.result == 'W', 3,
np.where(df_liverpool_2223.result == 'D', 1, 0))- np
.quantile(a, q)- a: array, q: quantile
- useful for getting IQR
- np
.zeros(n)/ np.ones(n)— initialise result arrays for simulation loops - np
.sqrt(x), np.floor(x), np.exp(x)— element-wise math on arrays - np
.sum(arr)/ np.sum(c**2 / n_vals)— used in contrast SE calculations
Random Number Generation (L10)
# Modern API — preferred for reproducibility
rng = np.random.default_rng(seed=2137)
# Pass rng into scipy .rvs() calls:
from scipy.stats import norm, gamma, poisson, binom
X = norm.rvs(0, 1, size=100, random_state=rng)
X = gamma.rvs(2, scale=3, size=50, random_state=rng)
X = poisson.rvs(1.3, size=50, random_state=rng)
X = binom.rvs(2, 0.3, size=50, random_state=rng)- Always set seed before the simulation loop for reproducibility
rngis passed asrandom_state=rngto scipy distribution.rvs()
Simulation Pattern (L10)
rng = np.random.default_rng(2137)
output_vec = np.zeros(100) # pre-allocate result array
n = 20
lambda_ = 0.5
for i in range(100):
X = poisson.rvs(0.5, size=n, random_state=rng)
Xbar = X.mean()
s = X.std()
t = norm.ppf(0.975)
CI = [Xbar - t*s/np.sqrt(n), Xbar + t*s/np.sqrt(n)]
if CI[0] < lambda_ and CI[1] > lambda_:
output_vec[i] = 1
output_vec.mean() # estimated coverageFull reference: Numpy Glossary