MScAS 2025 - DSAS Lecture 2
2025-09-30
Note
Unlike Python lists, arrays contain elements of the same data type.
arange
vs linspace
np.arange(start, stop, step)
: Creates values with fixed step size
arange(0, 20, 5)
→ [0, 5, 10, 15]
(stops before 20)range()
, endpoint is excludednp.linspace(start, stop, num)
: Creates num evenly spaced values
linspace(0, 10, 5)
→ [0.0, 2.5, 5.0, 7.5, 10.0]
(includes 10)Reminder on Distributions
Why Random Numbers?
Random number generation has many usages. E.g. Monte Carlo simulations in risk assessment, Bootstrapping for confidence intervals, Synthetic data generation for testing, etc.
Interval Notation Explained
The notation [0, 1)
uses mathematical interval notation:
[
(square bracket) = inclusive (includes the value))
(parenthesis) = exclusive (excludes the value)[0, 1)
means: 0 ≤ x < 1 (includes 0, excludes 1)Which of the following is true?
Array Attributes
The attributes of a numpy array are:
ndim
: number of dimensions (axes) of an array.shape
: tuple containing the size of an array in each dimension.size
: total number of elements in an array.dtype
: gives the data type of the elements of an array.itemsize
: provides the size (in bytes) of the elements of an array.Key Differences
For an array with shape (3, 4) and dtype int64:
shape = (3, 4)
→ Structure: 3 rows × 4 columnssize = 12
→ Total elements: 3 × 4 = 12 valuesitemsize = 8
→ Each element occupies 8 bytes (int64)size × itemsize = 12 × 8 = 96 bytes
For an array with shape (2, 3, 4), what is the total size?
Note
Indexing and slicing are similar to Python lists but extended to multiple dimensions. Again, remember that indexing in Python starts at 0.
Warning
Arrays are mutable. meaning that changes modify the original (remember to use .copy()
when needed!).
For a 2D array arr
, what does arr[:, 1]
select?
Idea
How can we rearrange data without changing values?
Can you reshape an array of size 12 into shape (3, 5)?
Operator | NumPy Function | Description |
---|---|---|
+ |
np.add |
Addition |
- |
np.subtract |
Subtraction |
* |
np.multiply |
Multiplication |
/ |
np.divide |
Division |
** |
np.power |
Exponentiation |
% |
np.mod |
Modulus |
What does np.array([1, 2, 3]) * 2
produce?
Idea
Aggregations are often used for summarizing data and doing some statstical analysis.
How
You must specify the axis along which the aggregate is computed
Claim Severity Analysis
Using gamma distribution to model claim amounts (common in actuarial practice)
For array [[1, 2], [3, 4]]
, what is arr.sum(axis=0)
?
Broadcasting is NumPy’s powerful mechanism for performing operations on arrays of different shapes.
Intuition
Think of it as “stretching” smaller arrays to match larger ones, enabling element-wise operations without explicit copying.
Broadcasting Rules
What is this np.newaxis
?
Simply put, numpy.newaxis
is used to increase the dimension of the existing array by one more dimension, when used once. Thus:
Can arrays with shapes (3, 1) and (1, 4) be broadcast together?
Idea
Create boolean arrays for filtering and analysis
Operator | NumPy Function | Description |
---|---|---|
& |
np.logical_and |
Element-wise AND |
| |
np.logical_or |
Element-wise OR |
~ |
np.logical_not |
Element-wise NOT |
What does arr[arr > 5]
return?
Use arrays of indices to select multiple elements
For array [10, 20, 30, 40, 50]
, what does arr[[0, 2, 4]]
return?
Efficiency Tip
When you only need the k smallest/largest elements, use partition
instead of full sorting. It’s O(n) vs O(n log n)!
Common problem: Data that contains missing values
Standard NumPy functions propagate NaN values, potentially ruining calculations. Use NaN-safe versions when dealing with missing data!
Standard Function | NaN-safe Version | Description |
---|---|---|
np.sum |
np.nansum |
Sum ignoring NaNs |
np.mean |
np.nanmean |
Mean ignoring NaNs |
np.std |
np.nanstd |
Std deviation ignoring NaNs |
np.var |
np.nanvar |
Variance ignoring NaNs |
np.min |
np.nanmin |
Minimum ignoring NaNs |
np.max |
np.nanmax |
Maximum ignoring NaNs |
np.argmin |
np.nanargmin |
Index of min ignoring NaNs |
np.argmax |
np.nanargmax |
Index of max ignoring NaNs |
np.percentile |
np.nanpercentile |
Percentile ignoring NaNs |
np.median |
np.nanmedian |
Median ignoring NaNs |
What does np.nanmean([1, 2, np.nan, 4])
return?
NumPy arrays are the foundation of numerical computing in Python
Vectorization eliminates loops and speeds up calculations 10-100x
Broadcasting enables flexible operations on different-shaped arrays
Boolean indexing provides powerful data filtering capabilities
Aggregations along axes enable sophisticated data analysis
Integration with pandas, matplotlib, and scikit-learn creates complete workflow
One line takeaway
If you had to take away just one thing from this lecture, it would be: NumPy is not just about arrays, it’s about thinking in terms of vectorized operations and efficient numerical computing patterns that scale to real-world actuarial problems.
DSAS 2025 | HEC Lausanne