Exercise Set II - NumPy

Jupyter Google Colab 💡 Show Hints ℹ️ Show Solutions

Exercise 1

We start of by practicing array creation and some basic numpy operations.

Part A: Basics

Create the following one-dimensional NumPy array of integers: array_1 = np.array([23, 12, 23, 6, 32, 78, 3, 8, 4, 223, 12, 56, 78, 324]). Write some Python code to answer the following questions:

  1. Get the \(4\) largest values of array_1.

  2. Compute array_1 to the power of \(2\) using \(3\) different methods.

  3. Given array_1, negate all elements which are between \(23\) and \(84\), in place.

  4. Find indices of elements equal to \(12\) from array_1.

  5. Convert the one-dimensional array array_1 to a two-dimensional array with \(7\) rows.

  • Use np.sort() with array slicing for finding largest values.

  • For powers, try element-wise operations (**), NumPy functions (np.power()), and custom loops.

  • Boolean indexing with & combines conditions for filtering. Use np.where() or np.nonzero() to find indices.

  • The .reshape() method changes array dimensions.

(a)

(b)

(c)

(d)

(e)

Part B: Claims Example

Now, let’s create NumPy arrays to represent actuarial data and perform basic operations.

  1. Create a NumPy array representing annual claim amounts for a portfolio: [12500, 8900, 15600, 22000, 5400, 18900, 31200]

  2. Create a 2D array representing loss ratios for 4 policies over 3 years:

    Policy 1: [0.68, 0.72, 0.65]
    Policy 2: [0.89, 0.91, 0.85] 
    Policy 3: [0.45, 0.52, 0.48]
    Policy 4: [0.78, 0.83, 0.76]
  3. Use NumPy functions to create:

    • An array of 1000 zeros (for initializing claim counts)
    • An array of 500 ones multiplied by 1200 (base premiums)
    • An array of integers from 18 to 65 (insurable ages)
    • 10 evenly spaced discount factors between 1.0 and 0.5
  • Use np.array() to convert Python lists to NumPy arrays

  • For the 2D array, create a list of lists where each inner list represents one policy

  • For creating special arrays, look at: np.zeros(), np.ones(), np.arange(), np.linspace()

  • Remember that np.arange(start, stop) doesn’t include the stop value

  • You can multiply arrays by scalars: array * 1200

Part C: Array Attributes and info

Using the arrays created above:

  1. Print the shape, number of dimensions, and total number of elements for the loss ratios array

  2. Calculate the memory usage of the claim amounts array

  3. Check and explain the data types of all arrays

  • Array attributes you need: .shape, .ndim, .size, .dtype

  • For memory usage: .itemsize (bytes per element), .nbytes (total bytes)

  • Data types depend on the input: integers become int64, decimals become float64

  • Arrays created with np.zeros() default to float64, np.ones() also defaults to float64

Exercise 2

In this exercise, we’ll practice indexing, slicing, and boolean operations.

Part A: Functions

Write a Python function to compute averages using a sliding window over an array. That is, your function will take two arguments (or inputs): (i) an array and (ii) a sliding window size. For example, if you provide your function with the array array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) and the sliding window size \(2\) as inputs, then your function should return as output the array array([0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5]).

Now, let’s practice accessing and filtering actuarial data using various indexing techniques.

Part B: Indexing and Slicing

Using the loss ratios array from Exercise 1 parts B and C:

  1. Access the loss ratio for Policy 2 in Year 3

  2. Get all loss ratios for Policy 1 across all years

  3. Get loss ratios for all policies in Year 2

  4. Extract a subset containing the first 2 policies and first 2 years

  5. Get every other policy (1st, 3rd, etc.)

  • NumPy indexing: array[row_index, column_index]

  • Use : to select all elements along an axis: array[:, 1] gets all rows, column 1

  • Slicing syntax: start:stop:step

  • For subsets: array[0:2, 0:2] gets first 2 rows and first 2 columns

  • Step slicing: array[::2, :] gets every other row (step=2)

Part C: Booleans

  1. Find all loss ratios greater than 0.80

  2. Identify which policies have any year with loss ratio > 0.80

  3. Count how many loss ratios are between 0.60 and 0.80

  4. Create a “high risk” mask for policies with average loss ratio > 0.75

  • Boolean operations: >, <, >=, <=, ==

  • Combine conditions with & (and) and | (or): (condition1) & (condition2)

  • Use parentheses around each condition when combining

  • np.any(array, axis=1) checks if any element is True along rows

  • np.where(condition) returns indices where condition is True

  • np.sum(boolean_array) counts True values (True=1, False=0)

  • Average calculation: \(\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i\)

  • np.mean(array, axis=1) calculates average along rows (for each policy)

  • Risk assessment: Policies with high loss ratios indicate higher risk

Exercise 3

Now, let’s apply NumPy’s mathematical functions to actuarial calculations.

Part A: Premium Calculations

  1. Calculate risk-adjusted premiums for 5 policies with base premium $1000 and risk multipliers: [0.8, 1.2, 1.5, 0.9, 1.8]

  2. Compute present values for the following cashflows using 4% annual discount rate:

    • $5000 in year 1
    • $7500 in year 2
    • $10000 in year 3
  3. Calculate monthly compound interest for initial amount $100,000 at 6% annual rate over 5 years

  • Vectorized multiplication: array1 * array2 multiplies element by element

  • Array arithmetic: array + value, array - value

  • Present value formula: \(PV = \frac{FV}{(1 + r)^n}\) or \(PV = FV \times (1 + r)^{-n}\)

    • Where: \(FV\) = future value, \(r\) = discount rate, \(n\) = number of periods
  • Use np.array() to create arrays of cashflows and years

  • Compound interest formula: \(A = P(1 + r)^n\)

    • Where: \(A\) = final amount, \(P\) = principal, \(r\) = interest rate, \(n\) = periods
  • For monthly compounding: monthly rate = annual rate / 12, periods = years × 12

  • np.arange(1, months+1) creates array [1, 2, 3, …, months]

Part B: Statistical Analysis

Using the claim amounts from Exercise 1:

  1. Calculate mean, median, standard deviation, minimum, and maximum

  2. Find the 25th, 50th, and 75th percentiles

  3. Calculate the coefficient of variation (std/mean)

  4. Identify claims that are more than 1 standard deviation above the mean

  • Basic statistics: np.mean(), np.median(), np.std(), np.min(), np.max()

  • Percentiles: np.percentile(array, [25, 50, 75]) gives 25th, 50th, 75th percentiles

  • Coefficient of variation: \(CV = \frac{\sigma}{\mu}\) (standard deviation / mean)

    • Measures relative variability; useful for comparing risk across different claim sizes
  • Outlier detection: Values beyond \(\mu \pm k\sigma\) (often \(k=1, 2, or 3\))

    • For this exercise: threshold = \(\mu + \sigma\) (mean + 1 standard deviation)
  • len(array) gives the number of elements

  • Boolean indexing: array[array > threshold] selects elements meeting condition

Exercise 4

In this part, we’ll focus on a a practical actuarial application by creating a simple life insurance premium calculation system.

You need to calculate annual premiums for term life insurance policies. Given:

  • Face amounts: [$100,000, $250,000, $500,000, $1,000,000]

  • Ages: [25, 35, 45, 55]

  • Base mortality rates: [0.001, 0.002, 0.005, 0.012] (annual probability of death)

  • Interest rate: 3% annually

  • Policy term: 20 years

  • Loading factor: 20% (to cover expenses and profit)

Now, given the information above, do the following:

  1. Create arrays for all the given data

  2. Calculate the expected present value of death benefits for each policy

  3. Calculate the expected present value of a $1 annuity for 20 years at each age

  4. Calculate the net annual premium (expected PV of benefits / expected PV of annuity)

  5. Calculate the gross annual premium (net premium × loading factor)

  6. Create a summary table showing age, face amount, net premium, and gross premium

  • Start by creating arrays for all input data using np.array()

  • Expected Present Value of Death Benefits: \(EPV_{benefits} = \sum_{t=1}^{n} {}_{t-1}p_x \cdot q_{x+t-1} \cdot v^{t-0.5}\)

    • Where: \({}_{t-1}p_x\) = survival probability, \(q_{x+t-1}\) = death probability, \(v^{t-0.5}\) = discount factor
  • Expected Present Value of Annuity: \(EPV_{annuity} = \sum_{t=1}^{n} {}_{t-1}p_x \cdot v^{t-1}\)

  • Net Premium Formula: \(P_{net} = \frac{Face Amount \times EPV_{benefits}}{EPV_{annuity}}\)

  • Gross Premium: \(P_{gross} = P_{net} \times (1 + loading)\)

  • Survival probability: \({}_{t-1}p_x = (1-q)^{t-1}\)

  • Death probability in year t: \({}_{t-1}p_x \times q\)

  • Present value discounting: \(v^t = (1 + i)^{-t}\) where \(i\) is interest rate

  • Use np.sum() to add up present values across all years

  • Broadcasting: face_amounts[:, np.newaxis] creates column vector for matrix operations

Exercise 5

In this part, we’ll focus on some advanced array operations, namely reshaping, broadcasting, and advanced manipulations.

Part A: Portfolio Analysis

You have quarterly premium data for 3 product lines over 2 years (8 quarters total):

Product A: [100, 120, 110, 130, 125, 140, 135, 150] (in thousands)
Product B: [200, 180, 220, 210, 190, 230, 225, 240]  
Product C: [50, 60, 55, 65, 58, 70, 68, 75]
  1. Create a 2D array with products as rows and quarters as columns

  2. Reshape the data to show annual totals (2 years × 3 products)

  3. Calculate quarterly growth rates for each product

  4. Find the best and worst performing quarters for each product

  • Create 2D array with np.array([product_a, product_b, product_c])

  • For annual totals: use .reshape(3, 2, 4) to get (products, years, quarters)

  • Then use np.sum(axis=2) to sum over quarters

  • Growth rate formula: \(Growth\,Rate = \frac{New\,Value - Old\,Value}{Old\,Value} \times 100\%\)

  • Year-over-year growth: \(YoY = \frac{Year_2 - Year_1}{Year_1} \times 100\%\)

  • For quarterly growth: compare each quarter to the previous one

  • np.argmax() and np.argmin() find indices of max/min values

  • Convert quarter index to readable format: Q{(index%4)+1}Y{(index//4)+1}

Part B: Broadcasting

Apply broadcasting to calculate insurance premiums across multiple dimensions:

  • 3 age groups: [25, 40, 55]

  • 4 coverage amounts: [$100k, $250k, $500k, $1M]

  • Base rate: $2 per $1000 of coverage

  • Age multipliers: [0.8, 1.0, 1.5]

Calculate the premium matrix showing all combinations.

  • Premium calculation formula: \(Premium = \frac{Coverage}{1000} \times Base\,Rate \times Age\,Multiplier\)

  • Broadcasting principle: Arrays with compatible shapes can be operated on together

    • Compatible shapes: (3,1) and (1,4) → result (3,4)
  • Reshape arrays: ages.reshape(-1, 1) makes column vector, coverage.reshape(1, -1) makes row vector

  • Base premium calculation: (coverage / 1000) * base_rate

  • Broadcasting automatically expands arrays to compatible shapes

  • Final operation: base_premiums * age_multipliers_column

  • Result will be 3×4 matrix (3 ages × 4 coverage amounts)

  • Use nested loops and f-string formatting for nice table output

Summary

These exercises covered the essential NumPy skills for actuarial science:

  1. Array Creation: Using appropriate data structures for actuarial data
  2. Indexing & Slicing: Accessing specific policies, years, or conditions
  3. Boolean Operations: Filtering high-risk policies and claims
  4. Mathematical Operations: Premium calculations and present value computations
  5. Statistical Analysis: Risk assessment and portfolio analysis
  6. Advanced Operations: Reshaping data and using broadcasting for multi-dimensional calculations