Course Overview

Welcome!

Welcome to your journey into data science for actuarial applications! This course will equip you with essential programming and data analysis skills that are fundamental to modern actuarial practice. Whether you’re calculating reserves, analyzing mortality patterns, or building pricing models, the tools and techniques you’ll learn here form the foundation of data-driven decision making in insurance and risk management.

Objectives and Timeline

This course is designed around three core principles:

  1. Hands-on Learning: You’ll write code from day one. Theory and practice go hand in hand.
  2. Real-world Relevance: Every concept is illustrated with actuarial examples.
  3. Progressive Complexity: Starting with basics, we gradually build complete data science workflows.
ImportantBring Your Laptop!

This is a hands-on course centered around coding. You’ll need your laptop in every session to follow along with live coding demonstrations and work on exercises.

Lasting for six weeks, each Tuesday session (10:15-14:00) has the following structure:

  • 50% Interactive Lectures: Live coding demonstrations where we explore concepts together
  • 50% Practical Lab Exercises: Hands-on problem solving based on the lectures

The exact balance will vary by topic. Some concepts need more explanation, others more practice.

Topics

flowchart LR
    %% Main linear flow
    Acquire --> Tidy
    Tidy --> Transform
    
    %% Circular connections between Transform, Visualize, and Model
    Transform <--> Visualize
    Visualize <--> Model
    Model <--> Transform
    
    %% Final connections to Communicate
    %% Visualize --> Communicate
    Model --> Communicate
    
    %% Styling for clean appearance
    classDef default fill:#ffffff,stroke:#000000,stroke-width:2px,color:#000000,font-family:Arial,font-size:14px
    
    class Acquire,Tidy,Transform,Visualize,Model,Communicate default```
Figure 1: Data Science Workflow

Each chapter cover can be attributed to the following topics:

Note

Modeling is only covered in Introduction to DS II

TipData Science Career Paths

The field of data science offers various specializations, each focusing on different stages of the data science workflow (Figure 1). Here’s a brief overview and how this course helps prepare you:

  • Data Engineer: Designs and builds robust systems for collecting, storing, and processing data at scale. This role is crucial for the Acquire and initial Tidy and Transform steps, ensuring data is ready for analysis. Our course focuses more on analysis and the science of working with data.
  • Data Analyst: Primarily focuses on interpreting data to identify trends and draw conclusions from existing data. This role heavily emphasizes Transform (smaller extent also Tidy), Visualize, and Communicate steps. This course provides strong foundations in data manipulation with Pandas and visualization, which are core skills for data analysts.
  • Machine Learning Engineer/Scientist: Specializes in designing, building, and deploying AI/ML models. This role focuses heavily on Transform (feature engineering), Model, and deploying solutions that often integrate into communication channels. This course lays the groundwork in Python and data preparation, which are prerequisites for developing advanced ML solutions.
  • Data Scientist: Builds and implements statistical models and machine learning algorithms. Data scientists are involved across the entire workflow and are considered more “jack of all trades”. This course provides foundational programming and data processing skills essential for advanced modeling, with further depth explored in “Introduction to DS II”.

Here’s a more detailed breakdown of the topics:

Topic What You’ll Learn
Python Fundamentals • Variables, data types, and control structures
• Functions and error handling
• Object-oriented concepts for organizing code
Numerical Computing with NumPy • Array operations (slicing, indexing, etc.)
• Vectorized operations and functions
• Actuarial examples with NumPy
Data Manipulation with Pandas • DataFrames and wrangling tables
• Grouping and aggregation
• Handling time series data
Data Visualization • Statistical plots with Matplotlib
• Visualizations with Seaborn
• Univariate and multivariate plots
Complete Data Science Pipeline • Cookbook and steps to follow
• Health Insurance Example
• Principal Component Analysis for Interest rates

Learning Outcomes

By the end of this course, you will be able to obtain both technical Skills

  • Write Python programs, with a focus on solving actuarial problems
  • Manipulate and analyze datasets effectively and efficiently
  • Create professional visualizations for technical and non-technical audiences
  • Build reproducible data analysis pipelines

As well as applied competencies allowing you to do tasks such as:

  • Performing exploratory data analysis on insurance portfolios
  • Automating repetitive tasks in pricing and valuation
  • Calculating actuarial metrics (loss ratios, claim frequencies, reserves)
  • Preparing data for advanced modeling (covered in DS II)

These are just some examples. You’re only be bound by your imagination!

Assessment

Your learning will be evaluated through a practical project that demonstrates your ability to apply course concepts to a real actuarial problem.

Project Components

Component Weight Description
Proposal 20% Define problem and approach
Report 40% Complete analysis with code
Presentation 40% Communicate findings effectively

You will work in groups of 3 (maybe 4, depending on the number of participants). It’s up to you to find your group members, but there is a moodle forum to facilitate the process.

TipProject Ideas

Start thinking about actuarial problems you’d like to explore:

  • Mortality or morbidity analysis
  • Claims frequency/severity modeling
  • Portfolio risk assessment
  • Pricing algorithm development
  • Regulatory reporting automation

Some sample projects will be shared later during the course.

Important Dates

  • October 9: Project proposal deadline
  • November 12: Project report deadline
  • November 14: Slides submission deadline
  • November 18: Project presentations
Important

Note that that all the submissions must be made at 23:59 (CEST) of that day.

Tech Stack

The term “tech stack” refers to the collection of technologies that are used to build a product or service. In this case, the tech stack is the collection of technologies that we use for this course.

Here’s an overall picture of how all the tools can fit together in a data science workflow:

Tool Purpose When You’ll Use It Why It’s Important
Python Programming language Writing all code and analyses The foundation - everything else builds on this
Anaconda Package distribution Initial setup only Bundles Python with essential libraries
Jupyter Notebooks Interactive coding Exploratory analysis, exercises See results immediately, mix code with notes
VS Code Code editor Writing scripts, larger projects Professional development environment
Git/GitHub Version control Saving work, collaboration Track changes, build portfolio
Terminal Command interface Running scripts, managing packages Direct computer control
Quarto Document creation Reports, presentations Combine code, results, and narrative
Libraries Open-source code Data manipulation, visualization Pre-built functions for complex tasks
NoteThe Big Picture

Think of these tools as your actuarial toolkit:

  • Python is your calculator, with libraries for re-using existing code
  • Jupyter is your scratch pad (quick and dirty analysis)
  • VS Code is your professional workspace (what we refer to as production code)
  • Git is your filing system (think OneDrive/iCloud)
  • Quarto is your reporting tool (pdf, doc and even websites from a single file)

Website Tools

Throughout the course, you’ll encounter several interactive elements, found at the beginning of the documents:

  • 🗄️ Data downloads the datasets to run the examples outside of the DSAS website
  • Jupyter downloads jupyter notebook files to work locally
  • Google Colab is a cloud-based alternative to local Jupyter notebooks
  • 💡 Hints with you guidance when stuck
  • ℹ️ Solutions to check your work after attempting exercises. These are locked until the end of the day at 00:00 (CEST)

Live Code Environment

This website features interactive code blocks that run directly in your browser:

WarningImportant: Browser Code is Temporary!

Code you run in these interactive blocks is not saved. The environment resets when you refresh the page. For permanent work, download the Jupyter notebooks or use Google Colab.

Note: Learning to Code

Learning to program is like learning a new language, it takes practice and patience. Some advice:

  • Errors are normal: Even experienced programmers encounter errors daily
  • Search, search and search: If you run into an issue, most likely you’re not the only one
  • Type everything: Don’t copy-paste during learning; muscle memory helps
  • Experiment freely: Try changing code to see what happens
  • Collaborate: Discuss approaches with classmates (but write your own code)
TipIf at point you felt overwhelemed, that is completely normal!

In technical fields, there’s always a lot to learn, there are many tools, constantly changing and evolving. With that in mind, these are only “tools” that the course introduces to you. What you choose to take away from this course for your academic and professional career is up to you. The key takeaway of this course is doing “correct” data science, building complete (or nearly complete) DS workflows and and using Python (as one way of many ways) to implement and execute your DS tasks.

NoteOn AI and learning

While AI tools like ChatGPT can write code, it is strongly encouraged to avoid them during the learning phase. Understanding fundamentals deeply will make you a much stronger data scientist. You’re welcome to use AI tools for your final project, but first master the basics yourself.

Getting Started

Ready to begin? Your first tasks:

  1. Set up your environment (covered in Exercise Set 0)
  2. Join the course resources:
    • Sign up on moodle here
    • Access course materials on this website
    • Download Anaconda Python distribution
    • Create a GitHub account for version control
  3. Prepare for interactive learning:
    • Bring a fully charged laptop to every session
    • Be ready to code along during demonstrations
    • Don’t hesitate to ask questions!

In our next session, we’ll dive into Python fundamentals. Before then:

  • Complete Exercise Set 0 to set up your development environment
  • Explore the course website and familiarize yourself with the interface
  • Think about actuarial problems you’d like to solve with data science

Welcome aboard! Let’s begin your data science journey.