Course Overview

Welcome!

Welcome to your journey into data science for actuarial applications! This course will equip you with essential programming and data analysis skills that are fundamental to modern actuarial practice. Whether you’re calculating reserves, analyzing mortality patterns, or building pricing models, the tools and techniques you’ll learn here form the foundation of data-driven decision making in insurance and risk management.

Objectives and Timeline

This course is designed around three core principles:

Hands-on Learning: You’ll write code from day one. Theory and practice go hand in hand.
Real-world Relevance: Every concept is illustrated with actuarial examples.
Progressive Complexity: Starting with basics, we gradually build complete data science workflows.

Bring Your Laptop!

This is a hands-on course centered around coding. You’ll need your laptop in every session to follow along with live coding demonstrations and work on exercises.

Lasting for six weeks, each Tuesday session (10:15-14:00) has the following structure:

50% Interactive Lectures: Live coding demonstrations where we explore concepts together
50% Practical Lab Exercises: Hands-on problem solving based on the lectures

The exact balance will vary by topic. Some concepts need more explanation, others more practice.

Topics

flowchart LR
    %% Main linear flow
    Acquire --> Tidy
    Tidy --> Transform
    
    %% Circular connections between Transform, Visualize, and Model
    Transform <--> Visualize
    Visualize <--> Model
    Model <--> Transform
    
    %% Final connections to Communicate
    %% Visualize --> Communicate
    Model --> Communicate
    
    %% Styling for clean appearance
    classDef default fill:#ffffff,stroke:#000000,stroke-width:2px,color:#000000,font-family:Arial,font-size:14px
    
    class Acquire,Tidy,Transform,Visualize,Model,Communicate default```

Figure 1: Data Science Workflow

Each chapter cover can be attributed to the following topics:

Introduction to Python: Python Fundamentals that are relevant crucial for the entire workflow
Introduction to NumPy: Numerical Computing with NumPy (Tidy + Transform)
Introduction to Pandas: Data Manipulation with Pandas (Tidy + Transform)
Introduction to Plotting: Visualization (univariate and multivariate plots)
Data Science pipeline: Complete Data Science Pipeline
Exercise Set 0 - Setup: Communication via Quarto

Note

Modeling is only covered in Introduction to DS II

Data Science Career Paths

The field of data science offers various specializations, each focusing on different stages of the data science workflow (Figure 1). Here’s a brief overview and how this course helps prepare you:

Data Engineer: Designs and builds robust systems for collecting, storing, and processing data at scale. This role is crucial for the Acquire and initial Tidy and Transform steps, ensuring data is ready for analysis. Our course focuses more on analysis and the science of working with data.
Data Analyst: Primarily focuses on interpreting data to identify trends and draw conclusions from existing data. This role heavily emphasizes Transform (smaller extent also Tidy), Visualize, and Communicate steps. This course provides strong foundations in data manipulation with Pandas and visualization, which are core skills for data analysts.
Machine Learning Engineer/Scientist: Specializes in designing, building, and deploying AI/ML models. This role focuses heavily on Transform (feature engineering), Model, and deploying solutions that often integrate into communication channels. This course lays the groundwork in Python and data preparation, which are prerequisites for developing advanced ML solutions.
Data Scientist: Builds and implements statistical models and machine learning algorithms. Data scientists are involved across the entire workflow and are considered more “jack of all trades”. This course provides foundational programming and data processing skills essential for advanced modeling, with further depth explored in “Introduction to DS II”.

Here’s a more detailed breakdown of the topics:

Topic	What You’ll Learn
Python Fundamentals	• Variables, data types, and control structures • Functions and error handling • Object-oriented concepts for organizing code
Numerical Computing with NumPy	• Array operations (slicing, indexing, etc.) • Vectorized operations and functions • Actuarial examples with NumPy
Data Manipulation with Pandas	• DataFrames and wrangling tables • Grouping and aggregation • Handling time series data
Data Visualization	• Statistical plots with Matplotlib • Visualizations with Seaborn • Univariate and multivariate plots
Complete Data Science Pipeline	• Cookbook and steps to follow • Health Insurance Example • Principal Component Analysis for Interest rates

Learning Outcomes

By the end of this course, you will be able to obtain both technical Skills

Write Python programs, with a focus on solving actuarial problems
Manipulate and analyze datasets effectively and efficiently
Create professional visualizations for technical and non-technical audiences
Build reproducible data analysis pipelines

As well as applied competencies allowing you to do tasks such as:

Performing exploratory data analysis on insurance portfolios
Automating repetitive tasks in pricing and valuation
Calculating actuarial metrics (loss ratios, claim frequencies, reserves)
Preparing data for advanced modeling (covered in DS II)

These are just some examples. You’re only be bound by your imagination!

Assessment

Your learning will be evaluated through a practical project that demonstrates your ability to apply course concepts to a real actuarial problem.

Project Components

Component	Weight	Description
Proposal	20%	Define problem and approach
Report	40%	Complete analysis with code
Presentation	40%	Communicate findings effectively

You will work in groups of 3 (maybe 4, depending on the number of participants). It’s up to you to find your group members, but there is a moodle forum to facilitate the process.

Project Ideas

Start thinking about actuarial problems you’d like to explore:

Mortality or morbidity analysis
Claims frequency/severity modeling
Portfolio risk assessment
Pricing algorithm development
Regulatory reporting automation

Some sample projects will be shared later during the course.

Important Dates

October 9: Project proposal deadline
November 12: Project report deadline
November 14: Slides submission deadline
November 18: Project presentations

Important

Note that that all the submissions must be made at 23:59 (CEST) of that day.

Tech Stack

The term “tech stack” refers to the collection of technologies that are used to build a product or service. In this case, the tech stack is the collection of technologies that we use for this course.

Here’s an overall picture of how all the tools can fit together in a data science workflow:

Tool	Purpose	When You’ll Use It	Why It’s Important
Python	Programming language	Writing all code and analyses	The foundation - everything else builds on this
Anaconda	Package distribution	Initial setup only	Bundles Python with essential libraries
Jupyter Notebooks	Interactive coding	Exploratory analysis, exercises	See results immediately, mix code with notes
VS Code	Code editor	Writing scripts, larger projects	Professional development environment
Git/GitHub	Version control	Saving work, collaboration	Track changes, build portfolio
Terminal	Command interface	Running scripts, managing packages	Direct computer control
Quarto	Document creation	Reports, presentations	Combine code, results, and narrative
Libraries	Open-source code	Data manipulation, visualization	Pre-built functions for complex tasks

The Big Picture

Think of these tools as your actuarial toolkit:

Python is your calculator, with libraries for re-using existing code
Jupyter is your scratch pad (quick and dirty analysis)
VS Code is your professional workspace (what we refer to as production code)
Git is your filing system (think OneDrive/iCloud)
Quarto is your reporting tool (pdf, doc and even websites from a single file)

Website Tools

Throughout the course, you’ll encounter several interactive elements, found at the beginning of the documents:

🗄️ Data downloads the datasets to run the examples outside of the DSAS website
Jupyter downloads jupyter notebook files to work locally
Google Colab is a cloud-based alternative to local Jupyter notebooks
💡 Hints with you guidance when stuck
ℹ️ Solutions to check your work after attempting exercises. These are locked until the end of the day at 00:00 (CEST)

Live Code Environment

This website features interactive code blocks that run directly in your browser:

Important: Browser Code is Temporary!

Code you run in these interactive blocks is not saved. The environment resets when you refresh the page. For permanent work, download the Jupyter notebooks or use Google Colab.

Note: Learning to Code

Learning to program is like learning a new language, it takes practice and patience. Some advice:

Errors are normal: Even experienced programmers encounter errors daily
Search, search and search: If you run into an issue, most likely you’re not the only one
Type everything: Don’t copy-paste during learning; muscle memory helps
Experiment freely: Try changing code to see what happens
Collaborate: Discuss approaches with classmates (but write your own code)

If at point you felt overwhelemed, that is completely normal!

In technical fields, there’s always a lot to learn, there are many tools, constantly changing and evolving. With that in mind, these are only “tools” that the course introduces to you. What you choose to take away from this course for your academic and professional career is up to you. The key takeaway of this course is doing “correct” data science, building complete (or nearly complete) DS workflows and and using Python (as one way of many ways) to implement and execute your DS tasks.

On AI and learning

While AI tools like ChatGPT can write code, it is strongly encouraged to avoid them during the learning phase. Understanding fundamentals deeply will make you a much stronger data scientist. You’re welcome to use AI tools for your final project, but first master the basics yourself.

Getting Started

Ready to begin? Your first tasks:

Set up your environment (covered in Exercise Set 0)
Join the course resources:
- Sign up on moodle here
- Access course materials on this website
- Download Anaconda Python distribution
- Create a GitHub account for version control
Prepare for interactive learning:
- Bring a fully charged laptop to every session
- Be ready to code along during demonstrations
- Don’t hesitate to ask questions!

In our next session, we’ll dive into Python fundamentals. Before then:

Complete Exercise Set 0 to set up your development environment
Explore the course website and familiarize yourself with the interface
Think about actuarial problems you’d like to solve with data science

Welcome aboard! Let’s begin your data science journey.