Introduction to Data Science I

MScAS 2025 - Overview

Ilia Azizi

2025-09-23

Welcome!

Course Details

  • 🧑🏻‍🏫 Instructor: Ilia Azizi
  • đź•› Time: Tuesdays 10:15-14:00
  • 🏫 Room: Internef 122
  • 📚 Duration: 6 weeks


Course website for the material

Today’s Agenda

  • Course Overview & Objectives
  • Why Python for Actuarial Science?
  • Development Environment Setup
  • Introduction to Python


Moodle for communication

About Me

  • Finishing my PhD in Business Analytics at HEC Lausanne
  • I work on multi-modal machine learning with uncertainty quantification
  • MSc from HEC Lausanne and BSc from University of Rome Tor Vergata
  • I speak English, French, Italian, Spanish, Farsi and some Arabic
  • I love teaching, open-source development and generally building things
  • Some of my hobbies include PC gaming (& building), technology, and food science

Objectives

This course is designed around three core principles:

  1. Hands-on Learning: You’ll write code from day one. Theory and practice go hand in hand.
  2. Real-world Relevance: Every concept is illustrated with actuarial examples.
  3. Progressive Complexity: Starting with basics, we gradually build complete data science workflows.

Bring Your Laptop!

This is a hands-on course centered around coding. You’ll need your laptop in every session to follow along with live coding demonstrations and work on exercises.

Structure

Lasting for six weeks, each Tuesday session (10:15-14:00) has the following structure:

  • 50% Interactive Lectures: Live coding demonstrations where we explore concepts together
  • 50% Practical Lab Exercises: Hands-on problem solving based on the lectures
Week Date Topic Lecture Lab
1 Tue, Sep 23 Introduction to DS & Programming đź“„ Overview

đź“„ Intro to Python
📝 Setup

📝 Lab 1
2 Tue, Sep 30 Numerical Computation 📄 Intro to Numpy 📝 Lab 2
3 Tue, Oct 7 Data Wrangling 📄 Intro to Pandas 📝 Lab 3
Thu, Oct 9 Project proposal deadline (google form here) âť—
4 Tue, Oct 14 Visualization 📄 Intro to Plotting 📝 Lab 4
5 Tue, Oct 21 Data Science Workflow & Case Study 📄 Data Science Pipeline 📝 Lab 5

📝 Lab 6
6 Tue, Oct 28 Guest Lectures, Catch up & Coaching đź’¬ 1-1 group meetings
Wed, Nov 12 Project report deadline (moodle) âť—
Fri, Nov 14 Project presentation deadline (moodle) âť—
7 Tue, Nov 18 Final presentations (on site) âť—

Topics

flowchart LR
    %% Main linear flow
    Acquire --> Tidy
    Tidy --> Transform
    
    %% Circular connections between Transform, Visualize, and Model
    Transform <--> Visualize
    Visualize <--> Model
    Model <--> Transform
    
    %% Final connections to Communicate
    %% Visualize --> Communicate
    Model --> Communicate
    
    %% Styling for clean appearance
    classDef default fill:#ffffff,stroke:#000000,stroke-width:2px,color:#000000,font-family:Arial,font-size:14px
    
    class Acquire,Tidy,Transform,Visualize,Model,Communicate default

Each chapter cover can be attributed to the following topics:

Note

Modeling is only covered in Introduction to DS II

Learning Outcomes

What you will be able to do at the end of the course:

Technical Skills Applied Competencies
Write Python programs, with a focus on solving actuarial problems Performing exploratory data analysis on insurance portfolios
Manipulate and analyze datasets effectively and efficiently Automating repetitive tasks in pricing and valuation
Create professional visualizations for technical and non-technical audiences Calculating actuarial metrics (loss ratios, claim frequencies, reserves)
Build reproducible data analysis pipelines Preparing data for advanced modeling (covered in DS II)

These are just some examples

You’re only be bound by your imagination!

Career Paths

Here’s how this course prepares you for different data science roles:

flowchart LR
    %% Main linear flow
    Acquire --> Tidy
    Tidy --> Transform
    
    %% Circular connections between Transform, Visualize, and Model
    Transform <--> Visualize
    Visualize <--> Model
    Model <--> Transform
    
    %% Final connections to Communicate
    %% Visualize --> Communicate
    Model --> Communicate
    
    %% Styling for clean appearance
    classDef default fill:#ffffff,stroke:#000000,stroke-width:2px,color:#000000,font-family:Arial,font-size:14px
    
    class Acquire,Tidy,Transform,Visualize,Model,Communicate default

  • Data Engineer: Focuses on Acquire, Tidy, and Transform - building robust data systems. Our course emphasizes analysis over infrastructure.
  • Data Analyst: Emphasizes Transform, Visualize, and Communicate - interpreting data for insights. Strong foundation in Pandas and visualization from this course.
  • Machine Learning Engineer/Scientist: Focuses on Transform (feature engineering) and Model - building AI/ML solutions. This course provides essential Python and data preparation prerequisites.
  • Data Scientist: Involved across entire workflow - “jack of all trades” approach. This course builds foundational skills for advanced modeling in DS II.

Assessment

  • Practical project that demonstrates your ability to apply course concepts to a real actuarial problem.
  • Members of groups of 3 (maybe 4, depending on the number of participants). It’s up to you to find your group members, but there is a moodle forum to facilitate the process.
Component Weight Description
Proposal 20% Define problem and approach
Report 40% Complete analysis with code
Presentation 40% Communicate findings effectively


  • October 9 at 23:59 (CEST): Project proposal deadline
  • November 12 at 23:59 (CEST): Project report deadline
  • November 14 at 23:59 (CEST): Slides submission deadline
  • November 18: Project presentations

Start thinking about actuarial problems you’d like to explore:

  • Mortality or morbidity analysis
  • Claims frequency/severity modeling
  • Portfolio risk assessment

Examples of specific research questions:

  • How do mortality patterns vary across different regions in Switzerland, and what demographic factors have the most substantial impacts on life expectancy?

  • What seasonal patterns exist in automobile insurance claims, and how do weather conditions correlate with claim frequency and severity?

  • Do earthquakes follow a Poisson process, and how can an index insurance product be designed for this type of disaster?

  • How effectively can the claim frequency for Belgian Motor-TPL insurance be predicted using ML algorithms?

Some sample projects will be shared later during the course.

Tech Stack

The term “tech stack” refers to the collection of technologies that are used to build a product or service. In this case, the tech stack is the collection of technologies that we use for this course.

Tool Purpose When You’ll Use It Why It’s Important
Python Programming language Writing all code and analyses The foundation - everything else builds on this
Anaconda Package distribution Initial setup only Bundles Python with essential libraries
Jupyter Notebooks Interactive coding Exploratory analysis, exercises See results immediately, mix code with notes
VS Code Code editor Writing scripts, larger projects Professional development environment
Git/GitHub Version control Saving work, collaboration Track changes, build portfolio
Terminal Command interface Running scripts, managing packages Direct computer control
Quarto Document creation Reports, presentations Combine code, results, and narrative
Libraries Open-source code Data manipulation, visualization Pre-built functions for complex tasks

Think of these tools as your actuarial toolkit:

  • Python is your calculator, with libraries for re-using existing code
  • Jupyter is your scratch pad (quick and dirty analysis)
  • VS Code is your professional workspace (what we refer to as production code)
  • Git is your filing system (think OneDrive/iCloud)
  • Quarto is your reporting tool (pdf, doc and even websites from a single file)

Website Tools

The course website features interactive code blocks that run directly in your browser:

Important: Browser Code is Temporary!

Code you run in these interactive blocks is not saved. The environment resets when you refresh the page. For permanent work, download the Jupyter notebooks or use Google Colab.

Throughout the course, you’ll encounter several interactive elements, found at the beginning of the documents:

  • 🗄️ Data downloads the datasets to run the examples outside of the DSAS website
  • Jupyter downloads jupyter notebook files to work locally
  • Google Colab is a cloud-based alternative to local Jupyter notebooks
  • đź’ˇ Hints with you guidance when stuck
  • ℹ️ Solutions to see the answers after attempting exercises. They remain locked until the end of the lecture day at 00:00 (CEST) with a countdown timer

Note: Learning to Code

Learning to program is like learning a new language, it takes practice and patience. Some advice:

  • Errors are normal: Even experienced programmers encounter errors daily
  • Search, search and search: If you run into an issue, most likely you’re not the only one
  • Type everything: Don’t copy-paste during learning; muscle memory helps
  • Experiment freely: Try changing code to see what happens
  • Collaborate: Discuss approaches with classmates (but write your own code)

Feeling overwhelmed is normal!

Technical fields involve many evolving tools. Focus on the key takeaway: doing “correct” data science and building complete workflows using Python as one implementation approach.

On AI and learning

Avoid AI tools during learning and master the fundamentals first. You can use them for your final project after building a strong foundation.

Getting Started

Ready to begin? Your first tasks:

  1. Set up your environment (covered in Exercise Set 0)
  2. Join the course resources:
  3. Prepare for interactive learning:
    • Bring a fully charged laptop to every session
    • Be ready to code along during demonstrations
    • Don’t hesitate to ask questions!

Start thinking about your project

You have any ideas on which actuarial problems can be solved with data science? Great! You can start working on your project proposal now!

Questions?