flowchart LR %% Main linear flow Acquire --> Tidy Tidy --> Transform %% Circular connections between Transform, Visualize, and Model Transform <--> Visualize Visualize <--> Model Model <--> Transform %% Final connections to Communicate %% Visualize --> Communicate Model --> Communicate %% Styling for clean appearance classDef default fill:#ffffff,stroke:#000000,stroke-width:2px,color:#000000,font-family:Arial,font-size:14px class Acquire,Tidy,Transform,Visualize,Model,Communicate default```
Course Overview
Welcome!
Welcome to your journey into data science for actuarial applications! This course will equip you with essential programming and data analysis skills that are fundamental to modern actuarial practice. Whether you’re calculating reserves, analyzing mortality patterns, or building pricing models, the tools and techniques you’ll learn here form the foundation of data-driven decision making in insurance and risk management.
Objectives and Timeline
This course is designed around three core principles:
- Hands-on Learning: You’ll write code from day one. Theory and practice go hand in hand.
- Real-world Relevance: Every concept is illustrated with actuarial examples.
- Progressive Complexity: Starting with basics, we gradually build complete data science workflows.
This is a hands-on course centered around coding. You’ll need your laptop in every session to follow along with live coding demonstrations and work on exercises.
Lasting for six weeks, each Tuesday session (10:15-14:00) has the following structure:
- 50% Interactive Lectures: Live coding demonstrations where we explore concepts together
- 50% Practical Lab Exercises: Hands-on problem solving based on the lectures
The exact balance will vary by topic. Some concepts need more explanation, others more practice.
Topics
Each chapter cover can be attributed to the following topics:
Introduction to Python: Python Fundamentals that are relevant crucial for the entire workflow
Introduction to NumPy: Numerical Computing with NumPy (Tidy + Transform)
Introduction to Pandas: Data Manipulation with Pandas (Tidy + Transform)
Introduction to Plotting: Visualization (univariate and multivariate plots)
Data Science pipeline: Complete Data Science Pipeline
Exercise Set 0 - Setup: Communication via Quarto
Modeling is only covered in Introduction to DS II
The field of data science offers various specializations, each focusing on different stages of the data science workflow (Figure 1). Here’s a brief overview and how this course helps prepare you:
- Data Engineer: Designs and builds robust systems for collecting, storing, and processing data at scale. This role is crucial for the
Acquire
and initialTidy
andTransform
steps, ensuring data is ready for analysis. Our course focuses more on analysis and the science of working with data. - Data Analyst: Primarily focuses on interpreting data to identify trends and draw conclusions from existing data. This role heavily emphasizes
Transform
(smaller extent alsoTidy
),Visualize
, andCommunicate
steps. This course provides strong foundations in data manipulation with Pandas and visualization, which are core skills for data analysts. - Machine Learning Engineer/Scientist: Specializes in designing, building, and deploying AI/ML models. This role focuses heavily on
Transform
(feature engineering),Model
, and deploying solutions that often integrate into communication channels. This course lays the groundwork in Python and data preparation, which are prerequisites for developing advanced ML solutions. - Data Scientist: Builds and implements statistical models and machine learning algorithms. Data scientists are involved across the entire workflow and are considered more “jack of all trades”. This course provides foundational programming and data processing skills essential for advanced modeling, with further depth explored in “Introduction to DS II”.
Here’s a more detailed breakdown of the topics:
Topic | What You’ll Learn |
---|---|
Python Fundamentals | • Variables, data types, and control structures • Functions and error handling • Object-oriented concepts for organizing code |
Numerical Computing with NumPy | • Array operations (slicing, indexing, etc.) • Vectorized operations and functions • Actuarial examples with NumPy |
Data Manipulation with Pandas | • DataFrames and wrangling tables • Grouping and aggregation • Handling time series data |
Data Visualization | • Statistical plots with Matplotlib • Visualizations with Seaborn • Univariate and multivariate plots |
Complete Data Science Pipeline | • Cookbook and steps to follow • Health Insurance Example • Principal Component Analysis for Interest rates |
Learning Outcomes
By the end of this course, you will be able to obtain both technical Skills
- Write Python programs, with a focus on solving actuarial problems
- Manipulate and analyze datasets effectively and efficiently
- Create professional visualizations for technical and non-technical audiences
- Build reproducible data analysis pipelines
As well as applied competencies allowing you to do tasks such as:
- Performing exploratory data analysis on insurance portfolios
- Automating repetitive tasks in pricing and valuation
- Calculating actuarial metrics (loss ratios, claim frequencies, reserves)
- Preparing data for advanced modeling (covered in DS II)
These are just some examples. You’re only be bound by your imagination!
Assessment
Your learning will be evaluated through a practical project that demonstrates your ability to apply course concepts to a real actuarial problem.
Project Components
Component | Weight | Description |
---|---|---|
Proposal | 20% | Define problem and approach |
Report | 40% | Complete analysis with code |
Presentation | 40% | Communicate findings effectively |
You will work in groups of 3 (maybe 4, depending on the number of participants). It’s up to you to find your group members, but there is a moodle forum to facilitate the process.
Start thinking about actuarial problems you’d like to explore:
- Mortality or morbidity analysis
- Claims frequency/severity modeling
- Portfolio risk assessment
- Pricing algorithm development
- Regulatory reporting automation
Some sample projects will be shared later during the course.
Important Dates
- October 9: Project proposal deadline
- November 12: Project report deadline
- November 14: Slides submission deadline
- November 18: Project presentations
Note that that all the submissions must be made at 23:59 (CEST) of that day.
Tech Stack
The term “tech stack” refers to the collection of technologies that are used to build a product or service. In this case, the tech stack is the collection of technologies that we use for this course.
Here’s an overall picture of how all the tools can fit together in a data science workflow:
Tool | Purpose | When You’ll Use It | Why It’s Important |
---|---|---|---|
Python | Programming language | Writing all code and analyses | The foundation - everything else builds on this |
Anaconda | Package distribution | Initial setup only | Bundles Python with essential libraries |
Jupyter Notebooks | Interactive coding | Exploratory analysis, exercises | See results immediately, mix code with notes |
VS Code | Code editor | Writing scripts, larger projects | Professional development environment |
Git/GitHub | Version control | Saving work, collaboration | Track changes, build portfolio |
Terminal | Command interface | Running scripts, managing packages | Direct computer control |
Quarto | Document creation | Reports, presentations | Combine code, results, and narrative |
Libraries | Open-source code | Data manipulation, visualization | Pre-built functions for complex tasks |
Think of these tools as your actuarial toolkit:
- Python is your calculator, with libraries for re-using existing code
- Jupyter is your scratch pad (quick and dirty analysis)
- VS Code is your professional workspace (what we refer to as production code)
- Git is your filing system (think OneDrive/iCloud)
- Quarto is your reporting tool (pdf, doc and even websites from a single file)
Website Tools
Throughout the course, you’ll encounter several interactive elements, found at the beginning of the documents:
- 🗄️ Data downloads the datasets to run the examples outside of the DSAS website
- Jupyter downloads jupyter notebook files to work locally
- Google Colab is a cloud-based alternative to local Jupyter notebooks
- 💡 Hints with you guidance when stuck
- ℹ️ Solutions to check your work after attempting exercises. These are locked until the end of the day at 00:00 (CEST)
Live Code Environment
This website features interactive code blocks that run directly in your browser:
Code you run in these interactive blocks is not saved. The environment resets when you refresh the page. For permanent work, download the Jupyter notebooks or use Google Colab.
Note: Learning to Code
Learning to program is like learning a new language, it takes practice and patience. Some advice:
- Errors are normal: Even experienced programmers encounter errors daily
- Search, search and search: If you run into an issue, most likely you’re not the only one
- Type everything: Don’t copy-paste during learning; muscle memory helps
- Experiment freely: Try changing code to see what happens
- Collaborate: Discuss approaches with classmates (but write your own code)
In technical fields, there’s always a lot to learn, there are many tools, constantly changing and evolving. With that in mind, these are only “tools” that the course introduces to you. What you choose to take away from this course for your academic and professional career is up to you. The key takeaway of this course is doing “correct” data science, building complete (or nearly complete) DS workflows and and using Python (as one way of many ways) to implement and execute your DS tasks.
While AI tools like ChatGPT can write code, it is strongly encouraged to avoid them during the learning phase. Understanding fundamentals deeply will make you a much stronger data scientist. You’re welcome to use AI tools for your final project, but first master the basics yourself.
Getting Started
Ready to begin? Your first tasks:
- Set up your environment (covered in Exercise Set 0)
- Join the course resources:
- Sign up on moodle here
- Access course materials on this website
- Download Anaconda Python distribution
- Create a GitHub account for version control
- Prepare for interactive learning:
- Bring a fully charged laptop to every session
- Be ready to code along during demonstrations
- Don’t hesitate to ask questions!
In our next session, we’ll dive into Python fundamentals. Before then:
- Complete Exercise Set 0 to set up your development environment
- Explore the course website and familiarize yourself with the interface
- Think about actuarial problems you’d like to solve with data science
Welcome aboard! Let’s begin your data science journey.