Project Directives

Data Science 2025 Project Guidelines

For this assignment, you will work on a real data science project. The goal of the project is to go through the complete data science process to answer questions about a topic of your own choosing. You will:

  • Acquire the data, clean and wrangle it
  • Make visualizations
  • Run statistical analysis
  • Communicate the results

Evaluation:

  • 20% for the project proposal
  • 40% for the report
  • 40% for the presentation
  • The final grade is from 1 to 6 (Absent=0) and rounded to the closest 0.5.

Deliverables:

  • Report: A single PDF document or HTML file of approximately 4-8 A4 pages (not including appendix or supplementary material).
  • Presentation: the slides of your presentation.

Deadlines:

  • Project proposal: Thursday the 9th of October 2025 at 23h59 (CEST)
  • Project report: Wednesday the 12th of November 2025 at 23h59 (CEST)
  • Presentation slide: Friday the 14th of November 2025 at 23h59 (CEST)

Group members:

  • The size of groups is 3 students. Selected by the students themselves.

Content

The suggested structure is as follows:

  • An abstract: summary of the paper including the main results.
  • An introduction: context and objective of the study. This may include further sections:
  • Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.
  • Related work: Anything that inspired you, such as a paper, a website, or something we discussed in class.
  • Research questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis?
  • Dataset: Source, scraping method, wrangling, missing values and outliers.
  • Exploratory Analysis: What visualizations did you use to look at your data in different ways? Justify the decisions you made, and show any major changes to your ideas. How did you reach these conclusions?
  • Conclusion: limitations, discussion, and outlook subsections.
  • Appendix: extra elements from the paper body.
  • References: in an appropriate style (e.g., APA).

Alternatively, you can follow a more academic structure:

  • An abstract: summary of the paper including the main results.
  • An introduction: context and objective of the study.
  • A literature review: state of the art, previous works, and appropriate citations.
  • A methodology exposure: methods and models that are used in the paper.
  • Results: all the interesting results under the shape of tables, figures, interpretations.
  • Conclusion: limitations, discussion, and outlook subsections.
  • Appendix: extra elements from the paper body.
  • References: in an appropriate style (e.g., APA).

Again, you are free to adapt the structure to your needs, however, you must clearly explain the motivations behind choosing this project, the research questions, the dataset acquisition and cleaning, the exploratory analysis as well as answering your research questions alongisde a conclusion with limitations and future work.

Dataset

It’s important to select datasets that are substantial enough to allow meaningful analysis but manageable within the project’s timeframe. Below are the recommended guidelines:

Justification for These Recommendations
  • Data Cleaning and Wrangling: A dataset with a mix of variable types presents realistic challenges, such as handling missing values, outliers, and inconsistent data formats.
  • Exploratory Data Analysis (EDA): Multiple variables allow for univariate, bivariate, and multivariate analyses, helping to uncover patterns and relationships.
  • Modeling Opportunities: With diverse variable types, you can apply various modeling techniques (e.g., linear regression for numeric variables, classification algorithms for categorical outcomes).
Summary Recommendations
  • Minimum of 1,000 observations to ensure statistical validity.
  • 8 to 12 variables to provide sufficient complexity.
  • Include variable types:
    • Numeric (integer/double)
    • Categorical
    • Date/time
    • Text (character)

Presentations

Presentations will be organized on site in a 10+10-minutes format (presentation + questions). If the number of groups is large, then either the presentation time will be reduced, or, if not possible, another oral presentation session will be organized later (possibly during the exam session). Note that you only need to be present for your own presentation.

ImportantATTENTION

Not meeting the deadline (presentation and/or report) penalizes the grade by 0.1 per started hour of delay. No maximum penalty for the project report and presentation slide.