Data Science Project
Professor: Filipe A. N. Verri
Email: filipe.verri@gp.ita.br
Updates
September 11, 2025: Extra materials about formalization of data handling:
- Towards Scalable Dataframe Systems
- LaraDB: A Minimalist Kernel for Linear and Relational Algebra Computation
September 8, 2025: Tests will be distributed in printed form in class. Students can finish the test at home and bring the answers in the next class on September 29, 2025.
September 4, 2025: Examples of previous year projects are now available for download: Previous Year Projects Examples. These projects can serve as inspiration and reference for your own data science project development.
September 4, 2025: I have realized that there is no need to give the test online. Therefore, the test in week 8 will be given in person, as originally planned.
Updates on our schedule (Quarter 1):
Week 1, Jul 28, No classWeek 2, Aug 4, History and Mathematical FoundationsWeek 3, Aug 11, Written Test and Fundamental ConceptsWeek 4, Aug 18, Data Science ProjectWeek 5, Aug 25, Structured DataWeek 6, Sep 1, Structured Data and Data handling- Week 7, Sep 8, Data Handling
- Week 8, Sep 15, Written Test and Project Discussions
No classes on September 22 (“Semaninha do ITA”).
We start the 2nd quarter on September 29.
July 26, 2025: Classes are confirmed to start on August 4, 2025, in room 209 at ICT Unifesp.
Course Program
Brief history of data science. Fundamental data concepts. Methodologies for data science projects. Structured data, database normalization, and tidy data. Data handling operators and their properties. Learning from data and principles of statistical learning theory. Data preprocessing tasks. Evaluation and validation of data science products.
Course Information
Important: Only graduate students are permitted to enroll in this course.
- Number of students: Approximately 20
- Course load: 3–0–0–4
- Schedule: Mondays, 8:00–11:00, starting August 4, 2025
- Classroom: Unifesp ICT, room 209 (Avenida Cesare Mansueto Giulio Lattes, n° 1201, Eugênio de Mello, São José dos Campos, SP, Brazil)
- Language: All classes will be given in English. Students are encouraged to ask questions in English, but Portuguese is also permitted. All written and oral assignments must be in English.
Prerequisites
- Advanced programming skills
- Strong statistical background
- Machine learning skills
Goals
Providing the theoretical foundation and practical concepts to develop an end-to-end data science project for an inductive task.
Teaching Methodology
Expository classes in a common classroom, using a whiteboard, slide presentations, coding examples, books, and scientific papers. Supplementary didactic materials will be available in this page. The development of the case study will occur during home study hours, including programming and scientific paper writing.
Assessment
Grading Components
- T₁, T₂: Individual written tests in the 1st quarter
- T₃: Individual written test in the 2nd quarter
- L: Group activity including:
- Writing a scientific paper (optional)
- Developing a data science product
- 30-minute presentation
Final Grade Calculation
Final grades will be calculated as:
√((T₁ + T₂ + T₃)/3 × L)
Case Study Project
Ideally, 3 groups will be formed. Each group will be responsible for a case study. Students must choose a real-world problem and develop a data science project, including:
- Data collection
- Data handling
- Inductive learning
- Validation
- Documentation
- Deployment
The results must be presented in a 30-minute presentation. Extra points will be awarded to groups that write a scientific paper about the case study. The trained models must be incorporated into a data science product, such as a web application, a mobile application, or a web service.
Bibliography
- Filipe A. N. Verri (2025). Data Science Project: An Inductive Learning Approach. Victoria, British Columbia, Canada: Leanpub. Available at https://leanpub.com/dsp.
- Nina Zumel & John Mount (2019). Practical Data Science with R.
- Hadley Wickham & Garrett Grolemund (2023). R for Data Science.
Any required extra material will be made available in this page.
Schedule
1st Quarter
Week | Topics |
---|---|
1 | Chapter 1: A brief history of data science and Mathematical foundations |
2 | Written test (60 min) and Chapter 2: Fundamental concepts |
3 | Chapter 3: Data science project |
4-5 | Chapter 4: Structured data |
6-7 | Chapter 5: Data handling |
8 | Written test (60 min) and Project discussions |
2nd Quarter
Week | Topics |
---|---|
1 | Chapter 6: Learning from data |
2 | Chapter 7: Data preprocessing |
3 | Chapter 8: Solution validation |
4 | Project discussions |
5 | Written test (60 min) and Project discussions |
6-7 | Project discussions |
8 | Presentations |
Presentation Details
At most, 3 case studies will be presented per day, with 30 minutes for each presentation and 20 minutes for questions.
A break of 1 week will be observed between the 1st and 2nd quarters.