Hasso-Plattner-Institut
Prof. Dr. Tilmann Rabl
 

ML Systems

Instructors

Ilin Tolovski, Ricardo Salazar Diaz, Nils StrassenburgProf. Dr. Tilmann Rabl

The course will be conducted on-site at HPI. The sessions will take place on Wednesdays, 13:30 - 15:00 in room D-E.9/10

Description

With the growth in data volume, machine learning (ML) is a preferred framework for process automation, text generation, image recognition, and many other applications. Nevertheless, the significant surge in data volume also accompanies a substantial increase in algorithms,  preprocessing operators, and tools that pose challenges to the efficient management and execution of ML models and preprocessing pipelines. In this seminar, we are mainly interested in three aspects of ML systems.

First, we are interested in the efficient execution of ML and analytics pipelines. To do so, we explore the portability of such pipelines to a datbase management system (DBMS).  We perform smart analysis of the pipeline and express it in a representation readable for the DBMS. We explore integrating a DBMS into the pipeline's native environment to take advantage of all the powerful features of data engines to process data and leverage the DBMS hardware.

Next, we are interested in efficiently managing the parameters and metadata of models in ML systems. Model management in ML systems is a crucial task and becomes increasingly complex with the trends of high model update frequencies, increasing number of individual models, and exponentially growing number of parameters per model. Employing data management techniques allows us to efficiently manage the update and storage of the ML models and the associated pipelines.

Last, we are interested in the performance of ML inference. The inference performance is crucial when a deployed ML pipeline executes many prediction tasks and has a complex architecture, e.g., DNN. These complex applications demand a lot of computing and storage resources. In this seminar, we are interested in data management techniques that lead us to a better performance of ML inference pipeline execution. 

    Project

    This seminar will be structured around working on project topics in the field of machine learning systems. Based on topic proposals provided by the teaching staff, the students work in groups of 2 to develop a project idea, implement, and evaluate it. The progress of the project is discussed in weekly meetings with one of the seminar supervisors and is presented to the seminar participants in the form of

    (1) a proposal presentation,

    (2) an intermediate presentation,

    (3) a final presentation.

    At the end of the course, the students should summarize their findings in a written report.

    Paper presentations

    In this course, the students will have the opportunity to prepare discussion sessions on state-of-the-art research in machine learning systems. This includes studying a research paper in detail, presenting it in front of the group, introducing valuable insights, and leading the following discussion. To be adequately prepared for this, we will beforehand discuss the best practices for reading, writing, and presenting scientific papers. Ideally, the papers that will be presented in our sessions would cover the related work of the chosen project topics. Every week, each student will need to summarize one of the presented papers in a one-pager.

    Grading

    • Project + report - 50%

    • Paper presentation(s) - 20%

    • Intermediate - 10%

    • Final presentation - 15%

    • Participation/Feedback 5%

    • One-pagers - pass/fail - they will not be included in the final grade

    Announcements

    • The course will be conducted on-site at HPI. The sessions will take place on Wednesdays, 13:30 - 15:00 in room D-E.9/10.

    • All seminar announcements and course materials will be shared through Moodle: HPI Moodle Course.

    • The course is limited to 12 students.

    • If you have any questions, please contact us.

     

    Instructors

    Ilin Tolovski, Ricardo Salazar Diaz, Nils StrassenburgProf. Dr. Tilmann Rabl

    The course will be conducted on-site at HPI. The sessions will take place on Wednesdays, 13:30 - 15:00 in room D-E.9/10

    Description

    With the growth in data volume, machine learning (ML) is a preferred framework for process automation, text generation, image recognition, and many other applications. Nevertheless, the significant surge in data volume also accompanies a substantial increase in algorithms,  preprocessing operators, and tools that pose challenges to the efficient management and execution of ML models and preprocessing pipelines. In this seminar, we are mainly interested in three aspects of ML systems.

    First, we are interested in the efficient execution of ML and analytics pipelines. To do so, we explore the portability of such pipelines to a datbase management system (DBMS).  We perform smart analysis of the pipeline and express it in a representation readable for the DBMS. We explore integrating a DBMS into the pipeline's native environment to take advantage of all the powerful features of data engines to process data and leverage the DBMS hardware.

    Next, we are interested in efficiently managing the parameters and metadata of models in ML systems. Model management in ML systems is a crucial task and becomes increasingly complex with the trends of high model update frequencies, increasing number of individual models, and exponentially growing number of parameters per model. Employing data management techniques allows us to efficiently manage the update and storage of the ML models and the associated pipelines.

    Last, we are interested in the performance of ML inference. The inference performance is crucial when a deployed ML pipeline executes many prediction tasks and has a complex architecture, e.g., DNN. These complex applications demand a lot of computing and storage resources. In this seminar, we are interested in data management techniques that lead us to a better performance of ML inference pipeline execution. 

      Project

      This seminar will be structured around working on project topics in the field of machine learning systems. Based on topic proposals provided by the teaching staff, the students work in groups of 2 to develop a project idea, implement, and evaluate it. The progress of the project is discussed in weekly meetings with one of the seminar supervisors and is presented to the seminar participants in the form of

      (1) a proposal presentation,

      (2) an intermediate presentation,

      (3) a final presentation.

      At the end of the course, the students should summarize their findings in a written report.

      Paper presentations

      In this course, the students will have the opportunity to prepare discussion sessions on state-of-the-art research in machine learning systems. This includes studying a research paper in detail, presenting it in front of the group, introducing valuable insights, and leading the following discussion. To be adequately prepared for this, we will beforehand discuss the best practices for reading, writing, and presenting scientific papers. Ideally, the papers that will be presented in our sessions would cover the related work of the chosen project topics. Every week, each student will need to summarize one of the presented papers in a one-pager.

      Grading

      • Project + report - 50%

      • Paper presentation(s) - 20%

      • Intermediate - 10%

      • Final presentation - 15%

      • Participation/Feedback 5%

      • One-pagers - pass/fail - they will not be included in the final grade

      Announcements

      • The course will be conducted on-site at HPI. The sessions will take place on Wednesdays, 13:30 - 15:00 in room D-E.9/10.

      • All seminar announcements and course materials will be shared through Moodle: HPI Moodle Course.

      • The course is limited to 12 students.

      • If you have any questions, please contact us.