Machine Learning Course

Created: 29.1.2023, Edited: 29.1.2023, Edited: 14.3.2023, Edited: 26.6.2023

In this post I describe my machine learning course which I developed during my teacher studies with the purpose of improving my teaching on YouTube.

Figure 1: Machine Learning Course Thumbnail image

Context

I already have experience with creating online courses. One that I'm particularly proud of is my Self-driving car - Javascript course. It has cumulated about 100,000 views on my channel and almost 2 million views on the FreeCodeCamp channel: https://youtu.be/Rs_rAxEsAvI. It has many positive reviews and an active engagement (~ 10% of the people watching it have seen it all the way). One thing that course is lacking however, is interaction with the students and giving them feedback. Because the number of students participating online is quite large, I initially thought about a way of giving assignments and automatic evaluation. For that I took this as an example the Full Stack Open course offered by the University of Helsinki through Open University. I pitched the idea to my tutor, Kimmo and he approved it can be an interesting case study, so I started looking into how that open course is implemented in practice. As I was doing so, I became less enthusiastic about the impersonal way of providing feedback (automatic evaluation) and I started shifting my viewpoint.

In October, last year, I met with my tutor and asked what he thinks about a more personal way for massive-evaluation, including Discord as the discussion platform. Because the number of students can be quite high, I speculated that more advanced students can potentially help the novices out of good will which can be of great help. I believe this to be the case because some interaction is already happening on my Discord server [3] naturally. Perhaps more happens if the course is planned with interaction in mind... We agreed that this more personal approach is the way to go. Another thing we discussed is the possibility to keep the course open all the time, while allowing them to interact on the platform, which shouldn't be a problem. I wanted students to feel they are really taking part of a course so I asked them to help me generate the data we'll be working with:


My hope with this is to have them feel included and because their data ispart of the course it feels personalized and the course already starts with a pre-task (interaction).

Learning Outcomes

In this course, I demonstrate how to implement a fully working machine learning system able to recognize drawings. The software is built without using libraries because it's the best way to study all inner workings of a such a system and students can practice modern software development practices as well. The learning outcomes can be split in three. Those corresponding to software engineering, those related to mathematics and those to machine learning:

Software Engineering

  • JavaScript Programming Language [Lessons: 1, 2, 3, 4, 5 the chart lesson and homeworks ≥ 2, 6, 7, 8]
  • Web Application Development [Lessons: 1, 2, 3, 4, 5 the chart lesson and homeworks ≥ 2, 6, 7, 8]
  • Back-end Development [Lessons: 2, 3, 4, 5 and homeworks ≥ 3,6, 8]
  • Event-based Programming [Lessons: 1, 4 the chart lesson and the homework]
  • Modular Programming [Lessons: 1, 2 and the chart lesson]
  • Data Processing and Visualization [Lessons: 1, 2, 3, 5 the chart lesson and homeworks ≥ 36, 7, 8]
  • Code Reusability [Lessons: 2, 3, 9]
  • Software Testing [Homework: 1]

Mathematics

Machine Learning

  • Feature Extraction [Lessons: 3, 4; Homeworks: 3, 4]
  • Data Scaling [Lesson: 5; Homework: 4]
  • Nearest Neighbor Classification [Lesson: 4]
  • Model Evaluation [Lesson: 7]
  • Decision Boundary Plots [Lesson: 8]

Students will learn the concepts above by coding along. In addition, they will be able to demonstrate their understanding of the topics by doing the homework task(s) presented at the end of each lesson. The learning outcomes considered in each lesson are specified above (they will be updated as the course progresses). The first homework is already out, it is about Software Testing (students participate by filling in this spreadsheet). The second homework is to style the data viewer page in a different way and share updates on Discord [3]. The third homework is to extract 2 other features from the data. The chart lesson has a more demanding homework described here. Homework four is to account for rotation when computing the width and height features. Homework 5 is to implement standardization and apply it on the data. The 'reward' for doing these homeworks is that selected submissions (quickest or best ones) will be showcased in Phase 2 of the course.

YouTube

Before I discuss my considerations when making a YouTube course, it is important to note that the success of a YouTube video depends largely on the algorithm [1]. Even if you have the best video in the world, if YouTube doesn't decide to share it, it will remain unseen. The algorithm works in mysterious ways: it changes constantly as users interact with the platform. If YouTube decides to share a video today, it may not decide to do it tomorrow and so on. However, some patterns are clear. For example, when a new video is uploaded, the first minutes are important. If people engage with it in a positive way (like, comment, subscribe), YouTube will show the video to more people and so on.

This algorithm poses a problem when publishing a course on YouTube. The first video is usually well-received, however, the second video in a series typically does significantly worse. This is because the second video will likely be presented to people who haven't seen the first one and the immediate action then is to close it. Even if the viewer went to see video 1, the signal that video 2 is not interesting has already been sent and therefore, the damage was done.

Table 1. View count of each lesson on 14.3.2023.
LessonView Count
16.5K
23.0K
31.9K
Chart2.2K
41.2K
5726
(updated 26.6.2023)
LessonView Count
111K
24.4K
32.8K
Chart3.6K
41.9K
51.1K
61.2K
71.2K
81K
91.2K

This observation is important. It means that YouTube is not the best platform for publishing courses = video lectures that are related to each other. I counteracted this in a number of ways. First, I kept the data collected in the pre-task a secret in video 1. I will reveal it in video 2 (scheduled on 3.2.2023). I hope that it will gather enough interest from those who did the pre-task to give YouTube a positivbe signal. I speculate video 2 may actually do better than video 1, because when students notice their drawings they may be inclined to comment more. I was wrong, partially, at least. The video is currently the second most viewed in the series with 3K views compared to lesson 1 at 6.5K views. Another technique I used is including a standalone video in the middle of the course. That video will teach how to implement a general chart component. Users who land on this video, won't feel the need to click away. Moreover, that standalone video will promote the machine learning course, so in the chance that it becomes a 'hit' there's a double benefit in a sense. The downwards trend at the moment is noticeable (see Table 1) except for the Chart lesson that does not follow it. I speculate this lesson to gain more attention in time, especially after I publish the course on the FreeCodeCamp (planned to happen in April), because that version does not have this optional lesson, and those students who want to learn more will visit my channel for this specific vido. Finally, the course can roughly be divided in two parts: software engineering (beginning) and machine learning (second half) where we study learning methods using the tools we built in the first part. This means that there is a topical shift somewhere half way, and people who are not interested in building the tools (presumably because they know how they work already) may start watching half-way through the course. No evidence of this so far.

This was not the only way to approach publishing a course on YouTube. Alternatively, the entire course could have been presented as a single, long video. I disagree with the popular opinion that short videos are better [2]. I think the reason they perform better is because they are usually better quality: it is much easier to make a high-quality short video than a high-quality long video. I base this claim on the fact that the best-performing video on my channel is 12 hours long. On YouTube, long videos can be split into chapters, so they feel less overwhelming, and there are ways to skip to a given chapter if wanted. Moreover, some viewers complained when I published the first video, saying they won't follow along because they need to wait between the lectures. This lost me potential viewers. However, with the long video strategy, videos on the channel will appear less frequently (every few months instead of every week). This impedes channel growth especially for small channels like my own.

In conclusion, YouTube is not ideal for publishing courses, because it promotes each video independently, and video lectures usually relate to each other. However, by organizing the course to have some independent videos and shifting the topic in just the right way the course can become an overall success [remains to be seen...].

Student Participation

The participation at the moment (14.3.2023) is as follows. The pre-task (collecting the data = drawing the 8 items) has been completed by a total of 654 students, which is amazing! The homework participation, on the other hand, is well below my expectations (see Table 2). The only homework attempted at the moment is Homework 1 which involves software testing, and none of the coding tasks were attempted. Asking students on Discord [3] for a reason, I learned they either have trouble finding time for doing the tasks or the tasks are too difficult. I believe both these to be valid reasons: the former because this course is entirely voluntary and there are no credits / certificates associated with it and the later... in hindsight, I believe it to be true. All I can say is that it is hard for me to estimate the level of my YouTube audience (I have been told by other teachers that I publish advanced content when I consider it intermediate or below). This causes some difficulties now, when preparing Phase 2 of the course (to air sometime at the end of May). For that, I planned to use student submissions as examples and as a way to make the course even more interactive, but at the moment this doesn't seem realistic. I believe, however, that the situation can still change, especially after submitting the course to FreeCodeCamp in April. Then significantly more students will gain access to it, increasing the probability of participation overall. I plan to reassess the situation in May.

Table 2. Student participation at each homework on 14.3.2023.
HomeworkSubmissions
Pre-task654
16
20
Chart0
30
40
(updated 26.6.2023 after publishing on FreeCodeCamp).
HomeworkSubmissions
Pre-task716
138
21
Chart0
31
41
50
60
71
81

For a detailed analysis of homework submissions, please watch Phase 2 of the course, starting HERE.


REFERENCES
  1. Arthurs, Jane, Sophia Drakopoulou, Alessandro Gandini. 2018. "Researching YouTube" Convergence 24, no. 1 (2018): 3-15.
  2. https://greenbuzzagency.com/short-video-vs-long-video-optimizing-video-length
  3. https://discord.gg/tJh3bfWq