So it was time to study for uni exams, this meant that I needed to revise the content in all of the lectures. But wait! the lecture slides are only accessible through a convoluted website system called Canvas LMS.
- System only allows one pdf to downloaded at a time
- Each pdf requires 2 link clicks to download
- Too many mouse clicks:
40 pdf's * 2 links * 4 courses = 320 clicks!
I read somewhere that automatically browsing and extracting information from webpages (web scraping) is a legally debateable subject. However, my reasoning is that the information I’m scraping is intended to be downloaded by me anyway, so I’m not breaking any rules.
So, I thought this would be a perfect opportunity to use scraping to solve the problem.
So the implementation used a Python script which read the userId, password, course as arguments then executed the program. The stack used was:
A typical example for scraping using robobrowser looks like the following:
# Get lecture module links
Was fun to learn the basics of web scraping and this little script saved me countless hours of mindlessly downloading course content, and has saved others from the trouble too.
Full Source Code: https://github.com/benwinding/myuni-dl