So it was time to study for uni exams, this meant that I needed to revise the content in all of the lectures. But wait! the lecture slides are only accessible through a convoluted website system called Canvas LMS.
šProblem
- System only allows one pdf to downloaded at a time
- Each pdf requires 2 link clicks to download
- Too many mouse clicks:
40 pdf's * 2 links * 4 courses = 320 clicks!
šSolution: Scrape it
I read somewhere that automatically browsing and extracting information from webpages (web scraping) is a legally debateable subject. However, my reasoning is that the information Iām scraping is intended to be downloaded by me anyway, so Iām not breaking any rules.
So, I thought this would be a perfect opportunity to use scraping to solve the problem.
šPseudo Code
Begin; |
šImplementation
So the implementation used a Python script which read the userId, password, course as arguments then executed the program. The stack used was:
- Python3
- robobrowser
šExample
A typical example for scraping using robobrowser looks like the following:
# Get lecture module links |
šConclusion
Was fun to learn the basics of web scraping and this little script saved me countless hours of mindlessly downloading course content, and has saved others from the trouble too.
Full Source Code: https://github.com/benwinding/myuni-dl