README
uvic-course-scraper
UVic Course Scraper is a Node.js library that parses information from University of Victoria (UVic) course calandar and course schedule information sources. It uses Cheerio under the hood to parse HTML.
As a developer, you would use this to parse HTML and JSON from Kuali and BAN1P which would be retrieved by any method like fetch etc.
Install
npm install @vikelabs/uvic-course-scraper
API
The following table provides descriptions of the methods available on the object generated by UVicCourseScraper()
.
| Method | Description |
|--------|-------------|
|getAllCourses()
|Returns KualiCourseCatalog[]
with all active courses in the Kuali catalog|
|getCourseDetails(pid: string)
|Returns KualiCourseItem
with details for the course with the given pid|
|getCourseSections(subject: string, code: string, term: string)
|Returns ClassScheduleListing[]
with section details for all sections of the course in the given term|
|getSectionSeats(term: string, crn: string)
|Returns DetailedClassInformation
with the seats
and waitListSeats
for the course|
Example
const { UVicCourseScraper } = require('@vikelabs/uvic-course-scraper');
// get all courses from the Kuali course catalog
const allCourses: KualiCourseCatalog[] = await UVicCourseScraper.getAllCourses();
const courseTitle: string = allCourses[0].title;
// get course details for course with pid 'ByS23Pp7E' (in this case thats ACAN 225)
const courseDetails: KualiCourseItem = await UVicCourseScraper.getCourseDetails('ByS23Pp7E');
const courseDescription: string = courseDetails[0].description;
const courseLectureHours: string = courseDetails[0].hoursCatalogText.lecture;
// get course sections for CSC 111 in spring 2021
const courseSections: ClassScheduleListing[] = await UVicCourseScraper.getCourseSections('202101', 'CSC', '111');
const courseSectionCode: string = courseSections[0].sectionCode;
// get seats for course section with CRN 10953 in spring 2021 (in this case thats ECE 260 - A01)
const sectionSeats: DetailedClassInformation = await UVicCourseScraper.getSectionSeats('202101', '10953');
const sectionTotalSeats: number = sectionSeats.seats.capacity;
Developing
- Clone the repo:
git clone https://github.com/VikeLabs/uvic-course-scraper.git
- Run
npm install
- Optionally, experiment with
example.ts
usingnpx ts-node-dev src/example/example.ts
to get a feel for how cheerio and RegEx works on the type of sites our project is scraping. - Find an unassigned task on ZenHub to work on.
- Create a new branch using
git checkout -b <branch-name>
(make sure it's up to date withmaster
) - Commit the changes you've made and push to GitHub to create a Pull Request.
Testing
This project uses Jest testing framework. You can execute tests by running npm test
.
This will execute tests using Jest files with the extension *.test*
.
npx jest --watch
will put Jest into watch mode, which will execute tests as files change.
Developer Tools
This repository contains a CLI to make development related tasks easier.
npm run dump -- --term 202009 --type courses
- Dumps the course details for the
202009
term. - Outputs to a
courses.json
file.
npm run dump -- --term 202009 --type schedules
- Dumps the schedule details for all
202009
term classes. - This schedule details corresponds to the
Class Schedule Listing
page view on BAN1P. - This command can only be run after dumping courses data.
npm run dump -- --term 202009 --type class --crn 10953
- Dumps the HTML of a "Detailed Class Information" page for a given term and CRN.
npm run dump -- --term 202009 --type sections
- Dumps the section details for all
202009
term classes by crn. - This command can only be run after dumping schedules data.
Target Pages
The following are some of the pages we are currently parsing.
Schedule Information (BAN1P)
Class Schedule Listing
Class Schedule Listing - ECE 260 - 202009
This is where all the information for a specific class will be parsed such as when the term is, location, CRN, etc. You can change the query string parameters term_in
, subj_in
, and crse_in
to anything you'd like to view other class listings. For example, 202101
, CHEM
, and 101
could be put in the respective locations.
Detailed Class Information
This is where all the information for a specific section of a class will be parsed such as the class and waitlist capacity. You can change the parameters term_in
, crn_in
, to anything you'd like to view other class listings. For example, 202101
and 12345
could be put in the respective locations.
Course Information (Kuali)
The course information from this source is mostly in JSON
already so this library does not do much and is mainly used to create a list of courses for other processes. However, there is some parsing done. The preAndCorequisites
field is HTML
so we intend to parse this.
This is the JSON
file which contains basic information about every course being offered and some courses that were offered recently.
To get more detailed information about a course, one much make another request using the pid
value from the above JSON
This contains detailed information about a class like:
- Description
- Requirements
- Pre and co-requisites