README
nrk-sapmi-crawler
Crawler for NRK Sapmi news bulletins that will be the basis for stopword-sami and an example search engine for content in Sami.
Will crawl news bulletins in Northern Sami, Lule Sami and South Sami.
Getting a list of article IDs to crawl
import { getList, readIfExists, calculateListAndWrite, fetchOptions } from '../index.js'
const southSami = {
id: '1.13572943',
languageName: 'Ã…arjelsaemien',
url: 'https://www.nrk.no/serum/api/content/json/1.13572943?v=2&limit=1000&context=items',
file: './lib/list.southSami.json'
}
// To change user-agent for the crawler
// fetchOptions['user-agent'] = 'name of crawler/version - comment (i.e. contact-info)'
// Bringing it all together, fetching URL and reading file, and if new content -> merging arrays and writing
Promise.all([getList(southSami.url, fetchOptions), readIfExists(southSami.file).catch(e => e)])
.then((data) => {
calculateListAndWrite(data, southSami.id, southSami.file, southSami.languageName)
})
.catch(function (err) {
console.log('Error: ' + err)
})