nrk-sapmi-crawler

Crawler for NRK Sapmi news bulletins that will be the basis for Sami stopword lists and an example search engine for content in Sami.

Usage no npm install needed!

<script type="module">
  import nrkSapmiCrawler from 'https://cdn.skypack.dev/nrk-sapmi-crawler';
</script>

README

nrk-sapmi-crawler

Crawler for NRK Sapmi news bulletins that will be the basis for stopword-sami and an example search engine for content in Sami.

Will crawl news bulletins in Northern Sami, Lule Sami and South Sami.

Getting a list of article IDs to crawl

import { getList, readIfExists, calculateListAndWrite, fetchOptions } from '../index.js'

const southSami = {
 id: '1.13572943',
 languageName: 'Ã…arjelsaemien',
 url: 'https://www.nrk.no/serum/api/content/json/1.13572943?v=2&limit=1000&context=items',
 file: './lib/list.southSami.json'
}

// To change user-agent for the crawler
// fetchOptions['user-agent'] = 'name of crawler/version - comment (i.e. contact-info)'

// Bringing it all together, fetching URL and reading file, and if new content -> merging arrays and writing
Promise.all([getList(southSami.url, fetchOptions), readIfExists(southSami.file).catch(e => e)])
 .then((data) => {
   calculateListAndWrite(data, southSami.id, southSami.file, southSami.languageName)
 })
 .catch(function (err) {
   console.log('Error: ' + err)
 })