README
😎 @web-master/node-web-fetch 😎
Fetch web data as easy as possible
Description
It is the combination of @web-master/node-web-crawler and @web-master/node-web-scraper.
It can:
- FETCH
- SCRAPE
- It scrapes the specific page
- It gathers data from the page according to the
ScrapeConfig
- CRAWL
- It scrapes the specific page and gathers links
- It crawls the links and scrapes each page of the link
- It gathers data from each page according to
CrawlConfig
- SCRAPE
Installation
$ npm install --save @web-master/node-web-fetch
Usage
Single Page Scraping
Basic
import fetch from '@web-master/node-web-fetch';
const data = await fetch({
target: 'http://example.com',
fetch: {
title: 'h1',
info: {
selector: 'p > a',
attr: 'href',
},
},
});
console.log(data);
// {
// title: 'Example Domain',
// info: 'http://www.iana.org/domains/example'
// }
puppeteer
)
Waitable (by using import fetch from '@web-master/node-web-fetch';
const data = await fetch({
target: 'http://example.com',
waitFor: 3 * 1000, // wait for the content loaded! (like single page apps)
fetch: {
title: 'h1',
info: {
selector: 'p > a',
attr: 'href',
},
},
});
console.log(data);
// {
// title: 'Example Domain',
// info: 'http://www.iana.org/domains/example'
// }
Multi Pages Crawling
You Know the target urls already
import fetch from '@web-master/node-web-fetch';
const pages = await fetch({
target: [
'https://example1.com',
'https://example2.com',
'https://example3.com',
],
fetch: () => ({
title: 'h1',
}),
});
console.log(pages);
// [
// { title: 'An easiest crawling and scraping module for NestJS' },
// { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
// { title: '[Experimental] React SSR as a view template engine' }
// ]
You Don't Know the Target Urls so Want to Crawl Dynamically
import fetch from '@web-master/node-web-fetch';
const pages = await fetch({
target: {
url: 'https://news.ycombinator.com',
iterator: {
selector: 'span.age > a',
convert: (x) => `https://news.ycombinator.com/${x}`,
},
},
fetch: () => ({
title: '.title > a',
}),
});
console.log(pages);
// [
// { title: 'An easiest crawling and scraping module for NestJS' },
// { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
// ...
// ...
// { title: '[Experimental] React SSR as a view template engine' }
// ]
puppeteer
)
Waitable (by using import fetch from '@web-master/node-web-fetch';
const pages = await fetch({
target: {
url: 'https://news.ycombinator.com',
iterator: {
selector: 'span.age > a',
convert: (x) => `https://news.ycombinator.com/${x}`,
},
},
waitFor: 3 * 1000, // wait for the content loaded! (like single page apps)
fetch: () => ({
title: '.title > a',
}),
});
console.log(pages);
// [
// { title: 'An easiest crawling and scraping module for NestJS' },
// { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
// ...
// ...
// { title: '[Experimental] React SSR as a view template engine' }
// ]
TypeScript Support
import fetch from '@web-master/node-web-fetch';
interface HackerNewsPage {
title: string;
}
const pages: HackerNewsPage[] = await fetch({
target: {
url: 'https://news.ycombinator.com',
iterator: {
selector: 'span.age > a',
convert: (x) => `https://news.ycombinator.com/${x}`,
},
},
fetch: () => ({
title: '.title > a',
}),
});
console.log(pages);
// [
// { title: 'An easiest crawling and scraping module for NestJS' },
// { title: 'A minimalistic boilerplate on top of Webpack, Babel, TypeScript and React' },
// ...
// ...
// { title: '[Experimental] React SSR as a view template engine' }
// ]