segunda-feira, 22 de dezembro de 2014

Web Crawler MEAN (MongoDB, Express, Angular e NodeJS) Part º2


This post is the second part of tutorials about MEAN. Today we will see about crawler logic with cheerio (node lib).

In app.js declare crawler class:

var crawler = require('./routes/crawler.js');

After we set one timer for get source:


    console.log('trying crawler in:' + new Date());

    download(options, function downloadResult(data) {

      if (data) {

          findKeywordsAndusers(data, parseHtml);


      else console.log("error");  


},  1 * 60 * 1000);

The code above will call function download and in call back return will call function findKeywordsAndusers with function callback parseHtml in parameter.

The main logic in function parseHtml:

#load html download from function download

var $ = cheerio.load(data);

#iterate through each element 'li' and get element in tag 'h3' with link

$("li").each(function(i, e) {

    var title = $(e).find("h3>a").text();


After get the element what we need, the rest of logic is about retrieve keywords for match with title and send e-mail for alert.