Previous | Next
WireHose Developers Guide

Crawling feeds

The next step is to write the crawler. This method is similar to the importFeeds method, except that it will call fetchDictionaryFromURL and insertResources repeatedly, once for each available feed. It will also assign tags to the items in the feed based on the feed's tags.

  1. Add this method to Importer.java:
    public static void crawlFeeds() {
        EOEditingContext ec = new EOEditingContext();
        ec.lock();
        
        NSArray feeds = fetchFeedsToCrawl(ec);
        NSLog.debug.appendln("Found "+feeds.count()+" to crawl...");
        
        RSSFeed feed;
        NSMutableDictionary rss;
        NSMutableArray snapshots;
        NSDictionary statusDict;
        NSArray inserted;
        
        // iterate through feeds and fetch items from each one
        for (int i=0, count=feeds.count(); i<count; i++) {
            feed = (RSSFeed)feeds.objectAtIndex(i);
            NSLog.debug.appendln("Crawling "+feed.name()+": "+feed.link());
       
            try {
       
                // import the dictionary from the feed's URL
                rss = WHImporter.fetchDictionaryFromURL(
                  feed.link(), "Contents/Resources/rss20MappingModel.xml");
       
                // extract and clean up the dictionaries
                snapshots = cleanSnapshots(rss.valueForKeyPath("channel.items"));
                
                // insert the resources into the database
                // insertResources returns a dictionary of inserted, updated, deleted items
                statusDict = WHImporter.insertResources(ec, 
                    snapshots, "RSSItem", "Content/", null, WHImporter.IgnoreAndTag, 
                    true, true, true, true, false);
       
                // get inserted items from the returned dictionary
                inserted = (NSArray)statusDict.objectForKey(WHImporter.InsertedKey);
       
                // add tags to the inserted items based on the feed's tags
                tagItemsForFeed(ec, inserted, feed);
       
                // don't fetch for another hour
                feed.setLastFetchDate(new NSTimestamp());
                
                ec.saveChanges();
            } catch (Exception e) {
                NSLog.debug.appendln("Exception importing "+feed.link()+" - "+e);
                feed.setLastFetchWasInvalid(true);
            }
        }
        ec.unlock();
        ec.dispose();
    }

    The insertResources method returns a status dictionary which contains arrays of updated, inserted, removed and ignored objects. The importFeeds method ignored this return value, but here the list of inserted items are extracted from the dictionary so they can be tagged.


Previous | Next