Category Archives: scripting

Automating bulk web stuff with iMacros

I found myself needing to download a whole bunch of JSON data from a server that had a weird authentication thing that Chrome could deal with but wget/curl/Ruby couldn’t. Since my Firefox was on the fritz, I couldn’t use Selenium IDE or the Selenium Webdriver. iMacros to the rescue! (Chrome, Firefox)

2013-11-21 Geek notes - automating bulk web stuff with iMacros

That plus lots of keyboard macros and text manipulation in Emacs, plus a little parsing and regexp substitution in Ruby, plus more Emacs munging got me the data I wanted. Hooray for bubblegum and string scripting!

Scripting and the grocery store flyer

We plan the week’s meals around the grocery store flyers, taking advantage of what’s on sale (chicken, ground beef, etc.) and stocking up when the opportunity presents itself (for example, diced tomatoes or cream of mushroom soup).

The flyers are usually delivered on Thursdays. We do most of our grocery shopping at No Frills because it’s convenient and almost always a good price, but sometimes we’ll go to Freshco or Metro if there’s a particularly good sale. I might flip through the other flyers if we’re looking for something in particular, but most of the time, we just toss them out. I’d love to opt out of paper flyers, but that doesn’t seem to be an option in our neighbourhood.

It doesn’t take a lot of time to review the flyers, but I figured it would be fun to write a script that highlights specific items for us. Now I have a script that parses the output of the No Frills accessible flyer to create a table like this:

Y Clementines 2.47 2 lb bag product of Spain $2.47
Y Smithfield Bacon 3.97 500 g selected varieties $3.97
Y Thomas’ Cinnamon Raisin Bread 2.5 675 g or Weston Kaisers 12’s selected varieties $5.00 or $2.50 ea.
Y Unico Tomatoes 0.97 796 mL or Beans 540 mL selected varieties $0.97
Fresh Boneless Skinless Chicken Breast 3.33 2.78 BIG Pack!™ DECEMBER 18TH – 24TH ONLY! $3.33 lb/$7.34/kg save $2.78/lb
Purex 3.97 2.02 2.03 L $3.97 save $2.02
Frozen Steelhead Trout Fillets 5.97 2.0 filets de truite $5.97 lb/$13.16/kg save $2.00/lb
Heinz Tomato Juice 0.97 1.52 1.36 L selected varieties $0.97 save $1.52
Nestlé Multi-Pack Chocolate or Bagged Chocolate 2.88 0.61 45-246 g selected varieties $2.88 save 61¢
Source 4.97 0.5 16 x 100 g selected varieties $4.97 save 50 ¢
Franco Gravy 0.67 0.32 284 mL selected varieties $0.67 save 32¢
Ocean Spray Cranberry Sauce 1.67 0.32 348 mL whole or jellied $1.67 save 32¢
10 lb Bag Yellow Potatoes, 5 lb Bag Carrots or 10 lb Bag Yellow Cooking Onions 1.87 product of Ontario, Canada no. 1 grade or 5 lb Bag Rutabaga product of Canada, no. 1 grade $1.87
Betty Crocker Hamburger Helper 1.5 158-255 g or Mashed or Scallop Potatoes 141-215 g selected varieties $3.00 or $1.50 ea.
Blackberries 1.25 6 oz product of U.S.A. or Mexico DECEMBER 18TH – 24TH ONLY! 4/$5.00 or $1.25 ea.

(more lines omitted)

The table is sorted by whether the item name matches one of the things we usually buy (first column: Y), then how much the sale is for, and then the name of the item. Over time, I’ll add more things to the priority list, and the script will get smarter and smarter.

I can use Org commands to move the rows up or down or remove the rows I’m not interested in. Then I can take the second column of the script’s output with Emacs’ copy-rectangle-as-kill command (C-x r M-w), and paste it into OurGroceries‘ import dialog. That builds a shopping list that’s sorted by the aisles I’ve previously set up, and this list is synchronized with our phones.

I’ve added the script to https://github.com/sachac/scripts/blob/master/check-grocery-flyer.js, so you can check there for updates. The flyer URL and the list of staples are defined in a separate configuration file that I haven’t included in the repository, but you can probably come up with your own if you want to adapt the idea. =)

Here’s the source, just in case:

#!/usr/bin/env node

/*
  Creates a prioritized list based on the flyers, like this:

Y Clementines 2.47    2 lb bag product of Spain $2.47
Y Smithfield Bacon  3.97    500 g selected varieties $3.97
Y Thomas' Cinnamon Raisin Bread 2.50    675 g or Weston Kaisers 12's selected varieties $5.00 or $2.50 ea.
Y Unico Tomatoes  0.97    796 mL or Beans 540 mL selected varieties $0.97
  Fresh Boneless Skinless Chicken Breast  3.33  2.78  BIG Pack!™ DECEMBER 18TH - 24TH ONLY! $3.33 lb/$7.34/kg save $2.78/lb
  Purex 3.97  2.02  2.03 L $3.97 save $2.02
  Frozen Steelhead Trout Fillets  5.97  2.00  filets de truite $5.97 lb/$13.16/kg save $2.00/lb
  Heinz Tomato Juice  0.97  1.52  1.36 L selected varieties $0.97 save $1.52
  Nestlé Multi-Pack Chocolate or Bagged Chocolate 2.88  0.61  45-246 g selected varieties $2.88 save 61¢
  ...
  */

var rp = require('request-promise');
var cheerio = require('cheerio');
var homeDir = require('home-dir');
var config = require(homeDir() + '/.secret');
var staples = config.grocery.staples; // array of lower-case text to match against flyer items
var flyerURL = config.grocery.flyerURL; // accessible URL

function parseValue(details) {
  var matches;
  var price;
  if ((matches = details.match(/\$([\.0-9]+)( | )+(ea|lb|\/kg)/i))) {
    price = matches[1];
  }
  else if ((matches = details.match(/\$([\.0-9]+)/i))) {
    price = matches[1];
  }
  else if ((matches = details.match(/([0-9]+) *¢/))) {
    price = parseInt(matches[1]) / 100.0;
  }
  return price;
}

function getFlyer(url) {
  return rp.get(url).then(function(response) {
    var $ = cheerio.load(response);
    var results = [];
    $('table[colspan="2"]').each(function() {
      var cells = $(this).find('td');
      // $0.67  or  2/$3.00 or $1.25ea
      var item = $(cells[0]).text().replace(/^[ \t\r\n]+|[ \t\r\n]+$/g, '');
      var details = $(cells[1]).text().replace(/([ \t\r\n\u00a0\u0000]| )+/g, ' ').replace(/^[ \t\r\n]+|[ \t\r\n]+$/g, '');
      var matches;
      var save = '';
      var price = parseValue(details);
      details = details.replace(/ \/ [^A-Z$]+/, ' ');
      if (details.match(/To Our Valued Customers/)) {
        details = details.replace(/To Our Valued Customers.*/, 'DELAYED');
      }
      if ((matches = details.match(/save .*/))) {
        save = parseValue(matches[0]);
      }
      results.push({item: item,
                    details: details,
                    price: price,
                    save: save});
    });
    return results;
  });
}

function prioritizeFlyer(data) {
  for (var i = 0; i < data.length; i++) {
    var name = data[i].item.toLowerCase();
    for (var j = 0; j < staples.length; j++) {
      if (name.match(staples[j])) {
        data[i].priority = true;
      }
    }
  }
  return data.sort(function(a, b) {
    if (a.priority && !b.priority) return -1;
    if (!a.priority && b.priority) return 1;
    if (a.save > b.save) return -1;
    if (a.save < b.save) return 1;
    if (a.item < b.item) return -1;
    if (a.item > b.item) return 1;
  });
}

function displayFlyerData(data) {
  for (var i = 0; i < data.length; i++) {
    var o = data[i];
    console.log((o.priority ? 'Y' : '') + '\t' + o.item + "\t" + o.price + "\t" + o.save + "\t" + o.details);
  }
}

getFlyer(flyerURL).then(prioritizeFlyer).then(displayFlyerData);

I’ll check next week to see if the accessible flyer URL changes each time, or if I can determine the correct publication ID by going to a stable URL. Anyway, this was fun to write!