So, last week I launched chs-tonight. It's a completely autonomous website that scrapes the event schedules of several venues in and around Charleston and displays them in an organized and arguably stylish manner (currently 11 venues are being scraped. More are in the works. Want to see a venue represented? TELL ME).

So far I've gotten some very nice feedback from people who've seen it. Nobody's specifically told me they've used it yet, but, to paraphrase one of the kids from Angels in the Outfield, hey, it could happen.

In any case, I'd like to detail how chs-tonight works and talk about the process that went into creating it. Because I think some people may be interested in it, but also because I just kind of want to do it.

So, here we go. Part One: How Do?

On the about page for chs-tonight, it says that it was inspired by dctn by a guy named tom macwright. Credit where credit's due. Anyway, the back end of chs-tonight in a node application that runs on a Digital Ocean droplet I own (can you 'own' a droplet?). It's pretty simple. It iterates through all the modules in the sources folder and concatinates all the results into an array calls shows.

The sources folder contains all the web scrapers. Let's look at the scraper for the Tin Roof.

First things first, we request the site that lists all of the Tin Roof's shows. Tin Roof unfortunately uses a ReverbNation widget for the schedule on their actual website, but as long as you can find a venue's ReverbNation ID, you can access a regular ol' html list with all the shows.

So yeah, now we have the raw html for their list of shows. There's a neal little Node module called Cheerio that provides some JQuery-type functionality on the server. As you can see, we load the body of the page with Cheerio with

var $ = cheerio.load(body)

Then we iterate through all the list item elements and load each individual list item into Cheerio:

$('.show_nugget').each(function(i, elem) { 
      var $$ = cheerio.load(elem)

From there, we can just select each bit of data individually. For example, to get the date:

 var date = $$('.shows_date_').text().split('@')[0].trim()

SIDE NOTE: My Cheerio code, while it works, got pretty messy. Definitely let me know if you've got a better way to do this stuff.

One important task was to normalize the dates for all events across all venues. This made it very easy to organize everything in the LevelDB database that houses all the show data on the server...but that's all for a later post. Morgan advised a YYYY-MM-DD format, so that's what I did:

 function normalizeDate(date, year) {
  var newDate = []
  var day 
  var month = date.split(' ')[1]
  if(date.split(' ')[2] != '') day = date.split(' ')[2]
  else day = date.split(' ')[3]
  month = (months.indexOf(month)+1).toString()
  if(month.length<2) month = '0'+month
  if(day.length<2) day = '0'+day
  return newDate.join('-')

Looking back, this probably could be a lot cleaner using MomentJS. I'll refactor all this in the future. So that's how the scrapers work, basically. Does this even make sense? Is it interesting? Is this something you want to keep reading about?

Oh jeez, going through all this is gonna take more posts than I thought it would. Tune in next time for Part Two: LevelDB and HTML Generation, -or- How to Bore People Who Don't Program (And Some Who Do) To Death.