Cardiff NHS Hack Day 2015

Posted by mattjw in Uncategorized - (Comments Off on Cardiff NHS Hack Day 2015)

Team HEW at work. Credit: Paul Clarke.

This weekend I'm building a health data visualisation webapp at the Cardiff NHS hackathon! It was put together with Will Webberley, Martin Chorley, Glyn Mottershead, and a bunch of talented Cardiff Computational Journalism students! The students have also been live blogging throughout the weekend.

Visit the working webapp here. Find the code and data on GitHub.

Coverage elsewhere:

  • Post on the Cardiff Computational Journalism blog.
  • Post by Dyfrig Williams from the Welsh Audit Office's Good Practice Exchange .
  • Photos by Paul ClarkeSat and Sun.
  • Cardiff Computational Journalism live blog.
  • Martin's blog post.

Academic Genealogy

Posted by mattjw in Uncategorized - (Comments Off on Academic Genealogy)

I compiled and designed an academic genealogy graphic and had it printed and framed as a gift. Here's how I did it. Materials for you to do your own, including scripts and an example design, are available in this repository on GitHub. You'll likely need some familiarity with Python syntax.

The framed academic genealogy.

The framed academic genealogy.

Supervisor Family Trees

Supervisor family trees are a fun bit of academic self-indulgence. These are like real family trees, but instead of depicting parent-child associations between individuals, a supervisor family tree depicts supervisor-student associations. Exactly what constitutes supervision is an open question -- the most obvious definition is the supervision of a student's doctoral thesis, but this is a bit too narrow. The current conception of a supervised, research-based PhD degree only originated in the 19th century, but the tradition of academic mentorship is far older (e.g., we could go back at least as far as Socrates and Plato). With a wider definition we can build an academic genealogy that goes back many centuries.

Having recently completed my own PhD, I took the opportunity to explore my own academic ancestry and put together a genealogy that I could give to my supervisors as a gift. (Also, by researching my supervisors' ancestry I'd also be researching my own -- the perfect combination of self-indulgence and altruism!)

Click for PDF of final design.

Design of the genealogy. Full size PDF.

These academic family trees are nice because nearly all academics will be able to trace their ancestry back to at least a few notable scientists or mathematicians, in the same way that most western Europeans can trace their familial ancestry back to Charlemagne. Marin Mersenne, Isaac Newton, and Galileo Galilei are all ancestors of mine. In addition to direct ancestors, we can also look at individuals with whom one shares a common ancestor. For example, Alan Turing and Peter Hilton, both code-breakers at Bletchley Park during the Second World War, can be regarded as academic cousins as they both share Oswald Veblen as the supervisor of their respective doctoral supervisors.

The Data

The big challenge of compiling a genealogy is of course gathering the history of mentor-student relationships for those involved. Fortunately, the Mathematics Genealogy Project (MGP) has done a lot of the work for us. The MGP has mapped over 175,000 academics and their students. Although it is predominantly focused on mathematicians, it also includes academics who have made contributions in other fields, including physics, computer science, chemistry, and biology. A few people have written scripts and libraries that access this database to build a visualisation of an individual's academic genealogy. The best I've found is David Alber's Geneagrapher, which is written in Python. These scripts, however, only attempt to show an individual's direct ancestors and descendants, not any interesting academics that they may share a common ancestor with.

Including common ancestors in the genealogy is a lot more challenging. The number of individuals that have a shared common ancestor with a typical living academic is going to be huge, resulting in a lot of queries to the MGP and producing an unwieldy visualisation. Instead, we want some way of selecting a few interesting individuals to see if their ancestry can be connected to the person we're building a genealogy for (I'll call this person the focal academic for short) and then building the visualisation around that, possibly culling a few unwanted branches of the tree in the process.

Some Scripts

The Script-GenealogyMiner directory contains a Python script, genealogy_miner.py, that crawls the MGP for a given focal academic, attempting to connect him/her to other academics. It's configured through another Python file (specified as a command line argument) that contains configuration options. I've provided an example, config_turing.py, with Alan Turing as the focal academic. To run this example, download the Script-GenealogyMiner directory and execute:

python genealogy_miner.py config_turing.py turing.dot

For demonstration purposes, the configuration only has a few seed academics (see SEED_ID_LIST). Seeds are academics the script will attempt to find shared ancestry with. Crawling can take a while, depending on the number of individuals to be crawled, the number of ancestors they have, and the response time of the MGP servers. The crawl with the 14 example seed academics should take less than four minutes.

graphviz rendering of the Turing example with all demo seeds.

GraphViz rendering of the Turing example.

The output is a plain-text dot file (turing.dot) describing the genealogy (as a list of nodes and edges, including some formatting instructions such as text and arrow colours) that can be imported into other applications (I used OmniGraffle) so you can do further design work. dot is a popular graph description format and is fairly well supported. If you have GraphViz installed on your system, you can have it generate a rendering via:

dot -T png turing.dot > turing.png

The GraphViz rendering isn't production-quality -- for the final graphic I imported the dot file into OmniGraffle -- but it's useful when you're tweaking the crawl configuration. It takes a bit of guesswork to determine which academics might be reachable from the focal node. The configuration file allows you to specify a few different features which you'll need to play around with, so having GraphViz on hand to do quick renderings of the resulting genealogy is useful.

I should note that the script builds on Geneagrapher (already included in the Script-GenealogyMiner directory), which it uses to query the online MGP database.

Taking a look in config_turing.py shows how we can configure the crawler:

  • The focal academic (ID_FOCAL_NODE): The MGP identifier of the academic for whom we are generating the genealogy. This is given in the Mathematics Genealogy Project URL for a particular academic; e.g., Turing's page ends in ...?id=8014.
  • Prospective connections (SEED_ID_LIST): A list of individuals in the MGP, again identified by their MGP identifier. The script will try to find common ancestry between the focal node and these individuals. So, given the Turing example configuration, the script will look to see if Richard Feynman and Alan Turing have a common ancestor, and if so, it will include both their ancestries in the genealogy.
  • Tree pruning (CULL_AND_ABOVE and ERASE_INDIVIDUAL): Including a particular academic can introduce a large ancestry and produce an ungainly genealogy. These two parameters (the cull list and the erasure list) allow us to prune the tree. Culling (CULL_AND_ABOVE) will remove an individual and his/her entire ancestry. Erasure (ERASE_INDIVIDUAL) will remove a particular individual but leave his/her ancestors untouched. Culling (as opposed to erasure) an individual will also insert an ellipsis above its children nodes to indicate that part of the tree was removed there.
  • Colour scheme: Colour individuals in the genealogy based on their relationship with the focal academic. This includes colouring based on whether the individual shares a common ancestor, is a direct ancestor, is a direct descendant, and so on. Colour instructions are included in the output dot file; most applications (e.g., graphviz and OmniGraffle) should be able to interpret these instructions.

You'll likely need a bit of familiarity with Python syntax to get the most out of the script. Loading a Python module for configuration is a bit of a taboo, but is convenient enough for what this script needs to do. I've used networkx to make manipulating the genealogy (which is, more formally, a directed acyclic graph) simpler, since the script needs to handle joining and splitting of subgraphs, culling of disconnected components, and do some traversals for node colouring. 

The script also takes a list of scientific prize winners. If any of these appear in the genealogy, they will be given a special colour, as per the colour scheme. Which prizes you wish to include is up to you. The Script-FindPrizeWinners directory contains a crude script that will compare the names of academics (which can be copy and pasted from a dot file for convenience) to prize winners and return any matches, so you can figure out who in your genealogy is a winner. I've included a few text files containing lists of winners (up to 2013) for various scientific prizes; namely, Abel, Cole, Fields, Turing, and Wolf prizes. It does fuzzy string matching, since the Wikipedia lists of winners (from where the names are sourced) might have slightly different spellings to those in the MGP, so it will likely produce false-positives -- please use as a starting point only.

Design

After a few iterations of generating a dot file, checking its graphviz rendering, and tweaking the original configuration (e.g., adding more seeds, culling unwanted subtrees, erasing some nodes, etc.) I went on to import the file into OmniGraffle to do a prettier design. For anyone that wants somewhere to start, I've included the final design for one of my PhD supervisors, Roger Whitaker (who, interestingly, connects to my other supervisor, Stuart Allen, through William Hopkins), in the Designs directory. It's in A-paper ratio (1:\sqrt{2}) but will need resizing to whatever print size is required. I had it printed on glossy A3 paper and put it in this John Lewis picture frame.

(N.b.: Kudos to Stuart for kicking off this idea by stumbling on one of the older MGP genealogy scripts.)

One of the toys that the Computing Club has to play with is an AR.Drone 1.0. This is a pre-built WiFi-enabled quadrocopter manufactured by Parrot. There are official iOS and Android applications for remotely controlling the quadrocopter. The AR.Drone also streams a live video feed from its onboard camera to the controller. Flying the drone around from an app is fun enough, but where things get really interesting for the Computing Club is programming it to do things! Over the last few months undergraduates have been tinkering with the drone, making it do various things using the open-source javadrone API.

Kirill Sidorov and I, organisers of the Computing Club this academic year, were asked to prepare a demo for an upcoming School of Computer Science & Informatics Open Day. The aim of these open days is to enthuse A-Level students who are considering study in Computer Science. We needed something that was interactive and fun, but also allowed us to highlight some of the concepts of computer science and what makes it interesting. We decided on a motion-tracking AR.Drone demo. We'd use the on-board camera to have the drone follow an individual holding a target. There's some neat computer science here – control and computer vision in particular – and it also demonstrates the power using software to program real-world devices. Furthermore, it also meant we could build on the work done by Computing Club students and bring them in to chat to visitors at the Open Day.

Conveniently, a few days before the the first Open Day (17 April) was the two-day "Open Sauce" Hackathon. Kirill and I were attending anyway to help with the student-organised event, so we took advantage of the fruitful combination of hackathon ambience, energy drinks, and free food to build the demo over those two days. The repository is hosted on GitHub. The original output from the Hackathon is in this branch (warning: gnarled, hackathon-quality code). This was tweaked and (slightly) refactored over the following days in preparation for the Open Day, resulting in this.

Building the AR.Drone demo at the 2013 "Open Sauce" hackathon.

Building the AR.Drone demo at the 2013 "Open Sauce" hackathon.

The target we used during the hackathon was a ping-pong paddle wrapped in an A4 sheet of paper coloured with pink highlighter. In hindsight, the lighting conditions of the venue were very consistent, making it a favourable test environment. Kirill prototyped some image-by-image video processing to extract the target in MATLAB, and then translated to native Java. I handled the interaction with the AR.Drone and control loop. We also implemented a fairly crude but useful GUI to view the raw and processed image streams, debug some control parameters, and initiate take-off and landing (emergency, typically). The javadrone API made controlling the drone straightforward, and even allowed us to implement some nifty features like changing the drone's LED colours when the target is lost.

The image component outputs the location (a pixel coordinate) and extent (a measure proportional to the target's size in view) of the target in the camera's view. This information is used to handle our three control variables:

  1. Forward/back tilt for moving forwards and backwards to maintain a particular distance from the target.
  2. Left/right rotation to keep the target horizontally centred.
  3. Vertical ascent/descent to keep the camera and target at the same height.

We didn't have much time to fully explore the handling of the drone with respect to these control variables, but experimenting with a few simple linear controllers and a PID or two resulted in decent  tracking, as undergraduate George Sale demonstrates in this video:

(As shown in the video, as well as this other one, pretty much every flight ended up with a haywire drone and me initiating a forced landing.)

That was the hackathon; the Open Day proved much more challenging. In our hackathon experiments, the specificity of our target detection was excellent. Specificity was our primary concern, since a false-positive target detection puts bystanders wearing unfortunately coloured clothing on the receiving end of multi-bladed drone fury. The Open Day venue had very uneven lighting, with patchy artificial lights, and a large window in one corner that would temporarily flood the camera depending on the drone's angle. This caused the colour profile of the paddle to change drastically depending on the angle of the drone, the location of the target, and the location of the drone.

To deal with this, our first trick was to change the target. Significant variation in light reflection between dimly lit and brightly lit areas meant large changes in the target's brightness and hue. By switching to a backlit target we could ensure fairly consistent brightness, irrespective of ambient light. Using a bike light, a home-made filter (highlighted A4 paper), a diffuser (coffee filter paper), and filter assembly (polystyrene cup), we hacked together the following target:

(Yes, we effectively built a cheap Playstation Move controller.)

The resulting target had very consistent and distinct appearance. After this there were just a few camera-related issues to tackle; in particular:

  • Although the camera resolution is 640x480, the drone only streams 320x240 back to the laptop. Nothing much to say here, except it's surprising (802.11g is capable of the bandwidth and latency) and inconvenient.
  • Either the camera hardware or drone firmware was doing some unwanted brightness auto-adjustment which we had to un-adjust back on the laptop.
  • The lens quality is poor. We had to discard everything outside a centre 320px-wide circle to cull corner artefacts.

And, then, finally, we were left with a superb signal and negligible false-positive rate.

Target Triumph.

Target triumph! Left panel: raw stream. Right panel: processed video stream; red pixels and white circle indicate detected target.

The control still needs a lot of work, but the drone flies and reacts well. It's enjoyable watching people have a go at it. Initially people are very tentative. This is unsurprising; the drone's forward/back lunging can be vicious at first (although it usually stabilises before quite reaching the volunteer). After a few goes, they're eventually able to start taking it on tours around the demo area, almost like walking a dog; albeit a dog that is noisier, less behaved, and hovering in mid air.

2013 "Open Sauce" Hackathon Round-up

Posted by mattjw in Uncategorized - (Comments Off on 2013 "Open Sauce" Hackathon Round-up)

Last weekend's "Open Sauce" Hackathon was a big success. In addition to the funding I mentioned in my previous post, GitHub also got in touch a day before the event to bolster each prize category with one-year bronze and silver accounts.

There's a write-up and more photos at the CSCF website, so please navigate there for more information. I'll also maintain a list of other individuals' posts below.

Hackathon 2013 group photo.

Hackathon 2013 group photo.

While at the event Kirill Sidorov and I,  Computer Club co-organisers, also took the opportunity to write the software for a motion-tracking quadrocopter demo we'd been asked to for the School's upcoming Open Day. Write-up to follow on this blog.

Thanks to all the judges, undergraduate organisers, sponsors, and attendees for making it a great event!

Elsewhere:

Open Sauce Hackathon 2013

Posted by mattjw in Uncategorized - (Comments Off on Open Sauce Hackathon 2013)

Last year I attended the inaugural School of Computer Science & Informatics "Open Sauce" Hackathon as a participant. It was a hugely successful event, and good fun to work with Mark and Chris in building Motion Kitty Pi, a prototype Spotify home music streaming service for Raspberry Pi (with motion-triggered playback!). Not only was the event a success, it was superbly organised by undergraduates in the School of Computer Science & Informatics's Computer Club. Click here for a report on last year's event.

Last year's hackathon.

The 2012 hackathon.

Now being a lecturer and co-runner of the Computer Club I get to assist the undergraduates in organising this year's Hackathon, and it's shaping up to be even better than last year's! They've done an excellent job of organising and promoting the event, with over 40 attendees already registered. Among these are undergraduate students from Cardiff University and other institutions, PhD students, staff members, and local professionals.

As with last year the School is supporting the event with facilities and a contribution to the prize fund. What makes this year even more impressive is the amount of external sponsorship the students have secured. Box UK are very kindly providing the ever-important food and (energy) drinks for the two-day event, and a total £500-worth of prizes are being contributed from Linode, DigiStump, and eysys. On top of that, John Greenaway (Cardiff University Information Services), Richard Gaywood, Stuart Allen (Cardiff University School of CS&I), and Humphrey Sheil (eysys) will be on-hand to judge the final projects.

So: free food, free drink, big prizes, and, importantly, building something cool with friends. What more could you want in an event? Get more information or sign up if you haven't! And well done to Joe, Henry, Geraint, James, and all the organisers!

Box UK's "For the Social Good" Hackday and webapp "Gritly"

Posted by mattjw in Uncategorized - (Comments Off on Box UK's "For the Social Good" Hackday and webapp "Gritly")

I spent last Sunday at Box UK's "For the Social Good" hackathon. It was a very successful event and big thanks to Box UK for putting it on and providing a venue, food, and prizes. Over five teams hacked together apps on the broad theme of "social good" (something of benefit to the local community) in eight hours. Check out this post on Box UK's blog for more information.

Mark Greenwood, Martin Chorley, and I formed the "Cardiff University PhD students" team and built 'Gritly', a winter road condition maps mashup (more information below). Cardiff University did very well at the event, with Computer Science undergraduates winning runner up team and individual hacker prizes, and our own Gritly winning the first prize! It was great to see the apps everyone had built. Here are a few write-ups from elsewhere:

Team Gritly – Martin, me, and Mark. Photo by Dan Green.

Gritly

Mark, Martin, and I decided to put a team in and spent an hour last week brainstorming some ideas for a project. We'd decided on using a data.gov.uk dataset in some way and needed an idea for something that could benefit the local community. I'd had the phrase "Winter is Coming" rattling around in my head for a while because, well, Winter is coming. Also, and more relevantly, it's the time of year where UK news outlets do the usual winter weather doomforecasting. (OK, Game of Thrones too.) From this we edged towards the idea of trying to manage winter weather road hazards (i.e., ice and snow) by either notifying the council of areas that need more gritting (hence the name 'Gritly') or at least warning drivers of roads that may be hazardous during very cold spells.

We found a very fine-grained data.gov.uk dataset of a few years' road traffic accidents which includes information on the road and weather conditions at time of incident, and so decided on using this to build a Google Maps mashup that would plot the locations of road accidents where ice or snow were factors. The dataset also provides the date and severity of an incident (from 'low' up to 'fatal'), which the webapp can display. Since the data.gov.uk data on accidents is only provided annually we wanted to also include a realtime component. For this we took inspiration from the #UKSnow twitter mashup and decided that people could submit current road hazards by tweeting a post code and warning along with the hashtag #UKIce.

On the day, Mark handled the Twitter realtime component, Martin designed and implemented the web frontend, and I built the backend and data endpoints and handled deployment. I also did minor processing on the accidents dataset to extract the cold-weather related events and prepare it to be served. The code is available on GitHub here. The backend is written in Python and runs on Django.  The realtime component uses tweepy to grab the relevant tweets and geopy to geocode the post codes. The front end is bootstrap, Google's maps JS library, and a bit of jQuery.

Gritly.

You can visit Gritly at http://gritly.nomovingparts.net/.

It was great fun to build and we managed the project very well, finishing with a relieving 30 minutes to spare. We were also pleased to take the first-place group prize at the event -- thanks to Box UK for the Amazon vouchers!

Check out Mark's and Martin's write-ups here and here, respectively. Gritly's has since been covered in a few other places as well:

 

 

Why Local Companies Should Sponsor Hackathons

Posted by mattjw in Uncategorized - (Comments Off on Why Local Companies Should Sponsor Hackathons)

Having attended and organised a few hackathons recently, I've been impressed by their ability to glue together the local tech community, university students, and local companies. Hackathons work best when they have some sponsorship behind them; this lends some status to the event and, of course, cash to incentivise participants with food and prizes. As a company, you may be approached by a third party organising a hackathon. Here are some tips on how to get the most out of the event. You may even wish to organise a public hackathon yourself; the same applies.

Here's what you do as an employer. First do your due diligence -- check the organisers are legit, how many people they can genuinely pull in, whether they've secured an appropriate venue, and so on. If all is well, you then put some money into the hackathon to support it in some way. This could be covering the catering (e.g., lunch on each day), sponsoring a prize, or simply handing over an amount for the organisers to use as they want. (A note to newbie organisers here: sponsors will really appreciate a mention on the web page for your event. If you don't have a web page yet, make one! It's a good idea to ask how they want their name and/or logo to appear.)

Go to the event and bring a few employees; if possible, developers would be best! Put a few flyers for your company around the venue. See if you can even convince some employees to get stuck in and participate. For those not participating: interact with the groups, find out what they're doing, how they're doing it, what technologies they're familiar with, and so on. If you're sponsoring a prize, then you'll likely also be acting as a judge. This gives you yet more opportunities to quiz the teams and find out more about the members, and a chance to highlight their achievements at the end.

It's important to not make it a recruiting drive. The participants' primary focus is their projects; they'll likely become alienated by heavy-handed recruiting. They already know you're a company and possibly looking to hire some talented developers, so no need to remind them of it.

At the end-of-event round-up, briefly mention your company, what it does, and how/where to find out about job opportunities there. Being a sponsor of a prize helps here since you'll naturally have the floor for the moment while you announce the winners.

There are many benefits to sponsoring a hackathon, for both company and participant. First, you get a feel for the prospective employees. You've seen them working on a practical project over two days. You have an impression of their personality, how they work in a group, and their experience with various technologies. In fact, at the end of the event, for some of the participants you'll already have answers to the behavioural questions that might be asked during an interview. This is especially true if you (or your employees) participated in a few teams.

Your company's profile will be raised, and with key people in the local tech community. Typically, hackathon participants are very active in tech. They keep up with trends, they love Twitter, and, importantly for you, are well connected in the community. Even if you don't get a direct hire out of the hackathon, world will spread among the acquaintances and friends of participants.

For participants, they'll (hopefully) leave the event with a positive, lasting impression of your company, your employees, and the culture within your company. Traditional advertising will not buy you that level of engagement. (Participants will also, of course, be very grateful for the free food and/or prizes.) The organisers will be very appreciative, and will gladly mention your contribution to the event.