A few days ago, I found Rowan Manning’s joblint pack via Twitter. Joblint is a linter (code style guide implementation and bug finder) for job advertisements. It looks for keywords that indicate sexist, abusive, or bland content, unrealistic expectations, or outdated technology requirements and prints a report indicating possible problems with the ad. It has three levels of reports showing the severity level: errors, warnings, and notifications.
Of course the first thing I thought of doing was evaluating job posts on Hacker News. Hacker News (HN) only posts job advertisements from Y Combinator alumni, so all of them are tech startups. Also, while this doesn’t reflect the company posting there, commentators and HN audiences tend to have more “bro-eyed, start-douchebag streaks in them compared to other free software and programming communities that I usually enjoy hanging out with. in.
So, I wrote my first quick and dirty scraper script using Python, request, and lxml to find all the job links on the Hacker News job page, download the pages they link to, and run joblint on each one.
One HN job ad, the OrderAhead engineering job page, really does pass the joblint! Good job, OrderAhead!
Every other ad received a notification that sounded “competitive and performance based.” These are just notifications, not warnings or errors, as some jobs are in fact internally competitive and heavily based on performance. Perhaps all of Y Combinator’s work is correct, and this ad is accurate. I don’t think there’s any way of knowing from the outside. Three advertisements listed stupid and superficial facilities: beer, xbox, and pizza. Does the XBox count as paleo? The other two sure didn’t.
Overall, the current pool of job postings according to this metric is much better than I expected, and many of the warnings are due to content on the page that isn’t ad-related. Indeed, it could have been much worse. It’s clear that joblint is written ideally for you to copy and paste every ad copy into – the rate of false positives is quite high in analyzing entire HTML pages. Raw yield is available after cutting. You can also download and run your own script, if you want, to evaluate future ad sets – although be careful not to flood the HN with requests.