Reflecting on working with closed data (Data Study Group: Health and Wellbeing

I’ve been doing lots of thinking about working open (data, science, source, etc.) vs more closed / proprietary environments recently. This come following attending the Health and Wellbeing Data Study Group and finding the experience a bit frustrating - my team was working with some highly sensitive data, with a couple of problems: not everyone could see the data, and those who could see data found it was so highly sandboxed that they couldn’t get the tools they needed (e.g. analysis languages and libraries) to the data. I ended up doing some open source mapping work based on a mix of the little open data available via api and a small set of random data I generated based on descriptions on what the real data might look like.

Above: my output for the week - animation based on mocked data.

Closed data sucks, but it’s important

After thinking about this a lot, I came to a simple conclusion: Closed data sucks, and closed environments aren’t for me.

Now, don’t get me wrong - some data needs to be closed, perhaps because of locational or biomedical personal privacy concerns, or maybe even conservation reasons - don’t tell people where the baby rare creature is, lest it get poached! Closed / carefully protected data can be entirely appropriate.

Working with sensitive datasets that are closed for good reasons is still probably going to be massively important - insights can still be gleaned from them - but the challenges around working with this type of data doesn’t lend itself to quick or easy rewards.

Open (source|data) is way more fun

By contrast, working on an open source project like InterMine, most of my work tends to provide an immediate sense of gratification - I can see the ~30 public instances of our software online, interact with the community freely, use the tools, libraries, technologies that interest me (within reason), and generally feel like my creativity and brainpower are spent well. If I think of an exciting solution, usually the only thing that might prevent me from implementing it is time.

Looking back at closed environments - whether it’s corporate “risk mitigation”, data protection, or proprietary intellectual property protection - it’s tough when you think of a good solution but aren’t allowed to implement it, with reasons that feel mostly like it’s just a simple “because”. This type of environment tends to make me feel disappointed, which harms the little work I can do.

I guess, ultimately, working with closed data belongs to those with a stronger temperament than I. I hope I can always enjoy the fantastically nurturing environment of a funded open source project. I just need to try to remember this next time I come up with a chance like the Turing data study group - I might have enjoyed it a lot more had my subgroup been one of the open data groups.