BiVi - 3rd BiVi Annual Meeting (2017)
BiVi this year was exhaustingly busy but good fun to attend with some great networking activities. As usual I live tweeted on the first day, gave a lightning talk - 3 slides in 1 minute, phew! - and presented an InterMine poster.
The second day was mostly interactive workshops, so I found myself recording notes in a Gist instead:
- Imported basic setup from an
.ova
file so an entire galaxy is set up on your machine in a VM. - used two post-it note system to indicate if you're done (green) or need help (pink).
Galaxy is a workflow system that preserves history with many configurable tools - includes a galaxy "toolshed" to allow admins to install additional tools. https://toolshed.g2.bx.psu.edu/. I liked that you can keep an old version of the tool alongside a new version of the tool at once!
- State in galaxy workflow is indicated with colour in the workflow history bar on the right.
- History steps can be hidden if they seem uninteresting.
- New histories are started manually, not automatically, and can be named to keep analyses clear and separate.
- A workflow can be re-run on different input datasets
- Galaxy has dataset visualisation, including genome browsers, charts, 3d pdb vis, etc.
- New plugins developer info here: https://github.com/galaxyproject/training-material/tree/master/Dev-Corner
- Can run jobs on clusters and execute analyses of large jobs in parallel - designed to be scalable.
- Jupyter, R studio, others
- easily shared datasets & good for creating supplementary materials for publication
- "galaxy tours" UI tour showed us how to upload data and add a tool to a workflow.
- Easy to upload example files from a URL, typed data, or direct upload.
- Workflows can be renamed and edited as can every step within a workflow.
- Re-running the tool will re-run the workflow with new parameters and needs to be initialised manually.
- the "delete" button doens't permanently delete items - just adds them to a trash bin. This will be purged eventually depending on your galaxy setup, but gives you some grace time to recover things if needed.
- reproducibility information - metadata about the workflow - is available in the workflow panel on the right by clicking on the "i" icon for an expanded workflow.
- galaxy scratchbook allows you to create analyses with side-by-side windowed mode. Nice.
The Ensembl genetree pipeline uses various tools (BLAST, T-Coffee, others) to generate information about gene families and protein families. Generally requires programming knowledge to use. The GeneSeqToFamily Galaxy tool was created to make identifying gene families easier.
We used three sample files with species info, CDS, and JSON, then ran GeneSeqToFamily preparation
on the files.
- we had to change the datatype of the .nhx species file in the right hand pane using the pencil icon. Just set it to
.nhx
- now open the workflow Geneseqtofamily tool. (Not prep this time)
- go to NCBI BLAST + BLASTp and change max hits to show to 4. (It won't look like it saved but apparently it did!)
- Press run workflow. This took a while on my VM with relatively limited resources.
- when everything turned green, I went to the final step "Gene align and family aggregator on data", expanded the step, and clicked on the bar chart button to get the Aequatus visualisation.
- you can also examine the other green data result steps on the right to get info about the data steps that led you here. Provenance!
- use the magnifying glass on the left to see the 4 gene families in the results. Unfortunately they're numbered, not named.
- exons that are similar are coloured similarly across organisms
- a bit hard to see what's going on on my macbook pro screen - might be better in a large monitor.