The open science stack

Creating open science workflows

What is open science??

Complete transparency in the scientific process

Open science workflows

(open science workflows Hampton et al 2014)

The rise of open science

(adopted from Hampton et al 2014)

Why Open Science?

Crisis in public confidence

Why Open Science?

Combat high profile retractions

Why Open Science?

Combat high profile retractions


"The debunkers could do their debunking only because of a bit of luck: Data they needed happened to be available not from its original source, but through another researcher who had posted it to meet a journal’s open-data policies. (fivethirtyeight.com)"

Why Open Science?

Journals care.

Why Open Science?

Journals care.

"the major hurdle to overcome when trying to convince others that we should strive for Open Science: it is a major pain in the ass and is really expensive, in terms of both the money and amount of time required.

We need to stop telling people 'You should' and get better at telling people 'Here’s how' " - Emilio Bruna, UF, editor Biotropica

What is the open science stack?

A stack is a complete group of components that work together to produce a goal.

What is the open science stack?



  • Open lab notebooks / sharing
  • Open Data
  • Open Source / code sharing
  • Reproducible writing
  • Open Access / pre-prints



Open science stack is all the tools you need to produce open science

Open lab notebook

http://www.carlboettiger.info/lab-notebook.html

Virtual department on twitter


(Figure 2A - Darling et al 2013)

Virtual department on twitter

Share early results or discuss major findings with primary authors in other departments

Open Lab notebook / Twitter



  • Open lab notebooks = amazing provenance / opportunity for engagement
  • Open lab notebooks can require more technical skill to set-up
  • Sharing on Twitter / blogs is easier
  • Twitter is a poor platform for idea provenance

Open Lab notebook / Twitter



"This evidence suggests that the practice of open notebook science can faciliate both the performance and dissemination of research while remaining compatible and even synergistic with academic publishing." - Carl Boettiger

"...we believe there can be great and unexpected value to including social media into the life cycle of a scientific paper." - Darling et al 2013

Open data



“Open data and content can be freely used, modified, and shared by anyone for any purpose” - Open Knowledge Foundation

Advantages of open data

Your data can be used long after you're gone

(Figure 1D - Vines et al 2014)

Advantages of open data

Increased citation (9%)

(Figure 2 - Piowar and Vision 2013)

Have a plan for your data

(dataone.org)

http://dmptool.org

TL;DR rules for sharing open data



  1. Use an open format
  2. Use a metadata standards
  3. Use an open license
  4. Use an open repository

Open data formats



What makes a format open?

  • ASCII based
  • Binary but maintained by an open consortium
  • Machine independent
  • Machine readable (should be)

Data format examples

Open

  • FASTA / EMBL / Genbank
  • NeXML / NEXUS
  • GeoJSON / KML
  • CSV
  • NetCDF/HDF5

Closed

  • Excel
  • Any proprietary DB
    • Oracle
    • Access
  • ESRI shape file




  • Know your discipline specific standard
  • Know your funding agency policy
  • Know your journal's policy
  • Know your repository's policy



Some metadata standards


  • EML - Ecology
  • Darwin Core - Biodiversity data
  • CF - Climate data
  • ISO 19115 - GIS data
  • MIMS / MIMARK - Genomic / Metagenomic data

License please!



"To anyone who wants to photocopy, bind, and give a copy of the book to their loved one — more power to them. He/She will likely be disappointed that you’re so cheap, though." - Randall Munroe (xkcd)

License please!



Your most open choice, public domain!

Choose a Creative Commons license that fits your comfort level

No license does not mean your data is open!

http://creativecommons.org/choose/

Data repositories



Ideally:

  • Persistent with fail safes
  • Require metadata
  • Allow versioning
  • Issue a DOI for citability
  • Be open (with an API)!

Data repositories



Some suggestions

  • General purpose - Figshare / Zenodo
  • Biodiversity - GBIF / KNB
  • Nucleic acid sequences - Genbank / EMBL

For more suggestions:

http://www.nature.com/sdata/data-policies/repositories

http://journals.plos.org/plosone/s/data-availability

Open source / code sharing



Advantages of open source



  • Facilitates reproducibility
  • Enables collaboration
  • Incentivises writing clean code (future you thanks you)
  • More people will use what you build

Sharing code



  • Use version control! (git / svn)
  • Write human readable comments
  • Use a license (MIT / GPL / BSD)
  • Share on a public repository (GitHub / Bitbucket)
  • Use an open source platform (e.g. NOT matlab, mathematica)
  • Distribute it (CRAN / pipy)
  • Archive releases and assign DOI's

    http://guides.github.com/activities/citable-code/

Sharing code and data



Wolkovich et al. 2012

Reproducible documents



PDF text and figures is generated from code on the left

Reproducible documents



Code snippets embedded in text formatting

    

Reproducible documents



Code snippets embedded in text formatting

Writing in the open

Collaboration on GitHub

Reproducible document skills



  • Markdown / Latex
  • Git or other VCS
  • R (or python)
  • Patience!

Reproducible document

  • Open format
  • Fully reproducible document
  • Strong provenance tracking

  • Formatting problems
  • Your collaborators may hate you
  • Opportunity costs
  • Software updates can break your document

Pre-prints



"...not only does our newly-accepted PNAS paper have two citations, both from before it was accepted, but another group has already extended our approach in a new direction." - C. Titus Brown, UC Davis

http://ivory.idyll.org/blog/science-f-yeah.html

Pre-prints



(Figure 1. Desjardins-Proulx et al 2013)

  1. Immediate visibility for your work
  2. Establishment of idea precedence
  3. Improved peer-review
  4. Citation before publication

    (Desjardins-Proulx et al 2013)

Pre-prints

Pre-Print feedback on White et al. 2013

Pre-prints

Where to submit:

  • PeerJ
  • arXiv
  • bioRxiv
  • Figshare
    Be aware of your target journal's preprint policy!

Open Access


“Open Access (OA) literature is digital, online, free of charge, and free of most copyright and licensing restrictions.” - Peter Suber (Suber 2012)

Open Access


Two levels of Open Access

Gold Open Access

  • Open license
  • No restrictions
  • Author pays fees
  • Publisher hosts

Green Open Access

  • License varies by journal
  • Journal restrictions
    • Embargo
    • Copyright
    • Repository location
    • Text-mining / reuse
  • Free
  • Author / Institution hosts

Open Access



Some Gold OA journals

  • PLoS
  • PeerJ
  • Scientific Reports
  • Nature Communications
  • F1000 Research
  • Ecosphere
  • BioMedCentral

Advantages of open science



  • Efficiency in the research cycle
  • Greater collaboration / scrutiny
  • New research capabilities
  • Increased impact

(Whyte and Prior 2011)

"It is possible to achieve some measure of traditional success while being open. Grants; publications; tenure. 'nuff said." - C. Titus Brown, UC Davis

http://bit.ly/osstack
@emhrt_