Key Points
Software Packaging Overview
Reproducibility is an integral concept in the FAIR4RS principles. Appropriate software packaging is one way to account for reproducible research software, which involves collecting and configuring software components into a format deployable across different computer systems.
Software packaging is akin to the packaging a box for shipment. Attributes such as the software source code, installation instructions, user documentation, and test scripts all support to ensure reproducibility.
The purpose of a software package is to install source code for execution on various systems, with considerations including target users, dependencies, testability and scalability.
Accessing Packages
-
pipis the most common tool used to download and access Python packages from PyPI. - PyPI is an online package repository which users can choose to upload their packages to for others to use.
- CRAN is R’s repository which has a far higher barrier to entry than PyPI.
-
install.packagesis the gold standard way of installing packages from CRAN. -
pakis the modern fast package manager in R and can install from a variety of sources. - Both
pipandpakcan also be used to install packages on your local system (installing from source).
Creating Python Packages
- A package can be built with as little as 3 files: a metadata file, a
Python script, and an
__init__.pyfile - pyproject.toml files have 2 key tables, [build-system] and [project]
- Editable installs allow for quick and easy package development
- There are multiple standards out there for Python packaging, but pyproject.toml is the current recommended way.
-
uvstreamlines the package development process over using inbuilt Python tooling
Creating R Packages
- A package can be built with as little as 3 files:
DESCRIPTION,NAMESPACE, and a source file. -
usethishelps generate package skeletons, add dependencies, and add source code files -
devtools::load_all()loads the current package allowing for quick testing without needing to install it -
devtools::check()validates the package structure and contents
Versioning
Versioning is crucial for tracking the development, improvements, and bug fixes of a software package over time. It ensures that changes are documented and managed systematically, aiding in reproducibility and reliability of the software.
Versioning enables users to track code changes and dependencies, allowing reliable recreation of specific software versions, and further aiding the reproducibility of your software.
Publishing Packages
- R and Python packages can both be installed directly from GitHub
- GitHub allows you to create named releases using tags
- You can easily publish your package on PyPI for the wider Python
community, allowing your users to simply install your software using
pip install. - Publishing a package on CRAN is a thorough process with a manual review