Key Points

Software Packaging Overview


  • Reproducibility is an integral concept in the FAIR4RS principles. Appropriate software packaging is one way to account for reproducible research software, which involves collecting and configuring software components into a format deployable across different computer systems.

  • Software packaging is akin to the packaging a box for shipment. Attributes such as the software source code, installation instructions, user documentation, and test scripts all support to ensure reproducibility.

  • The purpose of a software package is to install source code for execution on various systems, with considerations including target users, dependencies, testability and scalability.

Accessing Packages


  • pip is the most common tool used to download and access Python packages from PyPI.
  • PyPI is an online package repository which users can choose to upload their packages to for others to use.
  • CRAN is R’s repository which has a far higher barrier to entry than PyPI.
  • install.packages is the gold standard way of installing packages from CRAN.
  • pak is the modern fast package manager in R and can install from a variety of sources.
  • Both pip and pak can also be used to install packages on your local system (installing from source).

Creating Python Packages


  • A package can be built with as little as 3 files: a metadata file, a Python script, and an __init__.py file
  • pyproject.toml files have 2 key tables, [build-system] and [project]
  • Editable installs allow for quick and easy package development
  • There are multiple standards out there for Python packaging, but pyproject.toml is the current recommended way.
  • uv streamlines the package development process over using inbuilt Python tooling

Creating R Packages


  • A package can be built with as little as 3 files: DESCRIPTION, NAMESPACE, and a source file.
  • usethis helps generate package skeletons, add dependencies, and add source code files
  • devtools::load_all() loads the current package allowing for quick testing without needing to install it
  • devtools::check() validates the package structure and contents

Versioning


  • Versioning is crucial for tracking the development, improvements, and bug fixes of a software package over time. It ensures that changes are documented and managed systematically, aiding in reproducibility and reliability of the software.

  • Versioning enables users to track code changes and dependencies, allowing reliable recreation of specific software versions, and further aiding the reproducibility of your software.

Publishing Packages


  • R and Python packages can both be installed directly from GitHub
  • GitHub allows you to create named releases using tags
  • You can easily publish your package on PyPI for the wider Python community, allowing your users to simply install your software using pip install.
  • Publishing a package on CRAN is a thorough process with a manual review