The code we write is often a means to an end: our main outputs are papers, PhD theses, talks and grants. But a lot of work goes into writing the code, and often this work is lost when someone leaves, and repeated by the next person. The code is also a record of how results were generated.

This week, Ben led us through how and why to make your code available to others, and in particular how to package python projects.

Why (and why not) make your code open?

Pros

  • Enables your work to have a wider impact; others can build on what you have done
  • Something to be known for when building a career
  • Widens the number of collaborators, potential coauthors
  • You can benefit from fixes and improvements
  • The code commits are a public record

Cons

  • A significant effort to document, tidy code
  • Time investment in supporting other users. Less time doing physics research

Most software projects are abandoned

[columns]

[column=0.65]

  • As of April 2017, GitHub claims to have 57 million repositories [Wikipedia]
  • In 2012 a presentation on Ohloh (now Open Hub) showed:
    • Only 5% of projects had a commit in the last year and more than one committer
    • Only about half of projects had more than one committer
    • 17% of projects had a commit in the last year

https://www.slideshare.net/blackducksoftware/open-source-by-the-numbers

[column=0.35]

\begin{figure} \centering \includegraphics[width=\textwidth]{project_commits.png}

\includegraphics[width=\textwidth]{project_team.png} \caption{} \end{figure}

[/columns]

Barriers to adopting code

  • Finding the code (**)
  • Understanding what it does
  • Getting it installed (**)
  • Learning how to use it
  • Fixing it when something goes wrong

The Tale of J.Random Newbie: http://www.catb.org/~esr/writings/taoup/html/ch16s01.html

Finding the code

Understanding what a code is/does

[columns]

[column=0.5]

If you are making code available, you should include these text files:

  • README, describing what the code does
  • LICENSE, specifying the license/copyright
  • INSTALL, with instructions on how to install it

Github and other online repositories make these part of the project front page

[column=0.5]

\begin{figure} \centering

\includegraphics[width=\textwidth]{numpy_license.png}

\includegraphics[width=\textwidth]{numpy_github.png} \caption{} \end{figure}

[/columns]

Understanding what a code is/does

[columns]

[column=0.5]

If you are making code available, you should include these text files:

  • README, describing what the code does
  • LICENCE, specifying the license/copyright
  • INSTALL, with instructions on how to install it

Github and other online repositories make these part of the project front page

[column=0.5]

\begin{figure} \centering

\includegraphics[width=\textwidth]{numpy_license.png}

\includegraphics[width=\textwidth]{numpy_github_raw.png} \caption{} \end{figure}

[/columns]

Installing code

If your code is hard to install, people need a very strong motivation or they will not do it.

  • Compiled languages (C, Fortran etc.) need to be compiled: configure, make, make install

  • Python and other languages need an interpreter e.g. Python 2 or 3 installed
  • Most code needs libraries

Help your users (and yourself) by:

  • Documenting the libraries your code needs
  • Documenting the install process
  • Having a Makefile or other build system
  • (Preferably) having a configuration script and install target

Python packaging

Python code is organised into packages, which are just directories and files

thing/
     __init__.py
     code.py
  • __init__.py does not have to contain anything, just exist

If thing is in your current directory, or in the PYTHONPATH environment variable, then you can run:

from thing import code

code.myfunction()

Tidying up the interface

What if we try something slightly different:

import thing
thing.code.myfunction()
-> AttributeError: module 'thing' has no attribute 'code'

When we import thing we don’t automatically import the code.py file. This is what the __init__.py file is for.

In __init__.py

from . import code

The code file is then imported automatically when thing is imported.

Note: The format of these relative imports have changed from Python 2 to Python 3.

Python installation: setuptools

A small python script setup.py can be used to specify

  • Which libraries or packages your code depends on
  • How to compile any C or Fortran code included in your package

These scripts can then automatically install your code.

from setuptools import setup

setup(
    name = "Thing",
    version = "0.1",
    packages = ["thing"]
    )

https://packaging.python.org/tutorials/distributing-packages/

Testing your installation process

Installing in your system paths is not a good idea, particularly when testing. Instead:

$ python setup.py install --user
...
creating thing.egg-info
installing library code to build/bdist.linux-x86_64/egg
...
creating 'dist/thing-0.1-py3.6.egg' and adding 'build/bdist.linux-x86_64/egg' to it
...
Processing thing-0.1-py3.6.egg
Installed /home/bd512/.local/lib/python3.6/site-packages/thing-0.1-py3.6.egg
Processing dependencies for thing==0.1
Finished processing dependencies for thing==0.1

Console and GUI scripts

If your project is supposed to be run as an application, setuptools can automatically create executables that will work under both Linux and Windows:

setup(
      ...
      entry_points={'console_scripts': [
                        'script = package.__main__:main'],
                    'gui_scripts': [
                        'app = package.__main__:gui'},
      )

This will create an executable called script (or script.exe for Windows) that imports the function main from package.__main__ and then calls it with no arguments.

The gui_scripts only makes a difference on Windows, where it will launch the program without opening a console window.

Entry points

Here’s a basic console entry point function in package/__main__.py:

import sys

def main(args=None):
   """The main routine"""
   if args is None:
       args = sys.argv[1:]
   print("This was called with command line arguments: ", args)

if __name__ == "__main__":
    main()

The magic __name__ == "__main__" part allows the package to be invoked with python -m <package> to run the main() function

Dependencies

setuptools also handles dependencies, and will fetch packages from PyPI as needed.

setup(
      ...
      install_requires=['numpy>=1.8',
                        'scipy>=0.14',
                        'matplotlib>=1.3'],
      )
  • Can specify a minimum version, an exact version (==), or more complicated ranges
  • Dependencies can be at a web address or repository (git/hg/svn) if they have a setup.py file or are a single python file

Distributing python packages: PyPI

   $ python setup.py sdist
  • [Optional] Create a wheel (various types, pre-built)
    $ pip install --user wheel
    $ python setup.py bdist_wheel --universal
  • Install twine (> 1.8.0) and upload the project (may need ~/.pypirc )
   $ pip install --user twine
   $ twine upload dist/*

Distributing Packages: https://packaging.python.org/tutorials/distributing-packages/

More complicated projects

The setup.py script can handle much more complex situations (NumPy,SciPy)

  • Cython
   from setuptools.extension import Extension
   from Cython.Build import cythonize
   import numpy

   extensions = [ Extension("somemodule",
                            ["somemodule.pyx"],
                            include_dirs=[numpy.get_include()])
                ]
   setup(...
         ext_modules=cythonize(extensions)
         )

https://copyninja.info/blog/cython_setuptools.html

More complicated projects

The setup.py script can handle much more complex situations (NumPy,SciPy)

  • Cython
  • C, Fortran extensions
  • Installation scripts
  • Installing your code as a script (executable command)

Conclusions

  • Making your code available is generally a good thing
  • Important to include README, LICENSE, documentation, tests
  • Make life easier for new users

  • Python setuptools very easy to set up for simple projects
  • More complicated projects are… more complicated
  • Uploading to PyPI makes installing your code easy. The process has changed several times, so documentation not consistent

One I made earlier…

    $ pip install freegs

https://jeffknupp.com/blog/2013/08/16/open-sourcing-a-python-project-the-right-way/