Starting a new project? Preparing your software for release? Inherited spaghetti code and want to give it a clean up? Among the many other things to think about will be how to best structure your project. This week at Coding Club, we discussed everything project structure – from directory layout to naming conventions.
You can get the slides for the talk here
Outline
- Higher-level things
 - Low-level things
 
Projects
- Individual piece of software
 - Collected set of components, e.g.:
    
- Simulation
 - Post-processing
 - Data sets
 - Website
 
 
The central ideas
- Abstract over related things
 - Give things names
 - Group things in namespaces
 - Keep it simple
 
Software directory layout
A typical C++ project layout
my_code
|-- docs
|-- examples
|-- include
|   \-- my_code_public.hxx
|-- src
|   |-- my_code_private.hxx
|   \-- my_code.cxx
|-- tests
|   \-- test_my_code.cxx
|-- LICENCE
\-- README.md
Software directory layout
A typical Python project layout
my_code
|-- docs
|-- examples
|-- my_code
|   |-- __init__.py
|   \-- my_code.py
|-- tests
|   \-- test_my_code.py
|-- LICENCE
|-- README.md
\-- setup.py
Software directory layout
Essential
- README
    
- What, why and how
 
 
Very good to have
- Separate source directory (“src”)
 - Separate documentation directory
 - Examples for libraries
 - Licence! What am I allowed to do?
 
Up to you
- Separate tests directory
 - Examples directory
 
The README
The most important file
- Often the first file people see
 - Make a good impression!
 - Details what the code is for
 - Details how to get started
    
- How to get access
 - Where to download from
 - How to install (including dependencies!)
 - How to run tests/examples
 
 - Where to get more information
    
- FAQ, papers, forums, etc.
 
 
The README
Example
# PlanetDetector
PlanetDetector detects planets in images.
Installation:
    pip install --user planetdetector
Run it like:
    planetdetector --mars image03.jpg
## Requirements
PlanetDetector needs `libplanet > 2.3`
The directories
- Separate out “project admin” from source
    
- True for both Python and C++/Fortran
 
 - Might be divided into subcomponents
 - Tests might be alongside source or separate
 includedirectory for public API- Documentation might be separate files or inline
 
Handling dependencies
Python
requirements.txt- List required/optional dependencies and versions
 
Compiled languages
- Some “standard” dependencies, e.g. FFTW, LAPACK
    
- Provide instructions for installing
 
 - For non-standard dependencies, put under 
externals/ - Can commit files directly
 - Or include as git submodules
    
- Easier updating
 
 
Source code
- Data structures
 - Functions
 - Files/Modules
 - Sub-components
 
Data types
“Bad programmers worry about the code. Good programmers worry about data structures and their relationships.”
- Linus Torvalds
 
- Choosing the right data structure can make all the difference
 - Can be easier to reason about data structure than logic
 
Data types
Not great
DiffLookup lookupFunc(DiffLookup *table, string label) {
  for (int i = 0; DiffNameTable[i].method != DIFF_DEFAULT; ++i) {
    if (strcasecmp(label.c_str(), DiffNameTable[i].label) == 0) {
      auto method = DiffNameTable[i].method;
      if (isImplemented(table, method)) {
        for (int j=0;;++j){
          if (table[j].method == method){
            return table[j];
          }
        }
      }
    }
  }
  ...
Data types
Better
DiffLookup lookupFunc(map<string, DiffLookup> table, string label) {
  return table[label];
}
Data types
Object oriented programming
- Wrap up several concepts into a higher-level abstraction
 - Bundle together related nouns (data) and verbs (functions)
 - Abstract a 
Particle, wrapping up mass, charge, position, etc., and how to calculate energy, force, etc. - Reduces cognitive load, freeing up mental energy to think about more important things
 
Object oriented programming
Before
energy = calculate_kinetic_energy(mass1, charge1, position1,
                                  velocity1, E_field)
force = coulomb_force(charge1, charge2, position1, position2)
update_position(position1, mass1, charge1, velocity1, force)
Object oriented programming
Before
energy = particle1.kinetic_energy(E_field)
particle1.set_coulomb_force(particle2)
particle1.push()
Data types
Object oriented programming
- Object-oriented programming is a way to wrap up data and functions that operate on that data
 - Can be a good mental fit for lots of problems in physics
 - Four “pillars”:
    
- Abstraction
 - Encapsulation
 - Inheritance
 - Polymorphism
 
 
Functions
- Reusable tasks
 - Gives names to things
    
- Names are amazing!
 
 - Single responsibility principle
    
- Each “thing” should have one task
 
 - If it’s hard to name, is it really two functions?
 
Functions
Example
for i in range(len(array)-1, 0, -1):
    for j in range(i):
        if list[j] > list[j+1]:
            temp = array[j]
            array[j] = array[j+1]
            array[j+1] = temp
Functions
…becomes
result = sort(array)
Naming things
- Names are hard!
 - Trade off between short and descriptive
 - Variables are nouns, functions are (usually) verbs
 - Dnt ndlssly abbrev
    
- Absolutely no single-letter variables!
 
 bool very_long_variable_names_can_be_difficult_to_read = true- Naming conventions help distinguish between types of names, e.g.:
    
PascalCasefor classes/typescamelCase/snake_casefor functions/variables
 
Runtime checks and multiple conditions
Arrow code
def calculate_thing(x, y, limit, dry_run):
    if x < limit:
        if y >= 0:
            if not dry_run:
                # do thing
            else:
                return True 
        else:
            raise ValueError("negative y")
    else:
        raise ValueError("x over limit")
Runtime checks and multiple conditions
Prefer preconditions
def calculate_thing(x, y, limit, dry_run):
    if x > limit:
        raise ValueError("x over limit")
    if y < 0:
        raise ValueError("negative y")
    if dry_run:
        return True
    # do thing
Splitting up files
- Single files get unmanageable above 10k lines
 - Group logically related things together
 - Sub-components might even go in separate directories
 - Namespace: set of symbols to organise objects
    
- filesystem! 
/home/peter/documentsand/home/nicky/documents - Python modules
 - C++ 
std:: 
 - filesystem! 
 
Fortran Modules
- Always use modules!
 - Compiler generates interfaces for procedures in modules
    
- Doesn’t do this for bare 
subroutines in files 
 - Doesn’t do this for bare 
 - Interfaces catch bugs!
 - Can control access to “internals”
 
Fortran Modules
integrator.f90
module integrator
  private              ! Everything private by default
  public :: integrate  ! Make integrate public
contains
  real function integrate(array, spacing)
    real, dimension(:), intent(in) :: array
    real, intent(in) :: spacing
    ...
  end function
  subroutine helper(...)
  ...
  end subroutine
end module
Fortran Modules
example.f90
real function total_mass(volume, density)
  use integrator
  ...
  total_mass = integrate(density, dv)
end function
No namespaces
- Fortran modules are not namespaced
 
Fortran Modules
even_better_example.f90
real function total_mass(volume, density)
  use integrator, only : integrate
  ...
  total_mass = integrate(density, dv)
end function
No namespaces
- Fortran modules are not namespaced
 
Python Modules
- Python modules can be single files or whole directories
 - Make a directory into a module with a 
__init__.pyfile __init__.pydefines what is imported by default- Modules are namespaces:
    
mycode.minandnumpy.mincan co-exist
 
Example.py
mycode/
mycode/
 |-- __init__.py
 \-- integrator.py
mycode/integrator.py
def integrate():
mycode/__init__.py
from integrator import integrate
Using mycode
import mycode
mycode.integrate(data)
C++
Namespaces
- No modules in C++ (yet!)
 #include-ing files literally inserts the text of the file- Namespaces allow reusing the same name for different things
\vspace{0.15cm}
    
#include <vector> namespace mycode { class vector { ... }; } std::vector<double> standard; mycode::vector mine; 
Anonymous namespaces
- Reduces the scope of members to the translation unit
    
- Translation unit: a source file and all its included headers
 
 
Documentation
- Writing documentation is bad
 - Undocumented code is worse
 - Try to make code self-explanatory
    
- Always in-sync!
 
 - Then write documentation directly in source
    
- => Reference manual
 
 - Then write stand alone documentation
    
- => How-to guide or tutorial
 
 
Inline documentation
Documentation builders
- Built in to some languages like Python:
\vspace{0.15cm}
    
def foo(a, b): """ Foos a and b together, returning a list of the results """ - Tools exist for other languages, e.g. Doxygen, Ford
\vspace{0.15cm}
    
//> Foos a and b together, returning a list of the results std::list<result> foo(int a, int b) {\vspace{0.15cm}
!! Foos a and b together, returning a list of the results function foo(a, b) result(list) 
Inline documentation
Documentation builders
- Compiles inline documentation into e.g. LaTeX, PDF, HTML
    
- HTML could go on project website
 - Most allow LaTeX
 
 - Web services exist for doing this automatically
    
- e.g. Readthedocs https://readthedocs.org/
 
 
Conclusion
- Abstract over related things
 - Give things names
 - Group things in namespaces
 - Keep it simple