
Accounting Engine
-----------------

Quick Overview
--------------
The fundamental accounting unit is a "journal entry".  Each entry consists
of a quantity (number of dollar bills, number of shares, etc.), the
price of that quantity (the price of one dollar is 1.00 dollars, etc.)
a memo, a pointer to the "credited" account, a reconciled flag and timestamp, 
an "action" field, and a pointer to any additional generic data.   Because 
the word "entry" has many other meanings, the remainder of the documentation, 
and the code, refers to it as a "split".  So when you see "Split", think "Journal
Entry".

The notion of "double entry" is embodied in the "transaction" structure.
The transaction consists of a date, a description field, a number, and
a list of one or more "splits".  When double-entry rules are properly 
enabled in the engine (they are settable with various flags), the sum-total
value of all of the splits is forced to be zero.  Note that if there
is just one split, its value must be zero; this is useful because a 
zero-valued split can store a price (useful for e.g. tracking stocks).
If there are two splits, then the value of one must be positive, the
other negative: this denotes that one account is credited, and another
is debited by an equal amount.  It is this very forcing of the splits to
add up to zero that causes a double-entry accounting system to always
automatically balance.  

Through various flags, double-entry can be disabled; this is often 
desirable for novice, home-oriented users. As an alternative, it may be
better to leave double-entry enabled, but credit all dangling splits 
to some dummy account, and then simply not show that dummy account to 
the user.

Note that the sum of the values of the splits is always computed with respect to 
a currency; thus splits can be balanced even when they are in different
currencies, as long as they share a common currency.  The conversion price
is simply the rpice stored in the split.  This feature allows currency-trading 
accounts to be established.

Every split must point at its parent transaction, and that transaction
must in turn include that split in its list of splits.   A split can 
belong to at most one transaction.  This relationship is forced by the engine.
The engine user cannnot accidentally destroy this relationship as long as they
stick to using the API, and never accessing internal structures directly.

Splits are grouped into "accounts". Each account consists of a list of 
splits that credit (debit) that account.   To ensure consistency, 
if a split points at an account, then the account must point at the 
split, and vice-versa.  A split can belong to at most one account.
Besides merely containing a list of splits, the account structure
also gives the account a name, a code number, description and notes fields,
a place to store general info, a pointer to the currency that is used
for all splits in this account, and a pointer to the "security" used 
for all splits in this account.  The "security" can be the name of a stock
(e.g. "IBM", "McDonald's"), or another currency (e.g. "Yen", "Franc"). 
The security is used during split balancing to enable trading between 
accounts denominated in different currencies, or to, for example, move 
stocks from one account to another.

Accounts can be arranged in a heirarchical tree.  The nodes of the 
tree are called "Account Groups".  By accounting convention, the 
value of an account is equal to the value of all of its splits plus
the value of all of its sub-accounts.  Account Groups are implemented 
as doubly-linked trees.


Stocks, non-Currency-Denominated Assets
---------------------------------------
The engine includes support for non-currency-denominated assets, 
such as stocks, bonds, mutual funds, inventory.  This is done with
two values in the Split structure:

   double share_price;
   double damount;

"damount" is the number of shares/items.  It is an "immutable" quantity,
in that it cannot change except by transfer (sale/purchase).  It is the
quantity that is used when computing balances.

"share_price" is the price of the item in question.  The share-price is
of course subject to fluctuation.

The net-value of a split is the product of "damount" and "share_price".
The currency balance of an account is the sum of all "damounts" times
the latest, newest share-price.

Currency accounts should use a share price of 1.0 for all splits.

To maintain the double-entry consistency, one must have the following
hold true:

   0.0 == sum of all split values.

If all splits are in the same currency, then this becomes:

   0.0 == sum of all ((split->damount) * (split->share_price))  

Thus, for example, the purchase of shares can be represented as:

   source:
   debit ABC Bank for $1045  (1045 dollars * dollar "price" of 1.00)
   
   destination:
   credit PQR Stock for $1000 (100 shares at $10 per share)
   credit StockBroker category $45 in fees

If the splits are in mixed currencies and securities, then there must
be at least one common currency/security between all of them.  Thus, 
for example:

   source:
   debit ABC Bank for $1045  (1045 dollars * dollar "price" of 1.00)
   
   destination:
   credit VolkTrader for 2000 DM (1000 dollars at 2.0 mark per dollar)
   credit Fees category $45 in fees

If the "currency" field is set to "DM" for the VolksTrader account, 
and the "security" field is set to "USD", while the currency for ABC bank is
"USD", then the balancing equation becomes: 

0.0 = 1045 * $1.00 - $1000 - 45 * $1.00

Note that we ignored the price when adding the second split.



Error Reporting
---------------
The error reporting architecture (partially implemented), uses a globally
visible subroutine to return an error.  In the naivest possible implementation,
the error reporting mechanism would look like this:

    int error_num;   /* global error number */

    int xaccGetError (void) { return error_num; }

    void xaccSomeFunction (Args *various_args) {
        if (bad_thing_happened) error_num = 42;
    }  

Many programmers are used to a different interface, e.g.

    int xaccSomeFunction (Args *various_args) {
        if (bad_thing_happened) return (42);
    }  

Because of this, it is important to explain why the former design was choosen over
the latter.  Let us begin by listing how the choosen design is as good as, and in
many ways can be better to the later design.

 (1) Allows programmer to check for errors asynchronously, e.g. outside
     of a performance critical loop, or far away, after the return of
     several subroutines.
 (2) (with the right implementation) Allows reporting of multiple, complex
     errors.  For example, it can be used to implement a trace mechanism.
 (3) (with the right implementation) Can be thread safe.
 (4) Allows errors that occurred deep in the implementation to be reported
     up to much higher levels without requireing bagagge inthe middle.

The right implementation for (2) is to implement not a single variable, but a stack
or a ring (circular queue) on which error codes are placed, and from which error
codes can be retreived.  The right implementation for (3) is the use
pthread_getspecific() to define a per-thread global and/or ring/queue.


Engine Isolation
----------------
Goals of engine isolation:
  o Hide the engine behind and API so that multiple, plaggable engines
    could be created, e.g. SQL or CORBA.
  o Engine users are blocked from being able to put engine ineternal
    structures in an inconsistent state.  Updates are "atomic".

Some half-finished thoughts about the engine API:

-- The engine structures should not be accessible to any code outside 
   of the engine.  Thus, the engine structures have been moved to 
   AccountP.h, TransactionP.h, etc.
   The *P.h files should not be included by code outside of the engine.

-- The down-side of hiding is that it can hurt performance.  Even trivial data 
   accesses require a subroutine call.  Maybe a smarter idea would be to leave
   the structures exposed, allow direct manipulation, and then "copy-in" and
   "copy-out" the structures into parallel structures when a hidden back end
   needs to be invoked.

-- the upside of hiding behind an API is that the engine can be 
   instrumented with extension language (perl, scheme, tcl, python) hooks 
   for pre/post processing of the data.  To further enable such hooks, we 
   should probably surround all operations on structures with "begin-edit" 
   and "end-edit" calls.

-- begin/end braces could potentially be useful for two-phase commit schemes.
   where "end-edit" is replaced by "commeit-edit" or "reject-edit".


Reconciliation
--------------
> * From: "Christopher B. Browne" <cbbrowne@knuth.brownes.org>
> >
> > /* Values for the reconciled field in Transaction: */
> > #define CREC 'c'              /* The transaction has been cleared
> > */
> > #define YREC 'y'              /* The transaction has been reconciled
> > */
> > #define FREC 'f'              /* frozen into accounting period
> > */
> > #define NREC 'n'              /* not reconciled or cleared
> > */

Note that FREC is not yet used/implemented ...

>  I've reconciled the bank/credit card                                   
> statement containing the transaction, and completed the
> reconciliation.  One
> could consider the transaction to now be "set in stone."  

If a transaction has been marked "reconciled" in the gui, should
the GUI then block any changes to the transaction?

How about the following proposal:
-- transactions marked 'y' (reconciled) in gui cannot be edited.
-- gui will allow 'y' to be changed back to 'n' or 'c'
   (thus allowing editing).
-- engine will also enforce above restricitions
-- transactions marked 'f' cannot be changed, period, either
   in the gui or the engine.

Let me know if this is a bad idea, otherwise I'll implement it.

Should I change the font or color or something for reconciled
transctions, to provide some (strronger) visual cue?  If so, what
color/font/whatever?


> (In a more traditional accounting system, this would be very much 
> the case.  Once a period is "closed," you can't change the data 
> anymore...)
> 
> I think that there should be a date stamp attached to the reconciliation
> field so that as well as knowing that it has been reconciled, you also 
> know *when* it was reconciled.
> 
> This isn't so important for personal finances for the periodic user; I
> have in the past wanted to know when a particular transaction was 
> reconciled.  This is useful if you want to trace back from the 
> electronic record to determine when the item actually cleared through 
> the bank.
> 
> This means that I can look at Cheque #428, written Jan 1/97, cashed in May 
> 1997 (it sat in someone's desk for a while) in the computer system and say 
> "Ah.  It was marked as reconciled on June 12th/97. That was when I did the 
> reconciliation of the May bank statements.  Ergo, the cheque cleared in May, 
> and that's the statement to go to to find a copy of the cheque..."
> 
> It's not terribly important for cheques that get cashed right away; it *is* 
> for things that hang around uncashed for a while.

If the above is implemented, what date should be stored if the user
toggles the recn flag a few time?  The date of the last toggle?
The very first date that it was recn'ed?

Automatic Backup
----------------
The following has been implemented:

Have (by default) xacc create a backup file
filename-timestamp.xac on every save.  This will eat up some disk
space until you go back and kill the older versions, but it's
better than not realizing that the data's subtly corrupt a week
later. 

A lot of small-office/home systems do this. primarily useful as a
historical record, in case you accidentally wipe out something, and
don't spot it until later.  Limited usefulness, but very nice in case
you accidentally delete an entire account.  
     
To a limited degree, it provides atomicity/consistency/etc at the
course-grained account-group level.


Transaction Processing
----------------------
There is a rudimentary level of "TP" build in, via the routines
xaccTransBeginEdit(), xaccTransRollbackEdit(), and xaccTransCommitEdit(),
which allow changes to be made to a transaction, and then commited,
or rejected at the end.  Handy for the GUI, if the user makes a bunch
of changes which they don't want to keep; also handy for an SQL back end,
where the Commit() routine triggers the actual update of the SQL database.


Journal Logs
------------
The following has been implemented; see TransLog.c for details.

Transaction logs.  The idea was that every time a transaction 
was called that would cause a data update, the information that 
was updated would be dumped out into an "append only" log file.

This somewhat parallels what better database systems do to ensure
integrity; Oracle, for instance, has what is called an "archive log." 
You have a copy of the database "synced up" as at some point in time, 
and can apply "archive logs" to bring that old copy of the database 
up to date should something go wrong to trash today's copy.

In effect, you'd have things like

=== 97/01/01 04:32:00 === Add Transaction ==== [whatever was added] ====
=== 97/01/01 04:32:02 === Delete Transaction ==== [whatever was deleted] ====

It also is a useful debugging tool, as if you make sure that the 
"log_transaction()" call starts by opening the log file, writes, and
then closes and syncs, you know what is going on with the data even if
horrible things happen to the "master" database file.

Session Mgmt
------------
To allow the user of the engine some guarentee of atomic updates, 
serialized file I/O, related miscellany, the concept of a session
is supported.  No file IO can be performed until a session has 
been created, and file updates are not guarenteed atomic unless
performed within a SessionBegin/SessionEnd pair.

Note that (in the current implementation) data can be manipulated 
outside of the session; its just that it cannot be saved/made persistent.

The goal of sessin management is to ensure that e.g. two users don't end 
up editing the same file at the same time, or, e.g. that an automatic
stock-quote update daemon running under a different pid doesn't trash 
data being currently edited by the user.


Remaining Work Items
--------------------
To find other remaining work items in the code, grep for the string 
"hack alert".


Ideas for engine enhancements:
-----------------------------
  1) Have (by default) gnucash immediately re-read a file after every
     write, and compare the two resulting AccountGroups for equality.

     During development, this is a good idea, as it will help uncover
     thinko's more quickly, instead of letting them hid for weeks or months
     (as the last one did).  Its not a bad self-consistency check when
     monkeying with the internals, damn the performance.
     
     It can be removed/disabled for product versions.


September 1998
