Events v200

Program and documentation copyright © 1997-1998 by

Romuald Ireneus 'Scibor-Marchocki

 

Abstract

The AnovaMath.dll exports ten real subroutines: Numerical integration by a generalization of the Simpson algorithm to the twelfth degree; it can integrate to positive infinity. The three functions: Gamma, Beta, and the incomplete Gamma. The four cumulative probability distributions: Gaussian, Student-t, chi-squared, and the Fisher. It also exports a six-way analysis of variance subroutine, which accommodates Latin-squares and sporadic missing conditions.

The Animals.mdb database (normalized to 3NF) may be employed to gather observational data of animal behavior. It generates win-lose events and computes the dominance hierarchy for each binary event "bout of behavior". It computes an analysis of variance on those events as well as on the unary events "scan". This six-way analysis of variance considers the one or two animals, time of day, day of week, the observer, and one user-definable factor. It computes the Student-t confidence criterion among these conditions.

 

Table of Contents

I Introduction

II The Concepts

A Mathematical

B Analysis of Variance

C Computation of the subscript

D Student-t

E The Dominance Hierarchy

F The Win-Lose algorithm

G Statistics of the binary events

H Statistics of the Unary Events

III The Implementation

A Sample output print-outs

B The Access database and its included WinLose VBA module

C Weight

D Importation of data from a spreadsheet

E Calling convention and sequences

F Detailed description of the functions exported from the AnovaMath.dll

1 The complete Gamma function.

2 The logarithm of the Beta function [Next revision]

3 The incomplete Gamma function.

4 The Student-t cumulative probability distribution.

5 The Chi-square cumulative probability distribution.

6 The Fisher-f cumulative probability distribution.

7 The Gaussian cumulative probability distribution.

8 Numerical integration over a bounded interval.

9 Numerical integration over the semi-closed interval [0, oo).

IV Installation and Other Details

A Installation

B Files

C Switchboard tree

    1. Animals
    2. Analysis

D Use

E Database

V Closing Remarks

A References

B Future

C Marketing

 

Introduction

The Concepts

This documentation is provided in three versions: a Word 97 document (the master copy), an MS-DOS text version, and an HTML version, which includes internal cross-references.

This program originally was written as a set of 16-bit FORTRAN programs, late in 1997, to help perform the calculations required for the homework in a behavioral-study class, which I was taking. The choice of language was an expedient, and a poor one at that. The program proved very useful; but it best should have been considered as a "proof of principle".

During the early part of 1998, I rewrote the program, in 32-bit C, as a set of five major modules plus the "main" and a file-opening modules. They are linked together into the event.exe program. The major modules are: mathematical, analysis of variance, hierarchy from a matrix input, hierarchy from a binary events input, and unary events. Each of the later two performs an analysis of variance upon the data.

During the Summer of 1998, I rewrote the program, once again, in Microsoft Access and its VBA (= Visual Basic for Applications), which calls a dll, written in Microsoft C. Since this dll utilizes neither internal storage nor any persistent objects, it employs neither constructors nor destructors. Hence, there is no need for C++.

This 32-bit Windows program has been written in, and tested on, the Microsoft C++ version 5.0 and Microsoft Access 97, on a Microsoft Windows NT 4.0 server. The program should run on the Microsoft Windows 95 or 98, as well as on Windows NT 4.0 workstation. The Access development edition is required only if the user desires to modify the database design.

While the Beta function is employed in the computation of the Fisher-f probability distribution and the bilinear transformation is employed in each of the cumulative probability distributions, neither has made it, as an export, before the cut-off. Sorry. Next revision.

Since now all of the data input is handled by the standard Microsoft Access "form", the former documentation of the description and data-input has been removed.

 

Mathematical

The mathematical module contains subroutines for the "double" 64-bit real-variable calculation of each of the following real functions or algorithms:

Gamma function, both complete and incomplete.

The closely related complete Beta function.

The Student-t cumulative probability distribution.

The Chi-square cumulative probability distribution.

The Fisher-f cumulative probability distribution.

The Gaussian (also known as the normal) cumulative probability distribution.

A subroutine for the numerical integration of an arbitrary function, employing a generalization of the Simpson algorithm, extended to employ an approximating polynomial of a user-selectable degree in the closed interval [1, 12]. By the use of the available bilinear transformation functions, the domain of [0, oo) may be one-to-one mapped onto the domain [0, 1). Hence, this transformation enables the integration routine to extend its domain to infinity. The integration routine has provisions for handling a removable singularity, at either end-point.

 

Analysis of Variance

Each combination of the n-ways, taken one or more at a time, is called an effect, of which there are 2**n - 1. An analysis of variance computes the Fisher-f statistic and its confidence criterion for the variance of each of these effects. Thus, directing ones attention to only the significant effects. See the Student-t section, for further detailed analysis of each of these.

Up to a six-way analysis of variance may be performed upon the descriptive-statistics data in the form of a set of three matrices: count of the samples, the means of each sample, and the standard deviations of each sample. There is a provision for the input of the sum-of-the-x's instead of the means. There also is a provision for the input of the sum-of-the-x-squared's, variance, or standard-error-of-the-mean, instead of the standard deviations. And, each of these may be centered at the mean or at the origin. Also, each may be either biased or unbiased. The Fisher-f statistic and its confidence criterion are computed. We display only those with the confidence criterion as least as large as that specified in the "System parameters" table.

The algorithm employed for this computation of the analysis of variance takes into account Latin squares and sporadic missing conditions. Latin squares allow the consideration of multiple values of a factor, without the requirement of gathering more data. However, to be effective, the Latin squares have to be incorporated into the design of the experiment, at the outset. We encourage you to employ Latin squares and their higher dimensional counterpart, often called "Graeco Latin squares", sometimes "Graeco Roman squares". The spelling of the Greek portion of the name varies.

Largely because of the surfeit of data provided by computer-controlled data-collection, Latin squares have gone out of fashion. You will have to make an effort to find a textbook or a table. However, if you are gathering data by hand, the search for such books will be well worth the effort.

 

Computation of the subscript

Since even within a single language, different compilers may compute subscripts of multi-dimensinal arrays in different ways, we provide our own method. There is a VBA code to do so

- - - - - - - -

Private Function Sub6(A As Long, B As Long, C As Long, D As Long, e As Long, f As Long, _

Amx As Long, Bmx As Long, Cmx As Long, Dmx As Long, Emx As Long, _

Fmx As Long, Mmx As Long) As Long

Dim y As Long

y = A + Amx * (B + Bmx * (C + Cmx * (D + Dmx * (e + Emx * f))))

If y < 0 Or Mmx <= y Then y = -1 ' error

If A < 0 Or B < 0 Or C < 0 Or D < 0 Or e < 0 Or f < 0 Then y = -1 ' underflow

If Amx <= A Or Bmx < -B Or Cmx <= C Or Dmx <= D Or Emx <= e Or _

Fmx <= f Then y = -1 ' overflow

Sub6 = y

End Function

- - - - - - - -

The subscripts are A, B, C, D, E, and F. Their domains are the semi-closed intervals [0, Amx), [0, Bmx), [0, Cmx), [0, Dmx), [0, Emx), and [0, Fmx), respectively. Specifically for the analysis of variance subroutine, for the count, sum of x, and sum of x squared arrays, any subscript with one (or more) of its arguments equal to one less than the upper-endpoint is reserved for the decoration and should be pre-set to zero. Also, any unused elements of the arrays should be set to zero, before calling the Anova6 subroutine. A violation of either of these requirements will cause meaningless results to be produced.

 

Student-t

Once the analysis of variance has filtered the significant effects, we compute the Student-t statistic (and its confidence criterion) of the difference between the means of each element of each ordered pair of conditions, within each of these significant effects.

The standard error of the mean of a single mean is the square root of the centered sum of the squares, divided by the product of n and n-1. The n=1 gives us an unbiased variance; while the n converts from the variance of each observation to that of the mean. As an equation, it is sqrt(S/(m(m-1))). The standard error of the difference between two means is sqrt((S+T)/(m+n-2)(1/m + 1/n)), which simplifies to sqrt((S+T)/(m(m-1))), when n=m. The Student-t statistic is the mean (or the difference of the means) divided by the applicable standard error of the mean. The confidence criterion is the Student-t cumulative probability distribution of that statistic. We display only those with the confidence criterion as least as large as that specified in the "System parameters" table.

 

The Dominance Hierarchy

A set of entities may exhibit binary directed pair-wise dominance. If this relationship is transitive, we may arrange these entities into a dominance hierarchy, easiest expressed as an incidence matrix. This concept is applicable to the calling-tree of subroutines as well as to a collection of animals. We adapt one of the possible definitions of a dominance hierarchy, as follows.

Given an incidence-matrix of the binary dominance relationship over a set of elements, we define the dominance hierarchy as the matrix which results from the following algorithm:

Define: Nonneg(x) = x if x > 0 else 0, x an integer.

Define: Clip(x) = 1 if x > 0 else -1 if x < 0 else 0, x an integer

Each of the foregoing two functions operates upon the elements, if its argument is either a vector or a matrix.

Given N a natural number.

Given an N vector V0 of not-repeated integers.

Given an NxN matrix M1 of non-negative integers.

Let M2 = Nonneg(M1 - transpose(M1)). Let M3 = Clip(M2).

Let R1 = row totals of M1. Let C1 = column totals of M1.

Let RC1 = R1 + C1. Let RC1c = Clip(RC1).

Let R2 = row totals of M2. Let C2 = column totals of M2.

Let R3 = row totals of M3. Let C3 = column totals of M3.

Consider a, b in closed interval [1,N].

A vector is sorted by V(a) - V(b). A matrix is sorted by M(a,b) - M(b,a).

To obtain the dominance hierarchy, sort by (major, intermediate, minor):

RC1c, M3, R3, R2, reverse C3, reverse C2, reverse V0.

"Reverse" means that the difference should be negated, e.g., -(V(a) - V(b)).

These matrices are directed from the left-side to the top, that is, the rows indicate the direction. An event matrix (whose elements are the natural numbers) displays the cardinality of the events. An incidence matrix (whose elements are the Boolean 0 = FALSE and 1 = TRUE) displays the possibility of the events. Thus, M1 and M2 are event matrices, while M3 is an incidence matrix.

In the aforementioned sort, except of the M3, the elements of each level may be assigned integral ordinal values, hence they are totally ordered. Thus, the dominance-hierarchy is totally ordered, except for the effect of the M3 matrix. Iff (= if and only if) M2 (hence M3) is a permutation of a triangular matrix, the dominance hierarchy is ordered; else it will contain at least one cycle.

Since the hierarchy may not be totally ordered and thus the elements do not have ordinal numbers, the "heap sort" algorithm (or any other advanced algorithm) cannot be employed. The suggested sorting algorithm is to compare each element with each of the following elements and to permute them iff they are out of order. The sort has to be performed from high to low; i.e., in decreasing order.

 

The Win-Lose algorithm

We define a win-lose event as an in-response-to sequence of binary events, as follows:

There must be a major aggression event (e.g., Contact Aggression, Rush, or Chase), designated a type-1 event.

There must be a clear resolution, with one (which will be called the loser) of the participants of a preceding un-retired type-1 event either:

e.g, Avoiding the other, designated a type-2 event, or

Moving towards another animal (which ordinarily would be superior to the other participant of the type-1 event) for help, designated a type-3 event.

The other participant of the type-1 event then will be declared the winner, and the type-1 event will be retired thereby.

These generated win-lose events are inserted into their chronological position in the input stream of observed events.

 

Statistics of the binary events

The observed binary events are categorized into classes, which are inserted into their chronological position in the input stream of observed events.

From the tallied binary events, the descriptive statistics, an analysis of variance (including its Fisher-f confidence criterion), and the Student-t confidence criterion are computed. Typically, the first four ways consist of the time of day, the day of week, the observer, and perhaps the year or some other user-specified condition. The fifth and sixth ways are for the animal and the subordinate, if any. These events also are tallied into a matrix for each activity. Each such matrix is employed as the M1 matrix in the foregoing definition of the dominance hierarchy, and the dominance hierarchy is computed.

 

Statistics of the Unary Events

Unary events are tallied into samples. The data is categorized into classes. Descriptive statistics, a five-way analysis of variance, and the Student-t confidence criterion are computed upon the data and the classes.

 

The Implementation

Sample output print-outs

These actual reports are handsome; the files have been stripped of all formatting, to save space and to be universally readable. They are the three most important reports produced by the database program.

Dom Hier.txt is a sample of the dominance hierarchy.

Anova out.txt is a sample of the six-way analysis of variance.

Anova e student.txt is a sample of the Student-t computation on the foregoing.

 

The Access database and its included WinLose VBA module

The database has been normalized to the NF3. The switchboard provides entry to each of the form and display modules. The win-lose and dominance-hierarchy algorithms are implemented as subroutines in the WinLose VBA (= Visual Basic for Applications) module. This module also includes a subroutine for the interface to the six-way analysis of variance subroutine and another subroutine for the Student-t, in the AnovaMath.dll.

The animal names and the several actions have had their ID fields entered as two characters. It is suggested that more characters would provide a better mnemonic. Or, the forms could have the corresponding combo-box expanded to include the third field of the name.

 

Query:

(Scaner) Scaner w query à

(One) Flatened One [valid] One count à One stats à (Anova One)

One w query à

One valid

One invalid

 

(Scaner) Scaner w query à

Two Flatened two Two count [not selfl] à Two stats à (Anova two)

Two w query à

Two self

 

Flatened binary pair Win lose report (WinLose)

(Win Lose temp)

DH 7 count [not self] DH 7 stats à (Anova binary pair)

(WinLose) DH 8 count [not self] DH 8 stats à (Anova WinLose)

DH 8 sum [not self] à (Dominance hierarchy WinLose)

 

DH 2 (not transposed)

DH 3 (transposed) DH 4 sum DH 6 (flattened) [not self] à

DH 6 stats à (Anova synopsis)

DH 6 sum à (Dominance hierarchy synopsis)

 

(Anova test) Anova t l[not self] à (Anova test)

 

(DH test) DH t [not self] à (Dominance hierarchy test)

 

(6 Anova) à DH e union DH e stats à

Distinct A in e à

Distinct B in e à

Distinct C in e à

Distinct D in e à

Distinct Subordinate in e à

Distinct Animals in e à

Anova out report (Anova output)

(Anova e decorated)

Anova e filtered

Anova e squared à

Anova e student report (Anova e student)

 

(3 Dominance hierarchy) DH a union DH a pos DH a net DH b union

DH b net DH c union DH c incidence DH c Animal

DH c Subord

DH d union DH d mat DH d incidence

Hier report (Dom Hier) Dom Hier sum

Dom Hier inc

 

 

Also, a small, bare bones, database. Anova.mdb, is provided just for the calculation of the analysis of variance of a single variable.

 

Query:

(Session) Q a à

(Anova test) à Q b Q bf à

D A à

D B à

D C à

D D à

D E à

D F à

Anova out report (Anova out)

(Anova decorated)

 

(Anova decorated) Q f Q fs Anova student report (Anova student)

 

 

The user is encouraged to experiment with modifications to the design of the database and the VBA code.

 

Weight

Animals: The unary detail events ("Obs one" and "Obs two") are weighted by each entry in the Scanner. Since there is no obvious weighing for the Binary Pair, they are unweighted. In the "Anova test", if you do not want to weigh your data, let the weight default to one. However, you should not mix your choice

Alternatively, "Obs one" could be weighted by itself. Just substitute "One w query" for "Scaner w query" in the "One stats query". Likewise, "Obs two" could be weighted by itself. Just substitute "Two w query" for "Scaner w query" in the "Two stats query". Which is better? Neither. I employed the "Scaner w query"; because it was the first one that occurred to me to do so. Which is more appropriate? It depends upon the design of your experimental protocol.

Analysis: In the "Session", if you do not want to weigh your data, let the weight default to one. However, you should not mix your choice. In the "Anova test", if you do not want to weigh your data, let the weight default to zero. However, you should not mix your choice.

 

Importation of data from a spreadsheet

If someone has existing data in Excel or another spreadsheet, or if he prefers to enter the data into a spreadsheet, the data may be imported into Access. Indeed, Access will create a table to match the data design, using the first row as the names of the fields. A filter will have to be designed to transfer this data into our database. If the data is confined to a single table, a write-into query may suffice. However, usually, a VBA subroutine, which reads the data and writes it into one or more tables, will be necessary. This author might be willing to assist in the creation of such a filter.

 

Calling convention and sequences

The C language is very versatile. Its current implementation provides for more than half-a-dozen calling conventions. And, if that is insufficient for your, you may compile a function in the "naked" mode and surround it with your own prologue and epilogue. However, the VBA can call only with the __stdcall calling convention. Thus, to make the analysis of variance and the several mathematical subroutines usable from Access, they are exported with this calling convention.

 

Detailed description of the functions exported from the AnovaMath.dll

Three variables occur in most of these functions. Anything in the closed interval [-small, small] is considered to be zero. A zero value for small defaults to 0.00001. A zero value for the degree of the interpolating polynomial employed by the numerical integration, deg, usually defaults to 6, except as noted otherwise. A zero value for the amount of steps employed during the numerical integration, imx, usually defaults to 20, except as noted otherwise. The choice for these values is not crucial and is more of an art than a science. Our goal was to provide fast computation, good to five decimal places.

Each of the functions is supposed to return a value of zero, if it detects an illegal or meaningless value of its parameters. However, it will require extensive testing to verify that such is the case under all circumstances.

You are welcome to experiment. The higher deg and imx are, the more precise but longer (proportional to their product) the computation. However, an excessively high value for deg (with a small imx) may cause instability. Thus, for deg greater than six, imx has to be increased rapidly. However, an excessively high value for imx will incur an accumulated round-off error. A value over 12 for deg requires better than double precision, not to begin curve-fitting to the round-off errors inherent in the numerical computation of the integrand. In the context of analogue to digital conversion, this error is called "quantization noise". Hence, while the necessary coefficients are easy to compute to any degree, we have not gone beyond 12, in this implementation.

 

The complete Gamma function.

__declspec( dllexport ) double __stdcall gammaB(double x, double small)

The function gammaB computes the Gamma function of x. The domain of x is (-oo, oo). This function will return a one within a small neighborhood of any pole, which are located at zero and each negative integer.

 

The logarithm of the Beta function [Next revision]

The Beta function is defined as G(m) G(n) / G(m + n), where G is the Gamma function. Taking logarithms, we obtain log(B(m, n)) = log(G(m)) + log(G(n)) - log(G(m + n)). The domain of (n, m) is (-oo, oo) squared. The Beta function is symmetric in (n, m). Each of these Gamma functions will return a zero within a small neighborhood of any of its poles, thus preventing the dreaded divide by zero hardware fault. However, the resulting value returned from the Beta function will be meaningless. Of course, nothing is lost; because the Beta function is not defined at any of these points.

 

The incomplete Gamma function.

__declspec( dllexport ) double __stdcall incomplete_gamma_cumB(int deg,

double x, int imx, double nr, double small)

The domain of x is [0, oo). The domain of nr is (-oo, oo). The deg defaults to 8 and the imx defaults to 320. The limit, as x increases without bound, of the incomplete Gamma function, is the (complete) Gamma function of its x equal to this nr.

 

The Student-t cumulative probability distribution.

__declspec( dllexport ) double __stdcall fn_student_t_cumB(int deg,

double x, double sigma, double mean, int imx, unsigned int n, double small)

The function fn_student_t_cumB computes the Student-t cumulative probability distribution of x, with n degrees of freedom. The mean and sigma are the sample mean and sample standard-deviation. The domain of x is (-oo, oo). The domain of n is [1, oo). If n has the illegal value of zero, this function will return a zero.

The Chi-square cumulative probability distribution.

__declspec( dllexport ) double __stdcall fn_chi_square_cumB(int deg,

double x, int imx, unsigned int n, double small)

The function fn_chi_square_cumB computes the chi-squred cumulative probability distribution of x, with n degrees of freedom. It is a wrapper for the incomplete Gamma function. Since this is a difficult integrand, we default deg to 8 and imx to 320. The domain of x is [0, oo). The domain of n is (-oo, oo).

The Fisher-f cumulative probability distribution.

__declspec( dllexport ) double __stdcall fn_fisher_cumB(int deg,

double x, unsigned int num, unsigned int denom, int imx, double small)

The function fn_fisher_cumB computes the Fisher-f cumulative probability distribution of x, with num degrees of freedom in the numerator and denom degrees of freedom in the denominator. It employs the Beta function, as a divisor outside of the integration. The domain of x is (-oo, oo). The domain of (num , denom) is [1, oo) squared. If either num or denom has the illegal value of zero, this function will return a zero.

The Gaussian cumulative probability distribution.

__declspec( dllexport ) double __stdcall fn_gauss_cumB(int deg,

double x, double sigma, double mean, int imx, double small)

This distribution sometimes is called by the over-used name "normal". The function fn_gauss_cumB computes the Gaussian cumulative probability distribution of x, with the given mean and standard deviation sigma. The domain of x is (-oo, oo).

 

Numerical integration over a bounded interval.

__declspec( dllexport ) double simpC(int deg, double a, double b,

double fn(double, parameters *),

parameters *pars, int imx, bool fr, double first, bool ls, double last)

Since only the C language can accept an arbitrary function within the calling sequence, we have exported this function in the default C calling convention.

The function simpC performs a numerical integration over the open interval (a, b). If there is a removable singularity at the left end-point, the right limit of the left end-point should be provided as first and fr set to true. If there is a removable singularity at the right end-point, the left limit of the right end-point should be provided as last and ls set to true. There is no default for the degree of the interpolating polynomial, deg, which should be specified in the closed interval [1, 12]. There is no default for the amount of steps, imx, employed. Use your discretion. If deg is given outside of its domain or imx is given less than one, this function will return a zero.

 

Numerical integration over the semi-closed interval [0, oo).

This technique is employed internally for the computation of each of the cumulative probability distributions. However, as an exported subroutine, it has not made it into this revision. It will require extensive testing, as an exported function. Sorry. Maybe we will provide it in the next revision.

 

Six way analysis of variance.

__declspec( dllexport ) int __stdcall Anova6B(int *Count, double *Sx, double *Ssqx,

int Amx, int Bmx, int Cmx, int Dmx, int Emx, int Fmx, int Mmx, int DescInput,

double Small, int Deg, int Imx,double *Ss, int *Df, double *Ms, double *Fisher,

double *ConfCrit)

The Count is the count, Sx is the sum of x's, and Ssqx is the sum of the squares of x. Each is a six-dimensional array. The Amx, Bmx, Cmx, Dmx, Emx, and Fmx are the sizes of these dimensions. Each subscript is in the semi-open interval [0, Amx), etc. The highest value, e.g., Amx-1, is reserved for the decoration. The Count, Sx, and Ssqx with any one (or more) such subscript should be initialized to zero. The DescInput describes the available options for the input of the data. It is the sum of several flags, in two mutually exclusive styles. If it is positive, the flags are: 1 Sx is the mean (versus sum of x), 2 Ssqx is centered on the mean (versus on the origin), 4 Ssqx is divided by the Count, 8 Ssqx is divided by the Count-1, and 16 Ssqx has had its square root extracted. If it is negative, the flags are: 1 Sx is the mean (versus sum of x), 2 Ssqx is centered on the mean (versus on the origin), 4 Ssqx is a variance, 8 Ssqx is unbiased (versus biased), 16 Ssqx is the standard deviation, and 32 Ssqx is the standard error of the mean. Observe that not every value is meaningful. The Small is employed in the analysis of variance itself and in the included computation of the Fisher-f confidence criterion. The Deg and Imx are employes in this Fisher-f computation. The remaining parameters are for return values. Each is a single-dimensional array of 67 elements; that is, [0, 67). It does not hurt if somewhat more room is provided. They should be pre-initialized to zero. The Ss is the sum squares. The Df is the amount of degrees of freedom. The Ms is the ms of the analysis of variance. The Fisher is the Fisher-f statistic of the analysis of variance. The ConfCrit is its confidence criterion. The Sx array will be decorated and converted to the mean. The Ssqx array will be decorated and converted to the standard error of the mean. Each pair of the triples from the corresponding Count, Sx, and Ssqx may be employed to compute the Student-t statistic and confidence criterion. This function will return a non-zero value, if it has detected an illegal condition.

 

Installation and Other Details

Installation

System requirements: Microsoft Windows NT 4.0 server, NT 4.0 workstation, 98 or 95. Optionally, the development version of Microsoft Access 97.

Expand the "Animals.zip" into a directory of your choice. Move the AnovaMath.dll into your system directory, where it will find a home with all of the other *.dll files. My system directory is C:\winnt\system32\ but yours might be different. If you want to run the program, double click upon the "Animals.mde" or the Anova.mde file. Double click upon the switchboard, within the Forms tab. If you want to modify the database design and have the development version of Access 97, drag the "Animals.mdb" file and drop it into Access. Exactly how it responds depends upon the edition of Access that you may have available on your computer.

 

Files

The files included in the Animals.zip are this document (Events.v200.doc, Events v200.txt, and Events v200.htm), the three sample output files (Dom hier.txt, Anova output.txt, and Anova e student.txt), the dynamic link library AnovaMath.dll, and the Animals database (Animals.mdb is the source, Animals.mde is the executable). Also the analysis of variance database (Anova.mdb is the source, Anova.mde is the executalbe).

 

Switchboard tree

The Animals switchboard.

Main:Data Entry _

add _

primary _

Anova AAnova BAnova CAnova DAnova 4SessionMain _Add _

Animals _

Gender

Animal Names

Main _

Add _

Actions _

Binary Synopsis

Binary

UnaryUnary Detail

Proximity

Main _

Add _

Observations _

Scanner

Obs12

Obs

Obs One

Obs Two

Binary Pair [formerly Hierev]

Main _

Add _

Test and system

Dominance test [formerly Hier]

Anova test [formerly Anova]

Anova titles

System parameters

Main _

Add _

Main _

Data Entry _

edit _

primary _

Anova A

Anova B

Anova C

Anova D

Anova 4

Session

Main _

Edit _

Animals _

Gender

Animal Names

Main _

Edit _

Actions _

Binary Synopsis

Binary

Unary

Unary Detail

Proximity

Main _

Edit _

Observations _

Scanner

Obs12 (weakly synchronized to Scanner)

Obs (weakly synchronized to Scanner)

Obs One (weakly synchronized to Obs)

Obs Two (weakly synchronized to Obs)

One invalid (null)

Two self (null)

Binary Pair

Pair self (null)

Main _

Edit _

Test and system

Dominance test

Anova test

Anova titles

System parameters

Main _

Edit _

Main _

Data Entry _

Main _

Display _

Input _

primary _

Anova A

Anova B

Anova C

Anova D

Anova 4

Session

Main _

Input _

Animals _

Gender

Animal Names

Main _

Input _

Actions _

Binary Synopsis

Binary

Unary

Unary Detail

Proximity

Main _

Input _

Observations _

Scanner

Events Pair

Main _

Input _

Test and system

Dominance test

Anova test

Anova titles

System parameters

Main _

Input _

Main _

Display _

Intriguing _

A-N _

Animal distinct in Obs

Animal distinct in pair

Binary distinct in pair

Detail distinct in One

DCBA distinct in e no report

Main _

Intriguing _

O-P _

One valid

Pairs distinct in pair

Pairs distinct in Two

Proximity distinct in Two

Main _

Intriguing _

Q-Z _

Subordinate distinct in pair

Synopsis distinct in pair

Unary distinct in One

Main _

Intriguing _

Main _

Display _

Flattened _

Binary pair, flattened

One, flattened

Two, flattened

Main _

Display _

Output _

Dominance hierarchy _ (must run first)

WinLose (must run first) *

Dominance hierarchy (must run second) *

Dominance hierarchy incidence

Dominance hierarchy sums

Main _

Output _

6-way Anova _

Anova out (must run first) *

Anova e decorated

Anova e filtered

Anovs e Student *Main _

Output _

Main _

Display _

Main _

 

Note:

The form Obs12 contains tabular versions of Obs One and Obs Two. They are strongly synchronized with their parent columnar Obs, in both the add and edit modes. However, since this Obs Two suffers from an identity crisis, I have been unable to implement the two form-based rules: 1) The proscription against self (that the Subordinate not be the same as the Animal of its parent form Obs). 2) That, once an Unary is entered, only the subsidiary Detail be provided in its ComboBox. However, both of these rules are applied posteriorly, during subsequent queries, by ignoring any non-complient record.

The asterisk indicates the four reports which perform computations, upon being opened. Hence, the "WinLose" must be run before any of the other reports. The computation within the "Dominance hierarchy" is required only for the following two reports. The computation within the "Anova out" is required for the following three reports. The computation within "Anova e Student" is not required for any other report.

 

The analysis of variance switchboard

Main:Data Entry _

add _

primary _

Anova AAnova BAnova CAnova DAnova EAnova FMain _Add _

secondary _

Anova 6

Main _

Add _

Observations _

Session

Main _

Add _

Test and system

Anova test [formerly Anova]

Anova titles

System parameters

Main _

Add _

Main _

Data Entry _

edit _

primary _

Anova AAnova BAnova CAnova DAnova EAnova FMain _Edit _

secondary _

Anova 6

Main _

Edit _

Observations _

Session

Main _

Edit _

Test and system

Anova test [formerly Anova]

Anova titles

System parameters

Main _

Edit _

Main _

Data Entry _

Main _

Display _

Input _

primary _

Anova AAnova BAnova CAnova DAnova EAnova FMain _Input _

secondary _

Anova 6

Main _

Input _

Observations _

Session

Main _

Input _

Test and system

Anova test

Anova titles

System parameters

Main _

Input _

Main _

Data Entry _

Output _

6-way Anova _

Anova out (must run first) *

Anova decorated

Anova filtered

Anovs Student *Main _

Output _

Main _

Display _

Main _

 

Note:

The asterisk indicates the two reports which perform computations, upon being opened. Hence, the "Anova out" must be run before any of the other reports. The computation within the "Anova out" is required for the following three reports. The computation within "Anova Student" is not required for any other report.

 

Use

The "switchboard" is located in its lexicographical position within the Forms tab.

Enter your data, in the order of the menu items in the switchboard. You may delete any of the sample data provided, with two exceptions: 1) The "Anova titles" table is employed, as is. Neither add nor delete any entries; but you may change the "Title" field. 2) The system will not run if there is not at least one entry in each of the "Anova A", "Anova B", "Anova C", "Anova D", "Anova E", "Anova F", "Anova 4", "Anova 6", "Session", and "Animal Names"

Obviously, you need more entries in these and the other tables to be able to obtain a meaningful computation. Data entry is an onerous chore. This program does not reduce the amount of data that needs to be entered; but, at least, it all can be entered with just a mouse and some short-cut keys. Then, the program computes everything and produces all of the reports. There is no need for re-entering or copying the data.

To say the obvious, data is not entered into a record-set, until you un-pend the record in a form. Only then you may use this record (or any of its fields) in another form. The short-cut of alternate-apostroph repeats the last entry of the field in focus.

 

Database

Since each of the components of Microsoft Office 97 employs the same VBA (= Visual Basic for Applications) programming language, in principle, it should be possible to perform the six-way analysis of variance and the associated Student-t calculation from, e.g., Word or Excel. However, a spreadsheet is good for a one-dimensional data set. It works for a two-dimensional data set. It can be stretched for a three-dimensional data set. But, there is no way it can be made to provide input for a six-dimensional data set. By elimination, we are left with either Access or SQL.

You need a way to enter the data. Then the database has to compute the descriptive statistics (count, mean, and variance or standard deviation). The subroutines within the "WinLose" module illustrate how to call the imported analysis of variance and the Student-t subroutines. Finally, the program has to print the reports.

Consider the provided database design as a sample. Feel free to design one for your specific requirements. Then share your design, by distributing in under the GNU license.

The database computes the descriptive statistics either with the built-in mean and (unbiased) variance, or as a sum of the x and sum of the x squared. As many business applications, the Access only integer arithmetic. Thus, the division inherent in the descriptive statistics causes large round-off errors. You could scale them. Fortunately, our data begins as a count of events -- perforce an integer. Then, summing and squaring can be performed in integer arithmetic without any round-off errors. We employ this latter method; but you could experiment.

On the large scale, the AnovaMath.dll employs an analysis of variance algorithm, which takes into account Latin squares and sporadic missing conditions. Furthermore, in the small scale, missing data is accommodated by weighting the mean and variance.

 

Closing Remarks

References

Access 97 Programming. Scott Billings, Joe Rhemann, et al. Sams Publishing, 1997, first edition. ISBN 0-672-21049-X. [Textbook].

Access 97 Developer's Handbook. Paul Litwin, Ken Getx, and Mike Gilbert. Sybex, 1997, third edition. ISBN 0-7821-1941-7. [Reference manual].

Introduction to Mathematical Statistics. Robert V. Hogg and Allen T. Craig. Prentice Hall Press, 1995, fifth edition. ISBN 0023557222. [Textbook].

Statistics; a First Course. Donald H. Sanders. McGraw-Hill, Inc., 1990, fifth edition. ISBN 0-07-0054900-1. [Elementary textbook].

For "Latin square", check with Alta-Vista. There are theorems, calculators, and examples.

 

Future

Customize the Anova.mdb to your specific application. Or, retain me to do so.

Expand the analysis of variance upwards from six-way. However, how realistic/practical is it for humans to gather enough data to justify higher than a six-way analysis of variance?

Add to the mathematical module the computation of various other statistical functions. In the late 1940's, I had devised an algorithm for the analytical (infinite series of trigonometric functions) computation of the incomplete third elliptic integral, in both of its real cases. However, it had not been feasible to utilize the algorithm on the mechanical calculators, which were available at that time. Now, with modern computers and my knowledge of numerical analysis, it would be interesting to implement subroutines to perform the calculations of the three elliptic integrals, both complete and incomplete. But did my notes survive? Does anyone have any suggestions or requirements for some other function?

I am open to suggestions, comments, or encouragement.

 

Marketing

This program is available, from the author (and from other sites).

 

Romuald Ireneus 'Scibor-Marchocki

15 250 East Arrow Highway

Baldwin Park, California 91706-1850

e-mail: rism22@sowest.net

It is marketed in two parts. The AnovaMath.dll is share-ware; the Animals.mde (and its source code Animals.mdb) and Anova.mde (and its source code Anova.mdb) are GNU. If you habitually/frequently use the AnovaMath.dll, please send me a check for $25.00.

 

Please send any comments, suggestions, questions, report of a problem with this program, or other correspondence to the foregoing e-mail address. And please include an e-mail address for a possible response -- do not expect any response by snail-mail. I also would be interested in hearing about any successful applications of this program and about the behavioral studies in which this program has been employed.

 

Saturday 26th September (IX) 1998 RISM.