OSBib v3.0

A collection of PHP classes to manage bibliographic formatting for OS bibliography software using the OSBib standard. Taken from and originally developed in WIKINDX (http://wikindx.sourceforge.net).

Released through http://bibliophile.sourceforge.net under the GPL licence.

If you make improvements, please consider contacting the administrators at bibliophile.sourceforge.net so that your improvements can be added to the release package.

October 2005
Mark Grimshaw (WIKINDX)
Andrea Rossato (Uniwakka)
Guillaume Gardey (BibOrb)
Christian Boulanger (Bibliograph)


INTRODUCTION
BIBSTYLE
CITESTYLE
TESTOSBIB
PARSEXML
LOADSTYLE
PARSESTYLE
STYLEMAP
UTF8
BIBFORMAT
BIBFORMAT USAGE
CITEFORMAT
CITEFORMAT USAGE


INTRODUCTION

OSBib is an Open Source bibliographic formatting engine written in PHP that uses XML style files to store formatting data for in-text or endnote-style (including footnote) citations and bibliographic lists. Released through Bibliophile, OSBib is designed to work with bibliographic data stored in any format via mapping arrays as defined in the class STYLEMAP. For those bibliographic systems whose data are stored in or that can be accessed as bibtex-type arrays, STYLEMAPBIBTEX is a set of pre-defined mapping arrays designed to get you up and running within a matter of minutes. Data stored in other formats require that STYLEMAP be edited.

OSBib provides support for printing the formatted output to web browsers or for exporting to Rich Text Format (for insertion into OpenOffice and similar word processors), exporting to OpenOffice's native sxw format or to plain text with no font formatting.

Style files are stored in XML format and are available for download from the Bibliophile site at:
http://bibliophile.sourceforge.net
The naming of the style files to be downloaded is (for example):
OSBib-americanPsychologicalAssociation_1.0_1.1
where the first number (in this case '1.0') is the version number of the OSBib classes the style is at least compatible with and the second number is the version number of the style file itself. For an explanation of the structure of the XML file, see bibliography_xml and citation_xml.

The OSBib package has two sections which share some common PHP files. Files in the directory format/ will format the bibliography output as described above. Files in the directory create/ will create or edit the XML style files. As supplied in the OSBib package, the create interface is stand-alone and runs via index.php. Users wishing to integrate the creation/editing interface within their bibliographic management system will need to modify or extract various portions of index.php for use in their own PHP code.


BIBSTYLE

This is not part of the distribution package but is here as an example of how WIKINDX uses OSBib-Format. BIBSTYLE::process() is the loop that parses each bibliographic entry one by one. You are likely to need a similar process loop. Further comments are found in CITESTYLE.php.


CITESTYLE

This is not part of the distribution package but is here as an example of how WIKINDX uses OSBib-Format. CITESTYLE::start() is the method that parses citations within a block of text. You will need a similar method. Further comments and help are found in CITESTYLE.php. Many of the methods used in CITESTYLE are similar to those used in BIBSTYLE so are not here described separately.


TESTOSBIB

This is not part of the distribution package but is here as a very simple example of how to set up bibliography formatting without many of the extra options found in BIBSTYLE. It can be run direct from a web browser to display how raw input is transformed into a formatted bibliography.


PARSEXML

Parse the XML style file into usable arrays. Used within BIBFORMAT::loadStyle() and CITEFORMAT.


LOADSTYLE

include_once($pathToOsbibClasses . "LOADSTYLE.php");
ARRAY LOADSTYLE::loadDir($pathToStyleFileDirectory);

This scans the style file directory and returns an alphabetically sorted (on the key) array of available bibliographic styles e.g.
$styles = LOADSTYLE::loadDir("styles/bibliography");
print_r($styles);

An example output from this would be :
Array ( [APA] => American Psychological Association (APA) [BRITISHMEDICALJOURNAL] => British Medical Journal (BMJ) [CHICAGO] => Chicago [HARVARD] => Harvard [IEEE] => Institute of Electrical and Electronics Engineers (IEEE) [MLA] => Modern Language Association (MLA) [TEST] => test [TURABIAN] => Turabian [WIKINDX] => WIKINDX -- Show All )

Use this to provide your users with a HTML FORM selectbox to choose their preferred style where the key from the array above is used in BIBFORMAT::loadStyle().


PARSESTYLE

This is used internally in BIBFORMAT and CITEFORMAT and parses a single style definition string for a particular resource type (book, web article etc.) from a style XML file into an array to be used by OSBib.


STYLEMAP

(If your database stores or access its records in a BibTeX style format, you should use STYLEMAPBIBTEX instead as this has been specially devised to offer an out-of-the-box solution for such systems and is a version of STYLEMAP that should not require editing. See also USAGE below.)

This contains all the mapping between your particular database/bibliographic management system and OSBib. There are plenty of comments in that file so read them carefully.
1/ You should edit $this->types array.
2/ You should edit each resource type's array changing only the key of each element. However, do not edit any key (or its value) that is 'creator1', 'creator2', 'creator3', 'creator4' or 'creator5'. For resource types in $this->types that you set to FALSE, you do not need to do anything to the specific resource array as these arrays will then be ignored.

A SQL query in WIKINDX to display each resource in a format suitable for OSBib processing may return the following associative array for one resource:
Array ( [resourceId] => 1 [type] => journal_article [title] => {X} Window System, Version 11 [subtitle] => [noSort] => The [url] => [isbn] => [field1] => 20 [field2] => S2 [field3] => [field4] => [field5] => [field6] => [field7] => [field8] => [field9] => [file] => [collection] => 1 [publisher] => [miscField1] => [miscField2] => [miscField3] => [miscField4] => [tag] => [addUserIdResource] => 1 [editUserIdResource] => [year1] => 1990 [year2] => [year3] => [pageStart] => [pageEnd] => [creator1] => 1,2,3 [creator2] => [creator3] => [creator4] => [creator5] => [quotes] => [paraphrases] => [musings] => [publisherName] => [publisherLocation] => [publisherType] => [collectionTitle] => Software Practice and Experience [collectionTitleShort] => [collectionType] => journal [timestamp] => 2005-04-24 10:48:15 )

What is important here is that the key names of the above array match the key names of the resource type arrays in STYLEMAP. This is how the data from your particular database is mapped to a format that OSBib understands and this is why you must edit the key names of the resource type array in STYLEMAP. The one exception to this is the handling of creator elements (author, editor, composer, inventor etc.) which OSBib expects to be listed as 'creator1', 'creator2', 'creator3', 'creator4' and 'creator5' where 'creator1' is always the primary creator (usually the author). Do not edit these key names.


UTF8

include_once($pathToOsbibClasses . "BIBFORMAT.php");
$utf8 = new UTF8();

BIBFORMAT expects its data to be in UTF-8 format and will return its formatted data in UTF-8 format. If you need to encode or decode your data prior to or after using OSBib, do not use PHP's utf8_encode() and utf8_decode() functions. Use the OSBib functions UTF8::encodeUtf8() and UTF8::decodeUtf8() instead. Additionally, if you need to manipulate UTF-8-encoded strings with functions such as strtolower(), strlen() etc., you should strongly consider using the appropriate methods in the OSBib UTF8 class.

METHODS

UTF8::encodeUtf8()
$utf8String = $utf8->encodeUtf8(STRING: $string);

Properly encode a string into multi-byte UTF-8.


UTF8::decodeUtf8()
$string = $utf8->decodeUtf8(STRING: $utf8String);

Properly decode a multi-byte UTF-8 string.


UTF8::utf8_strtolower()
$utf8String = $utf8->utf8_strtolower(STRING: $utf8String);

Convert a UTF-8 string to lowercase. Where PHP has been compiled with mb_string, mb_strtolower() will be used.


UTF8::utf8_strtoupper()
$utf8String = $utf8->utf8_strtoupper(STRING: $utf8String);

Convert a UTF-8 string to uppercase. Where PHP has been compiled with mb_string, mb_strtoupper() will be used.


UTF8::utf8_substr()
$utf8String = $utf8->utf8_strtolower(STRING: $utf8String, INT $start [, INT: $length=NULL]);

Return a portion of a UTF-8 string. Where PHP has been compiled with mb_string, mb_substr() will be used.


UTF8::utf8_ucfirst()
$utf8String = $utf8->utf8_ucfirst(STRING: $utf8String);

Ensure that the first letter of a UTF-8 string is uppercase.


UTF8::utf8_strlen()
$length = $utf8->utf8_strlen(STRING: $utf8String);

Return the length of a UTF-8 string. Where PHP has been compiled with mb_string, mb_strlen() will be used.



BIBFORMAT

This is the main OSBib engine for formatting bibliographic entries.
include_once($pathToOsbibClasses . "BIBFORMAT.php");
$bibformat = new BIBFORMAT([STRING: $pathToOsbibClasses = FALSE, BOOLEAN: $useBibtex = FALSE]);

By default, $pathToOsbibClasses will be the same directory as BIBFORMAT is in.

NB - BIBFORMAT expects its data to be in UTF-8 format and will return its formatted data in UTF-8 format. If you need to encode or decode your data prior to or after using OSBib, do not use PHP's utf8_encode() and utf8_decode() functions. Use the OSBib functions UTF8::encodeUtf8() and UTF8::decodeUtf8() instead. Additionally, if you need to manipulate UTF-8-encoded strings with functions such as strtolower(), strlen() etc., you should strongly consider using the appropriate methods in the OSBib UTF8 class.

PROPERTIES (to be set after instantiating the BIBFORMAT class)
$bibformat->output -- By default this property is 'html' but you can change it to 'rtf' for exporting to RTF files, 'sxw' for OpenOffice or 'plain' for plain text. It is used to format bold, underline, italics etc. for the appropriate output medium.
$bibformat->patterns -- A preg pattern (e.g. "/matchThis|matchThat/i") that, in conjunction with $bibformat->patternHighlight, is used to highlight words or phrases when displaying the results to a browser. This is useful when the bibliography to be displayed is the result of a SQL search. Default is FALSE and its value will be ignored if $bibformat->output is anything other than 'html'.
$bibformat->patternHighlight -- A CSS class defining the highlighting for above. Default is FALSE.
$bibformat->bibtexParsePath -- If you wish to use STYLEMAPBIBTEX because your database stores or accesses its data in a form similar to BibTeX, you should set the constructor parameter $useBibtex to TRUE and set this property to the path where PARSECREATORS, PARSEMONTH and PARSEPAGE can be found. These classes are not part of OSBib but are part of the bibtexParse package that can be downloaded from http://bibliophile.sourceforge.net. By default, this path will be to a bibtexParse/ directory in the same directory as BIBFORMAT is in.
$bibformat->cleanEntry -- If TRUE, convert BibTeX (and LaTeX) special characters to UTF-8. Default is FALSE.


METHODS

BIBFORMAT::loadStyle()
list($info, $citation, $styleCommon, $styleTypes) = $bibformat->loadStyle(STRING: $pathToStyleFiles, STRING: $styleFile);

Parses the XML style file into raw arrays (to be further processed in BIBFORMAT::getStyle(). The four associative arrays returned are:
$info -- general information about the resource including description, language, version etc.
$citation -- in-text citation styling (not currently used).
$styleCommon -- common styling for bibliographic output such as formatting of names, title capitalisation etc.
$styleTypes -- bibliographic styling for each resource type supported by that particular style.

These last two are used in BIBFORMAT::getStyle().


BIBFORMAT::getStyle()
$bibformat->getStyle(ASSOC_ARRAY: $styleCommon, ASSOC_ARRAY: $styleTypes);

Transform the raw XML arrays from BIBFORMAT::loadStyle() into OSBib-usable arrays and perform some pre-processing.

loadStyle() and getStyle() need be called only once so can be outside your process loop.

The following should be called for each database row you wish to process.


BIBFORMAT::preProcess()
$row = $bibformat->preProcess(STRING: $type, ASSOC_ARRAY: $row);

$row -- an associative array returned from your SQL query as described in the STYLEMAP section above.
$type -- the resource type which must be one of the ones listed in $this->types in STYLEMAP.

Among other things, preProcess() supplies one of the three generic style definitions if the requested bibliographic style does not provide a definition for a specific resource type. It also handles editor/author switching for books which have only editors.

Internally within BIBFORMAT, data from the SQL query $row is formatted and stored in a $item associative array. The following methods accomplish this:

BIBFORMAT::formatNames()
$bibformat->formatNames(ASSOC_ARRAY: $creators, STRING: $nameType);

This method should be called for each type of creator the resource has. (See BIBSTYLE for an example of how this is used in WIKINDX.)

$creators -- Multi-associative array of creator names. e.g. this array might be of the primary authors (in 'creator1'):
array(
[0] => array(['surname'] => 'Grimshaw', ['firstname'] => Mark, ['initials'] => 'N', ['prefix'] => ),
[1] => array(['surname'] => 'Witt', ['firstname'] => Jan, ['initials'] => , ['prefix'] => 'de')
);

$nameType -- One of 'creator1', 'creator2', 'creator3', 'creator4' or 'creator5'. This is mapped against the resource type array in STYLEMAP to determine what type of creator we're looking at. 'creator1' is always assumed to be the primary creator whether that is an author, composer, inventor etc.


BIBFORMAT::formatTitle()
$bibformat->formatTitle(STRING: $title[, STRING: $delimitLeft, STRING: $delimitRight]);

Format the title of the resource.

$title -- The title of the resource.
$delimitLeft
$delimitRight
-- Some bibliographic styles require all except the first letter of the title to be lowercased. If your bibliographic system allows users to specify groups of letters in the title that should not be lowercased (for example, proper names), then you enter the delimiters here. WIKINDX uses '{' and '}' as delimiters to protect character case.


BIBFORMAT::formatEdition()
$bibformat->formatEdition($edition);

Bibliographic styles may require the book edition number to be a cardinal or an ordinal number. If your edition number is stored in the database as a cardinal number, then it will be formatted as an ordinal number if required by the bibliographic style. If your edition number is stored as anything other than a cardinal number it will be used unchanged. The conversion is English - i.e. '3' => '3rd'. This works all the way up to infinity-1 ;-)


BIBFORMAT::formatPages()
$bibformat->formatPages(STRING: $pageStart [, STRING: $pageEnd])

BIBFORMAT::formatDate()
$bibformat->formatDate(INT: $day, INT: $month);

BIBFORMAT::formatRunningTime()
$bibformat->formatRunningTime(INT: $minutes, INT: $hours);

Running time for films, broadcasts etc.


BIBFORMAT::addItem()
$bibformat->addItem(STRING: $item, STRING: $fieldName);

Add an item to the internal $item array in BIBFORMAT. Use this to add elements of your resource to the $item array that do not require special formatting with the methods above. If it's not added, it won't be displayed. You'll notice a use of this in the example BIBSTYLE. for the URL of a resource. If you don't need to do your own special formatting, it's far easier to useaddAllOtherItems() below.


BIBFORMAT::addAllOtherItems()
$bibformat->addItem(ASSOC_ARRAY: $row);

Add all remaining items to the internal $item array in BIBFORMAT. Use this to add elements of your resource to the $item array that do not require special formatting with the methods above. If it's not added, it won't be displayed.


BIBFORMAT::map()
STRING $bibformat->map();

After you have added resource elements to the $item array using the methods above, calling map() will produce a formatted string suitable for printing to the output medium.

 


BIBFORMAT USAGE:

The formatting in BIBFORMAT works on one resource at a time so you will want to call it via a loop as you cycle through your data.

If you do not intend to use STYLEMAPBIBTEX, the following is a rough order of events within the loop described above. It's a general outline of what happens in BIBSTYLE as used by WIKINDX:

// Instantiate the BIBFORMAT class and initialize various parameters
include_once("core/styles/BIBFORMAT.php");
$bibformat = new BIBFORMAT();
list($info, $citation, $styleCommon, $styleTypes) = $bibformat->loadStyle("styles/bibliography/", "APA");
$bibformat->getStyle($styleCommon, $styleTypes);

After loading $bibformat->getStyle(), you can set some localisation for months and other variables. For example (these settings are the default):

	  		$bibformat->longMonth = array(
				1	=>	'January',
				2	=>	'February',
				3	=>	'March',
				4	=>	'April',
				5	=>	'May',
				6	=>	'June',
				7	=>	'July',
				8	=>	'August',
				9	=>	'September',
				10	=>	'October',
				11	=>	'November',
				12	=>	'December',
			);
		$bibformat->shortMonth = array(
				1	=>	'Jan',
				2	=>	'Feb',
				3	=>	'Mar',
				4	=>	'Apr',
				5	=>	'May',
				6	=>	'Jun',
				7	=>	'Jul',
				8	=>	'Aug',
				9	=>	'Sep',
				10	=>	'Oct',
				11	=>	'Nov',
				12	=>	'Dec',
			);

The title/subtitle separator can be set as:

		$citeformat->titleSubtitleSeparator = ": ";

// process loop starts here:
// Get the resource type ('book', 'journal_article', 'artwork' etc.)
$resourceType = $row['type'];
$row = $bibformat->preProcess($resourceType, $databaseRow);

// PreProcessing may change the value of $resourceType so get it back!
$resourceType = $bibformat->type;
// Add various resource elements to the BIBFORMAT::item array that require special processing and formatting
1. Creator names
2. Resource title
3. Resource edition
4. Resource pages
5. Resource date
6. Resource running time
7. Add the URL creating a hyperlink for web browser display
// Add all the other elements of the resource to BIBFORMAT::item array
$bibformat->addAllOtherItems($row);
// Finally, get the formatted resource string ready for printing to the web browser or exporting to RTF, OpenOffice or plain text
$string = $bibformat->map();
// process loop ends here


If you are using STYLEMAPBIBTEX for reasons described in the sections above, then the following is a rough order of events within the loop described above (with an example bibtex array supplied).

// Instantiate the BIBFORMAT class and initialize various parameters
include_once("core/styles/BIBFORMAT.php");
$bibformat = new BIBFORMAT(FALSE, TRUE);
list($info, $citation, $styleCommon, $styleTypes) = $bibformat->loadStyle("styles/bibliography/", "APA");
$bibformat->getStyle($styleCommon, $styleTypes);

// process loop starts here:
// $resourceArray must be an array of all the elements in the resource where the key names are valid, lowercase BibTeX field names. e.g.:
$resourceArray = array(
'author' => 'Grimshaw, Mark and Boulanger, Christian',
'title' => 'How Bibliographies Ruined our Lives',
'year' => '2005',
'volume' => '20',
'number' => '4',
'journal' => 'Journal of Mundane Trivia',
'pages' => '42--111',
'howpublished' => "\url{http://bibliophile.sourceforge.net}",
);
// Get the resource type ('book', 'article', 'inbook' etc.)
$resourceType = 'misc';
// In this case, BIBFORMAT::preProcess() adds all the resource elements automatically to the BIBFORMAT::item array...
$bibformat->preProcess($resourceType, $resourceArray);
// Finally, get the formatted resource string ready for printing to the web browser or exporting to RTF, OpenOffice or plain text
$string = $bibformat->map();
// process loop ends here

 


CITEFORMAT

This is the main OSBib engine for formatting in-text and endnote-style citations within a block of text.
include_once($pathToOsbibClasses . "CITEFORMAT.php");
$citeformat = new CITEFORMAT(CLASSOBJECT: &$bibstyleClass, CLASSMETHOD: $process [, STRING: $pathToOsbibClasses = FALSE]);

CITEFORMAT uses BIBFORMAT to format its appended bibliographies. You must set up a class similar to BIBSTYLE and a method similar to BIBSTYLE::process() (see above) prior to implementing CITEFORMAT and passing both the class and the method to CITEFORMAT.

By default, $pathToOsbibClasses will be the same directory as CITEFORMAT is in.

NB - CITEFORMAT expects its data to be in UTF-8 format and will return its formatted data in UTF-8 format. If you need to encode or decode your data prior to or after using OSBib, do not use PHP's utf8_encode() and utf8_decode() functions. Use the OSBib functions UTF8::encodeUtf8() and UTF8::decodeUtf8() instead. Additionally, if you need to manipulate UTF-8-encoded strings with functions such as strtolower(), strlen() etc., you should strongly consider using the appropriate methods in the OSBib UTF8 class.

PROPERTIES (to be set after instantiating the CITEFORMAT class)
$citeformat->output -- By default this property is 'html' but you can change it to 'rtf' for exporting to RTF files or 'plain' for plain text. It is used to format bold, underline, italics etc. for the appropriate output medium.
$citeformat->hyperlinkBase -- By default this property is FALSE but, if displaying the parsed block of text back to a web browser, you can turn on hyperlinking of citations by specifying the URL instead. CITEFORMAT will append the unique ID number as extracted for each bibliographic entry from the database (see usage below). WIKINDX uses "index.php?action=resourceView&id=".

CITEFORMAT is a little more complex than BIBFORMAT to use mainly due to disambiguation requirements, decisions as to whether to use in-text citation, endnote or footnote citations etc. etc. etc. so read the instructions carefully.


CITEFORMAT USAGE:

The following is a rough order of events you will need to set up and is a general outline of what happens in CITESTYLE as used by WIKINDX. Anything with $this->citeformat you should not change and should be very careful to position such variables in the code as WIKINDX uses them:

// Instantiate the CITEFORMAT class and initialize various parameters
include_once("core/styles/CITEFORMAT.php");
$citeformat = new CITEFORMAT();
list($info, $citation, $styleCommon, $styleTypes) = $citeformat->loadStyle("styles/bibliography/", "APA");
$citeformat->getStyle($styleCommon, $styleTypes);

After loading $citeformat->getStyle(), you can set some localisation for months and other variables. For example (these settings are the default):

	  		$citeformat->longMonth = array(
				1	=>	'January',
				2	=>	'February',
				3	=>	'March',
				4	=>	'April',
				5	=>	'May',
				6	=>	'June',
				7	=>	'July',
				8	=>	'August',
				9	=>	'September',
				10	=>	'October',
				11	=>	'November',
				12	=>	'December',
			);
		$citeformat->shortMonth = array(
				1	=>	'Jan',
				2	=>	'Feb',
				3	=>	'Mar',
				4	=>	'Apr',
				5	=>	'May',
				6	=>	'Jun',
				7	=>	'Jul',
				8	=>	'Aug',
				9	=>	'Sep',
				10	=>	'Oct',
				11	=>	'Nov',
				12	=>	'Dec',
			);

Two forms of possessive (for creator names) and 'et al.' equivalent can be set as:

		$citeformat->possessive1 = "'s"; // Set to FALSE if not used
		$citeformat->possessive2 = "s"; // Set to FALSE if not used
		$citeformat->textEtAl = "et al.";

// start() is the method called externally that starts the whole process:
1. Parse the input text for citations tags. WIKINDX uses BBCode-like [cite]....[/cite] tags that hold the unique ID from the database for the resource being cited in addition to any page numbers. Text prior to citation tags and within the citation tags are captured into two separate arrays.
2. Store the resource IDs from the citation tag in an array.
3. For these resource IDs, get the raw bibliographic data from the database and place this into a multi-dimensional array keyed by the resource's ID. Each element of this array should be an array in the same format as the array you use in BIBSTYLE or equivalent class (it will later be sent one array at a time to BIBSTYLE::process() within CITEFORMAT) -- the resource type (book, journal article etc.) is also required. The order you get data from the database in is important as it used for citation disambiguation.
4. Loop through the captured text formatting the resource title, creators, publication year and pages storing these in various $this->citeformat arrays.
5. Run $this->citeformat->process() to process the citations.
6. Finally, gather the appended bibliography formatted for either in-text or endnote-style citations and return the formatted text block.
// start() ends here