SynTag, Version 0.2d (alt)

Hinweis: dies ist nicht die neueste Version!

Zurück zur Übersicht

Datei: README.txt

"SynTag" - Flocke's Syntax Highlighter

A PHP class to create syntax highlighted HTML from sourcecode.

Version 0.2d - always find the latest version at
http://flocke.vssd.de/prog/code/php/syntag/

Copyright (C) 2005, 2006 Volker Siebert <flocke@vssd.de>
All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

---------------------------------------------------------------------------

SynTag is a set of PHP functions for syntax highlight, i.e., taking a piece
of sourcecode and putting special TAGs around SYNtactical elements. So
"SynTag" is an abbreviation of "SYNtactical TAGger". At the simplest case
this would put all keywords of a language in bold, e.g. "<b>if</b>".

At the current stage of version 0.2c, SynTag just has enough features to
satisfy my needs, i.e., to highlight sourcecode on my homepage. There are
some things that need to be refined for special cases in sourcecode that
do not appear in things I have written.

If you're just interested in highlighting source code, here's what you
have to do:

1. If you have the source of a file and it's name, just call

  +-[PHP]---------------------------------------------------------------
  | $filename = "foo.php";
  | $contents = file_get_contents("source/" . $filename);
  |
  | include("syntag/syntag.php");
  | $highlighted = SynTag::Highlight($contents, $filename);
  |
  | echo "<b>" . htmlspecialchars($filename) . "</b>\n";
  | echo '<pre>' . $highlighted . "</pre>\n";
  +---------------------------------------------------------------------

2. If you have just the source and the file type, simply call

  +-[PHP]---------------------------------------------------------------
  | $language = 'pascal';
  | $contents = "begin\n  writeln('foo');\nend.";
  |
  | include("syntag/syntag.php");
  | $highlighted = SynTag::HighlightLanguage($contents, $language);
  |
  | echo "<pre>" . $highlighted . "</pre>\n";
  +---------------------------------------------------------------------

The return value of SynTag::Highlight and SynTag::HighlightLanguage is
valid html format with tabs converted to spaces and special characters
converted to html entities. If you don't put it in <pre>-tags, you should
call SynTag::FixFormat on the result to replace linefeeds by <br>-tags and
leading spaces by &nbsp;.

SynTag is quite fast, but you should be aware that highlighting is an
expensive operation. At best, you should cache the results.

HOW IT WORKS
============

The main work is done by pattern matching through regular expressions. As
an example, here is the definition for a pascal string:

  +-[PHP]---------------------------------------------------------------
  | 'string.s' => array(
  |   '(' => "'",
  |   ')' => "'",
  |   'o' => '#string'
  | ),
  +---------------------------------------------------------------------

The element '(' defines the opening regular expression (a simple character
in this case) and the element ')' the closing expression. It would also be
possible to write this in the following way:

  +-[PHP]---------------------------------------------------------------
  | 'string.s' => array(
  |   '(' => "'([^'\n]|'')+'",
  |   'o' => '#string'
  | ),
  +---------------------------------------------------------------------

This would even be more correct, because a string with embedded single
quotes would not be tagged as two strings. But also note that the latter
definition is a little bit slower because of the more complex regular
expression.

For languages embedded in other languages, e.g. for javascript inside of
html, SynTag supports "context switching", as you can see with the block
for javascript embedded in html:

  +-[PHP]---------------------------------------------------------------
  | 'javascript.1' => array(
  |   '(' => '<script[^>]*(javascript|jscript)[^>]*>',
  |   ')' => '</script>',
  |   'o' => 'html.tag',
  |   'i' => 'javascript',
  |   '<' => SYNTAG_CONTEXT_SWITCH_BEGIN,
  |   '>' => SYNTAG_CONTEXT_SWITCH_END
  | )
  +---------------------------------------------------------------------

The array elements '(' and ')' define the beginning and the end of a
"context block". The elements '<' and '>' give special formatting
instructions, in this case the background color is changed to highlight
the switch of the context from html to javascript. The elements 'o' and
'i' specify the context (i.e. language) for the "outer" and "inner"
match. The outer match are the strings matched by the regular expressions
themselves and the inner match is everything inbetween these two matches.

Applied to a piece of html, this rule does the following

  +-[HTML]------------------------------------+
  | <div>JavaScript will be added here: <b>   | html
  | <script type="text/javascript">           | outer match -> html.tag
  | document.writeln('This was javascript!'); | inner match -> javascript
  | </script>                                 | outer match -> html.tag
  | </b> - did you see what I mean?</div>     | html
  +-------------------------------------------+

TODO
====

Adding more languages ;) They will come when I need them
(or if YOU provide them).

SynTag can be fooled by very simple constructs, because as of version 0.2
the matching of the end of a context is done by regular expressions. So
e.g. the following code should not be highlighted correctly, because the
end of the assembler block is just found by looking for 'end':

  +-[Pascal]------------------------------------------------------------
  | asm
  |         mov  eax, @string
  |@string: db   'end'
  | end;
  +---------------------------------------------------------------------

Instead, when switching to another, embedded language, we should define a
special operation 'x', that tells SynTag to leave the current context and
switch back to the previous one.

This issue will be fixed in one of the next versions.
Flocke's Garage
Valid HTML 4.01 Transitional Valid CSS!
(C) 2005-2018 Volker Siebert.
Creative Commons-LizenzvertragDer gesamte Inhalt dieser Webseite steht unter einer Creative Commons-Lizenz (sofern nicht anders angegeben).