SynTag, Version 0.2b (alt)
Hinweis: dies ist nicht die neueste Version!
Datei: README.txt
"SynTag" - Flocke's Syntax Highlighter A PHP class to create syntax highlighted HTML from sourcecode. Version 0.2b - always find the latest version at http://flocke.vssd.de/prog/code/php/syntag/ Copyright (C) 2005 Volker Siebert <flocke@vssd.de> All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. --------------------------------------------------------------------------- SynTag is a set of PHP functions for syntax highlight, i.e., taking a piece of sourcecode and putting special TAGs around SYNtactical elements. So "SynTag" is an abbreviation of "SYNtactical TAGger". At the simplest case this would put all keywords of a language in bold, e.g. "<b>if</b>". At the current stage of version 0.1, SynTag just has enough features to satisfy my needs, i.e., to highlight sourcecode on my homepage. There are some things that need to be refined for special cases in sourcecode that do not appear in things I have written. If you're just interested in highlighting source code, here's what you have to do: 1. If you have the source of a file and it's name, just call +-[PHP]--------------------------------------------------------------- | $filename = "foo.php"; | $contents = file_get_contents("source/" . $filename); | | include("syntag/syntag.php"); | $highlighted = SynTag::Highlight($contents, $filename); | | echo "<b>" . htmlspecialchars($filename) . "</b>\n"; | echo '<pre>' . $highlighted . "</pre>\n"; +--------------------------------------------------------------------- 2. If you have just the source and the file type, simply call +-[PHP]--------------------------------------------------------------- | $language = 'pascal'; | $contents = "begin\n writeln('foo');\nend."; | | include("syntag/syntag.php"); | $highlighted = SynTag::HighlightLanguage($contents, $language); | | echo "<pre>" . $highlighted . "</pre>\n"; +--------------------------------------------------------------------- The return value of SynTag::Highlight and SynTag::HighlightLanguage is valid html format with tabs converted to spaces and special characters converted to html entities. If you don't put it in <pre>-tags, you should call SynTag::FixFormat on the result to replace linefeeds by <br>-tags and leading spaces by . SynTag is quite fast, but you should be aware that highlighting is an expensive operation. At best, you should cache the results. HOW IT WORKS ============ The main work is done by pattern matching through regular expressions. As an example, here is the definition for a pascal string: +-[PHP]--------------------------------------------------------------- | 'string.s' => array( | '(' => "'", | ')' => "'", | 'o' => '#string' | ), +--------------------------------------------------------------------- The element '(' defines the opening regular expression (a simple character in this case) and the element ')' the closing expression. It would also be possible to write this in the following way: +-[PHP]--------------------------------------------------------------- | 'string.s' => array( | '(' => "'([^'\n]|'')+'", | 'o' => '#string' | ), +--------------------------------------------------------------------- This would even be more correct, because a string with embedded single quotes would not be tagged as two strings. But also note that the latter definition is a little bit slower because of the more complex regular expression. For languages embedded in other languages, e.g. for javascript inside of html, SynTag supports "context switching", as you can see with the block for javascript embedded in html: +-[PHP]--------------------------------------------------------------- | 'javascript.1' => array( | '(' => '<script[^>]*(javascript|jscript)[^>]*>', | ')' => '</script>', | 'o' => 'html.tag', | 'i' => 'javascript', | '<' => SYNTAG_CONTEXT_SWITCH_BEGIN, | '>' => SYNTAG_CONTEXT_SWITCH_END | ) +--------------------------------------------------------------------- The array elements '(' and ')' define the beginning and the end of a "context block". The elements '<' and '>' give special formatting instructions, in this case the background color is changed to highlight the switch of the context from html to javascript. The elements 'o' and 'i' specify the context (i.e. language) for the "outer" and "inner" match. The outer match are the strings matched by the regular expressions themselves and the inner match is everything inbetween these two matches. Applied to a piece of html, this rule does the following +-[HTML]------------------------------------+ | <div>JavaScript will be added here: <b> | html | <script type="text/javascript"> | outer match -> html.tag | document.writeln('This was javascript!'); | inner match -> javascript | </script> | outer match -> html.tag | </b> - did you see what I mean?</div> | html +-------------------------------------------+ TODO ==== Adding more languages ;) They will come when I need them (or if YOU provide them). SynTag can be fooled by very simple constructs, because as of version 0.2 the matching of the end of a context is done by regular expressions. So e.g. the following code should not be highlighted correctly, because the end of the assembler block is just found by looking for 'end': +-[Pascal]------------------------------------------------------------ | asm | mov eax, @string |@string: db 'end' | end; +--------------------------------------------------------------------- Instead, when switching to another, embedded language, we should define a special operation 'x', that tells SynTag to leave the current context and switch back to the previous one. This issue will be fixed in one of the next versions. |