11. Parsers
Module Parser.XML
- Methodnode_to_struct
mapping
(string
:string
|array
|mapping
) node_to_struct(.NSTree.NSNode
|.Tree.Node
rootnode
)- Description
XML parsing made easy.
- Returns
A hierarchical structure of nested mappings and arrays representing the XML structure starting at
rootnode
using a minimal depth.""
:string
The text content of the node.
"/"
:mapping
The arguments on this node.
"..."
:string
The text content of a simple subnode.
"..."
:array
A list of subnodes.
"..."
:mapping
A complex subnode (recurse).
- Example
Parser.XML.node_to_struct(Parser.XML.NSTree.parse_input("<foo>bar</foo>"));
Class Parser.XML.Simple
- Methodcompat_allow_errors
void
compat_allow_errors(string
version
)- Description
Set whether the parser should allow certain errors for compatibility with earlier versions.
version
can be:"7.2"
Allow more data after the root element.
"7.6"
Allow multiple and invalidly placed "<?xml ... ?>" and "<!DOCTYPE ... >" declarations (invalid "<?xml ... ?>" declarations are otherwise treated as normal PI:s). Allow "<![CDATA[ ... ]]>" outside the root element. Allow the root element to be absent.
version
can also be zero to enable all error checks.
- Methoddefine_entity
void
define_entity(string
entity
,string
s
,function
(:void
)cb
,mixed
...extras
)- Description
Define an entity or an SMEG.
- Parameter
entity
Entity name, or SMEG name (if preceeded by a
"%"
).- Parameter
s
Expansion of the entity. Entity evaluation will be performed.
- See also
define_entity_raw()
- Methoddefine_entity_raw
void
define_entity_raw(string
entity
,string
raw
)- Description
Define an entity or an SMEG.
- Parameter
entity
Entity name, or SMEG name (if preceeded by a
"%"
).- Parameter
raw
Verbatim expansion of the entity.
- See also
define_entity()
- Methodlookup_entity
string
lookup_entity(string
entity
)- Returns
Returns the verbatim expansion of the entity.
- Methodparse
array
parse(string
xml
,string
context
,function
(:void
)cb
,mixed
...extra_args
)array
parse(string
xml
,function
(:void
)cb
,mixed
...extra_args
)
- Methodparse_dtd
mixed
parse_dtd(string
dtd
,string
context
,function
(:void
)cb
,mixed
...extras
)mixed
parse_dtd(string
dtd
,function
(:void
)cb
,mixed
...extras
)
Class Parser.XML.Simple.Context
- Methodcreate
Parser.XML.Simple.ContextParser.XML.Simple.Context(
string
s
,string
context
,int
flags
,function
(:void
)cb
,mixed
...extra_args
)Parser.XML.Simple.ContextParser.XML.Simple.Context(
string
s
,int
flags
,function
(:void
)cb
,mixed
...extra_args
)- Parameter
s
- Parameter
context
These two arguments are passed along to
push_string()
.- Parameter
flags
Parser flags.
- Parameter
cb
Callback function. This function gets called at various stages during the parsing.
- Methodpush_string
void
push_string(string
s
)void
push_string(string
s
,string
context
)- Description
Add a string to parse at the current position.
- Parameter
s
String to insert at the current parsing position.
- Parameter
context
Optional context used to refer to the inserted string. This is typically an URL, but may also be an entity (preceeded by an
"&"
) or a SMEG reference (preceeded by a"%"
). Not used by the XML parser as such, but is simply passed into the callbackinfo mapping as the field"context"
where it can be useful for eg resolving relative URLs when parsing DTDs, or for determining where errors occur.
- Methodcreate
- Methodcompat_allow_errors
Class Parser.XML.Validating
- Description
Validating XML parser.
Validates an XML file according to a DTD.
- Methodget_external_entity
string
|zero
get_external_entity(string
sysid
,string
|void
pubid
,mapping
|void
info
,mixed
...extra
)- Description
Get an external entity.
Called when a <!DOCTYPE> with a SYSTEM identifier is encountered, or when an entity reference needs expanding.
- Parameter
sysid
The SYSTEM identifier.
- Parameter
pubid
The PUBLIC identifier (if any).
- Parameter
info
The callbackinfo mapping containing the current parser state.
- Parameter
extra
The extra arguments as passed to
parse()
orparse_dtd()
.- Returns
Returns a string with a DTD fragment on success. Returns
0
(zero) on failure.- Note
Returning zero will cause the validator to report an error.
- Note
In Pike 7.7 and earlier
info
had the value0
(zero).- Note
The default implementation always returns
0
(zero). Override this function to provide other behaviour.- See also
parse()
,parse_dtd()
- Methodparse
array
parse(string
data
,string
|function
(string
,string
,mapping
,array
|string
,mapping
(string
:mixed
),__unknown__
... :mixed
)callback
,mixed
...extra
)- FIXME
Document this function
- Methodparse_dtd
array
parse_dtd(string
data
,string
|function
(string
,string
,mapping
,array
|string
,mapping
(string
:mixed
),__unknown__
... :mixed
)callback
,mixed
...extra
)- FIXME
Document this function
- Methodvalidate
private
mixed
validate(string
kind
,string
name
,mapping
attributes
,array
|string
contents
,mapping
(string
:mixed
)info
,function
(string
,string
|zero
,mapping
|zero
,array
|string
,mapping
(string
:mixed
),__unknown__
... :mixed
)callback
,array
(mixed
)extra
)- Description
The validation callback function.
- See also
::parse()
Class Parser.XML.Validating.Element
- Description
XML Element node.
Module Parser.XML.NSTree
- Description
A namespace aware version of Parser.XML.Tree. This implementation does as little validation as possible, so e.g. you can call your namespace xmlfoo without complaints.
- Methodparse_input
NSNode
parse_input(string
data
,void
|string
default_ns
)- Description
Takes a XML string
data
and produces a namespace node tree. Ifdefault_ns
is given, it will be used as the default namespace.- Throws
Throws an
error
when an error is encountered during XML parsing.
- Methodvisualize
string
visualize(Node
n
,void
|string
indent
)- Description
Makes a visualization of a node graph suitable for printing out on a terminal.
- Example
> object x = parse_input("<a><b><c/>d</b><b><e/><f>g</f></b></a>"); > write(visualize(x)); Node(ROOT) NSNode(ELEMENT,"a") NSNode(ELEMENT,"b") NSNode(ELEMENT,"c") NSNode(TEXT) NSNode(ELEMENT,"b") NSNode(ELEMENT,"e") NSNode(ELEMENT,"f") NSNode(TEXT) Result 1: 201
Class Parser.XML.NSTree.NSNode
- Description
Namespace aware node.
- Methodadd_namespace
void
add_namespace(string
ns
,void
|string
symbol
,void
|bool
chain
)- Description
Adds a new namespace to this node. The preferred symbol to use to identify the namespace can be provided in the
symbol
argument. Ifchain
is set, no attempts to overwrite an already defined namespace with the same identifier will be made.
- Methodchange_namespace
void
change_namespace(string
from
,string
to
)- Description
Change all elements and attributes in the subtree in namespace
from
to namespaceto
. In case an attribute is defined in both namespaces it will be overwritten.
- Methodchild_namespaces
mapping
child_namespaces(mapping
(Node
:mapping
(string
:string
))intermediate
)- Description
Return the defined namespaces from the tree.
- Parameter
intermediate
If namespaces are clobbered, the node that needs additional xmlns attributes are added to this mapping.
- Methoddiff_namespaces
mapping
(string
:string
) diff_namespaces()- Description
Returns the difference between this node and its parent namespaces.
- Methodget_default_ns
string
get_default_ns()- Description
Returns the default namespace in the current scope.
- Methodget_defined_nss
mapping
(string
:string
) get_defined_nss()- Description
Returns a mapping with all the namespaces defined in the current scope, except the default namespace.
- Note
The returned mapping is the same as the one in the node, so destructive changes will affect the node.
- Methodget_ns
string
get_ns()- Description
Returns the namespace in which the current element is defined in.
- Methodget_ns_attributes
mapping
(string
:mapping
(string
:string
)) get_ns_attributes()- Description
Returns all the attributes in all namespaces that is associated with this node.
- Note
The returned mapping is the same as the one in the node, so destructive changes will affect the node.
- Methodget_ns_attributes
mapping
(string
:string
) get_ns_attributes(string
namespace
)- Description
Returns the attributes in this node that is declared in the provided namespace.
- Methodget_ns_short
string
get_ns_short(string
ns
)- Description
Returns the short name for the given namespace in this context. Returns the empty string if the namespace is the default namespace. Returns 0 if the namespace is unknown.
- Methodget_short_attributes
mapping
(string
:string
) get_short_attributes()- Description
Return the attributes for the element with the names given their short name prefixes.
- Methodget_xml_name
string
get_xml_name()- Description
Returns the element name as it occurs in xml files. E.g. "zonk:name" for the element "name" defined in a namespace denoted with "zonk". It will look up a symbol for the namespace in the symbol tables for the node and its parents. If none is found a new label will be generated by hashing the namespace.
- Methodremove_child
void
remove_child(NSNode
child
)- Description
The remove_child is a not updated to take care of name space issues. To properly remove all the parents name spaces from the chid, call
remove_node
in the child.
- Methodrename_namespace
void
rename_namespace(string
from
,string
to
)- Description
Renames the namespace prefix of a namespace. No checks will be made to see if the namespace represented is the same throughout the subtree.
Module Parser.XML.SloppyDOM
- Description
A somewhat DOM-like library that implements lazy generation of the node tree, i.e. it's generated from the data upon lookup. There's also a little bit of XPath evaluation to do queries on the node tree.
Implementation note: This is generally more pragmatic than
Parser.XML.DOM
, meaning it's not so pretty and compliant, but more efficient.Implementation status: There's only enough implemented to parse a node tree from source and access it, i.e. modification functions aren't implemented. Data hiding stuff like NodeList and NamedNodeMap is not implemented, partly since it's cumbersome to meet the "live" requirement. Also,
Parser.HTML
is used in XML mode to parse the input. Thus it's too error tolerant to be XML compliant, and it currently doesn't handle DTD elements, like "<!DOCTYPE", or the XML declaration (i.e. "<?xml version='1.0'?>".
- Methodparse
Document
parse(string
source
,void
|int
raw_values
)- Description
Normally entities are decoded, and
Node.xml_format
will encode them again. Ifraw_values
is nonzero then all text and attribute values are instead kept in their original form.
Class Parser.XML.SloppyDOM.Document
- Note
The node tree is very likely a cyclic structure, so it might be an good idea to destruct it when you're finished with it, to avoid garbage. Destructing the
Document
object always destroys all nodes in it.
- Methodget_elements
array
(Element
) get_elements(string
name
)- Description
Note that this one looks among the top level elements, as opposed to
get_elements_by_tag_name
. This means that if the document is correct, you can only look up the single top level element here.- Note
Not DOM compliant.
Class Parser.XML.SloppyDOM.Node
- Description
Basic node.
- Methodget_text_content
string
get_text_content()- Description
If the raw_values flag is set in the owning document, the text is returned with entities and CDATA blocks intact.
- See also
parse
- Methodsimple_path
mapping
(string
:string
)|Node
|array
(mapping
(string
:string
)|Node
)|string
|zero
simple_path(string
path
,void
|int
xml_format
)- Description
Access a node or a set of nodes through an expression that is a subset of an XPath RelativeLocationPath in abbreviated form.
That means one or more Steps separated by "/" or "//". A Step consists of an AxisSpecifier followed by a NodeTest and then optionally by one or more Predicate's.
"/" before a Step causes it to be matched only against the immediate children of the node(s) selected by the previous Step. "//" before a Step causes it to be matched against any children in the tree below the node(s) selected by the previous Step. The initial selection before the first Step is this element.
The currently allowed AxisSpecifier NodeTest combinations are:
name to select all elements with the given name. The name can be "*" to select all.
@name to select all attributes with the given name. The name can be "*" to select all.
comment() to select all comments.
text() to select all text and CDATA blocks. Note that all entity references are also selected, under the assumption that they would expand to text only.
processing-instruction("name") to select all processing instructions with the given name. The name can be left out to select all. Either ' or " may be used to delimit the name. For compatibility, it can also occur without surrounding quotes.
node() to select all nodes, i.e. the whole content of an element node.
. to select the currently selected element itself.
A Predicate is on the form [PredicateExpr] where PredicateExpr currently can be in any of the following forms:
An integer indexes one item in the selected set, according to the document order. A negative index counts from the end of the set.
A RelativeLocationPath as specified above. It's executed for each element in the selected set and those where it yields an empty result are filtered out while the rest remain in the set.
A RelativeLocationPath as specified above followed by ="value". The path is executed for each element in the selected set and those where the text result of it is equal to the given value remain in the set. Either ' or " may be used to delimit the value.
If
xml_format
is nonzero, the return value is an xml formatted string of all the matched nodes, in document order. Otherwise the return value is as follows:Attributes are returned as one or more index/value pairs in a mapping. Other nodes are returned as the node objects. If the expression is on a form that can give at most one answer (i.e. there's a predicate with an integer index) then a single mapping or node is returned, or zero if there was no match. If the expression can give more answers then the return value is an array containing zero or more attribute mappings and/or nodes. The array follows document order.
- Note
Not DOM compliant.
Class Parser.XML.SloppyDOM.NodeWithChildElements
- Description
Node with child elements.
- Methodget_descendant_elements
array
(Element
) get_descendant_elements()- Description
Returns all descendant elements in document order.
- Note
Not DOM compliant.
- Methodget_descendant_nodes
array
(Node
) get_descendant_nodes()- Description
Returns all descendant nodes (except attribute nodes) in document order.
- Note
Not DOM compliant.
Module Parser.XML.Tree
- Description
XML parser that generates node-trees.
Has some support for XML namespaces http://www.w3.org/TR/REC-xml-names/RFC 2518 section 23.4.
- Note
This module defines two sets of node trees; the
SimpleNode
-based, and theNode
-based. The main difference between the two, is that theNode
-based trees have parent pointers, which tend to generate circular data references and thus garbage.There are some more subtle differences between the two. Please read the documentation carefully.
- ConstantXML_ATTR
constant
int
Parser.XML.Tree.XML_ATTR
- Description
Attribute nodes are created on demand
- Methodattribute_quote
string
attribute_quote(string
data
,void
|string
ignore
)- Description
Quotes the string given in
data
by escaping &, <, >, ' and ".
- Methodparse_file
Node
parse_file(string
path
,bool
|void
parse_namespaces
)- Description
Loads the XML file
path
, creates a node tree representation and returns the root node.
- Methodparse_input
RootNode
parse_input(string
data
,void
|bool
no_fallback
,void
|bool
force_lowercase
,void
|mapping
(string
:string
)predefined_entities
,void
|bool
parse_namespaces
,ParseFlags
|void
flags
)- Description
Takes an XML string and produces a node tree.
- Note
flags
is not used forPARSE_WANT_ERROR_CONTEXT
,PARSE_FORCE_LOWERCASE
orPARSE_ENABLE_NAMESPACES
since they are covered by the separate flag arguments.
- Methodroxen_attribute_quote
string
roxen_attribute_quote(string
data
,void
|string
ignore
)- Description
Quotes strings just like
attribute_quote
, but entities in the form
RXML parse error: Unknown scope "foo". | &foo.bar; | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl"> | <else> | <else> | <nocache> | <cache enable-protocol-cache="yes">
will not be quoted.
- Methodroxen_text_quote
string
roxen_text_quote(string
data
)- Description
Quotes strings just like
text_quote
, but entities in the form
RXML parse error: Unknown scope "foo". | &foo.bar; | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl"> | <else> | <else> | <nocache> | <cache enable-protocol-cache="yes">
will not be quoted.
- Methodsimple_parse_file
SimpleRootNode
simple_parse_file(string
path
,void
|mapping
predefined_entities
,ParseFlags
|void
flags
,string
|void
default_namespace
)- Description
Loads the XML file
path
, creates aSimpleNode
tree representation and returns the root node.
- Methodsimple_parse_input
SimpleRootNode
simple_parse_input(string
data
,void
|mapping
predefined_entities
,ParseFlags
|void
flags
,string
|void
default_namespace
)- Description
Takes an XML string and produces a
SimpleNode
tree.
- Methodtext_quote
string
text_quote(string
data
)- Description
Quotes the string given in
data
by escaping &, < and >.
Enum Parser.XML.Tree.ParseFlags
- Description
Flags used together with
simple_parse_input()
andsimple_parse_file()
.
Class Parser.XML.Tree.AbstractNode
- Annotations
@
Pike.Annotations.Implements
(AbstractSimpleNode
)- Description
Base class for nodes with parent pointers.
- Methodadd_child
AbstractNode
add_child(AbstractNode
c
)- Description
Adds the node
c
to the list of children of this node. The node is added before the nodeold
, which is assumed to be an existing child of this node. The node is added first ifold
is zero.- Note
Returns the new child node, NOT the current node.
- Returns
The new child node is returned.
- Methodadd_child_after
AbstractNode
add_child_after(AbstractNode
c
,AbstractNode
old
)- Description
Adds the node
c
to the list of children of this node. The node is added after the nodeold
, which is assumed to be an existing child of this node. The node is added first ifold
is zero.- Returns
The current node.
- Methodadd_child_before
AbstractNode
add_child_before(AbstractNode
c
,AbstractNode
old
)- Description
Adds the node
c
to the list of children of this node. The node is added before the nodeold
, which is assumed to be an existing child of this node. The node is added last ifold
is zero.- Returns
The current node.
- Methodclone
AbstractNode
clone(void
|int(-1..1)
direction
)- Description
Clones the node, optionally connected to parts of the tree. If direction is -1 the cloned nodes parent will be set, if direction is 1 the clone nodes childen will be set.
- Methodfix_tree
void
fix_tree()- Description
Fix all parent pointers recursively in a tree that has been built with
tmp_add_child
.
- Methodget_ancestors
array
(AbstractNode
) get_ancestors(bool
include_self
)- Description
Returns a list of all ancestors, with the top node last. The list will start with this node if
include_self
is set.
- Methodget_following
array
(AbstractNode
) get_following()- Description
Returns all the nodes that follows after the current one.
- Methodget_following_siblings
array
(AbstractNode
) get_following_siblings()- Description
Returns all following siblings, i.e. all siblings present after this node in the parents children list.
- Methodget_preceding
array
(AbstractNode
) get_preceding()- Description
Returns all preceding nodes, excluding this nodes ancestors.
- Methodget_preceding_siblings
array
(AbstractNode
) get_preceding_siblings()- Description
Returns all preceding siblings, i.e. all siblings present before this node in the parents children list.
- Methodget_root
AbstractNode
get_root()- Description
Follows all parent pointers and returns the root node.
- Methodget_siblings
array
(AbstractNode
) get_siblings()- Description
Returns all siblings, including this node.
- Methodlow_clone
optional
AbstractNode
low_clone()- Description
Returns an initialized copy of the node.
- Note
The returned node has no children, and no parent.
- Methodremove_child
void
remove_child(AbstractNode
c
)- Description
Removes all occurrences of the provided node from the called nodes list of children. The removed nodes parent reference is set to null.
- Methodremove_node
void
remove_node()- Description
Removes this node from its parent. The parent reference is set to null.
- Methodreplace_child
AbstractNode
|zero
replace_child(AbstractNode
old
,AbstractNode
|array
(AbstractNode
)new
)- Description
Replaces the first occurrence of the old node child with the new node child or children. All parent references are updated.
- Note
The returned value is NOT the current node.
- Returns
Returns the new child node.
- Methodreplace_children
void
replace_children(array
(AbstractNode
)children
)- Description
Replaces the nodes children with the provided ones. All parent references are updated.
- Methodreplace_node
AbstractNode
|array
(AbstractNode
) replace_node(AbstractNode
|array
(AbstractNode
)new
)- Description
Replaces this node with the provided one.
- Returns
Returns the new node.
- Methodtmp_add_child
Methodtmp_add_child_before
Methodtmp_add_child_after AbstractNode
tmp_add_child(AbstractNode
c
)AbstractNode
tmp_add_child_before(AbstractNode
c
,AbstractNode
old
)AbstractNode
tmp_add_child_after(AbstractNode
c
,AbstractNode
old
)- Description
Variants of
add_child
,add_child_before
andadd_child_after
that doesn't set the parent pointer in the newly added children.This is useful while building a node tree, to get efficient refcount garbage collection if the build stops abruptly.
fix_tree
has to be called on the root node when the building is done.
Class Parser.XML.Tree.AbstractSimpleNode
- Description
Base class for nodes.
- Method`[]
AbstractSimpleNode
|zero
res =Parser.XML.Tree.AbstractSimpleNode()
[pos
]- Description
The [] operator indexes among the node children, so
node[0]
returns the first node andnode[-1]
the last.- Note
The [] operator will select a node from all the nodes children, not just its element children.
- Methodadd_child
AbstractSimpleNode
add_child(AbstractSimpleNode
c
)- Description
Adds the given node to the list of children of this node. The new node is added last in the list.
- Note
The return value differs from the one returned by
Node()->add_child()
.- Returns
The current node.
- Methodadd_child_after
AbstractSimpleNode
add_child_after(AbstractSimpleNode
c
,AbstractSimpleNode
old
)- Description
Adds the node
c
to the list of children of this node. The node is added after the nodeold
, which is assumed to be an existing child of this node. The node is added first ifold
is zero.- Returns
The current node.
- Methodadd_child_before
AbstractSimpleNode
add_child_before(AbstractSimpleNode
c
,AbstractSimpleNode
old
)- Description
Adds the node
c
to the list of children of this node. The node is added before the nodeold
, which is assumed to be an existing child of this node. The node is added last ifold
is zero.- Returns
The current node.
- Methodclone
optional
AbstractSimpleNode
clone()- Description
Returns a clone of the sub-tree rooted in the node.
- Methodget_children
array
(AbstractSimpleNode
) get_children()- Description
Returns all the nodes children.
- Methodget_descendants
array
(AbstractSimpleNode
) get_descendants(bool
include_self
)- Description
Returns a list of all descendants in document order. Includes this node if
include_self
is set.
- Methodget_last_child
AbstractSimpleNode
|zero
get_last_child()- Description
Returns the last child node or zero.
- Methoditerate_children
int
iterate_children(function
(AbstractSimpleNode
,mixed
... :int
|void
)callback
,mixed
...args
)- Description
Iterates over the nodes children from left to right, calling the function
callback
for every node. If the callback function returnsSTOP_WALK
the iteration is promptly aborted andSTOP_WALK
is returned.
- Methodlow_clone
optional
AbstractSimpleNode
low_clone()- Description
Returns an initialized copy of the node.
- Note
The returned node has no children.
- Methodnode_factory
optional
this_program
node_factory(int
type
,string
name
,mapping
attr
,string
text
)- Description
Optional factory for creating contained nodes.
- Parameter
type
Type of node to create. One of:
XML_TEXT
XML text.
text
contains a string with the text.XML_COMMENT
XML comment.
text
contains a string with the comment text.XML_HEADER
<?xml?>-header
attr
contains a mapping with the attributes.XML_PI
XML processing instruction.
name
contains the name of the processing instruction andtext
the remainder.XML_ELEMENT
XML element tag.
name
contains the name of the tag andattr
the attributes.XML_DOCTYPE
DTD information.
DTD_ENTITY
DTD_ELEMENT
DTD_ATTLIST
DTD_NOTATION
- Parameter
name
Name of the tag if applicable.
- Parameter
attr
Attributes for the tag if applicable.
- Parameter
text
Contained text of the tab if any.
This function is called during parsning to create the various XML nodes.
Define this function to provide application-specific XML nodes.
- Returns
Returns one of
AbstractSimpleNode
A node object representing the XML tag.
int(0)
0
(zero) if the subtree rooted here should be cut.zero
UNDEFINED
to fall back to the next level of parser (ie behave as if this function does not exist).- Note
This function is only relevant for
XML_ELEMENT
nodes.- Note
This function is not available in Pike 7.6 and earlier.
- Note
In Pike 8.0 and earlier this function was only called in root nodes.
- Methodremove_child
void
remove_child(AbstractSimpleNode
c
)- Description
Removes all occurrences of the provided node from the list of children of this node.
- Methodreplace_child
AbstractSimpleNode
|zero
replace_child(AbstractSimpleNode
old
,AbstractSimpleNode
|array
(AbstractSimpleNode
)new
)- Description
Replaces the first occurrence of the old node child with the new node child or children.
- Note
The return value differs from the one returned by
Node()->replace_child()
.- Returns
Returns the current node on success, and
0
(zero) if the nodeold
wasn't found.
- Methodreplace_children
void
replace_children(array
(AbstractSimpleNode
)children
)- Description
Replaces the nodes children with the provided ones.
- Methodwalk_inorder
int
walk_inorder(function
(AbstractSimpleNode
,mixed
... :int
|void
)callback
,mixed
...args
)- Description
Traverse the node subtree in inorder, left subtree first, then root node, and finally the remaining subtrees, calling the function
callback
for every node. If the functioncallback
returnsSTOP_WALK
the traverse is promptly aborted andSTOP_WALK
is returned.
- Methodwalk_postorder
int
walk_postorder(function
(AbstractSimpleNode
,mixed
... :int
|void
)callback
,mixed
...args
)- Description
Traverse the node subtree in postorder, first subtrees from left to right, then the root node, calling the function
callback
for every node. If the functioncallback
returnsSTOP_WALK
the traverse is promptly aborted andSTOP_WALK
is returned.
- Methodwalk_preorder
int
walk_preorder(function
(AbstractSimpleNode
,mixed
... :int
|void
)callback
,mixed
...args
)- Description
Traverse the node subtree in preorder, root node first, then subtrees from left to right, calling the callback function for every node. If the callback function returns
STOP_WALK
the traverse is promptly aborted andSTOP_WALK
is returned.
- Methodwalk_preorder_2
int
walk_preorder_2(function
(AbstractSimpleNode
,mixed
... :int
|void
)cb_1
,function
(AbstractSimpleNode
,mixed
... :int
|void
)cb_2
,mixed
...args
)- Description
Traverse the node subtree in preorder, root node first, then subtrees from left to right. For each node we call
cb_1
before iterating through children, and thencb_2
(which always gets called even if the walk is aborted earlier). If the callback function returnsSTOP_WALK
the traverse decend is aborted andSTOP_WALK
is returned once all waitingcb_2
functions have been called.
Class Parser.XML.Tree.AttributeNode
- Annotations
@
Pike.Annotations.Implements
(Node
)
Class Parser.XML.Tree.CommentNode
- Annotations
@
Pike.Annotations.Implements
(Node
)
Class Parser.XML.Tree.DTDAttlistNode
- Annotations
@
Pike.Annotations.Implements
(Node
)
Class Parser.XML.Tree.DTDElementNode
- Annotations
@
Pike.Annotations.Implements
(Node
)@
Pike.Annotations.Implements
(DTDElementHelper
)
Class Parser.XML.Tree.DTDEntityNode
- Annotations
@
Pike.Annotations.Implements
(Node
)
Class Parser.XML.Tree.DTDNotationNode
- Annotations
@
Pike.Annotations.Implements
(Node
)
Class Parser.XML.Tree.DoctypeNode
- Annotations
@
Pike.Annotations.Implements
(Node
)
Class Parser.XML.Tree.ElementNode
- Annotations
@
Pike.Annotations.Implements
(Node
)
Class Parser.XML.Tree.HeaderNode
- Annotations
@
Pike.Annotations.Implements
(Node
)
Class Parser.XML.Tree.Node
- Annotations
@
Pike.Annotations.Implements
(AbstractNode
)@
Pike.Annotations.Implements
(VirtualNode
)- Description
XML node with parent pointers.
- Methodget_attribute_nodes
array
(Node
) get_attribute_nodes()- Description
Creates and returns an array of new nodes; they will not be added as proper children to the parent node, but the parent link in the nodes are set so that upwards traversal is made possible.
Class Parser.XML.Tree.PINode
- Annotations
@
Pike.Annotations.Implements
(Node
)
Class Parser.XML.Tree.RootNode
- Annotations
@
Pike.Annotations.Implements
(Node
)- Description
The root node of an XML-tree consisting of
Node
s.
- Methodcreate
Parser.XML.Tree.RootNodeParser.XML.Tree.RootNode(
string
|void
data
,mapping
|void
predefined_entities
,ParseFlags
|void
flags
)
- Methodflush_node_id_cache
void
flush_node_id_cache()- Description
Clears the node id cache built and used by
get_element_by_id
.
- Methodget_element_by_id
ElementNode
get_element_by_id(string
id
,int
|void
force
)- Description
Find the element with the specified id.
- Parameter
id
The XML id of the node to search for.
- Parameter
force
Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.
- Returns
Returns the element node with the specified id if any. Returns
UNDEFINED
otherwise.- See also
flush_node_id_cache
Class Parser.XML.Tree.SimpleCommentNode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)
Class Parser.XML.Tree.SimpleDTDAttlistNode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)
Class Parser.XML.Tree.SimpleDTDElementNode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)@
Pike.Annotations.Implements
(DTDElementHelper
)
Class Parser.XML.Tree.SimpleDTDEntityNode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)
Class Parser.XML.Tree.SimpleDTDNotationNode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)
Class Parser.XML.Tree.SimpleDoctypeNode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)
Class Parser.XML.Tree.SimpleElementNode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)
Class Parser.XML.Tree.SimpleHeaderNode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)
Class Parser.XML.Tree.SimpleNode
- Annotations
@
Pike.Annotations.Implements
(AbstractSimpleNode
)@
Pike.Annotations.Implements
(VirtualNode
)- Description
XML node without parent pointers and attribute nodes.
Class Parser.XML.Tree.SimplePINode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)
Class Parser.XML.Tree.SimpleRootNode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)- Description
The root node of an XML-tree consisting of
SimpleNode
s.
- Methodcreate
Parser.XML.Tree.SimpleRootNodeParser.XML.Tree.SimpleRootNode(
string
|void
data
,mapping
|void
predefined_entities
,ParseFlags
|void
flags
,string
|void
default_namespace
)
- Methodflush_node_id_cache
void
flush_node_id_cache()- Description
Clears the node id cache built and used by
get_element_by_id
.
- Methodget_element_by_id
SimpleElementNode
get_element_by_id(string
id
,int
|void
force
)- Description
Find the element with the specified id.
- Parameter
id
The XML id of the node to search for.
- Parameter
force
Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.
- Returns
Returns the element node with the specified id if any. Returns
UNDEFINED
otherwise.- See also
flush_node_id_cache
Class Parser.XML.Tree.SimpleTextNode
- Annotations
@
Pike.Annotations.Implements
(SimpleNode
)
Class Parser.XML.Tree.TextNode
- Annotations
@
Pike.Annotations.Implements
(Node
)
Class Parser.XML.Tree.VirtualNode
- Description
Node in XML tree
- Methodcast
(int)Parser.XML.Tree.VirtualNode()
(float)Parser.XML.Tree.VirtualNode()
(string)Parser.XML.Tree.VirtualNode()
(array)Parser.XML.Tree.VirtualNode()
(mapping)Parser.XML.Tree.VirtualNode()
(multiset)Parser.XML.Tree.VirtualNode()- Description
It is possible to cast a node to a string, which will return
render_xml()
for that node.
- Methodcreate
Parser.XML.Tree.VirtualNodeParser.XML.Tree.VirtualNode(
int
type
,string
|zero
name
,mapping
|zero
attr
,string
|zero
text
)
- Methodget_attributes
mapping
(string
:string
) get_attributes()- Description
Returns this nodes attributes, which can be altered destructivly to alter the nodes attributes.
- See also
replace_attributes()
- Methodget_elements
array
(AbstractNode
) get_elements(string
|void
name
,bool
|void
full
)- Description
Returns all element children to this node.
- Parameter
name
If provided, only elements with that name is returned.
- Parameter
full
If specified, name matching will be done against the full name.
- Returns
Returns an array with matching nodes.
- Methodget_first_element
AbstractNode
|zero
get_first_element(string
|void
name
,bool
|void
full
)- Description
Returns the first element child to this node.
- Parameter
name
If provided, the first element child with that name is returned.
- Parameter
full
If specified, name matching will be done against the full name.
- Returns
Returns the first matching node, and 0 if no such node was found.
- Methodget_node_type
int
get_node_type()- Description
Returns the node type. See defined node type constants.
- Methodget_short_attributes
mapping
get_short_attributes()- Description
Returns this nodes name-space adjusted attributes.
- Note
set_short_namespaces()
orset_short_attributes()
must have been called before calling this function.
- Methodget_tag_name
string
get_tag_name()- Description
Returns the name of the element node, or the nearest element above if an attribute node.
- Methodrender_to_file
void
render_to_file(Stdio.File
f
,void
|bool
preserve_roxen_entities
)- Description
Creates an XML representation for the node sub tree and streams the output to the file
f
. If the flagpreserve_roxen_entities
is set, entities on the form
RXML parse error: Unknown scope "foo". | &foo.bar; | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl"> | <else> | <else> | <nocache> | <cache enable-protocol-cache="yes">
will not be escaped.
- Methodrender_xml
string
render_xml(void
|bool
preserve_roxen_entities
,void
|mapping
(string
:string
)namespace_lookup
,void
|string
encoding
,void
|int(2bit)
quote_mode
)- Description
Creates an XML representation of the node sub tree. If the flag
preserve_roxen_entities
is set, entities on the form
RXML parse error: Unknown scope "foo". | &foo.bar; | <xsltransform html-fallback="yes" preparse="yes" xsl="/assets/xsl/apps/manual-parser.xsl"> | <else> | <else> | <nocache> | <cache enable-protocol-cache="yes">
will not be escaped.- Parameter
namespace_lookup
Mapping from namespace prefix to namespace symbol prefix.
- Parameter
encoding
Force a specific output character encoding. By default the encoding set in the document XML processing instruction will be used, with UTF-8 as a fallback. Setting this value will change the XML processing instruction, if present.
- Parameter
quote_mode
0
Defaults to single quote, but use double quote if it avoids escaping.
1
Defaults to double quote, but use single quote if it avoids escaping.
2
Use only single quote.
3
Use only double quote.
- Methodreplace_attributes
void
replace_attributes(mapping
(string
:string
)attrs
)- Description
Replace the entire set of attributes.
- See also
get_attributes()
- Methodset_short_attributes
void
set_short_attributes(mapping
short_attrs
)- Description
Sets this nodes name-space adjusted attributes.
- Methodset_tag_name
void
set_tag_name(string
name
)- Description
Change the tag name destructively. Can only be used on element and processing-instruction nodes.
Class Parser.XML.Tree.XMLNSParser
- Description
Namespace aware parser.
Class Parser.XML.Tree.XMLParser
- Description
Mixin for parsing XML.
Uses
Parser.XML.Simple
to perform the actual parsing.
- Methodnode_factory
protected
AbstractSimpleNode
node_factory(int
type
,string
name
,mapping
attr
,string
text
)- Description
Factory for creating nodes.
- Parameter
type
Type of node to create. One of:
XML_TEXT
XML text.
text
contains a string with the text.XML_COMMENT
XML comment.
text
contains a string with the comment text.XML_HEADER
<?xml?>-header
attr
contains a mapping with the attributes.XML_PI
XML processing instruction.
name
contains the name of the processing instruction andtext
the remainder.XML_ELEMENT
XML element tag.
name
contains the name of the tag andattr
the attributes.XML_DOCTYPE
DTD information.
DTD_ENTITY
DTD_ELEMENT
DTD_ATTLIST
DTD_NOTATION
- Parameter
name
Name of the tag if applicable.
- Parameter
attr
Attributes for the tag if applicable.
- Parameter
text
Contained text of the tab if any.
This function is called during parsning to create the various XML nodes.
Overload this function to provide application-specific XML nodes.
- Returns
Returns a node object representing the XML tag, or
0
(zero) if the subtree rooted in the tag should be cut.- Note
This function is not available in Pike 7.6 and earlier.
- See also
node_factory_dispatch()
,AbstractSimpleNode()->node_factory()
- Methodnode_to_struct
Class Parser.HTML
- Description
This is a simple parser for SGML structured markups. It's not really HTML, but it's useful for that purpose.
The simple way to use it is to give it some information about available tags and containers, and what callbacks those are to call.
The object is easily reused, by calling the
clone()
function.- See also
add_tag
,add_container
,finish
- Method_inspect
mapping
_inspect()- Description
This is a low-level way of debugging a parser. This gives a mapping of the internal state of the Parser.HTML object.
The format and contents of this mapping may change without further notice.
- Method_set_tag_callback
Method_set_entity_callback
Method_set_data_callback Parser.HTML
_set_tag_callback(function
(:void
)|string
|array
to_call
)Parser.HTML
_set_entity_callback(function
(:void
)|string
|array
to_call
)Parser.HTML
_set_data_callback(function
(:void
)|string
|array
to_call
)- Description
These functions set up the parser object to call the given callbacks upon tags, entities and/or data. The callbacks will only be called if there isn't another tag/container/entity handler for these.
The callback function will be called with the parser object as first argument, and the active string as second. Note that no parsing of the contents has been done. Both endtags and normal tags are called; there is no container parsing.
The return values from the callbacks are handled in the same way as the return values from callbacks registered with
add_tag
and similar functions.The data callback will be called as seldom as possible with the longest possible string, as long as it doesn't get called out of order with any other callback. It will never be called with a zero length string.
If a string or array is given instead of a function, it will act as the return value from the function. Arrays or empty strings is probably preferable to avoid recursion.
- Returns
Returns the object being called.
- Methodadd_tag
Methodadd_container
Methodadd_entity
Methodadd_quote_tag
Methodadd_tags
Methodadd_containers
Methodadd_entities Parser.HTML
add_tag(string
name
,mixed
to_do
)Parser.HTML
add_container(string
name
,mixed
to_do
)Parser.HTML
add_entity(string
entity
,mixed
to_do
)Parser.HTML
add_quote_tag(string
name
,mixed
to_do
,string
end
)Parser.HTML
add_tags(mapping
(string
:mixed
)tags
)Parser.HTML
add_containers(mapping
(string
:mixed
)containers
)Parser.HTML
add_entities(mapping
(string
:mixed
)entities
)- Description
Registers the actions to take when parsing various things. Tags, containers, entities are as usual.
add_quote_tag()
adds a special kind of tag that reads any data until the next occurrence of the end string immediately before a tag end.- Parameter
to_do
This argument can be any of the following.
function
(:void
)The function will be called as a callback function. It will get the following arguments, depending on the type of callback.
mixed tag_callback(Parser.HTML parser,mapping args,mixed ... extra) mixed container_callback(Parser.HTML parser,mapping args,string content,mixed ... extra) mixed entity_callback(Parser.HTML parser,mixed ... extra) mixed quote_tag_callback(Parser.HTML parser,string content,mixed ... extra)
string
This tag/container/entity is then replaced by the string. The string is normally not reparsed, i.e. it's equivalent to writing a function that returns the string in an array (but a lot faster). If
reparse_strings
is set the string will be reparsed, though.array
The first element is a function as above. It will receive the rest of the array as extra arguments. If extra arguments are given by
set_extra()
, they will appear after the ones in this array.int(0..)
If there is a tag/container/entity with the given name in the parser, it's removed.
The callback function can return:
string
This string will be pushed on the parser stack and be parsed. Be careful not to return anything in this way that could lead to a infinite recursion.
array
The element(s) of the array is the result of the function. This will not be parsed. This is useful for avoiding infinite recursion. The array can be of any size, this means the empty array is the most effective to return if you don't care about the result. If the parser is operating in
mixed_mode
, the array can contain anything. Otherwise only strings are allowed.int(0)
This means "don't do anything", ie the item that generated the callback is left as it is, and the parser continues.
int(1)
Reparse the last item again. This is useful to parse a tag as a container, or vice versa: just add or remove callbacks for the tag and return this to jump to the right callback.
- Returns
Returns the object being called.
- See also
tags
,containers
,entities
- Methodat
Methodat_line
Methodat_char
Methodat_column array
(int
) at()int
at_line()int
at_char()int
at_column()- Description
Returns the current position. Characters and columns count from
0
, lines count from1
.at()
gives an array with the following layout.Array int
0
Line.
int
1
Character.
int
2
Column.
- Methodcase_insensitive_tag
int
case_insensitive_tag(void
|int
value
)- Description
All tags and containers are matched case insensitively, and argument names are converted to lowercase. Tags added with
add_quote_tag()
are not affected, though. Switching to case insensitive mode and back won't preserve the case of registered tags and containers.
- Methodclear_tags
Methodclear_containers
Methodclear_entities
Methodclear_quote_tags Parser.HTML
clear_tags()Parser.HTML
clear_containers()Parser.HTML
clear_entities()Parser.HTML
clear_quote_tags()- Description
Removes all registered definitions in the different categories.
- Returns
Returns the object being called.
- See also
add_tag
,add_tags
,add_container
,add_containers
,add_entity
,add_entities
- Methodclone
Parser.HTML
clone(mixed
...args
)- Description
Clones the
Parser.HTML
object. A new object of the same class is created, filled with the parse setup from the old object.This is the simpliest way of flushing a parse feed/output.
The arguments to clone is sent to the new object, simplifying work for custom classes that inherits
Parser.HTML
.- Returns
Returns the new object.
- Note
create is called _before_ the setup is copied.
- Methodtags
Methodcontainers
Methodentities mapping
(string
:mixed
) tags()mapping
(string
:mixed
) containers()mapping
(string
:mixed
) entities()- Description
Returns the current callback settings. When matching is done case insensitively, all names will be returned in lowercase.
Implementation note: These run in constant time since they return copy-on-write mappings.
- See also
add_tag
,add_tags
,add_container
,add_containers
,add_entity
,add_entities
- Methodcontext
string
context()- Description
Returns the current output context as a string.
"data"
In top level data. This is always returned when called from tag or container callbacks.
"arg"
In an unquoted argument.
"splice_arg"
In a splice argument.
The return value can also be a single character string, in which case the context is a quoted argument. The string contains the starting quote character.
This function is typically only useful in entity callbacks, which can be called both from text and argument values of different sorts.
- See also
splice_arg
- Methodcurrent
string
current()- Description
Gives the current range of data, ie the whole tag/entity/etc being parsed in the current callback. Returns zero if there's no current range, i.e. when the function is not called in a callback.
- Methodfeed
Parser.HTML
feed()Parser.HTML
feed(string
s
,void
|int
do_parse
)- Description
Feed new data to the
Parser.HTML
object. This will start a scan and may result in callbacks. Note that it's possible that all data fed isn't processed - to do that, callfinish()
.If the function is called without arguments, no data is fed, but the parser is run. If the string argument is followed by a
0
,->feed(s,0);
, the string is fed, but the parser isn't run.- Returns
Returns the object being called.
- See also
finish
,read
,feed_insert
- Methodfeed_insert
Parser.HTML
feed_insert(string
s
)- Description
This pushes a string on the parser stack.
- Returns
Returns the object being called.
- Note
Don't use!
- Methodfinish
Parser.HTML
finish()Parser.HTML
finish(string
s
)- Description
Finish a parser pass. A string may be sent here, similar to feed().
- Returns
Returns the object being called.
- Methodget_extra
array
get_extra()- Description
Gets the extra arguments set by
set_extra()
.- Returns
Returns the object being called.
- Methodignore_tags
int
ignore_tags(void
|int
value
)- Description
Do not look for tags at all. Normally tags are matched even when there's no callbacks for them at all. When this is set, the tag delimiters
'<'
and'>'
will be treated as any normal character.
- Methodignore_unknown
int
ignore_unknown(void
|int
value
)- Description
Treat unknown tags and entities as text data, continuing parsing for tags and entities inside them.
- Note
When functions are specified with
_set_tag_callback()
or_set_entity_callback()
, all tags or entities, respectively, are considered known. However, if one of those functions return 1 and ignore_unknown is set, they are treated as text data instead of making another call to the same function again.
- Methodlazy_argument_end
int
lazy_argument_end(void
|int
value
)- Description
A
'>'
in a tag argument closes both the argument and the tag, even if the argument is quoted.
- Methodlazy_entity_end
int
lazy_entity_end(void
|int
value
)- Description
Normally, the parser search indefinitely for the entity end character (i.e.
';'
). When this flag is set, the characters'&'
,'<'
,'>'
,'"'
,'''
, and any whitespace breaks the search for the entity end, and the entity text is then ignored, i.e. treated as data.
- Methodmatch_tag
int
match_tag(void
|int
value
)- Description
Unquoted nested tag starters and enders will be balanced when parsing tags. This is the default.
- Methodmax_parse_depth
int
max_parse_depth(void
|int
value
)- Description
Maximum recursion depth during parsing. Recursion occurs when a tag/container/entity/quote tag callback function returns a string to be reparsed. The default value is
10
.
- Methodmixed_mode
int
mixed_mode(void
|int
value
)- Description
Allow callbacks to return arbitrary data in the arrays, which will be concatenated in the output.
- Methodparse_tag_args
mapping
parse_tag_args(string
tag
)- Description
Parses the tag arguments from a tag string without the name and surrounding brackets, i.e. a string on the form
"some='tag' Â args"
.- Returns
Returns a mapping containing the tag arguments.
- See also
tag_args
- Methodparse_tag_name
string
parse_tag_name(string
tag
)- Description
Parses the tag name from a tag string without the surrounding brackets, i.e. a string on the form
"tagname some='tag'  args"
.- Returns
Returns the tag name or an empty string if none.
- Methodquote_stapling
int
quote_stapling(int
|void
enable
)- Description
Enable old-style attribute quoting by stapling.
- Parameter
enable
Enable/disable the mode. Defaults to keeping the old setting.
- Returns
Returns the prior setting.
- Note
Any use of this mode is discouraged, and is only provided for compatibility with versions of Pike prior to 8.0.
- Note
Note also that this mode will output runtime warnings whenever the mode has had an effect on the parsing.
- Methodquote_tags
mapping
(string
:array
(mixed
|string
)) quote_tags()- Description
Returns the current callback settings. The values are arrays ({callback, end_quote}). When matching is done case insensitively, all names will be returned in lowercase.
Implementation note:
quote_tags()
allocates a new mapping for every call and thus, unlike e.g.tags()
runs in linear time.- See also
add_quote_tag
- Methodread
string
|array
(mixed
) read()string
|array
(mixed
) read(int
max_elems
)- Description
Read parsed data from the parser object.
- Returns
Returns a string of parsed data if the parser isn't in
mixed_mode
, an array of arbitrary data otherwise.
- Methodreparse_strings
int
reparse_strings(void
|int
value
)- Description
When a plain string is used as a tag/container/entity/quote tag callback, it's not reparsed if this flag is unset. Setting it causes all such strings to be reparsed.
- Methodset_extra
Parser.HTML
set_extra(mixed
...args
)- Description
Sets the extra arguments passed to all tag, container and entity callbacks.
- Returns
Returns the object being called.
- Methodsplice_arg
string
splice_arg(void
|string
name
)- Description
If given a string, it sets the splice argument name to it. It returns the old splice argument name.
If a splice argument name is set, it's parsed in all tags, both those with callbacks and those without. Wherever it occurs, its value (after being parsed for entities in the normal way) is inserted directly into the tag. E.g:
<foo arg1="val 1" splice="arg2='val 2' arg3" arg4>
becomes
<foo arg1="val 1" arg2='val 2' arg3 arg4>
if
"splice"
is set as the splice argument name.
- Methodtag
array
tag(void
|mixed
default_value
)- Description
Returns the equivalent of the following calls.
Array string
0
tag_name()
mapping
(string
:mixed
)1
tag_args(default_value)
string
2
tag_content()
- Methodtag_args
mapping
(string
:mixed
) tag_args(void
|mixed
default_value
)- Description
Gives the arguments of the current tag, parsed to a convenient mapping consisting of key:value pairs. If the current thing isn't a tag, it gives zero.
default_value
is used for arguments which have no value in the tag. Ifdefault_value
isn't given, the value is set to the same string as the key.
- Methodtag_content
string
tag_content()- Description
Gives the content of the current tag, if it's a container or quote tag. Otherwise returns zero.
- Methodtag_name
string
|zero
tag_name()- Description
Gives the name of the current tag, or zero. If used from an entity callback, it gives the string inside the entity.
- Methodwrite_out
Parser.HTML
write_out(mixed
...args
)- Description
Send data to the output stream, i.e. it won't be parsed and it won't be sent to the data callback, if any.
Any data is allowed when the parser is running in
mixed_mode
. Only strings are allowed otherwise.- Returns
Returns the object being called.
- Methodws_before_tag_name
int
ws_before_tag_name(void
|int
value
)- Description
Allow whitespace between the tag start character and the tag name.
- Methodxml_tag_syntax
int
xml_tag_syntax(void
|int
value
)- Description
Whether or not to use XML syntax to tell empty tags and container tags apart.
0
Use HTML syntax only. If there's a
'/'
last in a tag, it's just treated as any other argument.1
Use HTML syntax, but ignore a
'/'
if it comes last in a tag. This is the default.2
Use XML syntax, but when a tag that does not end with
'/>'
is found which only got a non-container tag callback, treat it as a non-container (i.e. don't start to seek for the container end).3
Use XML syntax only. If a tag got both container and non-container callbacks, the non-container callback is called when the empty element form (i.e. the one ending with
'/>'
) is used, and the container callback otherwise. If only a container callback exists, it gets the empty string as content when there's none to be parsed. If only a non-container callback exists, it will be called (without the content argument) for both kinds of tags.
Module Parser
- Methoddecode_numeric_xml_entity
string
|zero
decode_numeric_xml_entity(string
chref
)- Description
Decodes the numeric XML entity
chref
, e.g. "4" and returns the character as a string.chref
is the name part of the entity, i.e. without the leading '&' and trailing ';'. Returns zero ifchref
isn't on a recognized form or if the character number is too large to be represented in a string.
- Methodencode_html_entities
string
encode_html_entities(string
raw
)- Description
Encode characters to HTML entities, e.g. turning
"<"
into"<"
.The characters that will be encoded are characters <= 32,
"\"&'<>"
and characters >= 127 and <= 160 and characters >= 255.
- Methodget_xml_parser
HTML
get_xml_parser()- Description
Returns a
Parser.HTML
initialized for parsing XML. It has all the flags set properly for XML syntax and callbacks to ignore comments, CDATA blocks and unknown PI tags, but it has no registered tags and doesn't decode any entities.
- Methodhtml_entity_parser
Methodparse_html_entities HTML
html_entity_parser()string
parse_html_entities(string
in
)HTML
html_entity_parser(int
noerror
)string
parse_html_entities(string
in
,int
noerror
)- Description
Parse any HTML entities in the string to unicode characters. Either return a complete parser (to build on or use) or parse a string. Throw an error if there is an unrecognized entity in the string if noerror is not set.
- Note
Currently using XHTML 1.0 tables.
Class Parser.CSV
- Description
This is a parser for line oriented data that is either comma, semi-colon or tab separated. It extends the functionality of the
Parser.Tabular
with some specific functionality related to a header and record oriented parsing of huge datasets.We document only the differences with the basic
Parser.Tabular
.- See also
Parser.Tabular
- Methodfetchrecord
mapping
fetchrecord(void
|array
|mapping
format
)- Description
This function consumes a single record from the input. To be used in conjunction with
parsehead()
.- Returns
It returns the mapping describing the record.
- See also
parsehead()
,fetch()
- Methodparsehead
int
parsehead(void
|string
delimiters
,void
|string
|object
matchfieldname
)- Description
This function consumes the header-line preceding a typical comma, semicolon or tab separated value list and autocompiles a format description from that. After this function has successfully parsed a header-line, you can proceed with either
fetchrecord()
orfetch()
to get the remaining records.- Parameter
delimiters
Explicitly specify a string containing all the characters that should be considered field delimiters. If not specified or empty, the function will try to autodetect the single delimiter in use.
- Parameter
matchfieldname
A string containing a regular expression, using
Regexp.SimpleRegexp
syntax, or an object providing aRegexp.SimpleRegexp.match()
single string argument compatible method, that must match all the individual fieldnames before the header will be considered valid.- Returns
It returns true if a CSV head has successfully been parsed.
- See also
fetchrecord()
,fetch()
,compile()
Class Parser.RCS
- Description
A RCS file parser that eats a RCS *,v file and presents nice pike data structures of its contents.
- Constantmax_revisions_supported
constant
int
Parser.RCS.max_revisions_supported
- Description
Feature detection constant for the max_revisions argument to
create()
,parse()
andparse_delta_sections()
.
- Variableaccess
array
(string
) Parser.RCS.access- Description
The usernames listed in the ACCESS section of the RCS file.
- Variablebranch
string
|int(0)
Parser.RCS.branch- Description
The default branch (or revision), if present,
0
otherwise.
- Variablebranches
mapping
(string
:string
) Parser.RCS.branches- Description
Maps branch numbers (indices) to branch names (values).
- Note
The indices are short branch revision numbers (ie
"1.1.2"
and not"1.1.0.2"
).
- Variablecomment
string
|int(0)
Parser.RCS.comment- Description
The RCS file comment if present,
0
otherwise.
- Variableexpand
string
Parser.RCS.expand- Description
The keyword expansion options (as named by RCS) if present,
0
otherwise.
- Variablelocks
mapping
(string
:string
) Parser.RCS.locks- Description
Maps from username to revision for users that have acquired locks on this file.
- Variablercs_file_name
string
Parser.RCS.rcs_file_name- Description
The filename of the RCS file as sent to
create()
.
- Variablerevisions
mapping
(string
:Revision
) Parser.RCS.revisions- Description
Data for all revisions of the file. The indices of the mapping are the revision numbers, whereas the values are the data from the corresponding revision.
- Variabletags
mapping
(string
:string
) Parser.RCS.tags- Description
Maps tag names (indices) to tagged revision numbers (values).
- Note
This mapping typically contains raw revision numbers for branches (ie
"1.1.0.2"
and not"1.1.2"
).
- Variabletrunk
array
(Revision
) Parser.RCS.trunk- Description
Data for all revisions on the trunk, sorted in the same order as the RCS file stored them - ie descending, most recent first, I'd assume (rcsfile(5), of course, fails to state such irrelevant information).
- Methodcreate
Parser.RCSParser.RCS(
string
|void
file_name
,string
|int(0)
|void
file_contents
,void
|int
max_revisions
)- Description
Initializes the RCS object.
- Parameter
file_name
The path to the raw RCS file (includes trailing ",v"). Used mainly for error reporting (truncated RCS file or similar). Stored in
rcs_file_name
.- Parameter
file_contents
If a string is provided, that string will be parsed to initialize the RCS object. If a zero (
0
) is sent, no initialization will be performed at all. If no value is given at all, butfile_name
was provided, that file will be loaded and parsed for object initialization.- Parameter
max_revisions
Maximum number of revisions to process. If unset, all revisions will be processed.
- Methodexpand_keywords_for_revision
string
|zero
expand_keywords_for_revision(string
|Revision
rev
,string
|void
text
,int
|void
expansion_mode
)- Description
Expand keywords and return the resulting text according to the expansion rules set for the file.
- Parameter
rev
The revision to apply the expansion for.
- Parameter
text
If supplied, substitute keywords for that text instead using values that would apply for the given revision. Otherwise, revision
rev
is used.- Parameter
expansion_mode
Expansion mode
1
Perform expansion even if the file was checked in as binary.
0
Perform expansion only if the file was checked in as non-binary with expansion enabled.
-1
Perform contraction if the file was checked in as non-binary.
- Note
The Log keyword (which lacks sane quoting rules) is not expanded. Keyword expansion rules set in CVSROOT/cvswrappers are ignored. Only implements the -kkv, -ko and -kb expansion modes.
- Note
Does not perform any line-ending conversion.
- See also
get_contents_for_revision
- Methodget_contents_for_revision
string
|zero
get_contents_for_revision(string
|Revision
rev
,void
|bool
dont_cache_data
)- Description
Returns the file contents from the revision
rev
, without performing any keyword expansion. Ifdont_cache_data
is set we will not keep intermediate revisions in memory unless they already existed. This will cut down memory use at the expense of slow access to older revisions.- See also
expand_keywords_for_revision()
- Methodparse
this_program
parse(array
raw
,void
|function
(string
:void
)progress_callback
,void
|int
max_revisions
)- Description
Parse the RCS file
raw
and initialize all members of this object fully initialized.- Parameter
raw
The unprocessed RCS file.
- Parameter
progress_callback
Passed on to
parse_deltatext_sections
.- Parameter
max_revisions
Maximum number of revisions to process. If unset, all revisions will be processed.
- Returns
The fully initialized object (only returned for API convenience; the object itself is destructively modified to match the data extracted from
raw
)- See also
parse_admin_section
,parse_delta_sections
,parse_deltatext_sections
,create
- Methodparse_admin_section
array
parse_admin_section(string
|array
raw
)- Description
Lower-level API function for parsing only the admin section (the initial chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After running
parse_admin_section
, the RCS object will be initialized with the values forhead
,branch
,access
,branches
,tokenize
,tags
,locks
,strict_locks
,comment
andexpand
.- Parameter
raw
The tokenized RCS file, or the raw RCS-file data.
- Returns
The rest of the RCS file, admin section removed.
- See also
parse_delta_sections
,parse_deltatext_sections
,parse
,create
- FIXME
Does not handle rcsfile(5) newphrase skipping.
- Methodparse_delta_sections
array
parse_delta_sections(array
raw
,void
|int
max_revisions
)- Description
Lower-level API function for parsing only the delta sections (the second chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After running
parse_delta_sections
, the RCS object will be initialized with the value ofdescription
and populatedrevisions
mapping andtrunk
array. TheirRevision
members are however only populated with the membersRevision->revision
,Revision->branch
,Revision->time
,Revision->author
,Revision->state
,Revision->branches
,Revision->rcs_next
,Revision->ancestor
andRevision->next
.- Parameter
raw
The tokenized RCS file, with admin section removed. (See
parse_admin_section
.)- Parameter
max_revisions
Maximum number of revisions to process. If unset, all revisions will be processed.
- Returns
The rest of the RCS file, delta sections removed.
- See also
parse_admin_section
,tokenize
,parse_deltatext_sections
,parse
,create
- FIXME
Does not handle rcsfile(5) newphrase skipping.
- Methodparse_deltatext_sections
void
parse_deltatext_sections(array
raw
,void
|function
(string
:void
)progress_callback
,array
|void
callback_args
)- Description
Lower-level API function for parsing only the deltatext sections (the final and typically largest chunk of an RCS file, see manpage rcsfile(5)) of an RCS file. After a
parse_deltatext_sections
run, the RCS object will be fully populated.- Parameter
raw
The tokenized RCS file, with admin and delta sections removed. (See
parse_admin_section
,tokenize
andparse_delta_sections
.)- Parameter
progress_callback
This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).
- Parameter
args
Optional extra trailing arguments to be sent to
progress_callback
- See also
parse_admin_section
,parse_delta_sections
,parse
,create
- FIXME
Does not handle rcsfile(5) newphrase skipping.
- Methodtokenize
array
(array
(string
)) tokenize(string
data
)- Description
Tokenize an RCS file into tokens suitable as argument to the various parse functions
- Parameter
data
The RCS file data
- Returns
An array with arrays of tokens
Class Parser.RCS.DeltatextIterator
- Description
Iterator for the deltatext sections of the RCS file. Typical usage:
- Example
string raw = Stdio.read_file(my_rcs_filename); Parser.RCS rcs = Parser.RCS(my_rcs_filename, 0); raw = rcs->parse_delta_sections(rcs->parse_admin_section(raw)); foreach(rcs->DeltatextIterator(raw); int n; Parser.RCS.Revision rev) do_something(rev);
- Method_iterator_index
protected
int
_iterator_index()- Returns
the number of deltatext entries processed so far (0..N-1, N being the total number of revisions in the rcs file)
- Method_iterator_next
protected
int
_iterator_next()- Description
Advance the iterator one step.
Returns
UNDEFINED
when the iterator is finished, and otherwise the same as_iterator_index()
.
- Method_iterator_value
protected
Revision
_iterator_value()- Returns
the
Revision
at whose deltatext data we are, updated with its info
- Methodcreate
Parser.RCS.DeltatextIteratorParser.RCS.DeltatextIterator(
array
deltatext_section
,void
|function
(string
,mixed
... :void
)progress_callback
,void
|array
(mixed
)progress_callback_args
)- Parameter
deltatext_section
the deltatext section of the RCS file in its entirety
- Parameter
progress_callback
This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).
- Parameter
progress_callback_args
Optional extra trailing arguments to be sent to
progress_callback
- See also
the rcsfile(5) manpage outlines the sections of an RCS file
- Syntax
int
Parser.RCS.DeltatextIterator.nprotected
bool
read_next()- Description
Drops the leading whitespace before next revision's deltatext entry and sets this_rev to the revision number we're about to read.
- Methodparse_deltatext_section
protected
int
parse_deltatext_section(array
raw
,int
o
)- Description
Chops off the first deltatext section from the token array
raw
and returns the rest of the string, or the value0
(zero) if we had already visited the final deltatext entry. The deltatext's data is stored destructively in the appropriate entry of therevisions
array.- Note
raw
+o
must start with a deltatext entry for this method to work- FIXME
does not handle rcsfile(5) newphrase skipping
- FIXME
if the rcs file is truncated, this method writes a descriptive error to stderr and then returns 0 - some nicer error handling wouldn't hurt
Class Parser.RCS.Revision
- Description
All data tied to a particular revision of the file.
- Variableadded
int
Parser.RCS.Revision.added- Description
The number of lines that were added from the previous revision to make this revision (for the initial revision too).
- See also
lines
,removed
- Variableancestor
string
|zero
Parser.RCS.Revision.ancestor- Description
The revision of the ancestor of this revision, or
0
if this was the initial revision.- See also
next
- Variableauthor
string
Parser.RCS.Revision.author- Description
The userid of the user that committed the revision.
- Variablebranch
string
Parser.RCS.Revision.branch- Description
The branch name on which this revision was committed (calculated according to how cvs manages branches).
- Variablebranches
array
(string
) Parser.RCS.Revision.branches- Description
When there are branches from this revision, an array with the first revision number for each of the branches, otherwise
0
.Follow the
next
fields to get to the branch head.
- Variablelines
int
Parser.RCS.Revision.lines- Description
The number of lines this revision contained, altogether (not of particular interest for binary files).
- See also
added
,removed
- Variablenext
string
|zero
Parser.RCS.Revision.next- Description
The revision that succeeds this revision, or
0
if none exists (ie if this is the HEAD of the trunk or of a branch).- See also
ancestor
- Variablercs_next
string
|zero
Parser.RCS.Revision.rcs_next- Description
The revision stored next in the RCS file, or
0
if none exists.- Note
This field is straight from the RCS file, and has somewhat weird semantics. Usually you will want to use one of the derived fields
next
orprev
or possiblyrcs_prev
.- See also
next
,prev
,rcs_prev
- Variablercs_prev
string
|zero
Parser.RCS.Revision.rcs_prev- Description
The revision that this revision is based on, or
0
if it is the HEAD.This is the reverse pointer of
rcs_next
andbranches
, and is used byget_contents_for_revision()
when applying the deltas to settext
.- See also
rcs_next
- Variablercs_text
string
Parser.RCS.Revision.rcs_text- Description
The raw delta as stored in the RCS file.
- See also
text
,get_contents_for_revision()
- Variableremoved
int
Parser.RCS.Revision.removed- Description
The number of lines that were removed from the previous revision to make this revision.
- See also
lines
,added
- Variablerevision
string
Parser.RCS.Revision.revision- Description
The revision number (i e
rcs_file->revisions["1.1"]->revision == "1.1"
).
- Variablestate
string
Parser.RCS.Revision.state- Description
The state of the revision - typically
"Exp"
or"dead"
.
- Variabletext
string
|zero
Parser.RCS.Revision.text- Description
The text as committed or
0
ifget_contents_for_revision()
hasn't been called for this revision yet.Typically you don't access this field directly, but use
get_contents_for_revision()
to retrieve it.- See also
get_contents_for_revision()
,rcs_text
Class Parser.SGML
- Description
This is a handy simple parser of SGML-like syntax like HTML. It doesn't do anything advanced, but finding the corresponding end-tags.
It's used like this:
array res=Parser.SGML()->feed(string)->finish()->result();
The resulting structure is an array of atoms, where the atom can be a string or a tag. A tag contains a similar array, as data.
- Example
A string
"<gat> <gurka> </gurka> <banan> <kiwi> </gat>"
results in({ tag "gat"object with data:({ tag "gurka"object with data:({" "}) tag "banan"object with data:({" " tag "kiwi"object with data:({" "})})})})
ie, simple "tags" (not containers) are not detected, but containers are ended implicitely by a surrounding container _with_ an end tag.
The 'tag' is an object with the following variables:
string name; - name of tag mapping args; - argument to tag int line,char,column; - position of tag int eline,echar,ecolumn; - end position of tag, src[char..echar-1] got the block. add by Xuesong Guo string file; - filename (see <ref>create</ref>) array(SGMLatom) data; - contained data int open; - is not an empty element and has no end tag. add by Xuesong Guo
- Methodcreate
Parser.SGMLParser.SGML()
Parser.SGMLParser.SGML(
string
filename
,function
(:void
)|void
name_formater
,function
(:void
)|void
argname_formater
)- Description
This object is created with this filename. It's passed to all created tags, for debug and trace purposes. All tag name will be replace as name_formater(name) All arg_name will be replace as argname_formater(arg_name)
- Note
No, it doesn't read the file itself. See
feed()
.
- Methodfeed
Methodfinish
Methodresult object
feed(string
s
)array
(SGMLatom
|string
) finish()array
(SGMLatom
|string
) result(string
s
)- Description
Feed new data to the object, or finish the stream. No result can be used until
finish()
is called.Both
finish()
andresult()
return the computed data.feed()
returns the called object.
Class Parser.SGML.SGMLatom
- Variablename
Variableargs
Variableline
Variablechar
Variablecolumn
Variableeline
Variableechar
Variableecolumn
Variablefile
Variabledata
Variableopen string
Parser.SGML.SGMLatom.namemapping
Parser.SGML.SGMLatom.argsint
Parser.SGML.SGMLatom.lineint
Parser.SGML.SGMLatom.charint
Parser.SGML.SGMLatom.columnint
Parser.SGML.SGMLatom.elineint
Parser.SGML.SGMLatom.echarint
Parser.SGML.SGMLatom.ecolumnstring
Parser.SGML.SGMLatom.filearray
(SGMLatom
) Parser.SGML.SGMLatom.dataint
Parser.SGML.SGMLatom.open
- Variablename
Class Parser.Tabular
- Description
This is a parser for line and block oriented data. It provides a flexible yet concise record-description language to parse character/column/delimiter-organised records.
- See also
Parser.LR
, http://www.wikipedia.org/wiki/Comma-separated_values, http://www.wikipedia.org/wiki/EDIFACT
- Methodcompile
array
|mapping
compile(string
|Stdio.File
|Stdio.FILE
input
)- Description
Compiles the format description language into a compiled structure that can be fed to
setformat
,fetch
, orcreate
.The format description is case sensitive.
The format description starts with a single line containing:
[Tabular description begin]
The format description ends with a single line containing:
[Tabular description end]
Any lines before the startline are skipped.
Any lines after the endline are not consumed.
Empty lines are skipped.
Comments start after a
#
or;
.The depth level of a field is indicated by the number of leading spaces or colons at the beginning of the line.
The fieldname must not contain any whitespace.
An arbitrary number of single character field delimiters can be specified between brackets, e.g.
[,;]
or[,]
would be for CSV.When field delimiters are being used: in case of CSV type delimiters
[\t,;Â ]
the standard CSV quoting rules apply, in case other delimiters are used, no quoting is supported and the last field on a line should not specify a delimiter, but should specify a 0 fieldwidth instead.A fixed field width can be specified by a plain decimal integer, a value of 0 indicates a field with arbitrary length that extends till the end of the line.
A matching regular expression can be enclosed in
""
, it has to match the complete field content and usesRegexp.SimpleRegexp
syntax.On records the following options are supported:
- mandatory
This record is required.
- fold
Fold this record's contents in the enclosing record.
- single
This record is present at most once.
On fields the following options are supported:
- drop
After reading and matching this field, drop the field content from the resulting mappingstructure.
- See also
setformat()
,create()
,fetch()
- Example
Example of the description language:
[Tabular description begin] csv :gtz ::mybankno [,] ::transferdate [,] ::mutatiesoort [,] ::volgnummer [,] ::bankno [,] ::name [,] ::kostenplaats [,] drop ::amount [,] ::afbij [,] ::mutatie [,] ::reference [,] ::valutacode [,] mt940 :messageheader1 mandatory ::exporttime "0000" drop ::CS1 " " drop ::exportday "01" drop ::exportaddress 12 ::exportnumber 5 "[0-9]+" :messageheader3 mandatory fold single ::messagetype "940" drop ::CS1 " " drop ::messagepriority "00" drop :TRN fold ::tag ":20:" drop ::reference "GTZPB|MPBZ|INGEB" :accountid fold ::tag ":25:" drop ::accountno 10 :statementno fold ::tag ":28C:" drop ::settlementno 0 drop :openingbalance mandatory single ::tag ":60F:" drop ::creditdebit 1 ::date 6 ::currency "EUR" ::amount 0 "[0-9]+,[0-9][0-9]" :statements ::statementline mandatory fold single :::tag ":61:" drop :::valuedate 6 :::creditdebit 1 :::amount "[0-9]+,[0-9][0-9]" :::CS1 "N" drop :::transactiontype 3 # 3 for Postbank, 4 for ING :::paymentreference 0 ::informationtoaccountowner fold single :::tag ":86:" drop :::accountno "[0-9]*( |)" :::accountname 0 ::description fold :::description 0 "|[^:].*" :closingbalance mandatory single ::tag ":62[FM]:" drop ::creditdebit 1 ::date 6 ::currency "EUR" ::amount 0 "[0-9]+,[0-9][0-9]" :informationtoaccountowner fold single ::tag ":86:" drop ::debit "D" drop ::debitentries 6 ::credit "C" drop ::creditentries 6 ::debit "D" drop ::debitamount "[0-9]+,[0-9][0-9]" ::credit "C" drop ::creditamount "[0-9]+,[0-9][0-9]" drop ::accountname "(\n[^-:][^\n]*)*" drop :messagetrailer mandatory single ::start "-" ::end "XXX" [Tabular description end]
- Methodcreate
Parser.TabularParser.Tabular(
void
|string
|Stdio.File
|Stdio.FILE
input
,void
|array
|mapping
|string
|Stdio.File
|Stdio.FILE
format
,void
|int
verbose
)- Description
This function initialises the parser.
- Parameter
input
The input stream or string.
- Parameter
format
The format to be used (either precompiled or not). The format description language is documented under
compile()
.- Parameter
verbose
If
>1
, it specifies the number of characters to display of the beginning of each record as a progress indicator. Special values are:-4
Turns on format debugging with visible mismatches.
-3
Turns on format debugging with named field contents.
-2
Turns on format debugging with field contents.
-1
Turns on basic format debugging.
0
Turns off verbosity. Default.
1
Is the same as setting it to
70
.- See also
compile()
,setformat()
,fetch()
- Methodfeed
object
feed(string
content
)- Parameter
content
Is injected into the input stream.
- Returns
This object.
- See also
fetch()
- Methodfetch
mapping
|zero
fetch(void
|array
|mapping
format
)- Description
This function consumes as much input as needed to parse the full tabular structures at once.
- Parameter
format
Describes (precompiled only) formats to be parsed. If no format is specified, the format specified on
create()
is used, and empty lines are automatically skipped.- Returns
A nested mapping that contains the complete structure as described in the specified format.
If nothing matches the specified format, no input is consumed (except empty lines, if the default format is used), and zero is returned.
- See also
compile()
,create()
,setformat()
,skipemptylines()
- Methodsetformat
array
|mapping
setformat(array
|mapping
format
)- Parameter
format
Replaces the default (precompiled only) format.
- Returns
The previous default format.
- See also
compile()
,fetch()
Module Parser.C
- Methodgroup
array
(Token
|array
) group(array
(string
|Token
)tokens
,void
|mapping
(string
:string
)groupings
)- Description
Fold sub blocks of an array of tokens into sub arrays, for grouping purposes.
- Parameter
tokens
The token array to fold.
- Parameter
groupings
Supplies the tokens marking the boundaries of blocks to fold. The indices of the mapping mark the start of a block, the corresponding values mark where the block ends. The sub arrays will start and end in these tokens. If no groupings mapping is provided, {}, () and [] are used as block boundaries.
- Methodhide_whitespaces
array
hide_whitespaces(array
tokens
)- Description
Folds all whitespace tokens into the previous token's trailing_whitespaces.
- Methodreconstitute_with_line_numbers
string
reconstitute_with_line_numbers(array
(string
|Token
|array
)tokens
)- Description
Like
simple_reconstitute
, but adding additional #line n "file" preprocessor statements in the output whereever a new line or file starts.
- Methodsimple_reconstitute
string
simple_reconstitute(array
(string
|Token
|array
)tokens
)- Description
Reconstitutes the token array into a plain string again; essentially reversing
split()
and whichever of thetokenize
,group
andhide_whitespaces
methods may have been invoked.
- Methodsplit
array
(string
) split(string
data
,void
|mapping
(string
:string
)state
)- Description
Splits the
data
string into an array of tokens. An additional element with a newline will be added to the resulting array of tokens. If the optional argumentstate
is provided the split function is able to pause and resume splitting inside #"" and /**/ tokens. Thestate
argument should be an initially empty mapping, in which split will store its state between successive calls.
- Methodstrip_line_statements
array
(Token
|array
) strip_line_statements(array
(Token
|array
)tokens
)- Description
Strips off all (preprocessor) line statements from a token array.
- Methodtokenize
array
(Token
) tokenize(array
(string
)s
,void
|string
file
)- Description
Returns an array of
Token
objects given an array of string tokens.
Class Parser.C.Token
- Description
Represents a C token, along with a selection of associated data and operations.
- Variabletrailing_whitespaces
string
Parser.C.Token.trailing_whitespaces- Description
Trailing whitespaces.
- Method_sprintf
stringsprintf(stringformat, ... Parser.C.Tokenarg ... )
- Description
If the object is printed as %s it will only output its text contents.
- Method`+
string
res =Parser.C.Token()
+s
- Description
A string can be added to the Token, which will be added to the text contents.
- Method`==
int
res =Parser.C.Token()
==foo
- Description
Tokens are considered equal if the text contents are equal. It is also possible to compare the Token object with a text string directly.
- Method`[]
int
|string
res =Parser.C.Token()
[a
]- Description
Characters and ranges may be indexed from the text contents of the token.
- Method``+
string
res =s
+Parser.C.Token()
- Description
A string can be added to the Token, which will be added to the text contents.
- Methodcast
(int)Parser.C.Token()
(float)Parser.C.Token()
(string)Parser.C.Token()
(array)Parser.C.Token()
(mapping)Parser.C.Token()
(multiset)Parser.C.Token()- Description
It is possible to case a Token object to a string. The text content will be returned.
Class Parser.C.UnterminatedCharacterError
- Description
Error thrown when an unterminated character token is encountered.
Class Parser.C.UnterminatedCommentError
- Description
Error thrown when an unterminated comment token is encountered.
- Methodgroup
Module Parser.ECMAScript
- Description
ECMAScript/JavaScript token parser based on ECMAScript 2017 (ECMA-262), chapter 11: Lexical Grammar.
Module Parser.LR
- Description
LALR(1) parser generator.
Enum Parser.LR.SeverityLevel
- Description
Severity level
Class Parser.LR.ErrorHandler
- Description
Class handling reporting of errors and warnings.
- Variableverbose
optional
int(-1..1)
Parser.LR.ErrorHandler.verbose- Description
Verbosity level
-1
Just errors.
0
Errors and warnings.
1
Also notices.
Class Parser.LR.Parser
- Description
This object implements an LALR(1) parser and compiler.
Normal use of this object would be:
set_error_handler {add_rule, set_priority, set_associativity}* set_symbol_to_string compile {parse}*
- Variableerror_handler
function
(SeverityLevel
,string
,string
,mixed
... :void
) Parser.LR.Parser.error_handler- Description
Compile error and warning handler.
- Variableknown_states
mapping
(string
:Kernel
) Parser.LR.Parser.known_states- Description
LR0 states that are already known to the compiler.
- Variables_q
StateQueue
|zero
Parser.LR.Parser.s_q- Description
Contains all states used. In the queue section are the states that remain to be compiled.
- Method_sprintf
stringsprintf(stringformat, ... Parser.LR.Parserarg ... )
- Description
Pretty-prints the current grammar to a string.
- Methodcast
(int)Parser.LR.Parser()
(float)Parser.LR.Parser()
(string)Parser.LR.Parser()
(array)Parser.LR.Parser()
(mapping)Parser.LR.Parser()
(multiset)Parser.LR.Parser()- Description
Implements casting.
- Parameter
type
Type to cast to.
- Methodcompile
int
compile()- Description
Compiles the grammar into a parser, so that parse() can be called.
- Methoditem_to_string
string
item_to_string(Item
i
)- Description
Pretty-prints an item to a string.
- Parameter
i
Item to pretty-print.
- Methodparse
mixed
parse(object
|function
(void
:string
|array
(string
|mixed
))scanner
,void
|object
action_object
)- Description
Parse the input according to the compiled grammar. The last value reduced is returned.
- Note
The parser must have been compiled (with compile()) prior to calling this function.
- Bugs
Errors should be throw()n.
- Parameter
scanner
The scanner function. It returns the next symbol from the input. It should either return a string (terminal) or an array with a string (terminal) and a mixed (value). EOF is indicated with the empty string.
- Parameter
action_object
Object used to resolve those actions that have been specified as strings.
- Methodrule_to_string
string
rule_to_string(Rule
r
)- Description
Pretty-prints a rule to a string.
- Parameter
r
Rule to print.
- Methodset_associativity
void
set_associativity(string
terminal
,int
assoc
)- Description
Sets the associativity of a terminal.
- Parameter
terminal
Terminal to set the associativity for.
- Parameter
assoc
Associativity; negative - left, positive - right, zero - no associativity.
- Methodset_error_handler
void
set_error_handler(void
|function
(SeverityLevel
,string
,string
,mixed
... :void
)handler
)- Description
Sets the error report function.
- Parameter
handler
Function to call to report errors and warnings. If zero or not specifier, use the built-in function.
- Methodset_priority
void
set_priority(string
terminal
,int
pri_val
)- Description
Sets the priority of a terminal.
- Parameter
terminal
Terminal to set the priority for.
- Parameter
pri_val
Priority; higher = prefer this terminal.
- Methodset_symbol_to_string
void
set_symbol_to_string(void
|function
(int
|string
:string
)s_to_s
)- Description
Sets the symbol to string conversion function. The conversion function is used by the various *_to_string functions to make comprehensible output.
- Parameter
s_to_s
Symbol to string conversion function. If zero or not specified, use the built-in function.
- Methodstate_to_string
string
state_to_string(Kernel
state
)- Description
Pretty-prints a state to a string.
- Parameter
state
State to pretty-print.
Class Parser.LR.Parser.Item
- Description
An LR(0) item, a partially parsed rule.
- Variabledirect_lookahead
multiset
(string
) Parser.LR.Parser.Item.direct_lookahead- Description
Look-ahead set for this item.
- Variableerror_lookahead
multiset
(string
) Parser.LR.Parser.Item.error_lookahead- Description
Look-ahead set used for detecting conflicts
- Variableitem_id
int
Parser.LR.Parser.Item.item_id- Description
Used to identify the item. Equal to r->number + offset.
- Variablemaster_item
Item
|zero
Parser.LR.Parser.Item.master_item- Description
Item representing this one (used for shifts).
- Variablenext_state
Kernel
|zero
Parser.LR.Parser.Item.next_state- Description
The state we will get if we shift according to this rule
- Variablenumber
int
Parser.LR.Parser.Item.number- Description
Item identification number (used when compiling).
- Variableoffset
int
Parser.LR.Parser.Item.offset- Description
How long into the rule the parsing has come.
Class Parser.LR.Parser.Kernel
- Description
Implements an LR(1) state
- Variableaction
mapping
(int
|string
:Kernel
|Rule
) Parser.LR.Parser.Kernel.action- Description
The action table for this state
object(kernel) SHIFT to this state on this symbol. object(rule) REDUCE according to this rule on this symbol.
- Variableclosure_set
multiset
Parser.LR.Parser.Kernel.closure_set- Description
The symbols that closure has been called on.
- Variableitem_id_to_item
mapping
(int
:Item
) Parser.LR.Parser.Kernel.item_id_to_item- Description
Used to lookup items given rule and offset
- Variablerules
multiset
(Rule
) Parser.LR.Parser.Kernel.rules- Description
Used to check if a rule already has been added when doing closures.
- Variablesymbol_items
mapping
(int
:multiset
(Item
)) Parser.LR.Parser.Kernel.symbol_items- Description
Contains the items whose next symbol is this non-terminal.
- Methodclosure
void
closure(int
nonterminal
)- Description
Make the closure of this state.
- Parameter
nonterminal
Nonterminal to make the closure on.
- Methoddo_goto
Kernel
do_goto(int
|string
symbol
)- Description
Generates the state reached when doing goto on the specified symbol. i.e. it compiles the LR(0) state.
- Parameter
symbol
Symbol to make goto on.
Class Parser.LR.Parser.StateQueue
- Description
This is a queue, which keeps the elements even after they are retrieved.
Class Parser.LR.Priority
- Description
Specifies the priority and associativity of a rule.
Class Parser.LR.Rule
- Description
This object is used to represent a BNF-rule in the LR parser.
- Variableaction
function
(:void
)|string
|zero
Parser.LR.Rule.action- Description
Action to do when reducing this rule. function - call this function. string - call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will be the value of this non-terminal. The default rule is to return the first argument.
- Variablenum_nonnullables
int
Parser.LR.Rule.num_nonnullables- Description
This rule has this many non-nullable symbols at the moment.
- Variablenumber
int
Parser.LR.Rule.number- Description
Sequence number of this rule (used for conflict resolving) Also used to identify the rule.
- Methodcreate
Parser.LR.RuleParser.LR.Rule(
int
nt
,array
(string
|int
)r
,function
(:void
)|string
|void
a
)- Description
Create a BNF rule.
- Example
The rule
rule : nonterminal ":" symbols ";" { add_rule };
might be created as
rule(4, ({ 9, ":", 5, ";" }), "add_rule");
where 4 corresponds to the nonterminal "rule", 9 to "nonterminal" and 5 to "symbols", and the function "add_rule" is too be called when this rule is reduced.
- Parameter
nt
Non-terminal to reduce to.
- Parameter
r
Symbol sequence that reduces to nt.
- Parameter
a
Action to do when reducing according to this rule. function - Call this function. string - Call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will become the value of this non-terminal. The default rule is to return the first argument.
Module Parser.LR.GrammarParser
- Description
This module generates an LR parser from a grammar specified according to the following grammar:
directives : directive ; directives : directives directive ; directive : declaration ; directive : rule ; declaration : "%token" terminals ";" ; rule : nonterminal ":" symbols ";" ; rule : nonterminal ":" symbols action ";" ; symbols : symbol ; symbols : symbols symbol ; terminals : terminal ; terminals : terminals terminal ; symbol : nonterminal ; symbol : "string" ; action : "{" "identifier" "}" ; nonterminal : "identifier" ; terminal : "string";
- Methodmake_parser
Parser
make_parser(string
str
,object
|void
m
)- Description
Compiles the parser-specification given in the first argument. Named actions are taken from the object if available, otherwise left as is.
- Bugs
Returns error-code in both GrammarParser.error and return_value->lr_error.
Module Parser.Markdown
- Description
This is a port of the Javascript Markdown parser 'Marked' https://github.com/chjj/marked. The only method needed to be used is
parse()
which will transform Markdown text to HTML.For a description on Markdown, go to the web page of the inventor of Markdown https://daringfireball.net/projects/markdown/.
- Methodencode_html
protected
string
encode_html(string
html
,void
|bool
enc
)- Description
HTML encode <>"'. If
enc
is true& will also be encoded
- Methodparse
string
parse(string
md
,void
|mapping
options
)- Description
Convert markdown
md
to html- Parameter
options
"gfm"
:bool
Enable Github Flavoured Markdown. (true)
"tables"
:bool
Enable GFM tables. Requires "gfm" (true)
"breaks"
:bool
Enable GFM "breaks". Requires "gfm" (false)
"pedantic"
:bool
Conform to obscure parts of markdown.pl as much as possible. Don't fix any of the original markdown bugs or poor behavior. (false)
"sanitize"
:bool
Sanitize the output. Ignore any HTML that has been input. (false)
"mangle"
:bool
Mangle (obfuscate) autolinked email addresses (true)
"smart_lists"
:bool
Use smarter list behavior than the original markdown. (true)
"smartypants"
:bool
Use "smart" typographic punctuation for things like quotes and dashes. (false)
"header_prefix"
:string
Add prefix to ID attributes of header tags (empty)
"xhtml"
:bool
Generate self closing XHTML tags (false)
"newline"
:bool
Add a newline after tags. If false the output will be on one line (well, newlines in text will be kept). (false)
"renderer"
:Renderer
Use this renderer to render output. (Renderer)
"lexer"
:Lexer
Use this lexer to parse blocks of text. (Lexer)
"inline_lexer"
:InlineLexer
Use this lexer to parse inline text. (InlineLexer)
"parser"
:Parser
Use this parser instead of the default. (Parser)
- Methodreplace1
protected
string
replace1(string
subject
,string
from
,string
to
)- Description
Replaces the first occurance of
from
insubject
toto
Class Parser.Markdown.InlineLexer
- Description
Lexer used for inline text (eg bold text inside a paragraph).
Class Parser.Markdown.Lexer
- Description
Block-level lexer (parses paragraphs, lists, tables, etc).
Class Parser.Markdown.Parser
- Description
Top-level parsing handler. It's usually easier to replace the Renderer instead.
Class Parser.Markdown.Renderer
- Methodattrs
string
attrs(mapping
token
,mapping
|void
dflt
)- Description
Collect additional attributes from the token and render them as HTML attributes. Default attributes can be provided.
- Methodhtml
Methodtext
Methodstrong
Methodem
Methoddel
Methodcodespan
Methodbr string
html(string
text
,mapping
token
)string
text(string
t
,mapping
token
)string
strong(string
t
,mapping
token
)string
em(string
t
,mapping
token
)string
del(string
t
,mapping
token
)string
codespan(string
t
,mapping
token
)string
br(mapping
token
)
- Methodattrs
Module Parser.Pike
- Description
This module parses and tokenizes Pike source code.
Module Parser.Python
Module Parser._parser
- Description
Low-level helpers for parsers.
- Note
You probably don't want to use the modules contained in this module directly, but instead use the other
Parser
modules. See instead the modules below.- See also
Parser
,Parser.C
,Parser.Pike
,Parser.RCS
,Parser.HTML
,Parser.XML
Module Parser._parser._C
- Description
Low-level helpers for
Parser.C
.- Note
You probably want to use
Parser.C
instead of this module.- See also
Parser.C
,_Pike
.
Module Parser._parser._Pike
- Description
Low-level helpers for
Parser.Pike
.- Note
You probably want to use
Parser.Pike
instead of this module.- See also
Parser.Pike
,_C
.
Module Parser._parser._RCS
- Description
Low-level helpers for
Parser.RCS
.- Note
You probably want to use
Parser.RCS
instead of this module.- See also
Parser.RCS
- Methoddecode_numeric_xml_entity