Uses
- find elements within an XML document
- test whether particular element(s) exist in an XML document
Background
-
A DSL for selecting nodes from an XML document
-
Spec:
- Versions (1.0 is defacto default)
- As of 2020-04-09, current version is 3.1 https://www.w3.org/TR/xpath-3/
- 1.0 spec: https://www.w3.org/TR/1999/REC-xpath-19991116/
- Many/most implementations only support 1.0
- e.g. Nokogiri depends on libxml2 which supports 1.0 and has no plans for 2.0 support because the spec changed a lot from 1.0 to 2.0
- e.g. Firefox Gecko implements 1.0
- defines path expressions
- Versions (1.0 is defacto default)
-
can be used in any XML like document: XML, HTML, SVG
-
implemented in all language stacks
- not all browsers/stacks implement all features
-
xpath has functions e.g.
position()
- lets you choose an element by position
number()
string()
string-length()
- ...
-
syntax
- starting points for the search
//
- recursive descent operator
- starting the path with
//
means starting anywhere within the document
/
- selects the document root
- ignores whatever context node you might have passed in to begin with
- attribute names being with
@
- conditions go within
[]
- logical AND conditions by specifying them concurrently e.g.
//things[@type="good"][@size="big"]
will return all<thing>
elements which have an attributetype="good"
ANDsize="big"
- OR ?
- NOT ?
- logical AND conditions by specifying them concurrently e.g.
- starting points for the search
-
axes
- there are 13 axes
- an axis represents a relationship to the context node and is used to locate nodes relative to that node in the tree
- not all implementations support all axes
-
models the document as a tree of nodes
-
types of nodes
- element nodes
- attribute node
- text node
-
defines a way to compute a string value for each type of node
-
some types of nodes have "names"
- a "name" is a tuple of (local-part, namespace-uri)
- the namespace-uri can be null
- a "name" is a tuple of (local-part, namespace-uri)
-
xpath expressions return one of 4 basic types when evaluated
- node-set
- unordered collection of nodes without duplicates
- boolean (true or false)
- number ( floating point number)
- string (a sequence of UCS characters)
- node-set
-
note that "a single node" is not a basic type - nodes are always in a node-set
-
expressions are evaluated with respect to a context (think of it like "context" in the sense of evaluating a function in a lang with lexical scope
-
context consists of
- a node (the context node)
- a pair of non-zero positive integers (context-position, cotext-size)
- context-position is always less than or equal to context size
- a set of variable bindings
- a mapping from variable names to variable values
- variable values can have any type valid for the return value of an expression and other types (???)
- a function library
- a mapping of function names to functions
- each function takes 0 or more args
- each function returns one of the 4 basic types (see above)
- the spec defines a set of core functions
- XSLT and XPointer define additional functions and data types
- a set of namespace declarations in scope for the expression
- a mapping from prefixes to namespace URIs
-
the expression is a set of location steps each separated by
/
-
the set of nodes selected by each step becomes the context node for the location step to it's right e.g.
- example:
a/b/c
- the nodes selected by a become the context node for b
- the nodes selecte by b become the context node for c and so on
- example:
-
each location step has 3 parts:
- an axis
- specifies the tree relationship between the nodes selected by this location step and the context node
- a node test
- specifies node type and expanded name of the nodes selected by this location step
- 0-many predicates
- use arbitrary expressions to further refine the set of nodes selected
- an axis