Skip to content

Files

Latest commit

889a2a5 · Oct 26, 2020

History

History
95 lines (88 loc) · 4.16 KB

xpath.md

File metadata and controls

95 lines (88 loc) · 4.16 KB

XPath

Uses

  1. find elements within an XML document
  2. test whether particular element(s) exist in an XML document

Background

  • A DSL for selecting nodes from an XML document

  • Spec:

  • can be used in any XML like document: XML, HTML, SVG

  • implemented in all language stacks

    • not all browsers/stacks implement all features
  • xpath has functions e.g.

    • position()
      • lets you choose an element by position
    • number()
    • string()
    • string-length()
    • ...
  • syntax

    • starting points for the search
      • //
        • recursive descent operator
        • starting the path with // means starting anywhere within the document
      • /
        • selects the document root
        • ignores whatever context node you might have passed in to begin with
    • attribute names being with @
    • conditions go within []
      • logical AND conditions by specifying them concurrently e.g. //things[@type="good"][@size="big"] will return all <thing> elements which have an attribute type="good" AND size="big"
      • OR ?
      • NOT ?
  • axes

    • there are 13 axes
    • an axis represents a relationship to the context node and is used to locate nodes relative to that node in the tree
    • not all implementations support all axes
  • models the document as a tree of nodes

  • types of nodes

    1. element nodes
    2. attribute node
    3. text node
  • defines a way to compute a string value for each type of node

  • some types of nodes have "names"

    • a "name" is a tuple of (local-part, namespace-uri)
      • the namespace-uri can be null
  • xpath expressions return one of 4 basic types when evaluated

    1. node-set
      • unordered collection of nodes without duplicates
    2. boolean (true or false)
    3. number ( floating point number)
    4. string (a sequence of UCS characters)
  • note that "a single node" is not a basic type - nodes are always in a node-set

  • expressions are evaluated with respect to a context (think of it like "context" in the sense of evaluating a function in a lang with lexical scope

  • context consists of

    1. a node (the context node)
    2. a pair of non-zero positive integers (context-position, cotext-size)
      • context-position is always less than or equal to context size
    3. a set of variable bindings
      • a mapping from variable names to variable values
      • variable values can have any type valid for the return value of an expression and other types (???)
    4. a function library
      • a mapping of function names to functions
      • each function takes 0 or more args
      • each function returns one of the 4 basic types (see above)
      • the spec defines a set of core functions
      • XSLT and XPointer define additional functions and data types
    5. a set of namespace declarations in scope for the expression
      • a mapping from prefixes to namespace URIs
  • the expression is a set of location steps each separated by /

  • the set of nodes selected by each step becomes the context node for the location step to it's right e.g.

    • example:
      a/b/c
      
    • the nodes selected by a become the context node for b
    • the nodes selecte by b become the context node for c and so on
  • each location step has 3 parts:

    1. an axis
      • specifies the tree relationship between the nodes selected by this location step and the context node
    2. a node test
      • specifies node type and expanded name of the nodes selected by this location step
    3. 0-many predicates
      • use arbitrary expressions to further refine the set of nodes selected