string.Template should allow inspection of identifiers #90465

benkehoe · 2022-01-08T20:12:30Z

BPO	46307
Nosy	@warsaw, @serhiy-storchaka, @miss-islington, @benkehoe
PRs	bpo-46307: Add string.Template.get_identifiers() method #30493

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2022-01-12.22:12:21.559>
created_at = <Date 2022-01-08.20:12:29.789>
labels = ['type-feature', 'library', '3.11']
title = 'string.Template should allow inspection of identifiers'
updated_at = <Date 2022-01-12.22:12:21.559>
user = 'https://github.com/benkehoe'

bugs.python.org fields:

activity = <Date 2022-01-12.22:12:21.559>
actor = 'AlexWaygood'
assignee = 'none'
closed = True
closed_date = <Date 2022-01-12.22:12:21.559>
closer = 'AlexWaygood'
components = ['Library (Lib)']
creation = <Date 2022-01-08.20:12:29.789>
creator = 'ben11kehoe'
dependencies = []
files = []
hgrepos = []
issue_num = 46307
keywords = ['patch']
message_count = 11.0
messages = ['410112', '410127', '410128', '410129', '410131', '410144', '410148', '410172', '410174', '410175', '410322']
nosy_count = 4.0
nosy_names = ['barry', 'serhiy.storchaka', 'miss-islington', 'ben11kehoe']
pr_nums = ['30493']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue46307'
versions = ['Python 3.11']

benkehoe · 2022-01-08T20:12:30Z

Currently, the only thing that can be done with a string.Template instance and a mapping is either attempt to substitute with substitute() and catch a KeyError if some identifier has not been provided in the mapping, or substitute with safe_substitute() and not know whether all identifiers were provided.

I propose adding a method that returns the identifiers in the template. Because the template string and pattern are exposed, this is already possible as a separate function:

def get_identifiers(template):
    return list(
        set(
            filter(
                lambda v: v is not None,
                (mo.group('named') or mo.group('braced') 
                 for mo in template.pattern.finditer(template.template))
            )
        )
    )

However, this function is not easy for a user of string.Template to construct without learning how the template pattern works (which is documented but intended to be learned only when subclassing or modifying id patterns).

As a method on string.Template, this would enable use cases like more comprehensive error handling (e.g., finding all missing mapping keys at once) or interactive prompting.

warsaw · 2022-01-09T01:11:39Z

I've never personally needed this, but I could see where it could come in handy.

I wonder if __iter__() would be the right API for that? I wonder then if we should also implement __contains__()?

Would you be interested in creating a PR for the feature?

benkehoe · 2022-01-09T02:51:18Z

Happy to make a PR! In my mind I had been thinking it would be the get_identifiers() method with the implementation above, returning a list.

As for __iter__, I'm less clear on what that would look like:

t = string.Template(...)
for identifier in t:
  # what would I do here?
  # would it include repeats if they appear more than once in the template?

I guess there are two ways to think about it: one is "what identifiers are in this template?" which I think should return a list with no repeats, which I can then iterate over or check if a value is in it. The other is, "what are the contents of the template?" in the style of string.Formatter.parse().

Given that string.Template is supposed to be the "simple, no-frills" thing in comparison to string.Formatter, I see less use for the latter option.

warsaw · 2022-01-09T03:28:17Z

I think you’re right that the iterator API isn’t very helpful. I also agree that you probably really want to answer the “why identifiers are in this template?” question. As for repeats, there’s two ways to think about it. You could return all the identifiers in the order in which they’re found in the template (and you can unique-ify them if you want by passing that list to set()). But maybe you don’t really need that either.

get_identifiers() works for me!

On Jan 8, 2022, at 18:51, Ben Kehoe <report@bugs.python.org> wrote:

Ben Kehoe <ben@kehoe.io> added the comment:

Happy to make a PR! In my mind I had been thinking it would be the get_identifiers() method with the implementation above, returning a list.

As for __iter__, I'm less clear on what that would look like:

t = string.Template(...)
for identifier in t:

what would I do here?

would it include repeats if they appear more than once in the template?

I guess there are two ways to think about it: one is "what identifiers are in this template?" which I think should return a list with no repeats, which I can then iterate over or check if a value is in it. The other is, "what are the contents of the template?" in the style of string.Formatter.parse().

Given that string.Template is supposed to be the "simple, no-frills" thing in comparison to string.Formatter, I see less use for the latter option.

benkehoe · 2022-01-09T03:49:10Z

I opened a PR. By default, it raises an exception if there's an invalid identifier; there's a keyword argument raise_on_invalid to control that.

The implementation I have adds them to a set first, which means the order is not guaranteed. I'm of two minds about this: if there's a big template, you want to gather the identifiers in a set so uniqueness is checked immediately and O(1) and without duplication. On the other hand, if templates are never very big, you could build a list (in order) and check it, O(N) style, or build a duplicate list and set in parallel. Or build a big list and check it at the end.

I kind of think ordering doesn't matter? What would someone do with that information?

benkehoe · 2022-01-09T12:47:25Z

Having slept on it, I realized that if I was presenting interactive prompts for a template, I would expect the prompts to be in order that the identifiers appear in the template. Accordingly, I've updated the PR to maintain ordering.

serhiy-storchaka · 2022-01-09T13:53:49Z

What are the use cases for this feature?

benkehoe · 2022-01-09T22:20:42Z

The point is to be able to programmatically determine what is needed for a
successful substitute() call. A basic use case for this is better error
messages; calling substitute() with an incomplete mapping will tell you
only the first missing identifier it encounters; if you know all the
identifiers you can raise an error about all the missing identifiers.
Another error handling use case is checking whether the template is valid,
without needing to provide a complete mapping. A use case unrelated to
error handling that I’ve encountered a few times is interactive prompting
for template values, which you can only do if you can get a list of the
identifiers in the template.

serhiy-storchaka · 2022-01-09T22:39:57Z

The simplest way of collecting template names is to use a defaultdict:

>>> d = collections.defaultdict(str)
>>> string.Template('$a $b').substitute(d)
' '
>>> d.keys()
dict_keys(['a', 'b'])

You can use a custom mapping if you need special handling of absent keys.

benkehoe · 2022-01-09T22:48:03Z

That doesn’t really seem like a Pythonic way of extracting that
information? Nor does it seem like it would be an obvious trick for the
average developer to come up with. A method that provides the information
directly seems useful.

miss-islington · 2022-01-11T19:15:51Z

New changeset dce642f by Ben Kehoe in branch 'main':
bpo-46307: Add string.Template.get_identifiers() method (GH-30493)
dce642f

jgiacobbi · 2024-03-21T19:56:37Z

I have a use case that has been made more complicated by the decision not to list identifiers with duplicates. I am substituting identifiers with capture groups to match output strings. Since re does not allow repeated named capture groups, I have to use unnamed capture groups and match them to their identifiers after the fact. So that leaves three cases:

No repeats, i.e. len(groups) == len(ids)
Repeats in the template but not the groups, so deduplicating the capture groups matches the template since they are both in order
The worst case, repeats in the capture group contents that could fit into more than one identifier and are not actually duplicates.

If the identifiers were listed naively then every case would be case 1.

This is the code I have to work around this problem:

    def parse_groups(self, groups: tuple[str]) -> dict[str, str]:
        ids = self.template.get_identifiers()
        if len(ids) == len(groups):
            return dict(zip(ids, groups))

        # fastest way to dedup and preserve order in 3.7+
        unique = list(dict.fromkeys(groups))
        if len(ids) == len(unique):
            return dict(zip(ids, unique))

        known = set()
        matched = {}

        found = iter(groups)

        # ids and groups are in the order found in the string being matched
        # duplicates can be skipped, but sometimes a duplicate isn't a duplicate,
        # ex. (\d{2})/(\d{2}) - 15/15 - hour/minute
        for id in ids:
            try:
                while value := next(found):
                    if re.match(self.patterns[id], value):
                        matched[id] = value
                        known.add(value)
                        break
                    elif value in known:
                        break  # skip duplicates
            except StopIteration:
                break

        if len(ids) == len(matched):
            return matched

The pull request to change this behavior is simple and obvious, but I haven't made it because I have questions

Am I just wrong? Is there an easy way to do this that I am overlooking?
Is there a good reason not to leave it up to the consumer if they want to convert to a set or an ordered set? Doing it for them destroys information.
If both use cases are valuable then why not support both?

benkehoe mannequin added 3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 8, 2022

AlexWaygood closed this as completed Jan 12, 2022

ezio-melotti transferred this issue from another repository Apr 10, 2022

CAM-Gerlach mentioned this issue Oct 16, 2022

gh-95913 Add string section to Whatsnew with new Template methods #98311

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

string.Template should allow inspection of identifiers #90465

string.Template should allow inspection of identifiers #90465

benkehoe mannequin commented Jan 8, 2022

benkehoe mannequin commented Jan 8, 2022

warsaw commented Jan 9, 2022

benkehoe mannequin commented Jan 9, 2022

warsaw commented Jan 9, 2022

what would I do here?

would it include repeats if they appear more than once in the template?

benkehoe mannequin commented Jan 9, 2022

benkehoe mannequin commented Jan 9, 2022

serhiy-storchaka commented Jan 9, 2022

benkehoe mannequin commented Jan 9, 2022

serhiy-storchaka commented Jan 9, 2022

benkehoe mannequin commented Jan 9, 2022

miss-islington commented Jan 11, 2022

jgiacobbi commented Mar 21, 2024

string.Template should allow inspection of identifiers #90465

string.Template should allow inspection of identifiers #90465

Comments

benkehoe mannequin commented Jan 8, 2022

benkehoe mannequin commented Jan 8, 2022

warsaw commented Jan 9, 2022

benkehoe mannequin commented Jan 9, 2022

warsaw commented Jan 9, 2022

what would I do here?

would it include repeats if they appear more than once in the template?

benkehoe mannequin commented Jan 9, 2022

benkehoe mannequin commented Jan 9, 2022

serhiy-storchaka commented Jan 9, 2022

benkehoe mannequin commented Jan 9, 2022

serhiy-storchaka commented Jan 9, 2022

benkehoe mannequin commented Jan 9, 2022

miss-islington commented Jan 11, 2022

jgiacobbi commented Mar 21, 2024