More strict entity matching and better handle duplicate enum entity cases #1221
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Template
PR Checklist
npm test
locally and all tests are passing.PR Description
This PR mainly adds a new feature to allow to limit the matched entities to the ones defined by the protvided utterances. This is mainly helpful when builtins are used that match content "by themself" and so add entities that were not expected and also might overlap with defined entities. Because of the fact that builtin extractors were executed before otehr entities they had precendence potentially.
With the new NER setting considerOnlyIntentEntities=true/false the matched entities can be hard limited to the ones defined by utterances of the matched intent. All other entities that might have matched are filtered out.
Additionally the code also handles the case that there are overlapping enum entities and if more then one are defined for the intent then the matching checks if one was already matched and then matches the other ones. In the past the entity matching was allways detected on "the first" enum entity with the relevant value. Now they are filled in the order they were added.
The PR also adds a lot more testing for entitiy matching