Multiple entities of same type/same values resolving incorrectly #1054

manikantag · 2022-01-20T17:26:19Z

Describe the bug
When the utterances have multiple entities of the same type (i.e., same values), then the entities are resolved incorrectly.

To Reproduce
Steps to reproduce the behavior:

Download this codesandbox: https://codesandbox.io/s/nlp-entities-test-k7mds (somehow can't run this online)
Extract the zip to 'nlp-entities-test' directory
cd nlp-entities-test
npm i
node .
Console message shows the entities resolved incorrectly

corpus.json:

{
  "name": "Forex Corpus",
  "locale": "en-US",
  "data": [
      {
          "intent": "GetForexRates",
          "utterances": [
              "convert from @ccyFrom to @ccyTo"
          ],
          "answers": [
              "{{ ccyFrom }} is equal to 76 {{ ccyTo }}"
          ]
      }
  ],
  "entities": {
      "ccyFrom": {
          "options": {
              "usd": ["USD", "Dollar"],
              "inr": ["INR", "rupee"]
          }
      },
      "ccyTo": {
          "options": {
              "usd": ["USD", "Dollar"],
              "inr": ["INR", "rupee"],
              "aud": ["AUD"],
              "gbp": ["GBP", "Pound"]
          }
      }
  }
}

Here, USD & INR are part of both ccyFrom and ccyTo. In my case, these 2 entities are actually of the same type. For this test purpose only I removed a few entries from ccyFrom.

When the test phrase is: convert from USD to INR, both the entities are resolved as same ccyFrom with alias as ccyFrom_0 = USD and ccyFrom_1 = INR. Below is the output:

(node:15384) [PINODEP008] PinoWarning: prettyPrint is deprecated, look at https://github.com/pinojs/pino-pretty for alternatives.
(Use `node --trace-warnings ...` to show where the warning was created)
Epoch 1 loss 1 time 0ms
Epoch 2 loss 0.9650657932707684 time 0ms
Epoch 3 loss 0.5299402324845675 time 0ms
Epoch 4 loss 0.19224289789421256 time 0ms
Epoch 5 loss 0.03955679344829571 time 0ms
Epoch 6 loss 0.0014750207906256687 time 0ms
Epoch 7 loss 0.002237304943064969 time 0ms
Epoch 8 loss 0.006008538506772707 time 0ms
Epoch 9 loss 0.005510054266446926 time 1ms
Epoch 10 loss 0.003098378075640447 time 0ms
Epoch 11 loss 0.0011584788091092459 time 0ms
Epoch 12 loss 0.00025270211351046876 time 0ms
Epoch 13 loss 0.000012782181064874715 time 0ms
{
 locale: 'en',
 utterance: 'convert from USD to INR',
 settings: undefined,
 languageGuessed: false,
 localeIso2: 'en',
 language: 'English',
 nluAnswer: {
   classifications: [ [Object] ],
   entities: undefined,
   explanation: undefined
 },
 classifications: [ { intent: 'GetForexRates', score: 1 } ],
 intent: 'GetForexRates',
 score: 1,
 domain: 'default',
 sourceEntities: [],
 entities: [
   {
     start: 13,
     end: 15,
     len: 3,
     levenshtein: 0,
     accuracy: 1,
     entity: 'ccyFrom', // <<<<<< resolved as 'ccyFrom' which is expected
     type: 'enum',
     option: 'usd',
     sourceText: 'USD',
     utteranceText: 'USD',
     alias: 'ccyFrom_0'
   },
   {
     start: 20,
     end: 22,
     len: 3,
     levenshtein: 0,
     accuracy: 1,
     entity: 'ccyFrom',  // <<<<<< resolved as 'ccyFrom', but 'ccyTo' is expected
     type: 'enum',
     option: 'inr',
     sourceText: 'INR',
     utteranceText: 'INR',
     alias: 'ccyFrom_1'
   }
 ],
 answers: [ { answer: 'USD is equal to 76 {{ ccyTo }}', opts: undefined } ],
 answer: 'USD is equal to 76 {{ ccyTo }}',
 actions: [],
 sentiment: {
   score: 0.625,
   numWords: 5,
   numHits: 2,
   average: 0.125,
   type: 'senticon',
   locale: 'en',
   vote: 'positive'
 }
}

If the test phrase is convert from USD to GBP, then the entities are resolved as expected to ccyFrom = USD and ccyTo = GBP as GBP is not present in ccyFrom options.

Expected behavior
Entities should be resolved by the entity names used in the utterances (in this case, matching utterance is 'convert from @ccyFrom to @ccyto')

Desktop (please complete the following information):
- OS: Windows 11
- Browser: NA
- Package version: 4.22.7
- Node version: 16.13.

The text was updated successfully, but these errors were encountered:

jankoenig · 2022-02-04T12:49:26Z

We're seeing this issue as well. Here is a discussion about it: https://community.jovo.tech/t/entities-resolved-incorrectly/1767/6?u=jan

Apollon77 · 2022-08-08T15:35:45Z

I think the reason is that entities are parsed just by searching for the words in the whole text. So the "order of the entities in the utterance" is not taken into account right now. Especially because of the support to detect "repeated entities" of the same type in such nunbered entities makes that very difficult.

I think one idea could be to combine an enum with "trim rules" (would be new feature), so allow to specify that e.g. before the enum always e.g. "to" needs to be existence. then the "ccyTo" wopuld have an options list AND a trim definition and only if both matches it would be allowed.

Currently if you define two entitiy types for the same entity name they are processed both and in the end you might end up with two entities found ... this would needed to change, so in fact kind of a breakign change ....

I could look into this because I'm currently working on entity stuff if wanted

Apollon77 · 2022-08-12T07:47:42Z

I prepare a PR for it - also in connection to #1174 ... but most likely PR nreeds to wait until my other 4 PRs are merged ... It starts to overlap code-wise, so else it gets a merge hell.

With the changes from the said PR we end in this matching:

"entities": [
        {
          "start": 13,
          "end": 15,
          "len": 3,
          "levenshtein": 0,
          "accuracy": 1,
          "entity": "ccyFrom",
          "type": "enum",
          "option": "usd",
          "sourceText": "USD",
          "utteranceText": "USD"
        },
        {
          "start": 20,
          "end": 22,
          "len": 3,
          "levenshtein": 0,
          "accuracy": 1,
          "entity": "ccyTo",
          "type": "enum",
          "option": "inr",
          "sourceText": "INR",
          "utteranceText": "INR"
        }
      ],

Apollon77 · 2022-11-30T08:02:20Z

should be adressed by #1221

rNineTheFirst · 2023-06-02T14:30:01Z

I think one idea could be to combine an enum with "trim rules" (would be new feature), so allow to specify that e.g. before the enum always e.g. "to" needs to be existence. then the "ccyTo" wopuld have an options list AND a trim definition and only if both matches it would be allowed.

Hi @Apollon77 is this feature already implemented and if so is there sample corpus for this? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple entities of same type/same values resolving incorrectly #1054

Multiple entities of same type/same values resolving incorrectly #1054

manikantag commented Jan 20, 2022 •

edited

Loading

jankoenig commented Feb 4, 2022

Apollon77 commented Aug 8, 2022

Apollon77 commented Aug 12, 2022

Apollon77 commented Nov 30, 2022

rNineTheFirst commented Jun 2, 2023

Multiple entities of same type/same values resolving incorrectly #1054

Multiple entities of same type/same values resolving incorrectly #1054

Comments

manikantag commented Jan 20, 2022 • edited Loading

corpus.json:

jankoenig commented Feb 4, 2022

Apollon77 commented Aug 8, 2022

Apollon77 commented Aug 12, 2022

Apollon77 commented Nov 30, 2022

rNineTheFirst commented Jun 2, 2023

manikantag commented Jan 20, 2022 •

edited

Loading