Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple entities of same type/same values resolving incorrectly #1054

Open
manikantag opened this issue Jan 20, 2022 · 5 comments
Open

Multiple entities of same type/same values resolving incorrectly #1054

manikantag opened this issue Jan 20, 2022 · 5 comments

Comments

@manikantag
Copy link

manikantag commented Jan 20, 2022

Describe the bug
When the utterances have multiple entities of the same type (i.e., same values), then the entities are resolved incorrectly.

To Reproduce
Steps to reproduce the behavior:

  1. Download this codesandbox: https://codesandbox.io/s/nlp-entities-test-k7mds (somehow can't run this online)
  2. Extract the zip to 'nlp-entities-test' directory
  3. cd nlp-entities-test
  4. npm i
  5. node .
  6. Console message shows the entities resolved incorrectly

corpus.json:

{
  "name": "Forex Corpus",
  "locale": "en-US",
  "data": [
      {
          "intent": "GetForexRates",
          "utterances": [
              "convert from @ccyFrom to @ccyTo"
          ],
          "answers": [
              "{{ ccyFrom }} is equal to 76 {{ ccyTo }}"
          ]
      }
  ],
  "entities": {
      "ccyFrom": {
          "options": {
              "usd": ["USD", "Dollar"],
              "inr": ["INR", "rupee"]
          }
      },
      "ccyTo": {
          "options": {
              "usd": ["USD", "Dollar"],
              "inr": ["INR", "rupee"],
              "aud": ["AUD"],
              "gbp": ["GBP", "Pound"]
          }
      }
  }
}

Here, USD & INR are part of both ccyFrom and ccyTo. In my case, these 2 entities are actually of the same type. For this test purpose only I removed a few entries from ccyFrom.

When the test phrase is: convert from USD to INR, both the entities are resolved as same ccyFrom with alias as ccyFrom_0 = USD and ccyFrom_1 = INR. ​Below is the output:

(node:15384) [PINODEP008] PinoWarning: prettyPrint is deprecated, look at https://github.com/pinojs/pino-pretty for alternatives.
(Use `node --trace-warnings ...` to show where the warning was created)
Epoch 1 loss 1 time 0ms
Epoch 2 loss 0.9650657932707684 time 0ms
Epoch 3 loss 0.5299402324845675 time 0ms
Epoch 4 loss 0.19224289789421256 time 0ms
Epoch 5 loss 0.03955679344829571 time 0ms
Epoch 6 loss 0.0014750207906256687 time 0ms
Epoch 7 loss 0.002237304943064969 time 0ms
Epoch 8 loss 0.006008538506772707 time 0ms
Epoch 9 loss 0.005510054266446926 time 1ms
Epoch 10 loss 0.003098378075640447 time 0ms
Epoch 11 loss 0.0011584788091092459 time 0ms
Epoch 12 loss 0.00025270211351046876 time 0ms
Epoch 13 loss 0.000012782181064874715 time 0ms
{
 ​locale: 'en',
 ​utterance: 'convert from USD to INR',
 ​settings: undefined,
 ​languageGuessed: false,
 ​localeIso2: 'en',
 ​language: 'English',
 ​nluAnswer: {
   ​classifications: [ [Object] ],
   ​entities: undefined,
   ​explanation: undefined
 ​},
 ​classifications: [ { intent: 'GetForexRates', score: 1 } ],
 ​intent: 'GetForexRates',
 ​score: 1,
 ​domain: 'default',
 ​sourceEntities: [],
 ​entities: [
   ​{
     ​start: 13,
     ​end: 15,
     ​len: 3,
     ​levenshtein: 0,
     ​accuracy: 1,
     ​entity: 'ccyFrom', // <<<<<< resolved as 'ccyFrom' which is expected
     ​type: 'enum',
     ​option: 'usd',
     ​sourceText: 'USD',
     ​utteranceText: 'USD',
     ​alias: 'ccyFrom_0'
   ​},
   ​{
     ​start: 20,
     ​end: 22,
     ​len: 3,
     ​levenshtein: 0,
     ​accuracy: 1,
     ​entity: 'ccyFrom',  // <<<<<< resolved as 'ccyFrom', but 'ccyTo' is expected
     ​type: 'enum',
     ​option: 'inr',
     ​sourceText: 'INR',
     ​utteranceText: 'INR',
     ​alias: 'ccyFrom_1'
   ​}
 ​],
 ​answers: [ { answer: 'USD is equal to 76 {{ ccyTo }}', opts: undefined } ],
 ​answer: 'USD is equal to 76 {{ ccyTo }}',
 ​actions: [],
 ​sentiment: {
   ​score: 0.625,
   ​numWords: 5,
   ​numHits: 2,
   ​average: 0.125,
   ​type: 'senticon',
   ​locale: 'en',
   ​vote: 'positive'
 ​}
}

If the test phrase is convert from USD to GBP, then the entities are resolved as expected to ccyFrom = USD and ccyTo = GBP as GBP is not present in ccyFrom options.

Expected behavior
Entities should be resolved by the entity names used in the utterances (in this case, matching utterance is 'convert from @ccyFrom to @ccyto')

Desktop (please complete the following information):
​- OS: Windows 11
​- Browser: NA
​- Package version: 4.22.7
​- Node version: 16.13.

@jankoenig
Copy link

We're seeing this issue as well. Here is a discussion about it: https://community.jovo.tech/t/entities-resolved-incorrectly/1767/6?u=jan

@Apollon77
Copy link
Contributor

I think the reason is that entities are parsed just by searching for the words in the whole text. So the "order of the entities in the utterance" is not taken into account right now. Especially because of the support to detect "repeated entities" of the same type in such nunbered entities makes that very difficult.

I think one idea could be to combine an enum with "trim rules" (would be new feature), so allow to specify that e.g. before the enum always e.g. "to" needs to be existence. then the "ccyTo" wopuld have an options list AND a trim definition and only if both matches it would be allowed.

Currently if you define two entitiy types for the same entity name they are processed both and in the end you might end up with two entities found ... this would needed to change, so in fact kind of a breakign change ....

I could look into this because I'm currently working on entity stuff if wanted

@Apollon77
Copy link
Contributor

I prepare a PR for it - also in connection to #1174 ... but most likely PR nreeds to wait until my other 4 PRs are merged ... It starts to overlap code-wise, so else it gets a merge hell.

With the changes from the said PR we end in this matching:

"entities": [
        {
          "start": 13,
          "end": 15,
          "len": 3,
          "levenshtein": 0,
          "accuracy": 1,
          "entity": "ccyFrom",
          "type": "enum",
          "option": "usd",
          "sourceText": "USD",
          "utteranceText": "USD"
        },
        {
          "start": 20,
          "end": 22,
          "len": 3,
          "levenshtein": 0,
          "accuracy": 1,
          "entity": "ccyTo",
          "type": "enum",
          "option": "inr",
          "sourceText": "INR",
          "utteranceText": "INR"
        }
      ],

@Apollon77
Copy link
Contributor

should be adressed by #1221

@rNineTheFirst
Copy link

I think one idea could be to combine an enum with "trim rules" (would be new feature), so allow to specify that e.g. before the enum always e.g. "to" needs to be existence. then the "ccyTo" wopuld have an options list AND a trim definition and only if both matches it would be allowed.

Hi @Apollon77 is this feature already implemented and if so is there sample corpus for this? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants