3
3
!!! note
4
4
To run this notebook in JupyterLab, load [ ` examples/ex0_0.ipynb ` ] ( https://github.com/DerwenAI/textgraphs/blob/main/examples/ex0_0.ipynb )
5
5
6
-
6
+
7
7
8
8
# demo: TextGraphs + LLMs to construct a 'lemma graph'
9
9
@@ -37,34 +37,34 @@ import textgraphs
37
37
% watermark
38
38
```
39
39
40
- Last updated: 2024-01-03T13:12:25.022356 -08:00
41
-
40
+ Last updated: 2024-01-09T13:34:52.709527 -08:00
41
+
42
42
Python implementation: CPython
43
43
Python version : 3.10.11
44
- IPython version : 8.18.1
45
-
44
+ IPython version : 8.20.0
45
+
46
46
Compiler : Clang 13.0.0 (clang-1300.0.29.30)
47
47
OS : Darwin
48
48
Release : 21.6.0
49
49
Machine : x86_64
50
50
Processor : i386
51
51
CPU cores : 8
52
52
Architecture: 64bit
53
-
53
+
54
54
55
55
56
56
57
57
``` python
58
58
% watermark -- iversions
59
59
```
60
60
61
+ textgraphs: 0.3.2.dev3+gaea63b7.d20240108
62
+ pyvis : 0.3.2
61
63
sys : 3.10.11 (v3.10.11:7d4cc5aa85, Apr 4 2023, 19:05:19) [Clang 13.0.0 (clang-1300.0.29.30)]
62
- matplotlib: 3.8.2
63
- textgraphs: 0.2.4
64
- spacy : 3.7.2
65
64
pandas : 2.1.4
66
- pyvis : 0.3.2
67
-
65
+ spacy : 3.7.2
66
+ matplotlib: 3.8.2
67
+
68
68
69
69
70
70
## parse a document
@@ -73,7 +73,7 @@ provide the source text
73
73
74
74
75
75
``` python
76
- SRC_TEXT : str = """
76
+ SRC_TEXT : str = """
77
77
Werner Herzog is a remarkable filmmaker and an intellectual originally from Germany, the son of Dietrich Herzog.
78
78
After the war, Werner fled to America to become famous.
79
79
"""
@@ -139,22 +139,22 @@ spacy.displacy.render(
139
139
Werner Herzog
140
140
<span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">PERSON</span>
141
141
</mark >
142
- is a remarkable filmmaker and an intellectual originally from
142
+ is a remarkable filmmaker and an intellectual originally from
143
143
<mark class =" entity " style =" background : #feca74 ; padding : 0.45em 0.6em ; margin : 0 0.25em ; line-height : 1 ; border-radius : 0.35em ;" >
144
144
Germany
145
145
<span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">GPE</span>
146
146
</mark >
147
- , the son of
147
+ , the son of
148
148
<mark class =" entity " style =" background : #aa9cfc ; padding : 0.45em 0.6em ; margin : 0 0.25em ; line-height : 1 ; border-radius : 0.35em ;" >
149
149
Dietrich Herzog
150
150
<span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">PERSON</span>
151
151
</mark >
152
- .<br >After the war,
152
+ .<br >After the war,
153
153
<mark class =" entity " style =" background : #aa9cfc ; padding : 0.45em 0.6em ; margin : 0 0.25em ; line-height : 1 ; border-radius : 0.35em ;" >
154
154
Werner
155
155
<span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">PERSON</span>
156
156
</mark >
157
- fled to
157
+ fled to
158
158
<mark class =" entity " style =" background : #feca74 ; padding : 0.45em 0.6em ; margin : 0 0.25em ; line-height : 1 ; border-radius : 0.35em ;" >
159
159
America
160
160
<span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">GPE</span>
@@ -174,9 +174,9 @@ display(SVG(parse_svg))
174
174
```
175
175
176
176
177
-
177
+
178
178
![ svg] ( ex0_0_files/ex0_0_17_0.svg )
179
-
179
+
180
180
181
181
182
182
## collect graph elements from the parse
@@ -465,15 +465,15 @@ display(wordcloud.to_image())
465
465
```
466
466
467
467
468
-
468
+
469
469
![ png] ( ex0_0_files/ex0_0_37_0.png )
470
-
470
+
471
471
472
472
473
473
## cluster communities in the lemma graph
474
474
475
475
In the tutorial
476
- <a href =" https://towardsdatascience.com/how-to-convert-any-text-into-a-graph-of-concepts-110844f22a1a " target =" _blank " >"How to Convert Any Text Into a Graph of Concepts"</a >,
476
+ <a href =" https://towardsdatascience.com/how-to-convert-any-text-into-a-graph-of-concepts-110844f22a1a " target =" _blank " >"How to Convert Any Text Into a Graph of Concepts"</a >,
477
477
Rahul Nayak uses the
478
478
<a href =" https://en.wikipedia.org/wiki/Girvan%E2%80%93Newman_algorithm " ><em >girvan-newman</em ></a >
479
479
algorithm to split the graph into communities, then clusters on those communities.
@@ -486,9 +486,9 @@ render.draw_communities();
486
486
```
487
487
488
488
489
-
489
+
490
490
![ png] ( ex0_0_files/ex0_0_40_0.png )
491
-
491
+
492
492
493
493
494
494
## graph of relations transform
@@ -554,7 +554,7 @@ profiler.stop()
554
554
555
555
556
556
557
- <pyinstrument.session.Session at 0x1362a3610 >
557
+ <pyinstrument.session.Session at 0x13c7e6770 >
558
558
559
559
560
560
@@ -563,57 +563,61 @@ profiler.stop()
563
563
profiler.print()
564
564
```
565
565
566
-
567
- _ ._ __/__ _ _ _ _ _/_ Recorded: 13:12:25 Samples: 12311
568
- /_//_/// /_\ / //_// / //_'/ // Duration: 64.403 CPU time: 79.383
566
+
567
+ _ ._ __/__ _ _ _ _ _/_ Recorded: 13:34:52 Samples: 10578
568
+ /_//_/// /_\ / //_// / //_'/ // Duration: 55.462 CPU time: 68.980
569
569
/ _/ v4.6.1
570
-
571
- Program: /Users/paco/src/textgraphs/venv/lib/python3.10/site-packages/ipykernel_launcher.py -f /Users/paco/Library/Jupyter/runtime/kernel-ef214b7e-84b2-47a4-b352-a78f796e2343 .json
572
-
573
- 64.402 _UnixSelectorEventLoop._run_once asyncio/base_events.py:1832
574
- └─ 64.397 Handle._run asyncio/events.py:78
570
+
571
+ Program: /Users/paco/src/textgraphs/venv/lib/python3.10/site-packages/ipykernel_launcher.py -f /Users/paco/Library/Jupyter/runtime/kernel-21c48172-c498-4e47-889b-254035b61b7d .json
572
+
573
+ 55.462 _UnixSelectorEventLoop._run_once asyncio/base_events.py:1832
574
+ └─ 55.461 Handle._run asyncio/events.py:78
575
575
[12 frames hidden] asyncio, ipykernel, IPython
576
- 45.188 ZMQInteractiveShell.run_ast_nodes IPython/core/interactiveshell.py:3391
577
- ├─ 24.298 <module> ../ipykernel_67071 /1708547378.py:1
578
- │ ├─ 16.518 InferRel_Rebel.__init__ textgraphs/rel.py:103
579
- │ │ └─ 16.412 pipeline transformers/pipelines/__init__.py:531
576
+ 39.675 ZMQInteractiveShell.run_ast_nodes IPython/core/interactiveshell.py:3394
577
+ ├─ 19.565 <module> ../ipykernel_45146 /1708547378.py:1
578
+ │ ├─ 14.605 InferRel_Rebel.__init__ textgraphs/rel.py:121
579
+ │ │ └─ 14.335 pipeline transformers/pipelines/__init__.py:531
580
580
│ │ [39 frames hidden] transformers, torch, <built-in>, json
581
- │ ├─ 4.979 PipelineFactory.__init__ textgraphs/pipe.py:319
582
- │ │ └─ 4.958 load spacy/__init__.py:27
583
- │ │ [24 frames hidden] spacy, en_core_web_sm, catalogue, imp...
584
- │ ├─ 1.833 TextGraphs.create_pipeline textgraphs/doc.py:87
585
- │ │ └─ 1.833 PipelineFactory.create_pipeline textgraphs/pipe.py:379
586
- │ │ └─ 1.833 Pipeline.__init__ textgraphs/pipe.py:152
587
- │ │ └─ 1.833 English.__call__ spacy/language.py:1016
588
- │ │ [15 frames hidden] spacy, spacy_dbpedia_spotlight, reque...
589
- │ └─ 0.966 InferRel_OpenNRE.__init__ textgraphs/rel.py:33
590
- │ └─ 0.959 get_model opennre/pretrain.py:126
591
- └─ 18.954 <module> ../ipykernel_67071/1245857438.py:1
592
- └─ 18.953 TextGraphs.perform_entity_linking textgraphs/doc.py:327
593
- └─ 18.953 KGWikiMedia.perform_entity_linking textgraphs/kg.py:246
594
- ├─ 9.114 KGWikiMedia._link_spotlight_entities textgraphs/kg.py:699
595
- │ └─ 9.109 KGWikiMedia.dbpedia_search_entity textgraphs/kg.py:519
596
- │ └─ 9.065 get requests/api.py:62
597
- │ [29 frames hidden] requests, urllib3, http, socket, ssl,...
598
- ├─ 9.072 KGWikiMedia._link_kg_search_entities textgraphs/kg.py:771
599
- │ └─ 9.070 KGWikiMedia.dbpedia_search_entity textgraphs/kg.py:519
600
- │ └─ 8.990 get requests/api.py:62
601
- │ [27 frames hidden] requests, urllib3, http, socket, ssl,...
602
- └─ 0.767 KGWikiMedia._secondary_entity_linking textgraphs/kg.py:871
603
- └─ 0.767 KGWikiMedia.wikidata_search textgraphs/kg.py:465
604
- └─ 0.766 KGWikiMedia._wikidata_endpoint textgraphs/kg.py:364
605
- └─ 0.765 get requests/api.py:62
606
- [7 frames hidden] requests, urllib3
607
- 18.389 InferRel_Rebel.gen_triples_async textgraphs/pipe.py:133
608
- ├─ 17.360 InferRel_Rebel.gen_triples textgraphs/rel.py:223
609
- │ ├─ 16.276 InferRel_Rebel.tokenize_sent textgraphs/rel.py:121
610
- │ │ └─ 16.267 TranslationPipeline.__call__ transformers/pipelines/text2text_generation.py:341
581
+ │ ├─ 3.545 PipelineFactory.__init__ textgraphs/pipe.py:430
582
+ │ │ └─ 3.527 load spacy/__init__.py:27
583
+ │ │ [22 frames hidden] spacy, en_core_web_sm, catalogue, imp...
584
+ │ ├─ 0.760 TextGraphs.create_pipeline textgraphs/doc.py:90
585
+ │ │ └─ 0.760 PipelineFactory.create_pipeline textgraphs/pipe.py:504
586
+ │ │ └─ 0.760 Pipeline.__init__ textgraphs/pipe.py:212
587
+ │ │ └─ 0.760 English.__call__ spacy/language.py:1016
588
+ │ │ [11 frames hidden] spacy, spacy_dbpedia_spotlight, reque...
589
+ │ └─ 0.653 InferRel_OpenNRE.__init__ textgraphs/rel.py:33
590
+ │ └─ 0.647 get_model opennre/pretrain.py:126
591
+ ├─ 18.260 <module> ../ipykernel_45146/1245857438.py:1
592
+ │ └─ 18.260 TextGraphs.perform_entity_linking textgraphs/doc.py:445
593
+ │ └─ 18.260 KGWikiMedia.perform_entity_linking textgraphs/kg.py:288
594
+ │ ├─ 8.876 KGWikiMedia._link_kg_search_entities textgraphs/kg.py:914
595
+ │ │ └─ 8.875 KGWikiMedia.dbpedia_search_entity textgraphs/kg.py:623
596
+ │ │ └─ 8.808 get requests/api.py:62
597
+ │ │ [37 frames hidden] requests, urllib3, http, socket, ssl,...
598
+ │ ├─ 8.664 KGWikiMedia._link_spotlight_entities textgraphs/kg.py:833
599
+ │ │ └─ 8.660 KGWikiMedia.dbpedia_search_entity textgraphs/kg.py:623
600
+ │ │ └─ 8.598 get requests/api.py:62
601
+ │ │ [37 frames hidden] requests, urllib3, http, socket, ssl,...
602
+ │ └─ 0.720 KGWikiMedia._secondary_entity_linking textgraphs/kg.py:1041
603
+ │ └─ 0.720 KGWikiMedia.wikidata_search textgraphs/kg.py:557
604
+ │ └─ 0.717 KGWikiMedia._wikidata_endpoint textgraphs/kg.py:426
605
+ │ └─ 0.717 get requests/api.py:62
606
+ │ [7 frames hidden] requests, urllib3
607
+ └─ 0.563 <module> ../ipykernel_45146/644158021.py:1
608
+ └─ 0.563 IceCreamDebugger.__call__ icecream/icecream.py:204
609
+ 15.131 InferRel_Rebel.gen_triples_async textgraphs/pipe.py:184
610
+ ├─ 14.398 InferRel_Rebel.gen_triples textgraphs/rel.py:259
611
+ │ ├─ 12.981 InferRel_Rebel.tokenize_sent textgraphs/rel.py:145
612
+ │ │ └─ 12.980 TranslationPipeline.__call__ transformers/pipelines/text2text_generation.py:341
611
613
│ │ [44 frames hidden] transformers, torch, <built-in>
612
- │ └─ 1.070 KGWikiMedia.resolve_rel_iri textgraphs/kg.py:301
613
- └─ 1.029 InferRel_OpenNRE.gen_triples textgraphs/rel.py:49
614
- └─ 0.888 KGWikiMedia.resolve_rel_iri textgraphs/kg.py:301
615
-
616
-
614
+ │ └─ 1.416 KGWikiMedia.resolve_rel_iri textgraphs/kg.py:352
615
+ │ └─ 0.914 get_entity_dict_from_api qwikidata/linked_data_interface.py:21
616
+ │ [16 frames hidden] qwikidata, requests, urllib3, http, s...
617
+ └─ 0.733 InferRel_OpenNRE.gen_triples textgraphs/rel.py:58
618
+ └─ 0.672 KGWikiMedia.resolve_rel_iri textgraphs/kg.py:352
619
+
620
+
617
621
618
622
619
623
## outro
0 commit comments