A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
-
Updated
Nov 29, 2024 - Python
Natural language processing (NLP) is a field of computer science that studies how computers and humans interact. In the 1950s, Alan Turing published an article that proposed a measure of intelligence, now called the Turing test. More modern techniques, such as deep learning, have produced results in the fields of language modeling, parsing, and natural-language tasks.
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
Portuguese pre-trained BERT models
This is a continuously updated handbook for readers to easily track the latest NL2SQL (Text-to-SQL) techniques in the literature and provide practical guidance for researchers and practitioners. If we missed any interesting work, feel free to contact us.
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
A lexicon for Sudachi
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.
A Python module that fetches a page of a word/phrase from the Online Indonesian Dictionary (https://kbbi.kemdikbud.go.id).
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
Assignment solutions for CS224N: Natural Language Processing with Deep Learning - Stanford / Winter 2023
Interface for reading the Paraphrase Database (PPDB)
A collection of natural language processing notebooks.
Roundtrip translation (aka back translation) python package
Debiasing word embeddings
A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference
ThamizhiPOSt - A neural based POS tagger for Tamil
Basic Universal Dependencies Part-of-Speech Tagger for Tibetan
An extensive dataset for latin-written arabic.
The scripts for compiling the Universal Derivations collections of harmonised word-formation resources for multiple langugaes.
Created by Alan Turing