Technical Description: http://qtleap.eu/wp-content/uploads/2015/05/Pilot1_technical_description.pdf http://qtleap.eu/wp-content/uploads/2015/05/TechnicalDescriptionPilot2_D2.7.pdf http://qtleap.eu/wp-content/uploads/2016/11/TechnicalDescriptionPilot3_D2.10.pdf
RudriCo-TOK is a tokenizer tool that splits contractions. De-contraction rules: 178.
A NER-classifier based on memory-based learning, trained on the CINTIL dataset, a corpus that contains part of the Corpus de Referência do Português Contemporâneo - CRPC (Reference Corpus of Contemporary Portuguese). https://portulanclarin.net/repository/browse/cintil-corpus-internacional-do-por...
Filter by:
Portuguese (193)
English (50)
Spanish; Castilian (30)
German (20)
French (19)
Czech (17)
Italian (17)
Basque (14)
Bulgarian (14)
Dutch; Flemish (10)
Galician (9)
Slovak (8)
Croatian (7)
Polish (7)
Danish (6)
Estonian (6)
Finnish (6)
Hungarian (6)
Irish (6)
Latvian (6)
Lithuanian (6)
Maltese (6)
Romanian (6)
Slovenian (6)
Swedish (6)
Catalan (3)
Chinese (3)
Spanish (3)
Arabic (2)
Latin (2)
Bosnian (1)
Hindi (1)
Icelandic (1)
Russian (1)
Serbian (1)
Swahili (1)
Thai (1)
Turkish (1)
Urdu (1)
Vietnamese (1)
1810-1940 (1)
1970 -2002 (1)
1970-1975 (1)
1970-2000 (1)
1970-2001 (1)
1970-2002 (1)
1971-1977 (1)
1974-2004 (1)
1986 -1987 (1)
1996-1997 (1)
1996-2011 (1)
2001 (1)
2003 (1)
Until 2006 (1)
Written Language (61)
Spoken Language (7)
Social Questions (15)
General (9)
News (8)
Novels (6)
Test Suite (6)
LAW (3)
INDUSTRY (2)
Political (2)
Fiction (1)
Geographic (1)
HEALTH (1)
News articles (1)
SOCIAL QUESTIONS (1)
Science (1)
TRADE (1)
Science (1)
Human Use (12)
Pos Tagging (11)
Linguistic Research (10)
Parsing (9)
Lexicon Access (7)
Lemmatization (6)
Other (4)
Annotation (2)
Summarisation (2)
Text Mining (2)
Semantic Web (1)
Speech Analysis (1)
Web Services (1)
TMX (18)
Text/xml (12)
Plain text (8)
Wav (3)
Application/pdf (2)
Xml (2)
Application/rtf (1)
Application/xml (1)
Audio/wav (1)
Sgml (1)
Text/html (1)
Text/plain (1)