VIDiom-PT

VIDiom-PT is a European Portuguese corpus annotated for verbal idioms, designed to support NLP applications in idiom processing. The resulting corpus comprises 5,178 annotated instances covering 747 distinct verbal idioms. The annotation process was validated through an inter-annotator agreement ...

Resource Type:Corpus
Media Type:Text
Language:Portuguese
News-QTLeap WSD/NED corpus

The texts are sentences from the News parallel corpus. The texts contain monolingual sentences from parallel corpora for the following pairs: Basque-English, Bulgarian-English, Czech-English, Portuguese-English and Spanish-English. The English corpus is comprised by the English side of the Spanis...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian
UIMA/U-Compare OpenNLP Tokenizer

This is a UIMA wrapper for the OpenNLP Tokenizer tool. It splits English sentences into individual tokens. The tool forms part of the in-built library of components provided with the U-Compare platform (see separate META-SHARE record) for building and evaluating text mining workflows. The U-Comp...

Resource Type:Tool / Service
Language:English
QTLeap LRT-M31-WP4

Treebanks and semantic lexicons for Basque, Bulgarian, Dutch, German and Portuguese. Created within European project QTLeap.

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Dutch; Flemish
German
Termcat Social Webs

Terms of Social Webs

Resource Type:Lexical / Conceptual
Media Type:Text
Languages:Catalan; Valencian
English
French
Galician
Italian
Portuguese
Spanish; Castilian
Europarl-QTLeap WSD/NED corpus

The texts are sentences from the Europarl parallel corpus (Koehn, 2005). The textscontain the monolingual sentences from parallel corpora for the following pairs: Bulgarian-English, Czech-English, Portuguese-English and Spanish- English. The English corpus is comprised by the English side of th...

Resource Type:Corpus
Media Type:Text
Languages:Basque
Bulgarian
Czech
English
Portuguese
Spanish; Castilian
Albertina PT-PT base

Albertina PT-PT base is a foundation, large language model for European Portuguese from Portugal. It is an encoder of the BERT family, based on the neural architecture Transformer and developed over the DeBERTa model, with most competitive performance for this language. It is distributed free ...

Resource Type:Language Description
Media Type:Text
Language:Portuguese
Spanish to English Machine translation module

Technical Description: http://qtleap.eu/wp-content/uploads/2015/05/Pilot1_technical_description.pdf http://qtleap.eu/wp-content/uploads/2015/05/TechnicalDescriptionPilot2_D2.7.pdf http://qtleap.eu/wp-content/uploads/2016/11/TechnicalDescriptionPilot3_D2.10.pdf

Resource Type:Tool / Service
Languages:English
Spanish; Castilian
English to Spanish Machine translation module

Technical Description: http://qtleap.eu/wp-content/uploads/2015/05/Pilot1_technical_description.pdf http://qtleap.eu/wp-content/uploads/2015/05/TechnicalDescriptionPilot2_D2.7.pdf http://qtleap.eu/wp-content/uploads/2016/11/TechnicalDescriptionPilot3_D2.10.pdf

Resource Type:Tool / Service
Languages:English
Spanish; Castilian
GistSumm

GistSumm (GIST SUMMarizer) is a summarization tool for Portuguese. It uses the gist as a guideline to identify and select text segments to include in the final extract. Automatically produced extracts have been evaluated under the light of gist preservation and textuality.

Resource Type:Tool / Service
Languages:English
Portuguese

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)