DA-EN Danish Ministry of Higher Education and Science 3 (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts Danish-English from the Danish Ministry o...

Resource Type:Corpus
Media Type:Text
Languages:Danish
English
Dataset of Nuanced Assertions on Controversial Issues (NAoCI dataset)

The Dataset of Nuanced Assertions on Controversial Issues (NAoCI) dataset consists of over 2,000 assertions on sixteen different controversial issues. It has over 100,000 judgments of whether people agree or disagree with the assertions, and of about 70,000 judgments indicating how strongly peopl...

Resource Type:Corpus
Media Type:Text
Language:English
A Repository of State of the Art and Competitive Baseline Summaries for DUC 2004

In the period since 2004, many novel sophisticated approaches for generic multi-document summarization have been developed. Intuitive simple approaches have also been shown to perform unexpectedly well for the task. Yet it is practically impossible to compare the existing approaches directly, bec...

Resource Type:Corpus
Media Type:Text
Language:English
HiEve

A corpus of manually annotated event hierarchies in news stories.

Resource Type:Corpus
Media Type:Text
Language:English
Manually annotated corpora for teaching and learning purposes of Brazilian Portuguese, Dutch, Estonian, and Slovene

These are manually annotated corpora for teaching and learning purposes of Brazilian Portuguese, Dutch, Estonian, and Slovene, as a contribution to the Manually Annotated Corpora Family available in CLARIN. Sentences are annotated with “problematic” or “non-problematic” labels, from the point of ...

Resource Type:Corpus
Media Type:Text
Languages:Brazilian Portuguese
Dutch
Estonian
Slovene
Spanish-English website parallel corpus (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. This is a parallel corpus of bilingual texts crawled fro...

Resource Type:Corpus
Media Type:Text
Languages:English
Spanish; Castilian
Polish-English parallel corpus from the website of the ING Polish Art Foundation (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Polish-English parallel corpus from the website of the I...

Resource Type:Corpus
Media Type:Text
Languages:English
Polish
Corpus of State-related content from the Latvian Web (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Latvian Web, home pages of ministries and state public s...

Resource Type:Corpus
Media Type:Text
Languages:English
Latvian
Laws of Malta (Processed)

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Compilation of bilingual Maltese legislation (Maltese-En...

Resource Type:Corpus
Media Type:Text
Languages:English
Maltese
U-Compare Species Disambiguation Service

Web service created by exporting UIMA-based workflow from the U-Compare text mining system. Functionality: Identifies biological named entities and disambiguates them according to species, by assigning a species ID from the NCBI taxonomy. Also identifies sentences and tokens. Tools in workflow...

Resource Type:Tool / Service
Language:English

Order by:

Filter by:

Text (446)
Audio (18)
Image (1)