Gerard de Melo
Gerard de Melo's Projects and Resources
Major Resources and Tools
Universal Wordnet (UWN)
One of the largest multilingual knowledge graphs, transforming the well-known
database into a massively multilingual resource covering over 1 million words and several million named entities in a single semantically organized hierarchy. This is based on machine learning along with the
extension based on Wikipedia. Our derivative project
) is being used by
Contributes information about words and other language-related entities to the
Linked Data Web
and Semantic Web, leading to a Web of Data in which the
Spanish National Library
, and others have linked their data to Lexvo.org, and Lexvo.org in turn connects its own data to other valuable resources.
Datasets and resources for sentiment analysis and fine-grained emotion analysis, in part available for multiple languages.
An ontology providing an enormous body of axiomatized world knowledge based on
as well as the
Suggested Upper Merged Ontology (SUMO)
. YAGO was used in IBM's famous Jeopardy!-winning system
Natural Language Processing
A database of etymological and derivational relationships between words in different languages, mined from
Pyramid Evaluation of summary quality using Automated Knowledge extraction — A method for evaluating the quality of a summary (e.g., one written by students) using the Pyramid method, which is known to be significantly more reliable than the ROUGE method when evaluating individual summaries.
Vector embeddings of words and concepts from the biomedical domain. The source code is a part of
Lexical resource providing information about Portuguese nominalizations.
Good, Great, Excellent: Semantic Intensity Information
System that scores the relative intensities of different words.
MASC Word Sense Alignment Visualizations
Non-1-to-1 alignments of word senses from two inventories visualized. See also our
about this project.
Thesauri in many languages, obtained by translating Roget's Thesaurus using task-specific statistical techniques
Typo Correction Data
Large spelling correction training datasets that enable deep learning-powered context-sensitive spelling correction.
Information Extraction and Information Retrieval
Source code for a deep neural Information Retrieval system.
System that visualizes information from across multiple documents using a graph-based user interface to browse relationships.
Knowledge and Data Resources
FrameBase uses frame semantics, a theory of natural language semantics, to represent knowledge about the world in a consistent way. I also developed a new
browsing interface for the FrameNet lexical resource
, which FrameBase relies on.
WebChild / Knowlywood
Large amounts of common-sense knowledge extracted from the Web.
Entity Type Description Generator
Source code for a deep learning-based system that generates natural language descriptions of entities, along with corresponding benchmark datasets (based on
Visualizing and Curating Knowledge Graphs over Time and Space
Video of our temporal knowledge browser (
Online interface to the SPASS-XDB reasoning system, which combines state-of-the-art theorem proving with support for large-scale knowledge sources.
Wikipedia IMDb Mappings
articles and corresponding
Internet Movie Database
File Viewer (for DOS) supporting over 400 different file formats.
Challenge Dataset for video classification.
Dataset providing discourse relationships between images and text.
We have published the source code for a number of different research projects. Follow the link for a list of available code bases.
Conway's Game of Life
Online version of Conway's Game Of Life cellular automaton.
Simple wire game that runs in the browser.
Return to Main Page