International Journal on Web Service Computing (IJWSC)
ISSN: 0976 - 9811 (Online); 2230 - 7702 (Print)
Webpage URL: https://airccse.org/journal/jwsc/ijwsc.html
Learning Cross-lingual Word Embeddings with Universal Concepts
Pezhman Sheinidashtegol and Aibek Musaev, University of Alabama, USA
Abstract
Recent advances in generating monolingual word embeddings based on word co-occurrence for universal languages inspired new efforts to extend the model to support diversified languages. State-of-the-art methods for learning cross-lingual word embeddings rely on the alignment of monolingual word embedding spaces. Our goal is to implement a word co-occurrence across languages with the universal concepts’ method. Such concepts are notions that are fundamental to humankind and are thus persistent across languages, e.g., a man or woman, war or peace, etc. Given bilingual lexicons, we built universal concepts as undirected graphs of connected nodes and then replaced the words belonging to the same graph with a unique graph ID. This intuitive design makes use of universal concepts in monolingual corpora which will help generate meaningful word embeddings across languages via the word cooccurrence concept. Standardized benchmarks demonstrate how this underutilized approach competes SOTA on bilingual word sematic similarity and word similarity relatedness tasks.
Keywords
Word Embedding Model, NLP, Word Embedding Evaluation Tasks, Universal Concepts, & Bilingual and Cross-lingual Word Embeddings
Original Source URL: https://aircconline.com/ijwsc/V10N3/10319ijwsc02.pdf
Volume URL: https://airccse.org/journal/jwsc/current2019.html
No comments:
Post a Comment