Lexical Representation of Dense Numerical Vectors: Introducing LangVec

Simeon Emanuilov, Aleksandar Dimov
Sofia University “St. Kliment Ohridski” (Bulgaria)

https://doi.org/10.53656/math2024-3-1-lex

Abstract. High-dimensional numerical vectors are widely used in machine learning for searching and indexing data. However, it is often difficult for users to interpret their meaning. To address this, we introduce a novel approach that transforms dense vectors into human-readable lexical representations using a percentile-based mapping approach. The essence of the approach is a mapping of words from a predefined/custom lexicon to vectors based on their relative local magnitudes. This way, it enables intuitive visualization of the semantic similarities and differences between complex data points and allows for domain-specific interpretability. It provides an easy way to deduplicate dense vectors (even near-duplicates) and can generate locality-aware hash-like representations, which can be used for efficient indexing and retrieval in various applications. The approach has also been implemented in an open-source library called LangVec. The paper provides examples on LangVec usage and highlights the key applications, including semantic search, recommendation systems, and clustering of numerical data into a human-readable format.
Keywords: interpretable machine learning, vector representations, lexical mapping, semantic similarity, clustering, recommendation systems

Lexical Representation of Dense Numerical Vectors: Introducing LangVec

Последвайте ни в социалните мрежи

Служебното правителство отчете свършеното

Distances between Remarkable Points and Inequalities in a Convex Quadrilateral

Distances between Remarkable Points and Inequalities in a Convex Quadrilateral

Development of Digital Competencies in the Compulsory Education in Information Technology in Secondary School

Using AI to Improve Answer Evaluation in Automated Exams

Последни публикации

Полезни линкове

Az-buki Weekly

Scientific Journals

Newsletter

Welcome Back!

Create New Account!

Retrieve your password

Lexical Representation of Dense Numerical Vectors: Introducing LangVec

Свързани статии:

Последвайте ни в социалните мрежи

Служебното правителство отчете свършеното

Distances between Remarkable Points and Inequalities in a Convex Quadrilateral

Distances between Remarkable Points and Inequalities in a Convex Quadrilateral

Development of Digital Competencies in the Compulsory Education in Information Technology in Secondary School

Using AI to Improve Answer Evaluation in Automated Exams

Последни публикации

Полезни линкове

Az-buki Weekly

Scientific Journals

Newsletter

Welcome Back!

Create New Account!

Retrieve your password