26,99 €
inkl. MwSt.
Versandkostenfrei*
Versandfertig in 6-10 Tagen
payback
13 °P sammeln
  • Broschiertes Buch

This book presents various algorithms to compute semantic similarities between english texts. I explored three different algorithms for computing English sentence similarity. The first algorithm, which is well-explored in the literature [Salton and Buckley, 1988, Wu and Salton, 1981], weights words in each sentence according to term frequency and inverse document frequency (tf-idf ) and uses no semantic information. The second algorithm uses measures of the semantic distance between words belonging to the same part of speech. The third algorithm combines the tf-idf scores and the semantic…mehr

Produktbeschreibung
This book presents various algorithms to compute semantic similarities between english texts. I explored three different algorithms for computing English sentence similarity. The first algorithm, which is well-explored in the literature [Salton and Buckley, 1988, Wu and Salton, 1981], weights words in each sentence according to term frequency and inverse document frequency (tf-idf ) and uses no semantic information. The second algorithm uses measures of the semantic distance between words belonging to the same part of speech. The third algorithm combines the tf-idf scores and the semantic distance scores between words. I evaluated the performance of the second and third algorithms on two data sets: O'Shea's set of sentence pairs with human similarity judgements [Li et al., Aug, Rubenstein and Goodenough, 1965], and Microsoft Research's sentence-level paraphrase dataset [Rus et al., 2012]. On O'Shea's data set, the third algorithm more accurately matches human judgments than the second. On the Microsoft data set, there was not a significant difference between the two algorithms
Autorenporträt
Anis Zaman is a Software Engineer living in New York, USA. His works in the Computer Science research space spans from Robotics, Computer Vision to Natural Language Processing. He is also obsessed about the tech industry and the technologies that enables one to write stable, robust applications.