Welcome to TIGRIS Virtual Lab Project Area
The project "e-ŠNUNNA" exploits Text Mining tools, in particular Clustering, to discover homogeneous groups and hidden relations in the data, on a corpus of letters written in the dialect of EŠNUNNA - a local variant of the Old-Babylonian - employed in the small, homonymous Kingdom of EŠNUNNA, fluorished in central Mesopotamia (Diyala Valley) in XXI-XVIII centuries B.C.
Because the innovative aspects of the project and in order to better validate results and procedures, a small corpus of texts has been studied; as case study a homogeneous, consistent group of fifty-one letters (XVIII century B.C.) found at Tell Harmal / Šaduppum (a relevant town of that Kingdom) has been chosen.
In the Project "e-ŠNUNNA" a corpus-based lexicon was extracted from that collection of 51 tablets, and then tagged with detailed grammatical, morphological, syntactic and semantic tagging.
Texts were studied according the main edition of Albrecht Goetze (1897-1971), whose transliteration into latin alphabet (Sumer XIV, 1958) has been slightly reviewed.
Transliterated text were semplified with the substitution of Unicode character into ASCII character set (see text encoding)
For this case-study an original ENEA Data Mining Algorithm has been employed (with other e-tools), on transliterated e-texts.
For Data Mining purposes - after lemmatization of all graphic forms - TALTAC2 software was able to reconstruct the whole Corpus, generating files, where all fragments were represented by lemmas, following the original sequence of words.
Preliminary results show that document clustering is highly-effective in discovering high quality groups and in highlighting interesting relations among data.