PhD Candidate Research Assistant

About Me

LLM reasoning, grounding, robustness, and evaluation using multilingual and cultural variation as a scientific lens.

I am currently a PhD student and research assistant at Saarland University supervised by Prof. Dr.-Ing. Philipp Slusallek and Prof. Dr. Dietrich Klakow. With over four years of experience in the machine learning industry, I have worked in diverse applied research roles, including Applied Research Data Scientist at Iquartic and Applied ML Developer at SingularityNet/iCog-Labs.

I hold an MSc in Mathematical Sciences – Machine Intelligence from AIMS-AMMI and a Bachelor of Science in Software Engineering from Addis Ababa Institute of Technology. I am an active member of the Masakhane, ETHIO NLP and AI-Grid community.

My research focuses on effective domain adaptation and evaluation methods for large language models (LLMs) in low-resource human and data languages. The overarching goal of my work is to advance the development of AI systems that are accurate, interpretable, robust, and culturally grounded, while remaining human-centered and contextually aware. I investigates how LLMs can better understand, reason, and communicate across linguistic, cultural, and structural boundaries, bridging the gap between human meaning and machine representation. Ultimately, my work aims to create socially aware and context-sensitive AI systems that perform reliably across diverse languages and data modalities.

Education

Universität des Saarlandes, Saarbrücken, Germany

PhD candidate

September 2022 – December 2026 [Expected]

Thesis: Understanding, Confusion, and Consequence of LLMs in Low Resource Languages.
Supervisor: Prof. Dr.-Ing. Philipp Slusallek and Prof. Dr. Dietrich Klakow.

African Institute for Mathematical Sciences, Accra, Ghana

Master of Science, Mathematical Sciences – Machine Intelligence

October 2019 – March 2021

Thesis: Exploring Data Imbalance and Modality Bias in Hateful Memes.
Supervisor: Prof. Dr. Marcus Rohrbach.

Addis Ababa University, Addis Ababa, Ethiopia

Bachelor of Science, Software Engineering

September 2012 – December 2017

Project: Amharic online handwriting recognition keyboard.
Supervisor: Dr. Natnael Argaw Wondimu
Recognition: Certification Of Recognition For Outstanding Graduation Project.

Experience

University of South Florida, Tampa, Florida

PhD Internship

July 2025 – Present

Socio-Cultural Large Language Models and Low Resource Languages.
Guest Lecture: University of South Florida.

Universität des Saarlandes, Saarbrücken, Germany

Research Assistant | HPC cluster support

September 2022 – Present

Conducting research in NLP, Multimodal learning, and AI/ML.
Managing HPC clusters and multi-GPU model training.
Providing HPC-related training and technical support.

iQuartic, New Wave, Maryland, USA

Applied Research Data Scientist

March 2021 – October 2022

Developed AI solutions for automated risk adjustment, transitioning pipelines to in-house solutions.
Built end-to-end pipelines for medical OCR, spell checking, and ICD code extraction.
Managed data collection, model building, deployment, and testing phases.

SingularityNet, Icog-Labs, Addis Ababa, Ethiopia

Applied ML Developer

June 2017 – August 2022

Developed CNN, GAN, and VAE models for fake news and plant disease detection (97–98% accuracy).
Implemented web recommendation systems using Doc2Vec and DeepCTR.
Built large-scale image retrieval systems and conducted visual graph analytics.

Icog-Labs Software Consultancy, Addis Ababa, Ethiopia

Internship - Junior Programmer

Jan 2015 – May 2017

Developed (YANETU), an AI-powered teaching assistant for children.
Built product components using Unity3D, Blender, and OpenCog NLP.

Publications

AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models

April 2026

Tadesse Destaw Belay, Shahriar Kabir Nahin, Israel Abebe Azime, Ocean Monjur, Marek Rei, Chris Biemann, Shamsuddeen Hassan Muhammad, Seid Muhie Yimam, Anshuman Chhabra

How can language learning systems be developed for languages that lack sufficient training resources? This challenge is increasingly faced by developers across the African continent who aim to build AI systems capable of understanding and responding in local languages. To address this gap, we introduce AFRILANGDICT, a collection of 194.7K African language-English dictionary entries designed as seed resources for generating language-learning materials, enabling us to automatically construct large-scale, diverse, and verifiable student-tutor question-answer interactions suitable for training ...

Abstract View PDF
AfrIFact: Cultural Information Retrieval, Evidence Extraction and Fact Checking for African Languages

April 2026

Israel Abebe Azime, Jesujoba Oluwadara Alabi, Crystina Zhang, Iffat Maab, Atnafu Lambebo Tonja, Tadesse Destaw Belay, Folasade Peace Alabi, Salomey Osei, Saminu Mohammad Aliyu, Nkechinyere Faith Aguobi, Bontu Fufa Balcha, Blessing Kudzaishe Sibanda, Davis David, Mouhamadane Mboup, Daud Abolade, Neo Putini, Philipp Slusallek, David Ifeoluwa Adelani, Dietrich Klakow

Assessing the veracity of a claim made online is a complex and important task with real-world implications. When these claims are directed at communities with limited access to information and the content concerns issues such as healthcare and culture, the consequences intensify, especially in low-resource languages. In this work, we introduce AfrIFact, a dataset that covers the necessary steps for automatic fact-checking (i.e., information retrieval, evidence extraction, and fact checking), in ten African languages and English. Our evaluation results show that even the best embedding model...

Abstract View PDF
Ethio-ASR: Joint Multilingual Speech Recognition and Language Identification for Ethiopian Languages

March 2026

Badr M. Abdullah, Israel Abebe Azime, Atnafu Lambebo Tonja, Jesujoba O. Alabi, Abel Mulat Alemu, Eyob G. Hagos, Bontu Fufa Balcha, Mulubrhan A. Nerea, Debela Desalegn Yadeta, Dagnachew Mekonnen Marilign, Amanuel Temesgen Fentahun, Tadesse Kebede, Israel D. Gebru, Michael Melese Woldeyohannis, Walelign Tewabe Sewunetie, Bernd Möbius, Dietrich Klakow

We present Ethio-ASR, a suite of multilingual CTC-based automatic speech recognition (ASR) models jointly trained on five Ethiopian languages: Amharic, Tigrinya, Oromo, Sidaama, and Wolaytta. These languages belong to the Semitic, Cushitic, and Omotic branches of the Afroasiatic family, and remain severely underrepresented in speech technology despite being spoken by the vast majority of Ethiopia’s population. We train our models on the recently released WAXAL corpus using several pre-trained speech encoders and evaluate against strong multilingual baselines, including OmniASR. Our best mod...

Abstract View PDF
AmharicStoryQA: A Multicultural Story Question Answering Benchmark in Amharic

February 2026

Israel Abebe Azime, Abenezer Kebede Angamo, Hana Mekonen Tamiru, Dagnachew Mekonnen Marilign, Philipp Slusallek, Seid Muhie Yimam, Dietrich Klakow

With the growing emphasis on multilingual and cultural evaluation benchmarks for large language models, language and culture are often treated as synonymous, and performance is commonly used as a proxy for a models understanding of a given language. In this work, we argue that such evaluations overlook meaningful cultural variation that exists within a single language. We address this gap by focusing on narratives from different regions of Ethiopia and demonstrate that, despite shared linguistic characteristics, region-specific and domain-specific content substantially influences language e...

Abstract View PDF

View All Publications

Projects

Fetching repositories from GitHub...

View All Projects

Israel A. Azime