Sanju Basak
About me
I'm Sanju (he/him), a recent graduate from the Department of Computer Science & Engineering (CSE) at Bangladesh University of Engineering and Technology (BUET). Currently, I am an adjunct lecturer at the Department of CSE, BUET, and also gathering experiences as a GenAI software engineer at ResTech AI, a US-based company. My research focuses on Natural Language Processing, Information Retrieval, and Human-AI Interaction, with a special interest in low-resource and multi-lingual language processing. My career goal is to pursue a PhD in these areas, starting in Fall 2025.
My undergraduate thesis project was under Dr. Rifat Shahriyar, CSE, BUET. We created an Effective Retrieval-Augmented Generation pipeline for Open Domain Question Answering in Bengali . Currently, our manuscript (joint first-authored) is ready for submission.
I have worked as a Research Assistant under Dr. Md. Mostofa Akbar in the Department of Computer Science and Engineering (CSE) at BUET. Our project focuses on developing a web-based platform aimed at automating the evaluation of medical exams. This involves utilizing advanced computer vision models and optical character recognition (OCR) technology to accurately extract answers from handwritten or printed exam scripts. The integration of these technologies aims to streamline and enhance the efficiency and accuracy of the medical exam grading process.
Outside of my academic pursuits, I have a deep passion for travel. In my free time, I enjoy watching movies or playing the flute.
Work Experience
-
Adjunct Lecturer
Department of Computer Science and Engineering,
Bangladesh University of Engineering and Technology
August 2024 — PresentCourse Instructor:
CSE314: Operating System Sessional
CSE200: Signals and Linear Systems Sessional
CSE108: Object Oriented Programming Language Sessional
CSE102: Structured Programming Language Sessional
CSE110: Computer Programming Sessional
-
Software Engineer - Generative AI
ResTech AI
July 2024 — Present- Contributing to ezGPT™, a secure and scalable Generative AI platform designed for document intelligence in regulated industries (e.g., government, healthcare, finance).
- Built modular backend services using Python with frameworks like FastAPI, Pydantic, and deployed with Docker.
- Integrated state-of-the-art LLMs such as Ollama, Gemini, and LLaMA 3 to extract suggestions, generate insights, and perform document-based Q&A tasks.
- Used LangChain to orchestrate LLM workflows and developed secure prompt-based pipelines with privacy-aware data handling.
-
Contributed to core platform modules including
knowGPTandpxlGPT, focusing on knowledge retrieval and data preparation workflows.
-
Research Assistant
Bangladesh University of Engineering and Technology
September 2024 — PresentWorking under the supervision of Dr. Md. Mostofa Akbar, CSE, BUET along with Md. Ashraful Islam on a machine learning and software development project, implementing a automated examination script marking app using OCR and vision models.
-
Intern
eSRD-Lab & MySoft Limited
May 2023 — June 2023• One month long Health Data Analytics Virtual Training.
• Learned data pre-processing, multidimensional data modeling, online analytical processing (OLAP)
• Learned how to do aggregation, correlation, association, clustering, prediction on data and data visualization
Research Experience
-
Effective Retrieval-Augmented Generation for Open Domain Question Answering in Bengali(2023 - Present)
Natural Language Processing, Information Retrieval Sanju Basak, Noshin Nawal, Rifat Shahriyar [Manuscript Ready for Submission]
Undergraduate thesis project under Dr. Rifat Shahriyar and Abhik Bhattacharjee (Research Assistant, Department of CSE, BUET). In this work, I co-developed the first-ever Bengali Retrieval-Augmented Generation (RAG) pipeline. This work involved benchmarking two Bengali open-domain question-answering datasets—SQuAD BN and BanglaRQA—using six state-of-the-art embedding models and three retrieval methods, covering both sparse and dense approaches. We evaluated the performance of three large language models (LLMs) with and without the RAG pipeline, analyzing their factual response capabilities on Bengali-region specific data in comparison to global data.
-
Multi-ToM: Evaluating Multilingual Theory of Mind Capabilities in Large Language Models (2024- Present)
Natural Language Processing, Theory of Mind, Cultural Bias Jayanda Shadhu, Ayan Antik Khan, Sanju Basak, Noshin Nawal, Abhik Bhattacharjee, Rifat Shahriyar
We presented a multilingual Theory of Mind (ToM) dataset translated from a bilingual version, encompassing seven major languages, and also developed a culturally nuanced dataset. We evaluated six state-of-the-art large language models (LLMs) on both datasets to assess how cultural relevance impacts their social reasoning abilities. Our analysis focused on examining the variations in LLM performance across different languages and cultural contexts, highlighting the influence of linguistic and cultural diversity on the models' understanding of social reasoning.
-
Reviving Inert Brains: Investigating Signal Processing and Memory Potential in Detached Goat Brains (2021-Present)
Computational Neuroscience Farbin Fayza, Aaiyeesha Mostak, Mashiyat Mahjabin Prapty, Md Toki Tahmid, Md. Asif Haider, Sanju Basak, Md. Mehrab Haque, Anup Bhowmik, K. M. Asifur Rahman, Azizur Rahman Anik, Nafis Karim, A. B. M. Alim Al Islam[Manuscript Ready for Submission] (Best Student Poster @ NSysS '21)
We investigate the potential of a deceased animal brain to process signals. Specifically, we examine the brain’s responses to external stimuli in the form of electrical signals and its ability to act as a memory unit. We also explore the transfer characteristics of the deceased goat brain and elucidate the corresponding function through representative circuits.
NSysS 2021: [Poster Presentation]
Education
-
B.Sc. in Computer Science and Engineering
Bangladesh University of Engineering and Technology
April 2019 - July 2024CGPA: 3.92/4.00
- Ranked 18th in a class of 123 students
Notable Courses:
CSE471- Machine Learning CSE405- Computer Security CSE317- Artificial Intelligence CSE309- Compiler Design CSE321- Computer Networks CSE313- Operating Systems CSE463- Introduction to Bioinformatics CSE409- Computer Graphics CSE305- Computer Architecture CSE411- Simulation and Modeling MATH245- Statistics and Probability MATH247- Linear Algebra -
Higher Secondary School Certificate (HSC)
Notre Dame College, Dhaka
2018GPA: 5.00/5.00
- Board Talent Pool Scholarship
-
Secondary School Certificate (SSC)
Dinajpur Zila School
2016GPA: 5.00/5.00
- Board Talent Pool Scholarship
- Placed 2nd in Dinajpur Board
Technical Skills
-
Programming Languages
Python, C/C++, C#, Java, JavaScript, SQL, HTML/CSS, Bash, LATEX, MySQL, x86 Assembly, Bison/Flex
-
Libraries/Frameworks
PyTorch, Keras, TensorFlow, LangChain, Scikit-learn, LlamaIndex, Node.js, NS2, xv6, Django REST, ReactJS, Wireshark
-
Tools/Platforms:
Docker, Git, Google Colab, kaggle, Visual Studio Code, Linux
Languages
English (Fluent), Bengali (Native)