Sanju Basak
NLP, HCI, IR

Sanju Basak

About me

I'm Sanju (he/him), a recent graduate from the Department of Computer Science & Engineering (CSE) at Bangladesh University of Engineering and Technology (BUET). Currently, I am an adjunct lecturer at the Department of CSE, BUET, and also gathering experiences as a GenAI software engineer at ResTech AI, a US-based company. My research focuses on Natural Language Processing, Information Retrieval, and Human-AI Interaction, with a special interest in low-resource and multi-lingual language processing. My career goal is to pursue a PhD in these areas, starting in Fall 2025.

My undergraduate thesis project was under Dr. Rifat Shahriyar, CSE, BUET. We created an Effective Retrieval-Augmented Generation pipeline for Open Domain Question Answering in Bengali . Currently, our manuscript (joint first-authored) is ready for submission.

I have worked as a Research Assistant under Dr. Md. Mostofa Akbar in the Department of Computer Science and Engineering (CSE) at BUET. Our project focuses on developing a web-based platform aimed at automating the evaluation of medical exams. This involves utilizing advanced computer vision models and optical character recognition (OCR) technology to accurately extract answers from handwritten or printed exam scripts. The integration of these technologies aims to streamline and enhance the efficiency and accuracy of the medical exam grading process.

Outside of my academic pursuits, I have a deep passion for travel. In my free time, I enjoy watching movies or playing the flute.

My Resume

Work Experience

  1. Adjunct Lecturer

    Department of Computer Science and Engineering,

    Bangladesh University of Engineering and Technology

    August 2024 — Present

    Course Instructor:

    CSE314: Operating System Sessional

    CSE200: Signals and Linear Systems Sessional

    CSE108: Object Oriented Programming Language Sessional

    CSE102: Structured Programming Language Sessional

    CSE110: Computer Programming Sessional

  2. Software Engineer - Generative AI

    ResTech AI

    July 2024 — Present
    • Contributing to ezGPT™, a secure and scalable Generative AI platform designed for document intelligence in regulated industries (e.g., government, healthcare, finance).
    • Built modular backend services using Python with frameworks like FastAPI, Pydantic, and deployed with Docker.
    • Integrated state-of-the-art LLMs such as Ollama, Gemini, and LLaMA 3 to extract suggestions, generate insights, and perform document-based Q&A tasks.
    • Used LangChain to orchestrate LLM workflows and developed secure prompt-based pipelines with privacy-aware data handling.
    • Contributed to core platform modules including knowGPT and pxlGPT, focusing on knowledge retrieval and data preparation workflows.
  3. Research Assistant

    Bangladesh University of Engineering and Technology

    September 2024 — Present

    Working under the supervision of Dr. Md. Mostofa Akbar, CSE, BUET along with Md. Ashraful Islam on a machine learning and software development project, implementing a automated examination script marking app using OCR and vision models.

  4. Intern

    eSRD-Lab & MySoft Limited

    May 2023 — June 2023

    • One month long Health Data Analytics Virtual Training.

    • Learned data pre-processing, multidimensional data modeling, online analytical processing (OLAP)

    • Learned how to do aggregation, correlation, association, clustering, prediction on data and data visualization

Research Experience

  1. Effective Retrieval-Augmented Generation for Open Domain Question Answering in Bengali(2023 - Present)

    Natural Language Processing, Information Retrieval Sanju Basak, Noshin Nawal, Rifat Shahriyar

    [Manuscript Ready for Submission]


    Undergraduate thesis project under Dr. Rifat Shahriyar and Abhik Bhattacharjee (Research Assistant, Department of CSE, BUET). In this work, I co-developed the first-ever Bengali Retrieval-Augmented Generation (RAG) pipeline. This work involved benchmarking two Bengali open-domain question-answering datasets—SQuAD BN and BanglaRQA—using six state-of-the-art embedding models and three retrieval methods, covering both sparse and dense approaches. We evaluated the performance of three large language models (LLMs) with and without the RAG pipeline, analyzing their factual response capabilities on Bengali-region specific data in comparison to global data.


    apt
  2. Multi-ToM: Evaluating Multilingual Theory of Mind Capabilities in Large Language Models (2024- Present)

    Natural Language Processing, Theory of Mind, Cultural Bias Jayanda Shadhu, Ayan Antik Khan, Sanju Basak, Noshin Nawal, Abhik Bhattacharjee, Rifat Shahriyar

    [Preprint Available]


    We presented a multilingual Theory of Mind (ToM) dataset translated from a bilingual version, encompassing seven major languages, and also developed a culturally nuanced dataset. We evaluated six state-of-the-art large language models (LLMs) on both datasets to assess how cultural relevance impacts their social reasoning abilities. Our analysis focused on examining the variations in LLM performance across different languages and cultural contexts, highlighting the influence of linguistic and cultural diversity on the models' understanding of social reasoning.

    MultiTom
  3. Reviving Inert Brains: Investigating Signal Processing and Memory Potential in Detached Goat Brains (2021-Present)

    Computational Neuroscience Farbin Fayza, Aaiyeesha Mostak, Mashiyat Mahjabin Prapty, Md Toki Tahmid, Md. Asif Haider, Sanju Basak, Md. Mehrab Haque, Anup Bhowmik, K. M. Asifur Rahman, Azizur Rahman Anik, Nafis Karim, A. B. M. Alim Al Islam

    [Manuscript Ready for Submission] (Best Student Poster @ NSysS '21)


    We investigate the potential of a deceased animal brain to process signals. Specifically, we examine the brain’s responses to external stimuli in the form of electrical signals and its ability to act as a memory unit. We also explore the transfer characteristics of the deceased goat brain and elucidate the corresponding function through representative circuits.


    NSysS 2021: [Poster Presentation]
    EGU

Education

  1. B.Sc. in Computer Science and Engineering

    Bangladesh University of Engineering and Technology

    April 2019 - July 2024

    CGPA: 3.92/4.00

    - Ranked 18th in a class of 123 students


    Notable Courses:

    CSE471- Machine Learning CSE405- Computer Security
    CSE317- Artificial Intelligence CSE309- Compiler Design
    CSE321- Computer Networks CSE313- Operating Systems
    CSE463- Introduction to Bioinformatics CSE409- Computer Graphics
    CSE305- Computer Architecture CSE411- Simulation and Modeling
    MATH245- Statistics and Probability MATH247- Linear Algebra

  2. Higher Secondary School Certificate (HSC)

    Notre Dame College, Dhaka

    2018

    GPA: 5.00/5.00

    - Board Talent Pool Scholarship

  3. Secondary School Certificate (SSC)

    Dinajpur Zila School

    2016

    GPA: 5.00/5.00

    - Board Talent Pool Scholarship

    - Placed 2nd in Dinajpur Board

Technical Skills

  1. Programming Languages

    Python, C/C++, C#, Java, JavaScript, SQL, HTML/CSS, Bash, LATEX, MySQL, x86 Assembly, Bison/Flex

  2. Libraries/Frameworks

    PyTorch, Keras, TensorFlow, LangChain, Scikit-learn, LlamaIndex, Node.js, NS2, xv6, Django REST, ReactJS, Wireshark

  3. Tools/Platforms:

    Docker, Git, Google Colab, kaggle, Visual Studio Code, Linux

  4. Languages

    English (Fluent), Bengali (Native)

Projects

  • Bangla Handwritten Character Recognition

    ppml

    View on Github

    Python , Pytorch, Ultralytics YOLO, EfficientNet, Kaggle

    It was a machine learning project to detect bangla handwritten characters from a form fields images. In this project, we finetuned YOLOv8 with WTW dataset which mainly for box detection. We then finetuned EfficientNet with BanglaLekhaIsolated dataset with 93.5% accuracy.

  • Machine Learning Algorithms and Neural Network from Scratch

    ml

    View on Github

    Python, Numpy, Scikit-learn, Pandas, Seaborn

    Implemented matrice transformation and image reconstruction using singular value decomposition . Implemented adaboost algorithm with logistic regression, exploratory data analysis and preprocessing techniques . Trained and evaluated a feed-forward neural network from scratch using only numpy . Implemented PCA and clustering with expectation-maximization algorithm on gaussian mixture models

  • OCR Based CRVS Form Digitalization

    crvs

    View on Github

    Tesseract, ReactJS, NodeJS, PostgreSQL

    A web platform to process handwritten CRVS (Civil Registration and Vital Statistics) forms through OCR, facilitating accurate extraction, digitalization, and correction of information. Participated in box detection of various form fields and designing the front end.

  • Ray Tracing

    raytracing

    View on Github

    Graphics, OpenGL, C++

    Followed by implementation of the raster based graphics pipeline, implemented ray tracing using OpenGL.

  • SEED Labs: Hands-on Labs for Security Education

    seed

    View on Github

    Python, Docker, Wireshark, Azure Cloud, TigerVNC

    Implemented various cryptographic algorithms such as AES, RSA and conducted various hands on experiments to learn security concepts such as buffer overflow, CSRF, XSS or SQLI attacks, malware and firewalls. Also a project is done where we analyzed all the features of The Hive and created a report.View the Project Report

  • Badhan Data Input API

    badhan

    View on Github

    NodeJS, Javascript, Firebase, Vercel

    Contributed to the development of the BADHAN, BUET Zone app, a voluntary blood donors' organization managed by students at Bangladesh University of Engineering and Technology, Dhaka, Bangladesh. Played a key role in enhancing the BADHAN Data Input API, ensuring rigorous validation and seamless integration of newly added person's data into the database.

  • Xv6 Memory Management

    xv6

    View on Github

    Xv6 Operating System

    Developed the paging framework of xv6 operating system, as well as implementing many other functionalities of the operating system.

  • PeekABook

    peakabook

    View on Github

    NodeJS, PLSQL, HTML, CSS, ExpressJS, Handlebars

    An online book buying platform to connect users with various bookshops for seamless book purchases. Participated in managing the database and creating necessary apis for the project. A database management project.

  • Brick Breaker

    brickbreaker

    View on Github

    C, C++, iGraphics

    An engaging arcade game inspired by DX Ball.

  • Yet Another C Compiler from scratch

    compiler

    View on Github

    Bison, Lex, C, C++

    Implemented a C-language compiler from scratch using Bison and Flex.

  • Object Sorting Machine

    Atmega32

    View on Github

    Microcontroller: Atmega32, C, AVR programming, Sensors, Arduino

    Hardware project for Microcontroller Course. A Smart Object Sorting Machine is designed to detect and categorize objects based on color and weight, pre-configured for 3 to 4 colors, each with two different weight categories, and efficiently place them into designated containers. View on Yuotube

Achievements

Honors & Awards

  1. Dean’s List Scholarship

    2019, 2021

    This scholarship is granted to undergraduate students for their academic excellence.

    Awarded four semesters for the academic excellence.

  2. Student Research Poster Champion

    2021

    Venue: 8th NSysS 2021 Conference

    Our poster titled "Is It Really Dead?" -Digging into Dead Brains through Analyzing Its Behavior in Response to Inducing External Impulses was champion in the poster presentation. [Poster Link]

  3. University Merit List Scholarship

    2019, 2021

    This scholarship is granted to top 10% undergraduate students for their academic excellence.

    Awarded four semesters for the academic excellence.

  4. Dhaka Board Talent Pool Scholarship (HSC)

    2018
  5. Dinajpur Board Talent Pool Scholarship (SSC)

    2016

    Ranked 2nd in the Board.

Leadership Experience

  1. Director (Logistics)

    BUET Cyber Security Club

    April 2023 - June 2024

    Founding Member of the Club

    Organized national-level contests, hackathon and capture-the-flag competitions. [Visit our Website]

  2. Vice President

    Badhan, BUET Zone

    Badhan is a voluntary blood donor organization. I managed and developed an app consisting of 5,000 donors database, coordinated numerous blood donations, and have personally donated 11 times. [Visit Our Website]

    June 2019 - June 2024
  3. President

    Saraswati Puja Organizing Committee, BUET

    December 2019 - June 2024

    Actively participated and managed different tasks affiliated with the Saraswati Puja every year.

    Organized the whole event in 2023. [Visit Our Facebook Page]

  4. Assistant General Secretary

    Satyen Bose Science Club, BUET

    June 2022 - June 2024
  5. Assistant Media & Publication Secretary

    ENVIRONMENT WATCH : BUET

    April 2021 - June 2023
  6. Class Representative, CSE, BUET

    December 2020 - June 2023

    Represented the class in the departmental meetings, organized class events, and communicated with the faculty on behalf of the class.

  7. Student Tutor

    December 2018 - Present

    Tutored many students ranging from middle school, high school, college and undergrad-level students during my undergrad years.