IIT Mandi - research intern
Honest deep-dive on a research internship - what I actually built, what the PI built, how the lab worked, and the meta-lessons about research workflows that have shaped my engineering ever since.
The context
I was a third-year undergrad looking for a summer 2024 internship. Industry slots were tight and I had not yet built the resume that would later get me into Binocs. A research internship at IIT Mandi came through a college network connection. May to September 2024, five months, two projects under one PI.
This deep-dive is honest. The work was educational, not breakthrough. Two years later, the specifics are stale enough that I do not claim recent expertise on either project. What stuck is what I want to talk about.
How the lab worked
The lab structure shaped everything I learned. The PI took on 6-8 undergrad interns per summer, paired each with a PhD student, and ran a weekly lab meeting where two interns presented their progress. The PhD student was your day-to-day mentor. The PI showed up for the big-picture pivots.
The cadence -
- Mondays - 1:1 with the PhD student. Review last week's results, plan this week.
- Tuesdays to Thursdays - heads-down work. Implement experiments, run them on the lab GPU server.
- Fridays - lab meeting. Two interns present, everyone asks hard questions.
- Reading - 3 papers a week, written summary in the lab Slack.
The reading habit was the single biggest take-away. I had read maybe 10 ML papers in college. By the end of the internship I had read 50, and could read an ML paper in 30 to 45 minutes with comprehension. That speed transferred to engineering papers, blog posts, internal design docs - everything.
Project 1 - continuous authentication
The thesis. Traditional authentication is one-shot - you log in once, the device assumes you are still you until you log out. A common attack vector is the unlocked phone you set down for 5 minutes. Continuous authentication adds a second factor that runs in the background - it watches behavioral signals (how you type, how you swipe, how you hold the phone) and degrades trust as those signals diverge from your baseline.
My piece of this. I built the data collection tooling. An Android app that recorded keystroke timings (key press duration, inter-key interval), swipe patterns (start point, end point, duration, pressure, velocity), and accelerometer signals (how the phone moved while you used it). The app shipped to a small group of lab volunteers who used it for a few weeks, generating a labeled dataset (each session tagged with the user's anonymous ID).
Then the modeling. I trained a small classifier in PyTorch that took a 30-second window of behavioral features and predicted which user was active. The architecture was a 1D CNN over the time series followed by a small MLP. Above chance accuracy (around 78 percent on the test set with 8 users), nowhere near production-ready (which would have needed 99 percent and a much larger user population).
What I learned -
- The bottleneck was data, not model. With 8 users and a few hours of data each, no model was going to be amazing.
- Privacy considerations were non-trivial. Every behavioral signal is identifying. The lab's IRB process taught me how to think about user data carefully.
- The signal that mattered most was the inter-key interval distribution, not the absolute timing. Per-user normalization was a real win.
The project did not ship anywhere. It was a learning exercise. The PhD student went on to use the dataset for her thesis.
Project 2 - GNN molecular olfaction
The thesis. Predict the perceptual descriptors of a molecule from its structure. A molecule's smell is a high-dimensional perceptual property (fruity, citrus, woody, etc.) determined by its 3D shape and chemistry. A graph neural network can take the molecular graph (atoms as nodes, bonds as edges) and learn a representation that predicts smell descriptors.
This was around the same time Google published their olfaction GNN paper. The lab was reproducing parts of that work and extending it to a different odor dataset.
My piece -
- The dataset pipeline. Public olfaction datasets come as SMILES strings (a compact text representation of molecules) with associated descriptor labels. I used RDKit to parse SMILES to molecular graphs, extracted node features (atom type, charge, hybridization, aromaticity) and edge features (bond type, ring membership). Output was a PyTorch Geometric dataset ready to feed a model.
- The model. A message-passing GNN with 3 layers, hidden dim 128, mean-pooling over nodes for the graph readout, MLP head for the multi-label descriptor prediction. Standard architecture for the time.
- Training and evaluation. AdamW optimizer, BCE loss for the multi-label setup, ROC-AUC per descriptor as the eval metric.
The result. My model reproduced the published baseline accuracy on the dataset. It did not extend beyond that. We tried a few variations (different message-passing schemes, larger hidden dims, attention pooling) without a clear win. By the end of the summer, the PhD student took the pipeline forward and I rotated off.
What I learned -
- Reproducing a published baseline is harder than people think. The paper omits details. The dataset has quirks. The training is unstable. Getting to baseline taught me more than trying to beat baseline ever would have.
- Hyperparameter sweeps are humbling. I ran 200 sweeps over the summer. The best result was barely better than my first reasonable guess.
- The math of message passing finally clicked for me. Before this I had read GNN tutorials and not really understood them. After building one, I understood them.
The meta-lessons that stuck
Two years later, the specific code is stale. The meta-lessons are not.
Reproduce before you extend
The PI's advice in my second week. "Do not chase state of the art on day one. Reproduce a known result first, that tells you the pipeline is right. Then push." I used this exact pattern at Binocs when I rebuilt the deck pipeline - first I reproduced the existing pipeline's outputs on the eval set with my new code, then I optimized. Without the reproduce step you do not know if an improvement is real or a bug.
Read 3 papers a week
The lab norm of weekly paper reading is the single highest-leverage habit I picked up. I still read papers and engineering blog posts weekly. The compounding is real - the more you read, the faster you read, the more connections you see between things.
Write up what you read
The lab Slack required a written summary of every paper you read. I hated it at first. By the end of the summer I realized writing about a paper is what forces you to understand it. Now I write a paragraph on every notable thing I read, even if no one will read my writing. The writing is for me.
Friday presentations
Presenting your work to a critical audience every other Friday teaches you to defend your choices. Why did you pick that architecture? Why that loss function? Why those hyperparameters? "Because the tutorial did" was not an acceptable answer. By the third presentation, I started anticipating the questions during the work and adjusting in real time.
Engineering quality matters in research too
I came in thinking research code did not have to be clean. The PhD student disabused me of that. Sloppy code in research means you cannot reproduce your own results, you cannot extend your own pipeline, and you cannot hand off to a teammate. The research code I wrote that summer used type hints, had tests for the data pipeline, and had a Makefile that re-ran experiments deterministically. That habit carried into my engineering work.
How I talk about this in interviews
Honestly. The script is roughly -
"I was a research intern at IIT Mandi in summer 2024 on two projects. The first was continuous authentication via behavioral biometrics - I built the Android data collection app and trained a small classifier that beat chance but was not production-ready. The second was a GNN for molecular olfaction - I built the dataset pipeline and reproduced the published baseline. The technical specifics are about a year and a half old at this point so I would not claim recent expertise. The meta-lessons that stuck were the discipline of reading 3 papers a week, the habit of reproducing before extending, and the practice of defending your choices to a critical audience. Those have shaped how I work as an engineer ever since."
Interviewers respect honesty about scope. They do not respect inflation. The PhD student did the real intellectual lifting on both projects. I did the engineering and got an education in return. That is a good summer.
Why I left research
The pace was the deciding factor. I am an engineer who likes shipping things people use. Research moves at a cadence (a paper a year, a thesis in five) that does not match my temperament. The decision was not a judgment on research - the PhD students I worked with were brilliant and patient. It was a judgment on me. I wanted to go ship.
I joined ViewR in January 2025. Then Binocs in July 2025. The rest is the other 8 topics in this section.
Learn more
- DocsPyTorch docsPyTorch
- Docs
- DocsRDKit docsRDKit
- Article
- ArticleLin Clark articleslin-clark.com