SimPPL

This isn't a success story. But it is a story of learning how to build an AI auditing product. When we joined the SimPPL program to build an innovative product as suggested by our college curriculum, we had one idea - detecting and mitigating biases from NLP models in Indian languages. We thought it was important and necessary. None of us had any knowledge at all about ML or NLP, but we were motivated, so we jumped in.

The Research Phase: Drowning in Papers

We had joined the program a bit later than the other teams, so Swapneel gave us a bit of a leeway in that area and gave us plenty of time and free space to research and choose any topic we would firstly enjoy and secondly, be able to commit to it in the long run.

We started by reading research papers, lots and lots of research papers. We were reading about everything. That was fun. We did not have a product, but we thought, okay, let's just do research and the product will shortly follow. Because, how difficult could it just be?

We were so mistaken. It was very difficult.

In July, we did some coding, just implementing what we read in research papers. After a few months, slowly, we realised something - we had hit saturation. We had read everything we could read, and whatever new we were trying to read, we couldn't understand. We were tired, we were overwhelmed.

We weren't moving forward. We wanted to move forward, but we just didn't know what to do next. The road was winding, the signs unclear. Our curiosity without direction eventually had eventually hit a wall.

The Pivot: Finding AI Auditing

This program was about sustainable ways to build responsible AI technologies. We wanted to build a product, something tangible that people could use, but had no idea what to do.

In late August, we were considering changing our project entirely, to something completely unrelated to bias. We did care about the topic, but could not identify a product idea within the field.

Afterwards, we came across some products that showed us another way to deal with our confusion. We looked into AI auditing tools, products that help developers assess their models for fairness and safety. Making something like that felt right to us. It would use stuff we had learned and be something we cared about. It aligned. We decided to do this in early September.

Then we started all the phases that the other teams had already been through.

Market Discovery: Finding Our Users

First we looked at the existing products and got a vague idea of what to do. We then started working on our market segmentation. Then we decided on a market for our product - ML developers. The entire journey of us reaching up to this market decision was a bit funny. First, we envisioned our entire market to be tailored towards startups or smaller businesses wanting to use AI bots in their products, and how we could help them out by providing some semblance of bias detection in their presumed models. But, that plan ultimately failed when we realized how that was not only an incredibly niche market to target but also a bit redundant, considering users whose main concern doesn't deal with use of their ML models every waking day would not care as much about a bias-free environment as an ML developer would. And, viola! There was our epiphany. We needed to consult and make our market ML developers.

From all the papers we had read, and the existing products we had reviewed, we had a good idea of what the most commonly used bias detection metrics were. We knew our beachhead market. So we moved on to the next phase - conducting user interviews.

User Interviews: Learning the Hard Way

We started reaching out to ML developers on LinkedIn. No one replied. This crushed our spirits so bad because all this time we thought, "Our idea is so cool! Our product is even cooler!"

Looking back, this should have been our first warning flag.

"We will manage somehow" is not a plan. We are not Elle Woods. Execution is imperative.

We started getting ourselves up our heads, trusting our idea to do the nitty-gritty work for us and making our product up to be something it would probably never be because we could never stop worshipping the thought behind it and start with implementing the actual result needed. But not so fast, it wasn't just our doe-eyed idealization. A major part that played in us getting almost zero responses was because honestly, our messages were terribly constructed.

After talking to Swapneel about this, we quickly rewrote all of it according to what he advised us. Instead of representing ourselves as a student project, we portrayed ourselves as a responsible innovation idea that was supported by Mozilla via SimPPL.

Language matters. How you portray yourself matters. "Student project" sounds unserious. "product" sounds serious.

Slowly, replies started trickling in. We set up video calls. We self some of our lost motivation return.

Learning to Interview: Finding Direction

We knew we weren't supposed to tell them our idea upfront because that could lead to biased answers as people try to be more encouraging and less direct. Swapneel made it a point to hammer that idea in, so we went in knowing what we had to do–simply listen and gather information on the issues that were faced by our potential customers. Instead of steering the conversation, we let it flow. We weren't there to pitch, we were there to learn. We were listening carefully for any mention of bias or fairness that they raised of their own volition. Because if they didn't, it probably meant that the issue wasn't really top of mind for them. And if that was the case, either our hypothesis was wrong, or we were talking to the wrong people.

They kept asking us to tell them about our project. The entire experience was so unfamiliar that if we were to use an analogy then nothing will describe it better than the unsettling realization kicking in like you being the first character who starts off the horror movie. The whole interaction was so crazy and we were all inexperienced, just going where our plot took us.

We took a step back. We learned we could not use the word 'project' in order to avoid such situations, and also because it led people to not take us seriously. Alas, the conversations were not leading to us gaining meaningful insights.

Thus, we went to Swapneel again for help. He directed us to understand the deeper objective of all our questions and really understand why we were asking what we were asking. And how the answers would even help us.

If you don't know what you are trying to learn from the interview, you won't learn anything. This was such a core learning point. Before, we dived into conducting the interviews head-on with no clear direction.

Now, we improvised. We reworked all of our questions. We stopped trying to validate our idea, and tried to focus on getting into the mindspace of our immediate users. We tried to understand their workflow, their priorities, their pain points.

Building Fairify: Implementation and Reality

After wandering through the fog of uncertainty, chasing shadows of meaning in endless scrolls of research and meandering interviews, just as weariness threatened to take us, it came - the first glimmer of light on the horizon. Our noble calling for something far bigger than us. Our purpose on this green earth. Some developers brought up fairness audits on their own.

From the interviews, we realised that creating datasets for testing was one of the hardest and most time-consuming parts for them. A lot of developers wanted to test their models for bias, but didn't have domain-relevant datasets or didn't know how to build good test cases. That stuck with us.

We dove back into research - this time with direction. We weren't aimlessly exploring anymore. We were looking for tools, techniques, and metrics. And we knew what kind of product we wanted to build.

After reading more papers and reviewing what was feasible, we settled on two core metrics to include in Fairify - Counterfactual Sentence Testing (CST) and Sentence Encoder Association Test (SEAT).

We decided our USP would be custom dataset creation for their use case - something we picked up directly from the interviews.

We'd seen papers using LLMs to create test cases. It didn't look too hard. We thought we could implement custom dataset creation too.

By the end of December, we had implemented both CST and SEAT in Fairify. It was working. We could run tests, and get results, just like we wanted.

Then we took a break to deal with our college commitments in the form of internships and exams.

You can't build a product just because you've read enough papers.

The Reality Check: When Things Fall Apart

When we came back, things started to wobble. We still hadn't figured out custom dataset creation. We tried, but it wasn't working the way we imagined. We were stuck.

And just having CST and SEAT wasn't enough. There were already tools out there that implemented them, and better. What were we offering that was new?

We did not know what else to do to make Fairify succeed. So we did more interviews. To find out what else we could do, and also to look for a co-development partner.

We got feedback on our work. People told us they liked what we were building. That it was interesting.

But, politeness is not validation. "Interesting" is not commitment. We realized that pretty late.

But we still weren't getting clarity on what to do next. No one was committing to co-develop the product with us. We didn't even know what set our tool apart from everything else out there. Other teams had found collaborators. We hadn't even found a direction.

Still, we didn't give up. The feedback made us think we were doing something right.

But generally speaking… people avoid discomfort. They prefer to give good news over hard truths. They delay saying no when they're not really interested.

This is why, at least in the initial phase, you do not mention your idea, but wait for the user to bring it up. If it is important to the user, they will bring it up on their own. Otherwise, it is not important and thus not needed.

The Decision to Stop: Learning When to Quit

Soon, we gave up.

We stopped pushing for new interviews. We stopped making changes to the product. We stopped checking in every day. The momentum was gone, and we knew it.

We gave up when we realised we were holding on to a product we didn't fully believe in anymore, just because we'd spent too much time on it. We gave up when it became clear that no one was truly excited to build it with us. We gave up when the honest path forward was to stop.

And that's something we wished we had realized earlier: It's okay to quit. What's not okay is ignoring the signs.

Fairify might see the light of day sometime again when we reflect on our whole journey from the start to the end like one re-watches their favourite movie. We hope we find those moments everywhere, the clues that clearly foreshadow the final ending, but only make sense on a rewatch.

Reflection

The biggest thing we learned which will most likely stick with us for the rest of our lifetime will be how to reflect before reacting. This comprises the entire bag of asking questions, both to ourselves and others. Perhaps when we find ourselves wanting everything, it is because we are dangerously close to wanting nothing. Understanding this has made us realize how crucial it is to focus on what truly matters, to stop chasing every idea, and to listen closely to the feedback that guides us toward what we genuinely need.

We would like to extend our huge big thank you to everyone we interviewed, even if they were only giving us their time to be kind. The biggest thank you to the SimPPL team and Swapneel and Dhara for building such a space where learning was celebrated and ideas don't get shut down even if they are out of the box, but achieve their purpose. That is what we experienced, and we are so grateful for it.

Fairify Team