Starting the new year with reflections on Dalgo’s direction

Jan 2025

The first sprint of the year 2025 and this time we were camped out of Mahabalipuram (58 KMs from Chennai), a city of rock temples and monuments filled with ancient history. With its shore touching the Bay of Bengal, the town is blessed with an evening breeze that helps you calm down and reflect on the activities of a long day. This sprint week was also full of reflections, meaningful conversations with NGOs, and internal discussions about potential future features of Dalgo.

We had around 8-9 NGOs (~12 participants in total) attending the sprint for a period of 2 days. And the Dalgo team had a couple of goals/aims with them going into the sprint

Facilitate/co-facilitate sessions on “Leveraging AI for non-profits” and “Practical challenges on data” (Aditya from Goalkeep took the lead on this session) to help NGOs
Discuss our roadmap to the NGOs and get feedback from the NGOs
Prepare a prototype for the chat with data feature and get feedback from the NGOs
Get some quick wins during the work time

In addition to the above, I had some goals of my own

Listen & understand how the NGOs are currently using Dalgo
LIsten & understand what other problems are they facing in their org that Dalgo could potentially help them with
Get a rough blueprint in my mind of Dalgo’s direction for the next 2 quarters based on internal discussions and conversations with NGOs

NGO sessions/workshops

Leveraging AI with non-profits (Ref ppt )

Pratiksha and Sanjeev did a great job curating the presentation & facilitating the session. The session started with some high level explanation of topics like token limits, context windows and chat poisoning in the context of chatbots, to help understand NGOs some limitations of using LLM models. It then moved on to show a comparative analysis of most popular LLM providers (ChatGPT, Perplexity, Claude AI, Meta AI) out there and what could be their best use cases.

Prompt engineering was up next. Due to internet issues, facilitators couldn’t conduct the exercise for prompt engineering and thus the session felt a bit monotonous. However, the session was more of a refresher since many NGOs had been using ChatGPT in their day to day work. Pratiksha then took the participants through some of the Gen AI tools (you can find all the references/names of the tools in the ppt attached above)

Sanjeev concluded the session talking about the risks of using AI and how to approach a problem with a potential AI solution. The picture below sums it up very nicely

Practical challenges on data

This session was one of my highlights from the sprint. Aditya (from Goalkeep) did a great job facilitating this. I found the framework of 5 Whys, that he introduced to drill down a problem statement and figure out its root cause, very interesting and something new that I learned from the session. The idea behind this is, many times the problem doesn’t exactly seem to be what it is and you have a different view of it when you try to dig deeper.

Each NGO had a Tech4dev team facilitating or asking the “whys” to drill down on the problem statement. I was sitting with Vinod with Akshata (Product/Data engineer) & Rohit (MnE) from Antarang Foundation. One of the problem statements we discussed, started off with a data quality issue where an MnE team member had to do more ad hoc analysis to figure out why the numbers seemed the way they were. After drilling down, we understood that the issue was more of data trust and access. The numbers were off due to pipeline failures/reset on Dalgo. A simple solution to start with was to give the MnE team (Rohit) access to Dalgo dashboard so they have more visibility over the data pipeline of the org and can access the elementary dashboard. A more thorough approach we discussed was to build a data validation/quality dashboard that will help the MnE team to trust it more. We agreed that the MnE team should be involved more in this exercise.

Following the “5 Whys” exercise, the participants had a choice to go to any of the three round table discussions to share their problem(s) & learn from the other NGO participants. Each table had Tech4dev team member(s) facilitating the conversations.

Data Quality & Pipeline Reliance
Operationalize data to enable decision making
Communicate our story effectively using data

All the 3 topics were very well thought out and apt based on what we heard the previous day from conversation with participants and the work we have been doing with them.

Chat with data feature demo

Our current data analysis feature lets users ask one off questions on their data. It has two inherent limitations

Users need to write a sql filter which sets the context for the question
The answer given is a one time summary in the sense that one cannot ask any follow up questions

The prototype we developed takes in a user query in natural language and does the following to reach to an answer

Convert the text to sql
Run the sql to show results and then ship them off to LLM provider (Open AI)
Setup a file search assistant on data uploaded in step 2 and ask the query

We wanted to validate our idea of chat with data and how useful it would be for the NGOs. We ran it against sample/synthetic data of the donor, projects and details of the fundings the donor has made against various projects. The session was very engaging, participants were excited & asked a lot of questions. Some of them were (participants didn’t know what columns/tables were there under the hood)

I want to know more about the projects donors are funding – Our bot was able to answer this correctly, it gave a list of 5-6 projects with donor details, project details and the amount of funding
Describe your data to me in English – The bot responded with a schema of the columns and their meaning
What’s the correlation between project length and budget ? – The bot first responded with a python code that computes the correlation coefficient, we had to ask it 2-3 more questions to force it to compute and spit the value of correlation coefficient
I have a program that helps children with their homework, how likely is it for the donors to fund us and why ? – It answered it very nicely in a generic manner.

Summarizing the overall feedback based on the comments/feedback from the NGOs during the demo

This feature could be used for exploratory data analysis and ad hoc analysis/reporting by MnE teams.
How do the users trust the analysis done by LLMs (bot) ? For example, the correlation coefficient
Is it possible for the chat feature to spit out visualization/charts when prompted with a question ?
How do we handle the limitations on context windows ? If a question requires sending a large amount of data to the LLMs, it might exceed or exhaust the context limit.
1. To this point, I questioned a few participants to come up with queries/questions in their use case that would require sending millions and millions of rows to LLM. In most cases, the users are interested in looking at a slice of data (eg, give me aggregated “metric” by year or by age or by state etc.)
How can we export the analysis that the bot has presented in a format (maybe excel or csv) that is easily shareable to various stakeholders ?
How can we hide the PII information to use this feature safely

During the next two days of the sprint, I sat with folks from Antarang and Janaagraha to run the “chat with data prototype” on their own datasets. It worked quite well, it was able to answer 75-80% of their queries. An additional observation was that folks weren’t interested in doing a lot of back & forth (i.e. chat) rather they asked very specific questions – got the answer to it in the first go and moved on.

Dalgo roadmap

Based on the conversations during the sprint and the consulting work we have doing with our NGOs, the broad areas in the roadmap on which we will focus or work to build Dalgo features look like

User interface for Data transformations
Integrating the chat functionality
Tools for presentation and storytelling
Operational reviews with qualitative insights
Mobile access for non-power users

One thing that our Dalgo team agrees on is that Dalgo needs to be more NGO centric and its features need to move in this direction. Catering to the ecosystem will set us apart from other tools out there in the market who might be equal to us in terms of tech.

On the closing day of the sprint, we also identified people from various organizations who agreed to collaborate in building various features listed above. I am very excited to work closely with them in the next few months

Highlights from the week

The weather in Mahabalipuram was beautiful and evening beach walks with everyone were so soothing. We would just go to the beach and sit there for sometime
Playing basketball in the morning with the Avni team and folks from Tech4dev was so much fun. I didn’t expect the hotel to have a basketball court but we lucked out.
I had some amazing seafood, mutton and biryani. Food just keeps getting better and better every time I explore a new city/town in South India. There are a whole lot of places I haven’t visited in South India and hopefully I get a chance to visit them this year.

I had an interesting conversation with Vinod and Sanjeev on our way to get some filter coffee at Ananda Bhavan (2km walk from the hotel).
1. We discussed/debated if “chat” in “chat with data” is really the way to go, NGOs just want an answer to their question and shouldn’t have to go back & forth for it. In most cases, they do know the question they are asking & the metric they are looking for.
2. We also talked about if traditional ML like prediction, forecasting etc. would be useful to the NGOs. It might in some cases like maybe for Anandita (MnE) from Sneha who currently uses Strata to do her statistical analysis.
3. Could we just use an agentic framework to generate python code for the question they are asking and then run it to get the answer. This way we would be able to “trust” the answer and then convince/market it to the users in that way.

I sat with the Antarang team to help with their bigquery cost issue. They have been seeing spikes and wanted to understand where it came from. We were able to figure out the root cause of the spikes which were some of the queries from the old pipeline which are still running. Bigquery’s information schema has a jobs table (INFORMATION_SCHEMA.JOBS_BY_PROJECT) that tracks each & every query along with the amount of data processed (which can be used to compute cost).
Another quick win was with the Sneha team where we figured out their missing cases in Dalgo. Some cases were visible in their commcare instance but not in Dalgo.The case didn’t have a form attached/filled for it and commcare’s list api doesn’t fetch such kinds of the case. These cases entered sneha’s commcare via the migration exercise that happened a few months back. And this was done via the backend, so these cases didn;’t have any kind of registration form filled even with the status being open. To have sync in the pipeline, we decided to attach some dummy form to it and fill it programmatically.
I got to know more about Bhumi and their processes from the “Chai Pe Charcha” event. Bhumi has a little over 2,00,000 volunteers to help them in their mission. One of the problems they face managing this staggering number of volunteers is in data collection. Volunteers don’t follow the template when collecting data from the field and hence requires more manual efforts to validate it.

Starting the new year with reflections on Dalgo’s direction

Jan 2025

NGO sessions/workshops

Chat with data feature demo

Highlights from the week

You may also like

How the Dalgo Team Uses AI-Assisted Development Workflows

Lessons From Bhumi: Closing the Data-to-Decision Gap With Dalgo

First Flight, First Sprint: A Week of Code, Cricket, and Chaotic Uno at Tech4Dev

Our Initiatives

Connect with us