Orchestrate all the Things - The Engineer in the Machine: How Neo Is Rewriting What It Means to Build AI. Featuring Gaurav and Saurabh Vij, Neo Co-Founders

Starting point is 00:00:00 A fully autonomous machine learning engineer agent, a benchmark that matters, and a question that cuts deeper than the hype. When a machine does the work, what happens to the learning part? The race to automate software engineering is expanding the new territory. Machine Learning. Neo is a fully autonomous machine learning engineering agent that handles the entire pipeline from problem statement to deployed model. Built by Kaurav and Sharap Vids, it topped the MLEB bench leaderboard and computer.

Starting point is 00:00:30 pressed a six-month production effort into one week. The harder question it raises whether agentic AI accelerates learning or how loaded it out, turns out to be one its founders have thought about carefully. Neh is a fully autonomous machine learning engineering agent. Machine learning is very messy in terms of it requires a lot of effort and grant work research and thought process into solving a business problem. What if there could be a way to employ large language models for such complex tasks which will require a lot of reasoning iterative experimentation exploration

Starting point is 00:01:04 across architectures one you know can we help existing machine learning engineers move really fast second can we help software engineers move into this world quickly because it's a very steep learning curve for them today and they have to they spend a lot of time they undergo a lot of struggle to you can say perform machine learning tasks by the itself. So the core idea was can we can we help them? And like one year back, we started looking into agents and realized that agents are becoming really capable to a point that it might not be able to replace machine learning engineers today, but it can help them move 10 times fast. I hope you will enjoy this. If you like my work on orchestrate all the things, you can subscribe

Starting point is 00:01:52 to my podcast available on all major platforms. My self-published newsletter also syndicated on substack, hackern, medium, NDZone, or follow, gesturate all the things on your social media of choice. Perfect. So yeah, I'm myself Gora. Soorov and I are brothers and together we co-founded a couple of startups. This is our third one, Neo. Neo is a fully autonomous machine learning engineering agent and prior to this, I have worked on computer vision research, background removal and real time stream, something that I used to work on around eight, nine years back.

Starting point is 00:02:31 Over the last 10 years, I have been fully involved in building models, exploring architectures, experimenting with rap pipelines, large language models, fine-tuning them, deploying them at scale. While building these solutions and Sorob and I together, we realized that a lot of the ML engineers face, like I was personally facing this challenge, but later on we realized that everybody in the community is facing the challenge of using like LLMs maybe for their specific work.

Starting point is 00:03:04 These days AI models are being like used, being used for various use cases. But we saw that what if there could be a way to employ large language models for such complex tasks, which will require a lot of reasoning, iterative experimentation, exploration across architectures, Plus, data science and machine learning is very messy in terms of it requires a lot of effort and grant work research and thought process into solving a business problem using machine learning. Having faced that hands on, I saw that we can potentially make this work. While Sorab and I were previously working on Moistair API, it was a platform that provided thousands of developers. three-click fine-tuning, no code fine-tuning in a way, and one-click deployment. So that's basically the whole idea where we realized that a lot of the developers

Starting point is 00:04:04 coming on Monster API and the platform that we had were not able to build models as they wanted to because they didn't know how to pre-process their data sets, how to analyze models, like architectures. So that's where we thought could there be an autonomous way of helping them? Could there be an agent that can build models, experiment with data sets, visualized for them, analyze for them, and that's how Neo came into being. And maybe Sorab can give his quick background as well. Great. Thanks. I think you already went a little bit into the founder story as well, but it's fine. So hi, welcome, welcome, Sarah. And thanks. Thanks for joining us. So Garov just kicked off the conversation by sharing a little bit about himself and

Starting point is 00:04:57 his background and how he got involved into this effort with Neo that you're both working on. So now it's your turn to just tell us a little bit about yourself and then how you started working on that. So my background is in particles for this and high energy particle. physics and nucleus science. I worked at CERN where they found the Higgs boson. It was its largest physics lab in the world, where they smashed trillions of protons together

Starting point is 00:05:34 to find what they call that the god particle, Higgs field and Higgs boson. That gave me lots of exposure to what you can say advanced machine learning because they were using those models, neural networks at that time. And then I switched to entrepreneurship, couple of startups and finally I started working with Gora of my brother and five years back we started QBlocks. It was, I think God of briefly shared a bit about that.

Starting point is 00:06:02 So it was inspired by the idea of distributed computing that had seen across different labs where anyone can donate their computing power to scientific endeavors. And so the idea was can we do that for AI? And we did that for a while, some thousands of developers, and then moved up the stack, you know, when we realized deeper problems, you know, with AI. Like everyone can't really, you know, fine-tune models or build our pipelines easily. At least two years back, it was not that trivial. Now, you know, building these things are becoming trivial. So the idea was, you know, can we help?

Starting point is 00:06:44 One, you know, can we help existing machine learning engineers move really fast? Second, can we help software engineers move into this world quickly because it's a very steep learning curve for them today. And they have to, they spend a lot of time, they undergo a lot of struggle to, you can say, perform machine learning task by themselves. So the core idea was, can we help them? And like one year back, we started looking into agents

Starting point is 00:07:14 and realized that agents are becoming really capable, to a point that yes, actually, It might not be able to replace machine learning engineers today, but it can help them move 10 times faster. It can help. One is speed. If you are machine learning engineer, it will help you perform the same task of data cleaning, feature engineering, experimentation, model training, building, fine-tuning, all of that really, really fast. Second is, I would say something that people don't talk about is a lot of these engineers don't want to do grant work. you hired them you know companies hired them for doing innovative stuff to build new models to think

Starting point is 00:07:54 at the system level but mostly they're stuck with cleaning their data or you know engineering features things like that so what if there's a co-engineer that can work by their side and take over you know grant work from their plate so that they can free up and focus on more on innovation so that was a core premise you know behind neo and yeah Okay, thank you. And it's a very interesting idea, I would say, probably not entirely new because of the fact that it's a sibling, I would call it, to the idea of automating fully or partially software engineering in general, which has been going on for a while. But it's definitely innovative in the sense that, at least to the best of my knowledge, I don't know of any other platform. that has attempted to automate the end-to-end pipeline in machine learning. There were some efforts with things like AutoML, for example, but I don't think any of those would at least claim to cover the entire pipeline.

Starting point is 00:09:04 And to share a little background of how I came to know about what you are doing, there are some people that have been very enthusiastically sharing their experience with your platform and I think like well you know this is a this is a force multiplier basically because if we're able to do what you just described so do all of these grant work basically in a way that that makes things faster than we are free to do more creative things and so on and also it sort of democratizes let's say access to people and the claim to fame let's say that or the proof that these people saw for the accomplishments that they have managed through your platform is some benchmarks basically so and most prominently the MLE benchmark.

Starting point is 00:09:58 So what I'd like to ask you is well first if you can just share a few words about the benchmark itself for people who may not be necessarily familiar with it and then your your engagement with the benchmark so why did you choose it and how do you submit and then we can move on to a follow-up question I have with okay so how does that translate to real world results let's see cool I think I'll just probably share what it is and then Godolph can do a deep dive so Emily bench was created by Open AI and the so core purpose for this benchmark is to evaluate different agents for machine learning engineering and And so what they do is they have carved out 75 complex Caggle competitions and you can run your agent among them.

Starting point is 00:10:50 There are three runs. So technically your agents are going through these competitions around 225 times. So one run is 75 competitions. You have to do, you create a score and then you do it three times. And then you do an average of that score. So and there are different, you can say, modularities to it. One is light, then medium, then there is high in terms of complexity.

Starting point is 00:11:15 So when Neo got 35% score, it was an average of this while Neo dramatically improved on the high side. Like it was able to solve some of the most complex problems. And in, yeah, and then just a bit, you know, data about that is that today in the entire world, you have around 300 machine learning engineers out of that around 5,000 or 5,000 or 6,000 are Kaggle masters and only 500 are Kaggle grandmasters so the way we are designing near you can see the holy grail for this company is to be able to reach the level of a Kaggle grandmaster because if you go out there today to hire a Kaggle grandmaster one it's only 500 in the entire world second they don't want to work for a lot of companies they are very selective they want to work for SpaceX NASA

Starting point is 00:12:08 Uber, like Apple, these kind of companies. So, but what if a startup, you know, like three guys, four guys, just started out. They need a cattle grandmaster for their core, you know, research. They can't hire them today. So with NEEO, we truly want to democratize this kind of intelligence that you can actually hire a cattle grandmaster for a fraction of the cost. And the benchmark for that is again, would be Emily, because it helps in validating that is neocate is cognitively superior and it can solve complex problems now let go

Starting point is 00:12:41 you know go really deep so the like as sort of as already mentioned that emily bench consists of 75 Kaggle competitions now these are divided into different different domain you can say types of ML problems there could be text classification image classification set to set tabular data based problems so overall Across these 75 competitions, Neo was able to achieve a medal, brown, silver, or gold, as per Kaggle's bench, like standard, in 34.2% of these competitions, fully autonomously, without any human intervention. So that required it to reason. Think through the problem statement, explore various architectural options, maybe ensembleing strategies, stacking the models together. There were various opportunities in front of it, but which algorithm or which approach to choose came out came about as an outcome or an insight after doing certain experiments.

Starting point is 00:13:46 So Neo performed experiments, analyzed, reviewed, and based on the learnings from that, proceeded further with further optimizations or enhancements or model architecture explorations. That led it to win a medal podium in 34.2 percent. of the competition. So when Neo launched on MLE bench, it was state of the art and it was very ahead of any other competition in terms of like the projects listed on MLE bench at that time.

Starting point is 00:14:19 Okay, thank you. So yeah, follow up questions to that. Let's see, let's start with, I think the most fundamental question. So in order to be able to submit solution to those tasks. As you also pointed out yourself, you need to perform a series of subtasks actually, like, you know, finding the dataset, cleaning the data set, and then creating fine tuning and so on and so forth. And since you're taking an agentic approach, the thing with this is that there is the so-called compounding error. So if at any point, at any step of

Starting point is 00:14:59 the process, there is an error, it compounds as you move along the, the pipeline let's say so and this is also like a well-known vulnerability in the in using lLMs as well so i wonder how did you manage to to tackle this this issue one of the approaches that uh like that is a novel approach that we have developed a mix of two things one is our context transfer protocol uh enabling neo to stay on the course with multi-agent orchestration These two approaches that we have developed in-house enables near to, you can say, not drift off the problem statement. It is able to analyze, perform experiments, executions, and after that, evaluate its actions. There are various open source architectures available in the market right now,

Starting point is 00:15:54 such as for agentic frameworks like code act, code react. A lot of them are somewhat decent in terms of their approach. But when it comes to machine learning, it can potentially become a cascading set of errors. One after the other, you can lead to failures of experiments resulting in very poor performance of the models because in the first place you didn't pre-process the data properly. There could be missing steps in between. Now, as an ML engineer, I would first, we thought of how we can potentially model this on a real ML engineer in the world. and like what steps would they take ahead of time so that the next steps which are dependent on them won't face failures plus self-evaluation not being dependent on a single model or agent that led to very high performance and you can say less such failures such as hallucinated solutions and cascading. errors.

Starting point is 00:17:01 Okay. And in order to do things such as, you know, clean data sets or develop models and so on, I presume that you must also have granted your core models, let's say, tool use, correct? Yes. It has it has ability to perform tool use. It has ability to search online, fetch relevant sources. fact even go through our archive papers for various tasks so yeah like that I think makes it as a whole agent system okay interesting I didn't realize the

Starting point is 00:17:44 part about being able to to consume let's say papers as well so that practically it's a very experimental it's an experimental feature that we have recently only integrated but yeah like we are on our way to make this a very fully capable ML engineering agent. Yeah, because you know that way your model doesn't just rely on your training however elaborate it may be, but it can also integrate like new knowledge in real time basically. Yeah. Absolutely.

Starting point is 00:18:20 That's pretty, that's a pretty interesting idea and even though it's at an experimental stage, which in a way is precisely what I wanted to ask you about as well. So, you know, all of these sounds pretty, pretty impressive. But I wonder how does that translate to, you know, to results beyond benchmarks, basically. So, you know, benchmarks as fine as they may be, as elaborate as they may be, they may be somewhat representative of the real world, but the real world is always more messy. So I wonder how, yeah, have you been able to replicate this type of result with real use cases?

Starting point is 00:19:02 And if yes, I wonder if there's any of those use cases that you can share. So one of the use cases that comes in my mind is ETA forecasting. That was from one of her you can say early design partners. The idea was can neo without much of human intervention. if the problem statement is correctly provided, if the data set is provided for, let's say, the last 12 months or nine months, can it build high-quality ETA prediction models, which is put a very real-world use case,

Starting point is 00:19:38 requires predicting ETF when a delivery is about to happen from point A to point B. This ETA prediction is potentially currently being used in so many applications, Uber, OLA, like taxi sharing and food delivery apps. There are endless consumer apps today which rely on ETA prediction for best consumer experience and building high quality models to predict accurately when your delivery is going to happen is very critical component. And that requires building high quality models using the data that you have, performing right feature, performing right feature engineering from your data set, evaluating if this, if a particular algorithm is working appropriately or not and performing

Starting point is 00:20:28 comparative analysis across various model architectures so Neo was autonomously able to do that so we in Neo there is a human in loop mode where you can provide feedback and guidance and you can disable that also allowing Neo to autonomously handle the problem statement so with least amount of human involvement or guidance Neo was able to develop models beyond 80, 85% accuracy on such real world data, which could basically it was evaluated on unseen data and it was able to deliver this accuracy of models. And when further hyperparameter optimization or tuning could be performed, it may, these

Starting point is 00:21:14 models may perform even better. And Neo can integrate into the pipeline of that MED engineer's workflow. Okay. Just to just to add on this use case, so this was a design partner and their shipment tracking company. So they are like imagine Uber, not Uber. What should I say? I think the analogy wouldn't be right. But they're there like Google Maps for shipments. So they track billions of at least 3.4 billion shipments every single year. Something like that. Now for that, they have. of the ETM model. And so they have these brownfield projects. And they want to, their main, you can say goal is, their machine learning team's main goal is to first maintain the accuracy of the model.

Starting point is 00:22:04 Because with new data, models tend to diverge. So they want to maintain the accuracy. Second, they want to perform experimentation so that they can actually increase the accuracy over time. So acceptable accuracy in this industry is around two hours. Because the shipment times are, let's say, 48 hours to 72 hours. So the acceptable accuracy, or you can see the error ring is two hours.

Starting point is 00:22:26 Now they had built this model with two hours of accuracy, sorry, error in a period of six months with the team. So when they started using NEO, one of the challenge, you know, was because of the bandwidth issues, can they dedicate one member? And that one member was, I won't say he was like an expert ML engineer, he was like a new ML engineer, novice with some experience. So we gave him Neo to try this out on that use case. He was able to beat that accuracy instead of two hours. Now it was 90 minutes and he was able to achieve that in one week. So imagine earlier there was a team that was doing all this job for a period of six months

Starting point is 00:23:06 to achieve an accuracy of two hours. Now one guy using Neo was able to achieve the accuracy of 90 minutes within one week. So that was the transformative experience that we were able to deliver using Neo. So not just you know it helps in maintaining your projects, increasing the accuracy by performing more experimentation, but also the amount of resources that you need. The human resources would dramatically reduce and the time to achieve the same results is like I would say 20 times lower compared to what was you know originally there. Okay, lots of questions but let's start with the human in the loop mode. that you mentioned because this is something that I was also wondering about.

Starting point is 00:23:52 So precisely because there's a number, there are a number of steps in its pipeline. So how does the human in the loop mode work? Does the person, does the human in this loop get to interact with Neo at each step of the process? Or perhaps like one time in the end or is it selective, is it tunable this interaction? Yes, so the workflow has been designed such that when the human in loop mode is enabled, Neo will perform experiments. It will run through its course, but if it faces a major blocker where it may need your guidance after multiple iterations or if it has successfully completed a major phase of the entire pipeline, there it will wait for the next feedback.

Starting point is 00:24:45 that would you like to proceed ahead or would you like me to proceed ahead or is there anything that you would like to add based on the work that i have done till this point so after each major phase successful completion it waits for that feedback if the human in loop mode is disabled it will autonomously try to either resolve its errors or it will if it has completed one major phase it will proceed to the next major phase as part of the continuation that okay this was successful I have successfully pre-processed the dataset, I have performed feature engineering, as per my understanding correctly, I can now proceed ahead with model training or as part of the task or if there is any other aspect that needs to be looked after as part of the problem statement.

Starting point is 00:25:29 So that's how the change happens and as a user I can contribute, I can guide that workflow in between those major phases. And I presume that the interaction is It happens through a text interface, right? So you are presented with some results and then you can verify or reject the results, I guess. Yes. Yes. It's a chat interface through which you can guide the workflow. But I was just wondering precisely on the topic of feature engineering that you mentioned. So how granular can this interaction be?

Starting point is 00:26:05 So for example, you know, let's take an example that you're doing image recognition, classification, whatever. And you know that there are many examples in which you have models that were that seem to perform correctly, but are actually trained on the wrong feature. For example, like, you know, a visual identification in some image that does not represent to what you want to actually recognize, but something else. So how far, how granular, how, how, how, how, how, how, how, how, what is the level of interaction you can have? Can you just say, hey, this feature is wrong. I want to change. And instead I would like to add that. other one. Yes, absolutely. You can go as granular as you want. Like this is like a for example, let's say it's a tabular data set. Neo-engineered the features for let's say. So I can take an example of let's say that there was a chat moderation workflow in which there are abusive

Starting point is 00:27:03 sentences in a dataset. And I want to build a model that that can basically predict whether this text has abusive language or not detect that. Now there is no specific column that represents abusive words. So based on my task description, it was able to autonomously figure out that I need to create, first of all, a feature that represents abusive language from the given list of strings of, you can say, the data set containing just natural language strings.

Starting point is 00:27:37 So that is one aspect of how it autonomously builds features and then trains models on them the other is I can guide the behavior like okay in in this abusive language feature I also want another feature that focuses on let's say hate speech or some other aspect can you combine multiple features together to create a one like a joint feature typically how we do in SQL databases also the ideas is I can go as granular in this aspect on evaluation side I can potentially ask Neo to evaluate the models on F1 score or like precision recall don't try this ensembleing

Starting point is 00:28:20 strategy just go with this approach you have GPU available use that prioritize that both for data processing you can use QDF there are various optimizations that I can also suggest apart from that it autonomously tries to optimize the entire pipeline itself as well so on every aspect I I can give feedback like this approach was wrong in terms of how you evaluated the model. It's not as I wanted it to be because definitely ultimately, so we have a mixture of our fine-tuned models, plus we use the publicly available foundational models, a mixture of them. So our fine-tuned models definitely are greater at some aspects and some in some areas in specifically coding or some reasoning aspects. Some foundational models are great.

Starting point is 00:29:08 But even after that, you would notice that the LLMs do hallucinate sometimes. We need to bring them back on the course. Our system helps in achieving that. But even after that, they may lack certain information about certain frameworks or algorithms. For example, after Laura, there was Q Laura for fine-tuning. Then there was Dora. The approaches keep on coming in. Getting that relevant information from Internet can also.

Starting point is 00:29:38 sometimes be unreliable because the resource itself does not contain proper information. So but then I can interject in between that this is how you should do it. This is the appropriate approach. This is how you would use the framework in this code. I can also provide code snippets if I want to. So just like a collaborative ML engineer, but it is doing the heavy lifting. It is training the models, evaluating them, using the environment in the most optimized manner, pre-processing the data sets for me.

Starting point is 00:30:08 Yeah, I mean, the way you describe it, to me it sounds like you may have been able to tackle better than any other solution that at least I'm familiar with. The core reasoning problem. So how do you combine this different information, but not just information, but how do you combine it in a logical flow, let's say, in a way that does not produce errors and hallucination? and so on and so forth. And you yourself highlighted that as a key feature of how new works. And I know that because you mentioned, I saw that you mentioned somewhere that you're also filing for some patent. So I would not expect you to fully disclose how you do that. But I just wonder if you can share with us what family, let's say, of solutions are you using for that?

Starting point is 00:31:02 So if I had to guess, my guess would be that it's probably not something deterministic, but something that leans more towards the more traditional, let's say, workflow engine sort of solution. I'm not sure if I got your question correctly. Are you asking about our engine or what new users? Yes, yes, yes, I'm trying, I'm speculating about how you could potentially have implemented this reasoning engine of yours, basically. Definitely, I'm not sure if I can go into the depth of the implementation.

Starting point is 00:31:40 I don't expect you to. I was just wondering if you can share like a very, very, you know, high level description. Okay. So is it basically is it is it deterministic or have you found some way to pull together LLMs in a way that does not hallucinate? Yeah. Oh, okay. Yeah, got it. Actually, it's a mix of both. Honestly, like you have to use LLMs for the variance, but you still need determinism. So. when you get best of the both worlds you can create a system that is reliably usable in production grade workflows especially for complex tasks like machine learning so with our architecture we have been able to achieve that in a way that works you can say pretty well for a lot of the machine learning and AI research topics involving how you can basically develop an outcome from an LLM which is highly reliable and can be produced repeatedly.

Starting point is 00:32:45 A lot of the times, Errolms are themselves not able to produce their answers. So for example, this is a term that if you perform ten experiments with the same all of them, you would get eight out of ten different results. Just because even if you keep the same temperature, a lot of the chances are that you won't get the same approach or answer. But while building such systems which should not break, which are ultimately, like you are building a prediction algorithm, you're building a recommendation engine, you're building computer vision models, detecting objects, you need to be, you need to build reliable models.

Starting point is 00:33:27 That requires a mixture of high quality data and algorithm excellence in terms of experimenting with architectures and manipulating how you basically move the data through that pipeline of the model and how you can repeatedly get that accuracy is something that we feel after so much experimentation with nemo that we are pretty ahead of a lot of the solutions which have been trying to use elements for machine learning or AI research i would just you know add like this at the system level uh uh allens are going to hallucinate at the base at the micro level they are going to hallucinate. But at the macro level, you can remove a lot of the errors created by hallucination.

Starting point is 00:34:17 That's what we have worked on. So when we have a multi-agent system and every agent which is powered by a fine-tune model, so when you narrow down the scope, because of fine-tuning, we narrowed down the scope, it reduces hallucination. So that's one area which has helped us in significantly improving this. this second is hallucinations still would happen there is no nothing like zero hallucinations at the at the you can say individual input level but when you when you look at this at the system level you can remove you know those hallucinations because of the multi agent flows of these agents are helping each other and they so imagine an agent is acting as a judge for another agent so one agent creates an output

Starting point is 00:35:04 the judge agent can evaluate that and stop it like at the very source you can say when it starts hallucinating. So I'm really trying to explain without giving you the entire algorithm. But the core idea here is allelms are going to hallucinate. You can reduce that by narrowing down the scope by fine-tuning them. And second, because of the multi-agent architecture, you can reduce. you can say you can stop them in their in their tracks when they start to hallucinate and you can guide them in a in a certain direction so that the it's a mixture of these practices that has ultimately given us lessen hallucinations compared to just let's say if we were let's say if you were just

Starting point is 00:35:50 using clod or if you we were using just a base model to perform these tasks let's say they were hallucinating and let's say the arbitrary the number was 80 percent because of this a multi-agent system, the overall hallucinations are like 11%, 12%. So you can dramatically lower down that by a mixture of these techniques. Okay. And as far as the models that you use are concerned, you mentioned that you actually use a mixture of models, some of which you have fine-tuned yourselves and some of which, I presume, are going to be commercially available models.

Starting point is 00:36:27 For the ones that you have produced yourself, you have total control. you don't have to worry about that in a way. For the models, for the commercial ones, the fact that they constantly evolve and change, and there's sometimes backwards compatibility issues and APIs break and all of these things, how do you deal with that in order to keep your system running? Well, absolutely.

Starting point is 00:36:54 Like, aliens are somewhat unreliable a lot of the times. The API is they return 429 or token context window limits are also one of the biggest you can say limitations while you dealing with such. So like for example, you would have seen in the past also that ML engineering pipelines are pretty, pretty huge in a sense that the data sets are big model experimentation are in numerous quantities. So the conversation grows quite a lot and managing the context is definitely one of the biggest challenges that we have overcome with our architecture.

Starting point is 00:37:36 Apart from that, definitely we have implemented a lot of safeguards and fallback mechanisms to ensure that if one LLM fails then there is a backup LLM to take over. But the ideas is like switching between LLMs is easy, making sure that they can produce similar outcomes is hard and that requires either fine-tuning models providing appropriate context now how to do that like when you have switched the model the capability also changes and that's another thing that we have worked on internally and that enables us to reliably switch between a set of models while ensuring that the quality remains consistent Definitely not the same exactly, but quite similar in terms of producing useful, valuable outcomes, artifacts like models or pre-processed datasets for the users.

Starting point is 00:38:37 Okay, so it sounds like you have invested in what people have come to call by the name context engineering. And I also presume probably in the MCP protocol, is that something that you use? Yes, absolutely. MCP is also used internally. We are actually pretty soon thinking of providing that as a feature also to the users so that they can bring their MCP tools and connect NEO with them directly. For example, it could be a Databricks MCP or any other data providers MCP or like experiment tracking MCP. It could be easily. integrated into Neo through users dashboard. Okay. So previously in the example that you shared, you shared about a model that people have created through Neo that reached something like 80 to 85% accuracy.

Starting point is 00:39:41 So for some use cases that may be good enough and, you know, case closed. In other use cases, you would have to, you would have to actually put on additional work to this model to bring it to 90-90 plus accuracy. So how would that work? Would a human engineer be able to take over after Neo produces that model and I work iteratively on the parts of the pipeline? Absolutely. Like, Neo provides these artifacts that you can also ask Neo to

Starting point is 00:40:15 build an inference pipeline on top of that. Or you can export them to your published pipeline already maybe you keep the artifacts model artifacts in a artifactory such as ML flow for production serving there could be other tools that you use or you might just want to do an S3 dump of all the models and you later on integrate them into your pipeline or experiment further so new can provide the scripts code scripts or the model artifacts the pre-processed datasets evaluation reports comparative and analysis that you can further use to gain insight on why this model failed and why this worked for my specific data.

Starting point is 00:41:01 And what were the issues associated in my data? A lot of the times as a developer, I may not be aware of the issues associated in my data. It could be that the data has bias. The data may have like unstructured information. columns may carry textual information. There are a lot of misinformation present in practical datasets that might need cleanup. So Neo can provide that in a report that can help me further improve my data and potentially and I can bring in new data also in front of Neo and ask it to retrain the model and see where it can go. So and ask it to perform further hyper parameter

Starting point is 00:41:44 tuning also to experiment and if it can potentially improve the accuracy. even further. Okay, so I presume that in that kind of scenario, so Neo produces a model with a certain level of accuracy and then some engineer takes over and improves the model further. Would it be possible to actually close the loop, let's say, and feed the final artifacts into Neo for future improvement? Yeah, absolutely. Through the tool use and MCB, we are developing the pipeline such that Neo can be integrated into your existing pipelines, existing ML infrastructure, so that it can build the models, push them, push them to your repository. And from there on, you can further provide tasks to NEO to optimize or improve. If there is drift in the model with new data, it can improve on that.

Starting point is 00:42:41 It can take your guidance and work accordingly. Like a collaborative ML engineer would regularly optimize and enhance the pipeline, as new data comes in. Okay, cool. So so far the scenarios that we have referred to all sort of assume that the person operating, let's say, Neo, is someone who knows what they're doing, basically.

Starting point is 00:43:04 So I'm a machine learning engineer. And we've talked about different ways of interacting with the system and so on. But what if this is actually not the case? I mean, one of the scenarios that you mentioned is having people use Neo that are software engineers. for example. So these people don't necessarily know, I mean obviously they are familiar with programming and I don't know data structures and whatnot, but they don't necessarily know how machine

Starting point is 00:43:30 learning works and you know the differences of the algorithms and building pipelines and all of those things. So what made me wonder about this particular scenario and the reason that I'm also very, I have a personal interest to say in this is because I have been that person. I mean that I have my back background is in software engineering. So at some point I had, you know, I sort of taught myself AI in a way and at some point I had to manage AI projects and make architectural decisions and so on. And the way that this thing, that I made this thing work is by trial and error, basically. You know, I had to educate myself and then try out different things and see, you know, what works and what breaks and what I should do and so on. So I'm wondering is, what I'm wondering is, okay,

Starting point is 00:44:17 if you get a person with that profile and you get them to use new, what kind of results can you expect first and then what kind of educational outcome can you expect for the person? So maybe you get some result in the end, but will the operator, let's say, get to really understand how machine learning works? By the way, thanks for sharing your personal experience. I would also love to understand how much time it took you for this transition. That's a good question. So I think in total, so I didn't actually start from data science and machine learning.

Starting point is 00:45:00 I started from the other side of the aisle, let's say. So more knowledge representation and reasoning and symbolic AI. I already had like a solid grasp of data structures and algorithms and all of those things. So going, let's say, from having like a very abstract idea of data science to actually being able to, you know, determine algorithms and run pipelines and all of those things, I think it took me like a few months, let's say half a year, roughly. Yeah. So, I mean, I mean, God, do you want to take it or I can share my personal experience? Yeah, please go ahead. I can add on that.

Starting point is 00:45:44 You know, when I was doing physics at these labs, like CERN and then later on, Atomic Energy Commission in France, these days, you'd never do pure physics. Everything is driven by systems and code, code. So I had to code all day, every day. And honestly, at some point, it became like 90% of my work was writing different algorithms, writing code every day. I could just spend 10% of my time on actual physics. So and a lot of, you know, researchers that I've met, you know, in astronomy, in physics and chemistry, and even in biology, they are drifting, you know, away from their core science and have been, you know, moving, spending more time on building algorithms. Because fundamentally, they know that AI can actually ten X their productivity. AI can actually help not just, you know, in just doing low.

Starting point is 00:46:40 level but if they are well worse in building these models they can actually do some advanced research and that's why everyone in every one of these researchers in their fields are now trying to implement machine learning like recently i met someone in astronomy they have massive amounts of data with machine learning models you can actually detect a lot of patterns and hidden signals but his frustration is i'm not a machine learning engineer i did basically you know computer science but I have to spend all my time you know to learn how to set up GPUs how to fine-tune a model how to build pipelines and all of that so that has been a real frustration now because of me know today and it's just been

Starting point is 00:47:26 I would say one month of me practicing with you know today I can build models and like recently I built a movie recommended model all on my own end to end then then I started to you know tinker around and really tweak you know some of the existing models so for example you have protein folding models. It's well established. You can take a model. But what if I could use the principle of least action from physics and integrate that with 3D protein folding?

Starting point is 00:47:53 No one has done that. But now I have the confidence to try this out. And maybe this can lead to better accuracy in protein folding. So these kind of things are something that a lot of researchers, innovators, from different fields, wants to try and they were not able to do that. And with Neo, you can just start, there are two things. I would say one is you can really start small. We give you all the different existing projects. You can start with them, tweak around, ask questions in plain English, and really, you know, scale fast.

Starting point is 00:48:22 Like scale fast in terms of learning really fast because as a human, we can't really catch up with all the progress that is happening all around us right now, especially in the world of AI every single day. There's a new, you can say, hundreds of research papers on optimizing your infrastructure, on fine-tuning methods, on training methods, all of that. And if my core research is in quantum physics, I don't have the time to read all the research papers. So I always dream, you know, if there could be an engineer that could sit by my side and do all of this work, while I can actually focus on pure physics.

Starting point is 00:48:56 Now I think with Neo, we can do that because, since I can, Neo is that co-engineer because now I can actually integrate some of the principles from physics, and I can tweak existing models. And that is like the first step to enter this field. Second, I would say what we have done is we give you two views. One is a macroscopic view through which you can look at all the different systems. And the second is microscopic view.

Starting point is 00:49:21 So you can actually look under the hood. You can look at the interdependencies, artifacts, code, everything. And because of that, you can deeply understand and ask questions to Neo that, okay, what did you actually do when you were, you know, fine-tuning this method? Or let's say I ask Neo to perform 100 experiments with different, you know, fine-tuning methods with different data sets. Now, Neo will perform those experiments, eliminate some of them based on a threshold value that I have provided it. And then it will run the top five eventually and give me an evaluation report. And in that evaluation report, it will do a comparative analysis that, okay, for example, let's say you are building computer vision model for self-driving cars.

Starting point is 00:50:02 you can have R CNN or RT data, let's say, eventually, and created a report for these two. And it gave you the difference that, okay, this one offers lower latency, but higher accuracy. This one is slow, but this one is faster, but it has lower accuracy. Now, as a human, I can decide and I can command Neo, because now I understand. And because of this understanding, I can command near that, okay, you can choose one path and go there and also explain, you know, step by step, what you are actually doing. So just like in one month, and I'm not even a software engineer. I don't code these days, but even then I was able to build all of this like in one month. So the greatest wish I have is that I can provide this superpower to all the researchers in the world

Starting point is 00:50:47 so that they can actually speed up, accelerate their research using NEO. So they don't spend a lot of their time in building or dealing with cloud infrastructure, GPUs, reading research papers on AI or writing code, they should become experts in their field, while tools like Neo should help them. Yeah, I mean, I think the low-level stuff that you also mentioned previously, like, I don't know, fine-tuning CPUs or cloud deployment or that, nobody's going to miss that. I was just wondering more about, you know, the more interesting things like, you know,

Starting point is 00:51:22 fine-tuning algorithms and choosing one over the other and that kind of thing. Like one thing, you know, like you said, from your experience, you know, these days a lot of people say that you don't have to learn, don't learn how to code, but I disagree. And I think you would agree because of your experience. When you learn how to code, it gives you, I would say the skill for your entire life, but you learn how the different systems interact with each other. you learn how to solve these problems at the algorithm level, how different systems work or systems engineering. And I think you can take that to any field. And I think I didn't enjoy coding, but that actually helped me today so that I can understand, you know, how different fine-tuning methods actually work.

Starting point is 00:52:14 What's really happening under the hood. So the, and with Neo, you know, we can question it and give us, you know, details about different methods that it has tried and what is the difference between q laura and laura or dpo or dora or different methods and then i can base on my priority i can ask it to uh choose one method and uh run with that yeah is your sound okay now yeah yeah sorry about that no worries you wanted to add something gota no absolutely i think sort of captured the entire context and I think that's the premise, like the macroscopic view and the microscopic view helps. So like just like as I have seen it, there are analysts, data analyst, business analyst, people also using Neo.

Starting point is 00:53:05 What we have realized is like a lot of the people have data without insights. They don't know really what their data is giving them the information. So in order to get that information, also analysis, experimentation with your data could be very crucial. And it could be as crucial as like, for example, I'm building a fintech trading bot model that can help me predict the price of a stock or a crypto. Even in that scenario, there is so much amount of data which as a human, it becomes very difficult to analyze and assess. that large volumes of data coming from different sources. Now could there be a model? I may have wanted to build a model,

Starting point is 00:53:54 but I'm not very good at ML engineering or I don't know much about the algorithms. So could there be a system that can help me get started, help me in my journey, and maybe as I build with that system, I can also start learning that what algorithms are prioritized, why are they prioritized, how to pre-process data,

Starting point is 00:54:16 datasets. So I think like while building with Neo, everybody learns the ML engineering depths of like how to choose a model, how to evaluate them for various use cases, because Neo as a system is general can be used for various tasks. It's not specific for it will only work with tabular data sets. It can work with image classification. It can work with gen AI models. It can build back pipelines also. So it is breadthwise, it is very capable in a lot of the domains around AIML.

Starting point is 00:54:53 So even with less understanding of such deeper concepts, I can get started with Neo. And slowly and steadily, I can start learning the deeper concepts with the microscopic view in place. Okay, well, thanks, and so we're close to wrapping up. And so far, we have there, we have dives really kind deep into the tech side of things, like how does it work and what you can do with it and so on. So let's wrap up with the business side of things because I'm sure that by now if people are listening,

Starting point is 00:55:26 they must be wondering like, okay, this sounds like a superpower. I want to use it. So how can people use it? And what are the next steps for you as a company? So how do you scale out, basically? So there's a lot of stuff happening, you know, behind the scenes right now. But I would say broadly speaking, two things. One is, you know, we are making Neo better at handling more and more complex tasks.

Starting point is 00:55:57 Second is integrations. We want Neo to integrate across all your systems from your database to different clouds. And these two things are, you know, design keeping in mind that we, want to actually provide an ML engineer eventually that's the long-term vision where we should be able to provide a Carroll Grandmaster level ML engineer so to every researcher developer innovator and company on the planet so from that vision these two things are incredibly important one that it should be able to handle more and more complex tasks so we are improving you know at the at the algorithm level

Starting point is 00:56:37 capabilities of Neo to handle more complex tasks which will be evident with our increasing score on the MLE event in coming days. Second is integrations like Godd have mentioned, MCP, many other tools. And third, I would say one of the things is the ability to read a research paper, write code, and integrate that within your existing brownfield project. And lastly, I would say, NEA is now ready. We have opened up our wait list. You can try it today itself.

Starting point is 00:57:08 Okay. So actually I would argue that most of the things you mentioned, except maybe for the last point, are really improvements or feature additions or whatever you want to call them. I was more thinking in terms of things like growing the company or going to market or I don't know, maybe getting funding or this type of thing. Is this something that you are working on or that you are able to share? Right now the focus is completely, you know, on going deep. with our existing user base. We have thousands of developers right now who are tinkering with Neo. And every single day we get lots of feedback.

Starting point is 00:57:46 So we are working on improving the product. And the goal here is before we hit the market, you know, for you can say more marketing or raising money, we want to make sure we have these engaged users who are using Neo on a day-to-day basis. And once we achieve that goal, we have like certain milestones internally. Once we hit those milestones, we are going to go out and raise money. Okay, well, maybe you have to create another wait list then for this is because I think that you will have many people who are interested in this. Yeah, it is always exciting.

Starting point is 00:58:22 And like we are relieving it in batches because that really helps, you know, in giving a more, you can say more refined product to the next batch instead of like giving the initial version to all the US. Okay, well, thanks. It's been a very, very interesting conversation. and I don't know, maybe if I manage to find some time to actually use the opportunity, I may join your wait list as well. It sounds super interesting. Thanks for sticking around. For more stories like this, check the link in bio and follow link data orchestration.

Orchestrate all the Things - The Engineer in the Machine: How Neo Is Rewriting What It Means to Build AI. Featuring Gaurav and Saurabh Vij, Neo Co-Founders

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.