Orchestrate all the Things - The Engineer in the Machine: How Neo Is Rewriting What It Means to Build AI. Featuring Gaurav and Saurabh Vij, Neo Co-Founders
Episode Date: April 16, 2026A fully autonomous machine learning engineering agent. A benchmark that matters. And a question that cuts deeper than the hype: when a machine does the work, what happens to the learning part? T...he race to automate software engineering is expanding to new territory: machine learning. Neo is a fully autonomous machine learning engineering agent that handles the entire pipeline from problem statement to deployed model. Built by Gaurav and Saurabh Vij, Neo topped the MLE-Bench leaderboard and compressed a six-month production effort into one week. The harder question Neo raises - whether agentic AI accelerates learning or hollows it out - remains open. Article published on Orchestrate all the Things: https://linkeddataorchestration.com/2026/04/16/autonomous-machine-learning-engineering-agent-neo/ --- The fundamentals still matter — arguably more, not less, as these systems get more capable. Get up to speed with Pragmatic AI Training: From data literacy to data science, governance, and responsible, Pragmatic AI Learn what you need to work alongside tools like this, not just alongside the hype. Custom quotes available. 👉 http://pragmaticai.training
Transcript
Discussion (0)
A fully autonomous machine learning engineer agent, a benchmark that matters, and a question
that cuts deeper than the hype.
When a machine does the work, what happens to the learning part?
The race to automate software engineering is expanding the new territory.
Machine Learning.
Neo is a fully autonomous machine learning engineering agent that handles the entire pipeline
from problem statement to deployed model.
Built by Kaurav and Sharap Vids, it topped the MLEB bench leaderboard and computer.
pressed a six-month production effort into one week.
The harder question it raises whether agentic AI accelerates learning or how loaded it out,
turns out to be one its founders have thought about carefully.
Neh is a fully autonomous machine learning engineering agent.
Machine learning is very messy in terms of it requires a lot of effort and grant work research
and thought process into solving a business problem.
What if there could be a way to employ large language models for
such complex tasks which will require a lot of reasoning iterative experimentation exploration
across architectures one you know can we help existing machine learning engineers move really fast
second can we help software engineers move into this world quickly because it's a very steep
learning curve for them today and they have to they spend a lot of time they undergo a lot of
struggle to you can say perform machine learning tasks by the
itself. So the core idea was can we can we help them? And like one year back, we started looking
into agents and realized that agents are becoming really capable to a point that it might not
be able to replace machine learning engineers today, but it can help them move 10 times fast.
I hope you will enjoy this. If you like my work on orchestrate all the things, you can subscribe
to my podcast available on all major platforms. My self-published newsletter also syndicated on
substack, hackern, medium, NDZone, or follow,
gesturate all the things on your social media of choice.
Perfect. So yeah, I'm myself Gora. Soorov and I are brothers and together we co-founded
a couple of startups. This is our third one, Neo.
Neo is a fully autonomous machine learning engineering agent and prior to this,
I have worked on computer vision research, background removal and real time stream,
something that I used to work on around eight, nine years back.
Over the last 10 years, I have been fully involved in building models,
exploring architectures, experimenting with rap pipelines, large language models,
fine-tuning them, deploying them at scale.
While building these solutions and Sorob and I together,
we realized that a lot of the ML engineers face,
like I was personally facing this challenge,
but later on we realized that everybody in the community is facing the challenge
of using like LLMs maybe for their specific work.
These days AI models are being like used, being used for various use cases.
But we saw that what if there could be a way to employ large language models for such complex tasks,
which will require a lot of reasoning, iterative experimentation, exploration across architectures,
Plus, data science and machine learning is very messy in terms of it requires a lot of effort and grant work research and thought process into solving a business problem using machine learning.
Having faced that hands on, I saw that we can potentially make this work.
While Sorab and I were previously working on Moistair API, it was a platform that provided thousands of developers.
three-click fine-tuning, no code fine-tuning in a way, and one-click deployment.
So that's basically the whole idea where we realized that a lot of the developers
coming on Monster API and the platform that we had were not able to build models as they wanted
to because they didn't know how to pre-process their data sets, how to analyze models,
like architectures. So that's where we thought could there be an autonomous way of helping them?
Could there be an agent that can build models, experiment with data sets, visualized for them,
analyze for them, and that's how Neo came into being. And maybe Sorab can give his quick background as well.
Great. Thanks. I think you already went a little bit into the founder story as well, but it's fine. So hi,
welcome, welcome, Sarah. And thanks. Thanks for
joining us. So Garov just kicked off the conversation by sharing a little bit about himself and
his background and how he got involved into this effort with Neo that you're both working on.
So now it's your turn to just tell us a little bit about yourself and then how you started working
on that.
So my background is in particles for this and high energy particle.
physics and nucleus science.
I worked at CERN where they found the Higgs boson.
It was its largest physics lab in the world,
where they smashed trillions of protons together
to find what they call that the god particle,
Higgs field and Higgs boson.
That gave me lots of exposure to what you can say advanced machine
learning because they were using those models,
neural networks at that time.
And then I switched to entrepreneurship,
couple of startups and finally I started working with Gora of my brother and five years back we started QBlocks.
It was, I think God of briefly shared a bit about that.
So it was inspired by the idea of distributed computing that had seen across different labs
where anyone can donate their computing power to scientific endeavors.
And so the idea was can we do that for AI?
And we did that for a while, some thousands of developers, and then moved up the stack, you know, when we realized deeper problems, you know, with AI.
Like everyone can't really, you know, fine-tune models or build our pipelines easily.
At least two years back, it was not that trivial.
Now, you know, building these things are becoming trivial.
So the idea was, you know, can we help?
One, you know, can we help existing machine learning engineers move really fast?
Second, can we help software engineers move into this world quickly
because it's a very steep learning curve for them today.
And they have to, they spend a lot of time,
they undergo a lot of struggle to, you can say,
perform machine learning task by themselves.
So the core idea was, can we help them?
And like one year back, we started looking into agents
and realized that agents are becoming really capable,
to a point that yes, actually,
It might not be able to replace machine learning engineers today, but it can help them move 10 times faster.
It can help.
One is speed.
If you are machine learning engineer, it will help you perform the same task of data cleaning, feature engineering, experimentation, model training, building, fine-tuning, all of that really, really fast.
Second is, I would say something that people don't talk about is a lot of these engineers don't want to do grant work.
you hired them you know companies hired them for doing innovative stuff to build new models to think
at the system level but mostly they're stuck with cleaning their data or you know engineering features
things like that so what if there's a co-engineer that can work by their side and take over you know
grant work from their plate so that they can free up and focus on more on innovation so that was a core
premise you know behind neo and yeah
Okay, thank you. And it's a very interesting idea, I would say, probably not entirely new because of the fact that it's a sibling, I would call it, to the idea of automating fully or partially software engineering in general, which has been going on for a while. But it's definitely innovative in the sense that, at least to the best of my knowledge, I don't know of any other platform.
that has attempted to automate the end-to-end pipeline in machine learning.
There were some efforts with things like AutoML, for example,
but I don't think any of those would at least claim to cover the entire pipeline.
And to share a little background of how I came to know about what you are doing,
there are some people that have been very enthusiastically sharing their experience
with your platform and I think like well you know this is a this is a force multiplier basically
because if we're able to do what you just described so do all of these grant work basically
in a way that that makes things faster than we are free to do more creative things and so on
and also it sort of democratizes let's say access to people and the claim to fame let's say
that or the proof that these people saw for the accomplishments that they have managed through
your platform is some benchmarks basically so and most prominently the MLE benchmark.
So what I'd like to ask you is well first if you can just share a few words about the benchmark itself
for people who may not be necessarily familiar with it and then your your engagement with the
benchmark so why did you choose it and how do you submit and then we can move on to a follow-up
question I have with okay so how does that translate to real world results let's see
cool I think I'll just probably share what it is and then Godolph can do a deep dive
so Emily bench was created by Open AI and the so core purpose for this benchmark is
to evaluate different agents for machine learning engineering and
And so what they do is they have carved out 75 complex Caggle competitions and you can run your agent among them.
There are three runs.
So technically your agents are going through these competitions around 225 times.
So one run is 75 competitions.
You have to do, you create a score and then you do it three times.
And then you do an average of that score.
So and there are different, you can say,
modularities to it.
One is light, then medium, then there is high in terms of complexity.
So when Neo got 35% score, it was an average of this while Neo dramatically improved on the high side.
Like it was able to solve some of the most complex problems.
And in, yeah, and then just a bit, you know, data about that is that today in the entire world, you have around 300
machine learning engineers out of that around 5,000 or 5,000 or 6,000 are Kaggle masters and
only 500 are Kaggle grandmasters so the way we are designing near you can see the holy
grail for this company is to be able to reach the level of a Kaggle grandmaster because if you go out
there today to hire a Kaggle grandmaster one it's only 500 in the entire world second they don't
want to work for a lot of companies they are very selective they want to work for SpaceX NASA
Uber, like Apple, these kind of companies.
So, but what if a startup, you know, like three guys, four guys, just started out.
They need a cattle grandmaster for their core, you know, research.
They can't hire them today.
So with NEEO, we truly want to democratize this kind of intelligence that you can actually
hire a cattle grandmaster for a fraction of the cost.
And the benchmark for that is again, would be Emily, because it helps in validating that
is neocate is cognitively superior and it can solve complex problems now let go
you know go really deep so the like as sort of as already mentioned that
emily bench consists of 75 Kaggle competitions now these are divided into
different different domain you can say types of ML problems there could be text
classification image classification set to set tabular data based problems so overall
Across these 75 competitions, Neo was able to achieve a medal, brown, silver, or gold, as per Kaggle's bench, like standard, in 34.2% of these competitions, fully autonomously, without any human intervention.
So that required it to reason. Think through the problem statement, explore various architectural options, maybe ensembleing strategies, stacking the models together.
There were various opportunities in front of it, but which algorithm or which approach to choose
came out came about as an outcome or an insight after doing certain experiments.
So Neo performed experiments, analyzed, reviewed, and based on the learnings from that,
proceeded further with further optimizations or enhancements or model architecture explorations.
That led it to win a medal podium in 34.2 percent.
of the competition.
So when Neo launched on MLE bench,
it was state of the art and it was very ahead
of any other competition in terms of like the projects
listed on MLE bench at that time.
Okay, thank you.
So yeah, follow up questions to that.
Let's see, let's start with, I think the most fundamental question.
So in order to be able to submit
solution to those tasks. As you also pointed out yourself, you need to perform a series of
subtasks actually, like, you know, finding the dataset, cleaning the data set, and then creating
fine tuning and so on and so forth. And since you're taking an agentic approach, the thing
with this is that there is the so-called compounding error. So if at any point, at any step of
the process, there is an error, it compounds as you move along the,
the pipeline let's say so and this is also like a well-known vulnerability in the in using
lLMs as well so i wonder how did you manage to to tackle this this issue
one of the approaches that uh like that is a novel approach that we have developed a mix of two things
one is our context transfer protocol uh enabling neo to stay on the course with multi-agent orchestration
These two approaches that we have developed in-house enables near to, you can say, not drift off the problem statement.
It is able to analyze, perform experiments, executions, and after that, evaluate its actions.
There are various open source architectures available in the market right now,
such as for agentic frameworks like code act, code react.
A lot of them are somewhat decent in terms of their approach.
But when it comes to machine learning, it can potentially become a cascading set of errors.
One after the other, you can lead to failures of experiments resulting in very poor performance of the models because in the first place you didn't pre-process the data properly.
There could be missing steps in between.
Now, as an ML engineer, I would first, we thought of how we can potentially model this on a real ML engineer in the world.
and like what steps would they take ahead of time so that the next steps which are dependent on them won't face failures plus self-evaluation not being dependent on a single model or agent that led to very high performance and you can say less such failures such as hallucinated solutions and cascading.
errors.
Okay.
And in order to do things such as, you know, clean data sets or develop models and so on,
I presume that you must also have granted your core models, let's say, tool use, correct?
Yes.
It has it has ability to perform tool use.
It has ability to search online, fetch relevant sources.
fact even go through our archive papers for various tasks so yeah like that I
think makes it as a whole agent system okay interesting I didn't realize the
part about being able to to consume let's say papers as well so that
practically it's a very experimental it's an experimental feature that we have
recently only integrated but yeah like we are on our way to make this
a very fully capable ML engineering agent.
Yeah, because you know that way your model doesn't just rely on your training however elaborate it may be,
but it can also integrate like new knowledge in real time basically.
Yeah.
Absolutely.
That's pretty, that's a pretty interesting idea and even though it's at an experimental stage,
which in a way is precisely what I wanted to ask you about as well.
So, you know, all of these sounds pretty, pretty impressive.
But I wonder how does that translate to, you know, to results beyond benchmarks, basically.
So, you know, benchmarks as fine as they may be, as elaborate as they may be,
they may be somewhat representative of the real world,
but the real world is always more messy.
So I wonder how, yeah, have you been able to replicate this type of result with real use cases?
And if yes, I wonder if there's any of those use cases that you can share.
So one of the use cases that comes in my mind is ETA forecasting.
That was from one of her you can say early design partners.
The idea was can neo without much of human intervention.
if the problem statement is correctly provided,
if the data set is provided for, let's say, the last 12 months or nine months,
can it build high-quality ETA prediction models,
which is put a very real-world use case,
requires predicting ETF when a delivery is about to happen from point A to point B.
This ETA prediction is potentially currently being used in so many applications,
Uber, OLA, like taxi sharing and food delivery apps.
There are endless consumer apps today which rely on ETA prediction for best consumer experience
and building high quality models to predict accurately when your delivery is going to happen is very critical component.
And that requires building high quality models using the data that you have, performing right feature,
performing right feature engineering from your data set, evaluating if this,
if a particular algorithm is working appropriately or not and performing
comparative analysis across various model architectures so Neo was autonomously
able to do that so we in Neo there is a human in loop mode where you can
provide feedback and guidance and you can disable that also allowing Neo to
autonomously handle the problem statement so with least amount of human
involvement or guidance Neo was
able to develop models beyond 80, 85% accuracy on such real world data, which could basically
it was evaluated on unseen data and it was able to deliver this accuracy of models.
And when further hyperparameter optimization or tuning could be performed, it may, these
models may perform even better.
And Neo can integrate into the pipeline of that MED engineer's workflow.
Okay. Just to just to add on this use case, so this was a design partner and their shipment tracking company. So they are like imagine Uber, not Uber. What should I say? I think the analogy wouldn't be right. But they're there like Google Maps for shipments. So they track billions of at least 3.4 billion shipments every single year. Something like that. Now for that, they have.
of the ETM model.
And so they have these brownfield projects.
And they want to, their main, you can say goal is,
their machine learning team's main goal is to first
maintain the accuracy of the model.
Because with new data, models tend to diverge.
So they want to maintain the accuracy.
Second, they want to perform experimentation
so that they can actually increase the accuracy over time.
So acceptable accuracy in this industry is around two hours.
Because the shipment times are, let's say, 48 hours
to 72 hours.
So the acceptable accuracy, or you can see the error ring is two hours.
Now they had built this model with two hours of accuracy, sorry, error in a period of six months
with the team.
So when they started using NEO, one of the challenge, you know, was because of the bandwidth
issues, can they dedicate one member?
And that one member was, I won't say he was like an expert ML engineer, he was like a new ML engineer,
novice with some experience. So we gave him Neo to try this out on that use case. He was able to
beat that accuracy instead of two hours. Now it was 90 minutes and he was able to achieve that in
one week. So imagine earlier there was a team that was doing all this job for a period of six months
to achieve an accuracy of two hours. Now one guy using Neo was able to achieve the accuracy of
90 minutes within one week. So that was the transformative experience that we were able to deliver
using Neo. So not just you know it helps in maintaining your projects, increasing the accuracy
by performing more experimentation, but also the amount of resources that you need. The human resources
would dramatically reduce and the time to achieve the same results is like I would say 20 times
lower compared to what was you know originally there. Okay, lots of questions but let's start
with the human in the loop mode.
that you mentioned because this is something that I was also wondering about.
So precisely because there's a number, there are a number of steps in its pipeline.
So how does the human in the loop mode work?
Does the person, does the human in this loop get to interact with Neo at each step of the process?
Or perhaps like one time in the end or is it selective, is it tunable this interaction?
Yes, so the workflow has been designed such that when the human in loop mode is enabled,
Neo will perform experiments. It will run through its course, but if it faces a major
blocker where it may need your guidance after multiple iterations or if it has successfully
completed a major phase of the entire pipeline, there it will wait for the next feedback.
that would you like to proceed ahead or would you like me to proceed ahead or is there anything
that you would like to add based on the work that i have done till this point so after each major
phase successful completion it waits for that feedback if the human in loop mode is disabled
it will autonomously try to either resolve its errors or it will if it has completed one major
phase it will proceed to the next major phase as part of the continuation that okay this was
successful I have successfully pre-processed the dataset, I have performed feature engineering,
as per my understanding correctly, I can now proceed ahead with model training or as part of the
task or if there is any other aspect that needs to be looked after as part of the problem statement.
So that's how the change happens and as a user I can contribute, I can guide that workflow
in between those major phases.
And I presume that the interaction is
It happens through a text interface, right?
So you are presented with some results and then you can verify or reject the results, I guess.
Yes. Yes. It's a chat interface through which you can guide the workflow.
But I was just wondering precisely on the topic of feature engineering that you mentioned.
So how granular can this interaction be?
So for example, you know, let's take an example that you're doing image recognition, classification, whatever.
And you know that there are many examples in which you have models that were that seem to perform correctly, but are actually trained on the wrong feature.
For example, like, you know, a visual identification in some image that does not represent to what you want to actually recognize, but something else.
So how far, how granular, how, how, how, how, how, how, how, how, what is the level of interaction you can have?
Can you just say, hey, this feature is wrong. I want to change. And instead I would like to add that.
other one. Yes, absolutely. You can go as granular as you want. Like this is like a for example,
let's say it's a tabular data set. Neo-engineered the features for let's say. So I can take an
example of let's say that there was a chat moderation workflow in which there are abusive
sentences in a dataset. And I want to build a model that that can basically predict whether this
text has abusive language or not detect that.
Now there is no specific column that represents abusive words.
So based on my task description,
it was able to autonomously figure out that I need to create,
first of all, a feature that represents abusive language
from the given list of strings of, you can say,
the data set containing just natural language strings.
So that is one aspect of how
it autonomously builds features and then trains models on them the other is I can
guide the behavior like okay in in this abusive language feature I also want another
feature that focuses on let's say hate speech or some other aspect can you
combine multiple features together to create a one like a joint feature typically
how we do in SQL databases also the ideas is I can go as granular in
this aspect on evaluation side I can potentially ask Neo to evaluate the
models on F1 score or like precision recall don't try this ensembleing
strategy just go with this approach you have GPU available use that prioritize
that both for data processing you can use QDF there are various optimizations
that I can also suggest apart from that it autonomously tries to
optimize the entire pipeline itself as well so on every aspect I
I can give feedback like this approach was wrong in terms of how you evaluated the model.
It's not as I wanted it to be because definitely ultimately, so we have a mixture of our fine-tuned models, plus we use the publicly available foundational models, a mixture of them.
So our fine-tuned models definitely are greater at some aspects and some in some areas in specifically coding or some reasoning aspects.
Some foundational models are great.
But even after that, you would notice that the LLMs do hallucinate sometimes.
We need to bring them back on the course.
Our system helps in achieving that.
But even after that, they may lack certain information about certain frameworks or algorithms.
For example, after Laura, there was Q Laura for fine-tuning.
Then there was Dora.
The approaches keep on coming in.
Getting that relevant information from Internet can also.
sometimes be unreliable because the resource itself does not contain proper information.
So but then I can interject in between that this is how you should do it.
This is the appropriate approach.
This is how you would use the framework in this code.
I can also provide code snippets if I want to.
So just like a collaborative ML engineer, but it is doing the heavy lifting.
It is training the models, evaluating them, using the environment in the most optimized
manner, pre-processing the data sets for me.
Yeah, I mean, the way you describe it, to me it sounds like you may have been able to tackle better than any other solution that at least I'm familiar with.
The core reasoning problem.
So how do you combine this different information, but not just information, but how do you combine it in a logical flow, let's say, in a way that does not produce errors and hallucination?
and so on and so forth.
And you yourself highlighted that as a key feature of how new works.
And I know that because you mentioned, I saw that you mentioned somewhere that you're also filing for some patent.
So I would not expect you to fully disclose how you do that.
But I just wonder if you can share with us what family, let's say, of solutions are you using for that?
So if I had to guess, my guess would be that it's probably not something deterministic,
but something that leans more towards the more traditional, let's say,
workflow engine sort of solution.
I'm not sure if I got your question correctly.
Are you asking about our engine or what new users?
Yes, yes, yes, I'm trying, I'm speculating about how you could potentially have
implemented this reasoning engine of yours, basically.
Definitely, I'm not sure if I can go into the depth of the implementation.
I don't expect you to. I was just wondering if you can share like a very, very, you know, high level description. Okay. So is it basically is it is it deterministic or have you found some way to pull together LLMs in a way that does not hallucinate?
Yeah. Oh, okay. Yeah, got it. Actually, it's a mix of both. Honestly, like you have to use LLMs for the variance, but you still need determinism. So.
when you get best of the both worlds you can create a system that is reliably usable in
production grade workflows especially for complex tasks like machine learning so with our
architecture we have been able to achieve that in a way that works you can say pretty well for
a lot of the machine learning and AI research topics involving how you can basically
develop an outcome from an LLM which is
highly reliable and can be produced repeatedly.
A lot of the times, Errolms are themselves not able to produce their answers.
So for example, this is a term that if you perform ten experiments with the same
all of them, you would get eight out of ten different results.
Just because even if you keep the same temperature, a lot of the chances are that you
won't get the same approach or answer.
But while building such systems which should not break, which are ultimately, like you are building
a prediction algorithm, you're building a recommendation engine, you're building computer vision
models, detecting objects, you need to be, you need to build reliable models.
That requires a mixture of high quality data and algorithm excellence in terms of experimenting with
architectures and manipulating how you basically move the data through that pipeline of the
model and how you can repeatedly get that accuracy is something that we feel after so much
experimentation with nemo that we are pretty ahead of a lot of the solutions which have been trying
to use elements for machine learning or AI research i would just you know add like this at the system
level uh uh allens are going to hallucinate at the base at the micro level
they are going to hallucinate.
But at the macro level, you can remove a lot of the errors created by hallucination.
That's what we have worked on.
So when we have a multi-agent system and every agent which is powered by a fine-tune model,
so when you narrow down the scope, because of fine-tuning, we narrowed down the scope, it reduces hallucination.
So that's one area which has helped us in significantly improving this.
this second is hallucinations still would happen there is no nothing like zero hallucinations at the at
the you can say individual input level but when you when you look at this at the system level you can
remove you know those hallucinations because of the multi agent flows of these agents are helping each other
and they so imagine an agent is acting as a judge for another agent so one agent creates an output
the judge agent can evaluate that and stop it like at the very source you can say when it starts hallucinating.
So I'm really trying to explain without giving you the entire algorithm.
But the core idea here is allelms are going to hallucinate.
You can reduce that by narrowing down the scope by fine-tuning them.
And second, because of the multi-agent architecture, you can reduce.
you can say you can stop them in their in their tracks when they start to hallucinate and you can
guide them in a in a certain direction so that the it's a mixture of these practices that has ultimately
given us lessen hallucinations compared to just let's say if we were let's say if you were just
using clod or if you we were using just a base model to perform these tasks let's say they were
hallucinating and let's say the arbitrary the number was 80 percent because of this
a multi-agent system, the overall hallucinations are like 11%, 12%.
So you can dramatically lower down that by a mixture of these techniques.
Okay.
And as far as the models that you use are concerned, you mentioned that you actually use a mixture
of models, some of which you have fine-tuned yourselves and some of which, I presume,
are going to be commercially available models.
For the ones that you have produced yourself, you have total control.
you don't have to worry about that in a way.
For the models, for the commercial ones,
the fact that they constantly evolve and change,
and there's sometimes backwards compatibility issues
and APIs break and all of these things,
how do you deal with that in order to keep your system running?
Well, absolutely.
Like, aliens are somewhat unreliable a lot of the times.
The API is they return 429 or token context window limits are also one of the biggest
you can say limitations while you dealing with such.
So like for example, you would have seen in the past also that ML engineering pipelines
are pretty, pretty huge in a sense that the data sets are big model experimentation
are in numerous quantities.
So the conversation grows quite a lot and managing the context is definitely one of the biggest
challenges that we have overcome with our architecture.
Apart from that, definitely we have implemented a lot of safeguards and fallback mechanisms to
ensure that if one LLM fails then there is a backup LLM to take over.
But the ideas is like switching between LLMs is easy, making sure that they can produce
similar outcomes is hard and that requires either fine-tuning models providing appropriate
context now how to do that like when you have switched the model the capability also changes
and that's another thing that we have worked on internally and that enables us to reliably
switch between a set of models while ensuring that the quality remains consistent
Definitely not the same exactly, but quite similar in terms of producing useful, valuable outcomes, artifacts like models or pre-processed datasets for the users.
Okay, so it sounds like you have invested in what people have come to call by the name context engineering.
And I also presume probably in the MCP protocol, is that something that you use?
Yes, absolutely. MCP is also used internally. We are actually pretty soon thinking of providing that as a feature also to the users so that they can bring their MCP tools and connect NEO with them directly. For example, it could be a Databricks MCP or any other data providers MCP or like experiment tracking MCP. It could be easily.
integrated into Neo through users dashboard.
Okay.
So previously in the example that you shared,
you shared about a model that people have created through Neo
that reached something like 80 to 85% accuracy.
So for some use cases that may be good enough and, you know, case closed.
In other use cases, you would have to,
you would have to actually put on additional work to this model
to bring it to 90-90 plus accuracy.
So how would that work?
Would a human engineer be able to take over after Neo produces that model and I work
iteratively on the parts of the pipeline?
Absolutely. Like, Neo provides these artifacts that you can also ask Neo to
build an inference pipeline on top of that. Or you can export them to your published
pipeline already maybe you keep the artifacts model artifacts in a
artifactory such as ML flow for production serving there could be other
tools that you use or you might just want to do an S3 dump of all the models
and you later on integrate them into your pipeline or experiment further so
new can provide the scripts code scripts or the model artifacts the pre-processed
datasets evaluation reports comparative and
analysis that you can further use to gain insight on why this model failed and why this worked for my specific data.
And what were the issues associated in my data?
A lot of the times as a developer, I may not be aware of the issues associated in my data.
It could be that the data has bias.
The data may have like unstructured information.
columns may carry textual information. There are a lot of misinformation present in practical
datasets that might need cleanup. So Neo can provide that in a report that can help me further
improve my data and potentially and I can bring in new data also in front of Neo and ask it
to retrain the model and see where it can go. So and ask it to perform further hyper parameter
tuning also to experiment and if it can potentially improve the accuracy.
even further. Okay, so I presume that in that kind of scenario, so Neo produces a model with a certain
level of accuracy and then some engineer takes over and improves the model further. Would it be possible
to actually close the loop, let's say, and feed the final artifacts into Neo for future
improvement? Yeah, absolutely. Through the tool use and MCB, we are developing the pipeline such that
Neo can be integrated into your existing pipelines, existing ML infrastructure, so that it can build the models, push them, push them to your repository.
And from there on, you can further provide tasks to NEO to optimize or improve.
If there is drift in the model with new data, it can improve on that.
It can take your guidance and work accordingly.
Like a collaborative ML engineer would regularly optimize and enhance the pipeline, as
new data comes in.
Okay, cool.
So so far the scenarios that we have referred to
all sort of assume that the person operating,
let's say, Neo, is someone who knows what they're doing,
basically.
So I'm a machine learning engineer.
And we've talked about different ways of interacting
with the system and so on.
But what if this is actually not the case?
I mean, one of the scenarios that you mentioned
is having people use Neo that are software engineers.
for example. So these people don't necessarily know, I mean obviously they are familiar with
programming and I don't know data structures and whatnot, but they don't necessarily know how machine
learning works and you know the differences of the algorithms and building pipelines and all of those
things. So what made me wonder about this particular scenario and the reason that I'm also very,
I have a personal interest to say in this is because I have been that person. I mean that I have my back
background is in software engineering. So at some point I had, you know, I sort of taught myself
AI in a way and at some point I had to manage AI projects and make architectural decisions and so on.
And the way that this thing, that I made this thing work is by trial and error, basically.
You know, I had to educate myself and then try out different things and see, you know, what works
and what breaks and what I should do and so on. So I'm wondering is, what I'm wondering is, okay,
if you get a person with that profile and you get them to use new,
what kind of results can you expect first and then what kind of educational outcome can you expect for the person?
So maybe you get some result in the end, but will the operator, let's say,
get to really understand how machine learning works?
By the way, thanks for sharing your personal experience.
I would also love to understand how much time it took you for this transition.
That's a good question.
So I think in total, so I didn't actually start from data science and machine learning.
I started from the other side of the aisle, let's say.
So more knowledge representation and reasoning and symbolic AI.
I already had like a solid grasp of data structures and algorithms and all of those things.
So going, let's say, from having like a very abstract idea of data science to actually being able to, you know, determine algorithms and run pipelines and all of those things, I think it took me like a few months, let's say half a year, roughly.
Yeah.
So, I mean, I mean, God, do you want to take it or I can share my personal experience?
Yeah, please go ahead.
I can add on that.
You know, when I was doing physics at these labs, like CERN and then later on, Atomic Energy Commission in France, these days, you'd never do pure physics.
Everything is driven by systems and code, code.
So I had to code all day, every day.
And honestly, at some point, it became like 90% of my work was writing different algorithms, writing code every day.
I could just spend 10% of my time on actual physics.
So and a lot of, you know, researchers that I've met, you know, in astronomy, in physics and chemistry, and even in biology, they are drifting, you know, away from their core science and have been, you know, moving, spending more time on building algorithms.
Because fundamentally, they know that AI can actually ten X their productivity.
AI can actually help not just, you know, in just doing low.
level but if they are well worse in building these models they can actually do some advanced
research and that's why everyone in every one of these researchers in their fields are now trying to
implement machine learning like recently i met someone in astronomy they have massive amounts of
data with machine learning models you can actually detect a lot of patterns and hidden signals
but his frustration is i'm not a machine learning engineer i did basically
you know computer science but I have to spend all my time you know to learn how to
set up GPUs how to fine-tune a model how to build pipelines and all of that so
that has been a real frustration now because of me know today and it's just been
I would say one month of me practicing with you know today I can build models and
like recently I built a movie recommended model all on my own end to end then then
I started to you know tinker around and really tweak you know some of the
existing models so for example
you have protein folding models.
It's well established.
You can take a model.
But what if I could use the principle of least action from physics and integrate that with 3D protein folding?
No one has done that.
But now I have the confidence to try this out.
And maybe this can lead to better accuracy in protein folding.
So these kind of things are something that a lot of researchers, innovators, from different fields, wants to try and they were not able to do that.
And with Neo, you can just start, there are two things.
I would say one is you can really start small.
We give you all the different existing projects.
You can start with them, tweak around, ask questions in plain English, and really, you know, scale fast.
Like scale fast in terms of learning really fast because as a human, we can't really catch up with all the progress that is happening all around us right now, especially in the world of AI every single day.
There's a new, you can say, hundreds of research papers on optimizing your infrastructure,
on fine-tuning methods, on training methods, all of that.
And if my core research is in quantum physics,
I don't have the time to read all the research papers.
So I always dream, you know,
if there could be an engineer that could sit by my side and do all of this work,
while I can actually focus on pure physics.
Now I think with Neo, we can do that because,
since I can, Neo is that co-engineer because now I can actually integrate
some of the principles from physics,
and I can tweak existing models.
And that is like the first step to enter this field.
Second, I would say what we have done is we give you two views.
One is a macroscopic view through which you can look at all the different systems.
And the second is microscopic view.
So you can actually look under the hood.
You can look at the interdependencies, artifacts, code, everything.
And because of that, you can deeply understand and ask questions to Neo that, okay,
what did you actually do when you were, you know, fine-tuning this method?
Or let's say I ask Neo to perform 100 experiments with different, you know, fine-tuning methods with different data sets.
Now, Neo will perform those experiments, eliminate some of them based on a threshold value that I have provided it.
And then it will run the top five eventually and give me an evaluation report.
And in that evaluation report, it will do a comparative analysis that, okay, for example, let's say you are building computer vision model for self-driving cars.
you can have R CNN or RT data, let's say, eventually, and created a report for these two.
And it gave you the difference that, okay, this one offers lower latency, but higher accuracy.
This one is slow, but this one is faster, but it has lower accuracy.
Now, as a human, I can decide and I can command Neo, because now I understand.
And because of this understanding, I can command near that, okay, you can choose one path and go there and also explain, you know, step by step, what you are actually doing.
So just like in one month, and I'm not even a software engineer.
I don't code these days, but even then I was able to build all of this like in one month.
So the greatest wish I have is that I can provide this superpower to all the researchers in the world
so that they can actually speed up, accelerate their research using NEO.
So they don't spend a lot of their time in building or dealing with cloud infrastructure, GPUs,
reading research papers on AI or writing code, they should become experts in their field,
while tools like Neo should help them.
Yeah, I mean, I think the low-level stuff that you also mentioned previously,
like, I don't know, fine-tuning CPUs or cloud deployment or that,
nobody's going to miss that.
I was just wondering more about, you know, the more interesting things like, you know,
fine-tuning algorithms and choosing one over the other and that kind of thing.
Like one thing, you know, like you said, from your experience, you know, these days a lot of
people say that you don't have to learn, don't learn how to code, but I disagree. And I think
you would agree because of your experience. When you learn how to code, it gives you, I would say
the skill for your entire life, but you learn how the different systems interact with each other.
you learn how to solve these problems at the algorithm level, how different systems work or systems engineering.
And I think you can take that to any field.
And I think I didn't enjoy coding, but that actually helped me today so that I can understand, you know, how different fine-tuning methods actually work.
What's really happening under the hood.
So the, and with Neo, you know, we can question it and give us, you know,
details about different methods that it has tried and what is the difference between
q laura and laura or dpo or dora or different methods and then i can base on my priority i can ask it
to uh choose one method and uh run with that yeah is your sound okay now yeah yeah sorry about that
no worries you wanted to add something gota no absolutely i think sort of captured the entire context
and I think that's the premise, like the macroscopic view and the microscopic view helps.
So like just like as I have seen it, there are analysts, data analyst, business analyst, people also using Neo.
What we have realized is like a lot of the people have data without insights.
They don't know really what their data is giving them the information.
So in order to get that information, also analysis, experimentation with your data could be very crucial.
And it could be as crucial as like, for example, I'm building a fintech trading bot model that can help me predict the price of a stock or a crypto.
Even in that scenario, there is so much amount of data which as a human, it becomes very difficult to analyze and assess.
that large volumes of data coming from different sources.
Now could there be a model?
I may have wanted to build a model,
but I'm not very good at ML engineering
or I don't know much about the algorithms.
So could there be a system that can help me get started,
help me in my journey,
and maybe as I build with that system,
I can also start learning that what algorithms are prioritized,
why are they prioritized,
how to pre-process data,
datasets. So I think like while building with Neo, everybody learns the ML engineering
depths of like how to choose a model, how to evaluate them for various use cases, because
Neo as a system is general can be used for various tasks. It's not specific for it
will only work with tabular data sets. It can work with image classification. It can work
with gen AI models.
It can build back pipelines also.
So it is breadthwise, it is very capable in a lot of the domains
around AIML.
So even with less understanding of such deeper concepts,
I can get started with Neo.
And slowly and steadily, I can start learning
the deeper concepts with the microscopic view in place.
Okay, well, thanks, and so we're close to wrapping up.
And so far, we have there, we have dives really kind
deep into the tech side of things, like how does it work and what you can do with it and so on.
So let's wrap up with the business side of things because I'm sure that by now if people are listening,
they must be wondering like, okay, this sounds like a superpower.
I want to use it.
So how can people use it?
And what are the next steps for you as a company?
So how do you scale out, basically?
So there's a lot of stuff happening, you know, behind the scenes right now.
But I would say broadly speaking, two things.
One is, you know, we are making Neo better at handling more and more complex tasks.
Second is integrations.
We want Neo to integrate across all your systems from your database to different clouds.
And these two things are, you know, design keeping in mind that we,
want to actually provide an ML engineer eventually that's the long-term vision where we
should be able to provide a Carroll Grandmaster level ML engineer so to every
researcher developer innovator and company on the planet so from that vision these
two things are incredibly important one that it should be able to handle more and
more complex tasks so we are improving you know at the at the algorithm level
capabilities of Neo to handle more complex tasks
which will be evident with our increasing score on the MLE event in coming days.
Second is integrations like Godd have mentioned, MCP, many other tools.
And third, I would say one of the things is the ability to read a research paper, write code,
and integrate that within your existing brownfield project.
And lastly, I would say, NEA is now ready.
We have opened up our wait list.
You can try it today itself.
Okay.
So actually I would argue that most of the things you mentioned, except maybe for the last point, are really improvements or feature additions or whatever you want to call them.
I was more thinking in terms of things like growing the company or going to market or I don't know, maybe getting funding or this type of thing.
Is this something that you are working on or that you are able to share?
Right now the focus is completely, you know, on going deep.
with our existing user base.
We have thousands of developers right now who are tinkering with Neo.
And every single day we get lots of feedback.
So we are working on improving the product.
And the goal here is before we hit the market, you know, for you can say more marketing
or raising money, we want to make sure we have these engaged users who are using Neo
on a day-to-day basis.
And once we achieve that goal, we have like certain milestones internally.
Once we hit those milestones, we are going to go out and raise money.
Okay, well, maybe you have to create another wait list then for this is because I think that you will have many people who are interested in this.
Yeah, it is always exciting.
And like we are relieving it in batches because that really helps, you know, in giving a more, you can say more refined product to the next batch instead of like giving the initial version to all the US.
Okay, well, thanks.
It's been a very, very interesting conversation.
and I don't know, maybe if I manage to find some time to actually use the opportunity,
I may join your wait list as well. It sounds super interesting.
Thanks for sticking around. For more stories like this, check the link in bio and follow link data
orchestration.
