Microsoft Research Podcast - Abstracts: October 9, 2023
Episode Date: October 9, 2023Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversatio...ns about new and noteworthy achievements. In this episode, Dr. Sheng Zhang, a Senior Researcher at Microsoft Research, joins host Dr. Gretchen Huizinga to discuss “UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition.” In this paper, Zhang and his coauthors present mission-focused instruction tuning, a method for distilling large language models into smaller, more efficient ones for a broad application class. Their UniversalNER models achieved state-of-the-art performance in named entity recognition, an important natural language processing (NLP) task. Model distillation has the potential to make NLP and other capabilities more accessible, particularly in specialized domains such as biomedicine, which could benefit from more resource-efficient and transparent options. Learn more:View the paperUniversalNER project website with demoCode on GitHubDataset and models on Hugging Face
Transcript
Discussion (0)
Welcome to Abstracts,
a Microsoft Research podcast that puts
the spotlight on world-class research in brief.
I'm Dr. Gretchen Huizenga.
In this series,
members of the research community at Microsoft give us
a quick snapshot or a podcast abstract
of their new and noteworthy papers.
Today, I'm talking to Dr. Shen Zhang, a senior researcher at Microsoft Research.
Dr. Zhang is co-author of a paper called Universal NER,
Targeted Distillation from Large Language Models for Open Named Entity Recognition,
and you can read this paper now on Archive.
Shen Zhang, thanks for joining us on Abstracts.
Thanks for having me.
So in a few sentences, give us a brief introduction or overview of the issue
or problem that your research addresses and why we should care about it. Sure. Well, our research
addresses the challenge of efficiently replicating the capabilities of large language models for targeted application.
Particularly, we focus on NAMNTT recognition or NER.
And people should care because this work aims
to create more cost-effective and transparent models
that can recognize a wide range of NTT types
across various domains,
which is crucial for knowledge extraction
and has numerical practical applications.
So how does your approach, your particular approach, build on or differ from what's been
done previously in this field?
Well, our approach builds on the idea of instruction tuning, which is used to fine-tune language
models to follow human instructions.
However, unlike existing work that focuses on tuning models into replicas of large language
models in every aspect, we propose a method called mission-focused instruction tuning,
where we train a smaller model to specifically excel in a broad
application class, such as open information instruction. And in our case study, we focus
on name entity recognition, NER, and we demonstrate how targeted distillation from large language
models can maximize the capabilities for this application.
At the same time, the smaller model, the student model, also preserves generalizability across
different semantic types and domains. This approach differs from previous work also because
we emphasize the importance of increasing the diversity of input data and generating more comprehensive
coverage of antitypes, which ultimately leads to better performance in the targeted application.
Okay, and in the paper you talk about student models trailing the original large language models
by large margins in what you call downstream applications.
Give me an example of what downstream application looks like.
Yeah, so we here specifically focus on name entity recognition,
that is, identifying name entities in the written text.
So there's various types of name entities.
So the canonical ones like person, geographic location, organization, and the people have various needs.
They can go beyond those core screen types.
They can go into very fine-grained types like athlete, a politician, and even finer-grained types.
And you cannot predefine what types will be considered in your task.
That's why we care about this universal
concept of NAM entity recognition.
Well, let's talk about methodology for a bit. What kind of research methodology
did you use, and how did you conduct this research?
We developed a general recipe for targeted distillation from large-language
models. And in this case, we applied to OpenNER.
And our methodology consists of two main steps,
data construction and mission-focused instruction tuning.
For data construction, we sampled inputs
from a large corpus across diverse domains.
And then we use a large language model, ChatGPT,
to annotate anti-dimensions and their associated anti-types
in the sampled inputs.
This process allowed us to create a data set
with a wide coverage of anti-types.
For mission-focused instruction tuning,
we fine-tune smaller models using our constructed data set
in a conversational style format.
For each anti-pe in the output,
we transform it into a natural language query and tune the model to generate structured outputs
that contain all entities of that type in the input passage. We also incorporate negative sampling
to account for antitypes not mentioned in that passage. And besides these two main steps, our research also involved assembling the largest to date
and the most diverse NER benchmark for evaluation.
We compared the performance of our targeted distillation approach with other state-of-the-art
models to demonstrate the effectiveness of our methodology.
Okay. So you talk about NER as a case study, and you had 43 datasets and nine domains.
Give me an example of some of those domains that you pulled from.
Yeah. So one very, you know, typical domain is like news, right? We read news every day, and the news mentioned about people,
events, and
location. So that's like
a very common domain. And
there are other very interesting domains like code.
People also write code.
And the computer can understand
the code, but the person would also
want to understand the code in some different
way. So if you have
code-specific
name entity recognition capability, that would be awesome for some people that want to understand
what's happening in the code. Right. And you mentioned programming or code, but I also see
in the paper biomedicine on one kind of complex and academic end and social media on another.
So those are wildly different domains that you pulled from.
Did you do that for a reason, that spectrum of different kinds of data?
Yes.
The reason is that, you know, for some high-value domain like biomedicine,
it's quite expensive to annotate some data to train a model like that.
So traditionally, people will have to hire an expert to do that.
That is quite expensive and not scalable.
And here in the universal NIR paper, we propose a way to distill that specific domain knowledge from the large language model.
So the whole process is automatic. And the result model, you can see, it does pretty
well and maybe equally well on the model that based on, you know, human expert annotated corpus.
So after all this, a research paper presents findings. I imagine you had some interesting
discoveries in this study.
What were your major findings?
Yes, our major findings were that the targeted distillation approach,
specifically here, the universal NER model we developed,
it achieved state-of-the-art performance in name antirecognition across a wide range of antitypes and domains.
And when we compare to other models like APACA, Vicuna,
and Instruct-UIE, Universal NER significantly
outperformed them in terms of F1 score.
This demonstrated the effectiveness
of mission-focused instruction tuning
for creating more cost-effective and transparent models that
can excel in targeted applications,
such as open AR. So let's talk a little bit more about real-world impact. We've already discussed
a little bit about that. But how would you say, based on these findings, that this impacts the
real world and how people will use this? Yeah, absolutely. I would say our work is very significant in terms of
real-world impact. Because first of all, NER is a fundamental task in natural language processing,
and it plays a crucial role in knowledge extraction, information retrieval, and data
mining. And by developing a more cost-effective and transparent model like Universal NER,
which can recognize a wide range of antitypes and domains, we enable better performance in
this downstream application. And like I said, this is particularly important in high-value
domains such as biomedicine, where specialized expertise is required for annotation and the new
antitypes keep emerging. Our approach can help save time and resources for effectively
recognizing these new antitypes without the need for extensive annotated data. And secondly,
our work can have a broader impact as it represents a general recipe for targeted distillation
from large language models.
And this approach can be applied to other application classes, such as open relation
extraction.
And this allows researchers and the practitioner to create much smaller models that can be
more efficient and transparent while maintaining high performance
in their targeted tasks. If there was one thing you want our listeners to take away from this work
and you could distill that into a short take, what would it be? One key takeaway from our work
is that targeted distillation from large language models using our mission-focused
instruction tuning can lead to more cost-effective and transparent models that excel in a broader
application class.
And our application demonstrates that it is possible to harness the capabilities of large
language models and distill them into much smaller models that not only maintain
general liability across semantic types and domains, but also surpass the performance
of their larger counterparts in the targeted application.
And this opens up new avenues for research and practical application in various fields, making knowledge extractions
and natural language processing tasks more efficient and accessible.
It sounds very promising, and it sounds like you're excited about it.
Yeah, I'm pretty excited.
Well then, tell us, given this new vista that you've opened up with this universal NER,
what unanswered questions or unsolved problems still remain in this area?
And what's next on your research agenda?
Yeah, our work demonstrates the effectiveness of targeted simulation for open NER,
but several unanswered questions remain.
And I would say the first one is adapting the approach to other
application classes. Our method is a general recipe for targeted distillation, and it would
be interesting to explore its effectiveness in other broader application classes, such as open
relation extraction. And the second one is handling label conflicts
and the dataset-specific definition.
So in our work, we propose a dataset-specific
instruction tuning template to address label conflicts.
But more research is needed to better understand
and develop methods for harmonizing discrepancies
in label definition across datasets.
And the last one is exploring more efficient data construction methods.
We use ChatGPT for data construction, but alternative approaches could be explored to
generate more diverse and comprehensive datasets for mission-focused instruction tuning.
And as for our research agenda,
we plan to continue exploring targeted distillation techniques and apply them to other application classes,
as well as investigate ways to improve data construction
for better performance and efficiency in real-world tasks.
Sounds like you got your work cut out for you.
Yes.
Shen Zhang, thanks for joining us today.
And to our listeners, thanks for tuning in.
If you're interested in learning more about this paper,
you can find a link at aka.ms forward slash abstracts,
or you can read the paper on Archive.
See you next time on Abstracts.