The Good Tech Companies - Generative AI: Expert Insights on Evolution, Challenges, and Future Trends
Episode Date: July 23, 2024This story was originally published on HackerNoon at: https://hackernoon.com/generative-ai-expert-insights-on-evolution-challenges-and-future-trends. Dive into the world... of generative AI with ELEKS' expert analysis, discover the challenges and see what the future holds. Check more stories related to machine-learning at: https://hackernoon.com/c/machine-learning. You can also check exclusive content about #ai, #generative-ai, #data-science, #llms, #future-of-ai, #ai-vs-genai, #ai-regulation, #good-company, and more. This story was written by: @elekssoftware. Learn more about this writer by checking @elekssoftware's about page, and for more stories, please visit hackernoon.com. Dive into the world of generative AI with ELEKS' expert analysis, discover the challenges and see what the future holds.
Transcript
Discussion (0)
This audio is presented by Hacker Noon, where anyone can learn anything about any technology.
Generative AI. Expert insights on evolution, challenges, and future trends.
By ELEX, AI has captured the attention of tech enthusiasts and industry experts for quite some
time. In this article, we delve into the evolution of AI, shedding light on the issues it poses and
the emerging trends on the horizon. As we observe the exponential growth of AI technology, it becomes increasingly crucial
to have a comprehensive understanding of its capabilities in order to maximize its potential
benefits. Delving into this complex realm, Volodymyr Getmansky, the head of the data
science office at ELEKS, shares his insights and expertise on this trending topic. AI versus Gen AI,
key differences explained. Firstly, generative AI is part of the AI field. While AI mainly focuses
on automating or optimizing human tasks, generative AI focuses on creating different objects.
Typical AI tasks such as building conversational or decision-making agents, intelligent automation,
image recognition and processing, as well as translation, can be enhanced with Gen AI. tasks such as building conversational or decision-making agents, intelligent automation,
image recognition and processing, as well as translation, can be enhanced with Gen AI.
It allows the generation of text and reports, images and designs, speech and music, and more.
As a result, the integration of generative AI into everyday tasks and workflows has become increasingly seamless and impactful. One might wonder which type of
data generation is the most popular. However, the answer is not straightforward. Multimodal
models allow the generation of different types of data based on diverse input. So, even if we
had usage statistics, it would be challenging to determine the most popular type of data being
generated. However, based on current business needs, large language
models are among the most popular. These models can process both text and numerical information
and can be used for tasks like question answering, text transformation, translation, spell checking,
enrichment, and generating reports. This functionality is a significant part of
operational activities for enterprises across industries, unlike image or video generation, which is less common.
Large language models, from text generation to modern giants.
Large language models, LLMs, are huge transformers, which are a type of deep learning model or,
to put it simply, specific neural networks.
Generally, LLMs shave anywhere from 8 billion to 70 billion parameters and are trained
on vast amounts of data. For instance, CRAWL, one of the largest datasets, contains webpages
and information from the past decade, amounting to dozens of petabytes of data. To put it in
perspective, the Titanic dataset, which consists of around 900 samples describing which passengers
survived the Titanic shipwreck, is less than 1 megabit in size, and the model that can efficiently predict the probability
of survival may have around 25 to 100 parameters. LLMs also have a long history,
and they didn't suddenly appear. For example, the ELEX data science department used GPT-2 for
response generation in 2019, while the first GPT, Generative Pre-trained
Transformer, model was released in 2018. However, even that wasn't the first appearance of the text
generation models. Before the transformer era started in 2017, tasks such as text generation
had been addressed using different approaches, for example generative adversarial networks,
an approach where the
generator trains based on the feedback from another network or discriminator. Autoencoders,
a general and well-known approach where the model tries to reproduce the input.
In 2013, efficient vector-word embeddings like Word2Vec were proposed, and even earlier,
in the previous century, there were examples of probabilistic and pattern-based generation, such as the ELISA chatbot in 1964. So, as we can see, the natural language generation,
NLG, tasks and attempts have existed for many years. Most of the current LLMs users,
such as ChadGPT, GPT, Gemini, Copilot, Claude, etc. are likely unaware of this because the results weren't as
promising as softer the first release of InstructGPT, where OpenAI proposed public access,
promoting it. Following the first release of ChadGPT in November 2022, which received millions
of mentions on social media. The AI regulation debate, balancing innovation and safety. Nowadays, the AI community is divided on
the topic of AI risks and compliance needs, with some advocating for AI regulations and safety
control while others oppose them. Among the critics is Jan LeCun, chief of Meta, Facebook,
AI, who stated that such AI agents have intelligence even not similar to that of a dog.
Meta AI Group, formerly Facebook AI
Research, is one of the developers of free and publicly available AI models such as Detectron,
Llama, Segment Anything, and ELF, which can be freely downloaded and used with only some
commercial limitations. Open access has definitely been favorably received by the worldwide AI
community. Greater than those systems are still very limited.
They don't have any understanding of the greater than underlying reality of the real world because
they are purely trained on text, greater than a massive amount of text. Greater than greater than
greater than, Yann LeCun, chief AI scientist at Meta The concerns regarding the regulations have
also been raised by officials. For example, French President Emmanuel Macron warned that
landmark EU legislation designed to tackle the development of artificial intelligence risks
hampering European tech companies compared to rivals in the US, UK, and China. On the other
hand, there are AI regulation supporters. According to Elon Musk, Tesla CEO, AI is one
of the biggest risks to the future of civilization.
This is the same as non-public, paid AI representatives, but here, the real exciters
of such a position can be market competition, to limit the spread of competing AI models.
Overview of the EU Artificial Intelligence Act in 2023, the EU Parliament passed the AI Act,
the first set of comprehensive rules governing the use of AI
technologies within the European Union. This legislation sets a precedent for responsible
and ethical AI development and implementation. Key issues addressed by the EU AI Act.
Firstly, there are logical limitations to personal data, as already outlined by different standards, like GDPR, EU, APPI, Japan, HIPAA, US, and PIPEDA, Canada,
which cover personal data processing, biometric identification, etc. Connected to this are
scoring systems or any form of people categorization, where model bias can have a significant impact,
potentially leading to discrimination. Finally, there is behavioral manipulation,
where some models can try to increase any business KPIs, conversion rates, over-consumption.
AI model preparation and usage, challenges and concerns. There are many issues and concerns
connected to model preparation, usage, and other hidden activities. For example, the data used for
the model training consists of personal data,
which wasn't authorized for such purposes. Global providers offer services focused on
private correspondence, emails, or other private assets, photos, video, that can be used for the
model training in the hidden mode without any announcement. There was recently a question
addressed to OpenAI's CTO regarding the use of private videos for Sora training,
a non-public OpenAI service for generating videos based on textual queries, but she could not provide a clear answer.
Another issue can be related to data labeling and filtering.
We don't know the personal characteristics, skills, stereotypes, and knowledge of specialists involved there,
and this can introduce unwanted statements, content to the data. Also, there was an ethical issue. There was information that some of the global Gen AI
providers involved labelers from Kenya and underpaid them. Model bias and so-called model
hallucinations, in which the models provide incorrect or partially incorrect answers that
appear to be perfect, are also problems. Recently, the ELEKS data science team was
working on improving our customers' retrieval augmented generation, RAG, solution, which covers
showing some data for the model, and the model summarizes or provides answers based on that data.
During the process, our team realized that many modern online, larger but paid, or offline,
smaller and public, Models confuse the enterprise
names and numbers. We had data containing financial statements and audit information
for a few companies, and the request was to show company A's revenue. However, the revenue for
company A wasn't directly provided in the data and needed to be calculated. Most models, including
leaders in the LLM arena benchmark, responded with the wrong revenue level
that belonged to company B. This error occurred due to partially similar character combinations
in companies' names such as, limited, service, etc. Here, even the prompt learning didn't help.
Adding a statement like, if you aren't confident or some information is missing,
please answer don't know, didn't resolve the issue. Backslash dot. Another thing is about numerical representation. The LLMs perceive numbers as
tokens, or even many tokens, like 0.3333 can be encoded as 0.3 feet and 3333, according to the
byte pair encoding approach, so it is hard to deal with complicated
numerical transformations without additional adapters. The recent appointment of retired
U.S. Army General Paul M. Nakasone to OpenEye's board of directors has sparked a mixed reaction.
On the one hand, Nakasone's extensive background in cybersecurity and intelligence is seen as a
significant asset, likely to implement robust strategies to defend
against cyber attacks, crucial for a company dealing with AI research and development.
On the other hand, there are concerns about the potential implications of Nkaswan's appointment
due to his military and intelligence background, former head of the National Security Agency,
NSA, and U.S. Cyber Command, which may lead to increased government surveillance and intervention.
The fear is that Nakasone could facilitate more extensive access by government agencies to Open
AI's data and services. Thus, some fear that this appointment can affect both the use of the service,
data, requests by government agencies, and the limitations of the service itself.
Finally, there are other concerns, such as the generated code vulnerability,
contradictory suggestions, inappropriate usage, passing exams or getting instruction on how to
create the BOM, and more. How to improve the LLM's usage for more robust results?
First, it's crucial to determine whether using LLM is necessary and whether it should be a
general foundational model. In some cases, the purpose
and the decomposed task are not so complicated and can be resolved by simpler offline models
such as misspelling, pattern-based generation, and parsing, information retrieval. Additionally,
the general model can answer questions not related to the intended purpose of LLM integration.
There are examples when the company encouraged online LLM integration, E.G.G.P.T.
Gemini, without any additional adapters, pre- and post-processors, and encountered unexpected
behavior. For example, the user asked a car dealer chatbot to write the Python script to
solve the Navier-Stokes fluid flow equation, and the chatbot said, certainly. I'll do that.
Next, comes the question of which LLM to use, public and offline or paid and offline.
The decision depends on the complexity of the task and the computing possibilities.
Online and paid models are larger and have higher performance, while offline and public
models require significant expenditures for hosting, often needing at least 40 gigabits of
VRAM. When using
online models, it's essential to have a strict control of sensitive data shared with the provider.
Typically, for such things, we build the pre-processing module that can remove personal
or sensitive information, such as financial details or private agreements, without significantly
changing the query to preserve the context, leaving information like the enterprise size or approximate location if needed. The initial step to decreasing the model's
bias and avoiding hallucinations is to choose the right data or context or rank the candidates,
E, G, for RAG. Sometimes, vector representation and similarity metrics, such as cosine similarity,
may not be effective. This is because small variations,
like the presence of the word, no, or slight differences in names, e.g. oracle vs. orich,
can have a significant impact. As for the post-processing, we can instruct the model
to respond with, don't know, if confidence is low and develop a verification adapter that
checks the accuracy of the model's responses.
Emerging trends and future directions in the LLM field. Numerous research directions exist in the field of LLMs, and new scientific articles emerge weekly. These articles cover a range of
topics, including transformer, LLM optimization, robustness, efficiency, such as how to generalize
models without significantly increasing their size
or parameter count, typical optimization techniques, like distillation, and methods
for increasing input, context, length. Among the various directions, prominent ones during
the recent period include mixture of tokens, mixture of experts, mixture of depth, skeleton
of thoughts, rope, and chain of thoughts prompting. Let's briefly describe what
each of Thessamians. 1. The mixture of experts, Moes, is a different transformer architecture.
It typically has a dynamic layer consisting of several, eight in mixtural, or many dense,
flattened layers representing different knowledge. This architecture includes switch or routing
methods, for example, a gating function that allows selecting which tokens should be processed by which experts, leading to the reduced number
of layers, experts, per token or group of tokens to one expert, switch layer. This allows for
efficient model scaling and improves performance by using different submodels, experts, for input
parts, making it more effective than using one general and even larger layer.
Backslash.2. The mixture of tokens is connected to the mentioned mixture of experts,
where we group tokens by their importance, softmax activation, for a specific expert.
Backslash.3. The mixture of depth technique is also connected to the mentioned MOS,
particularly, in terms of routing. It aims to decrease the computing graph, compute budget, limiting it to the top tokens that will be used in the attention mechanism.
The tokens deemed less important, E, G, punctuation, for the specific sequence are skipped.
This results in dynamic token participation, but the K, top K tokens, number of tokens is static,
so we can decrease the sizes according
to the compute budget, or k, which we've chosen. Backslash dot, 4, the skeleton of thoughts is
efficient for LLM scaling and allows the generation of parts of the completion, model response,
in parallel based on the primary skeleton request, which consists of points that can be parallelized. Backslash dot five. There
are other challenges, for example, the input size. Users often want to provide an LLM with
large amounts of information, sometimes even whole books, while keeping the number of parameters
unchanged. Here are two known methods, alibi, attention layer with linear biases, and rope,
rotary position embedding, that can extrapolate,
or possibly interpolate, the input embedding using the dynamic positional encoding and scaling factor,
allowing users to increase the context length in comparison to which was used for the training.
Backslash dot, 6. The chain of thoughts prompting, which is an example of few-shot prompting,
the user provides the supervision for LLM in the
context, aims to decompose the question into several steps. Mostly, it is applied to reasoning
problems, such as when you can split the logic into some computational plan. The example from
the origin paper, Roger has five tennis balls. He buys two more cans of tennis balls. Each can
has three tennis balls. How many tennis balls does he have now? Thoughts plan.
Roger started with five balls. Two cans of three tennis balls each is six tennis balls.
Five plus six equals eleven. The answer is eleven.
Besides that, there are many other directions, and every week,
several new significant papers appear around them.
Sometimes, there is an additional problem
for data scientists in following all these challenges and achievements.
What can end-users expect from the latest AI developments? There are also many trends,
just to sum up, there may be stronger AI regulations that will limit different
solutions and finally will result in available models generalization or field coverage.
Other trends are mostly about
the existing approaches improvement, for example, decreasing the number of parameters and memory
needed, e.g. quantization or even 1-bit LLMs, where each parameter is ternary, can take minus 1, 0,
1 values. So, we can expect offline LLMs or diffusion transformers, DIT, modern diffusion models
and visual transformer successors, primary for the image generation tasks, running even
on our phones.
Nowadays, there are several examples, for example, Microsoft's Fi2 model with the
generation speed is about 3 to 10 tokens per sec on modern Snapdragon-based Android devices.
Also, there will be more advanced
personalization, using all previous user experience and feedback to provide more suitable results,
even up to digital twins. Many other things will have been improved that are available right now
assistance, model customization and marketplaces, one model for everything, multimodal direction,
security, a more efficient mechanism to work
with personal data, to encode it, etc. and others. Ready to unlock the potential of AI for your
business? Contact ELEKS expert. Thank you for listening to this Hackernoon story, read by
Artificial Intelligence. Visit hackernoon.com to read, write, learn and publish.