The Good Tech Companies - Generative AI: Expert Insights on Evolution, Challenges, and Future Trends

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Generative AI. Expert insights on evolution, challenges, and future trends. By ELEX, AI has captured the attention of tech enthusiasts and industry experts for quite some time. In this article, we delve into the evolution of AI, shedding light on the issues it poses and the emerging trends on the horizon. As we observe the exponential growth of AI technology, it becomes increasingly crucial to have a comprehensive understanding of its capabilities in order to maximize its potential benefits. Delving into this complex realm, Volodymyr Getmansky, the head of the data science office at ELEKS, shares his insights and expertise on this trending topic. AI versus Gen AI,

Starting point is 00:00:46 key differences explained. Firstly, generative AI is part of the AI field. While AI mainly focuses on automating or optimizing human tasks, generative AI focuses on creating different objects. Typical AI tasks such as building conversational or decision-making agents, intelligent automation, image recognition and processing, as well as translation, can be enhanced with Gen AI. tasks such as building conversational or decision-making agents, intelligent automation, image recognition and processing, as well as translation, can be enhanced with Gen AI. It allows the generation of text and reports, images and designs, speech and music, and more. As a result, the integration of generative AI into everyday tasks and workflows has become increasingly seamless and impactful. One might wonder which type of data generation is the most popular. However, the answer is not straightforward. Multimodal

Starting point is 00:01:31 models allow the generation of different types of data based on diverse input. So, even if we had usage statistics, it would be challenging to determine the most popular type of data being generated. However, based on current business needs, large language models are among the most popular. These models can process both text and numerical information and can be used for tasks like question answering, text transformation, translation, spell checking, enrichment, and generating reports. This functionality is a significant part of operational activities for enterprises across industries, unlike image or video generation, which is less common. Large language models, from text generation to modern giants.

Starting point is 00:02:12 Large language models, LLMs, are huge transformers, which are a type of deep learning model or, to put it simply, specific neural networks. Generally, LLMs shave anywhere from 8 billion to 70 billion parameters and are trained on vast amounts of data. For instance, CRAWL, one of the largest datasets, contains webpages and information from the past decade, amounting to dozens of petabytes of data. To put it in perspective, the Titanic dataset, which consists of around 900 samples describing which passengers survived the Titanic shipwreck, is less than 1 megabit in size, and the model that can efficiently predict the probability of survival may have around 25 to 100 parameters. LLMs also have a long history,

Starting point is 00:02:55 and they didn't suddenly appear. For example, the ELEX data science department used GPT-2 for response generation in 2019, while the first GPT, Generative Pre-trained Transformer, model was released in 2018. However, even that wasn't the first appearance of the text generation models. Before the transformer era started in 2017, tasks such as text generation had been addressed using different approaches, for example generative adversarial networks, an approach where the generator trains based on the feedback from another network or discriminator. Autoencoders, a general and well-known approach where the model tries to reproduce the input.

Starting point is 00:03:35 In 2013, efficient vector-word embeddings like Word2Vec were proposed, and even earlier, in the previous century, there were examples of probabilistic and pattern-based generation, such as the ELISA chatbot in 1964. So, as we can see, the natural language generation, NLG, tasks and attempts have existed for many years. Most of the current LLMs users, such as ChadGPT, GPT, Gemini, Copilot, Claude, etc. are likely unaware of this because the results weren't as promising as softer the first release of InstructGPT, where OpenAI proposed public access, promoting it. Following the first release of ChadGPT in November 2022, which received millions of mentions on social media. The AI regulation debate, balancing innovation and safety. Nowadays, the AI community is divided on the topic of AI risks and compliance needs, with some advocating for AI regulations and safety

Starting point is 00:04:32 control while others oppose them. Among the critics is Jan LeCun, chief of Meta, Facebook, AI, who stated that such AI agents have intelligence even not similar to that of a dog. Meta AI Group, formerly Facebook AI Research, is one of the developers of free and publicly available AI models such as Detectron, Llama, Segment Anything, and ELF, which can be freely downloaded and used with only some commercial limitations. Open access has definitely been favorably received by the worldwide AI community. Greater than those systems are still very limited. They don't have any understanding of the greater than underlying reality of the real world because

Starting point is 00:05:09 they are purely trained on text, greater than a massive amount of text. Greater than greater than greater than, Yann LeCun, chief AI scientist at Meta The concerns regarding the regulations have also been raised by officials. For example, French President Emmanuel Macron warned that landmark EU legislation designed to tackle the development of artificial intelligence risks hampering European tech companies compared to rivals in the US, UK, and China. On the other hand, there are AI regulation supporters. According to Elon Musk, Tesla CEO, AI is one of the biggest risks to the future of civilization. This is the same as non-public, paid AI representatives, but here, the real exciters

Starting point is 00:05:50 of such a position can be market competition, to limit the spread of competing AI models. Overview of the EU Artificial Intelligence Act in 2023, the EU Parliament passed the AI Act, the first set of comprehensive rules governing the use of AI technologies within the European Union. This legislation sets a precedent for responsible and ethical AI development and implementation. Key issues addressed by the EU AI Act. Firstly, there are logical limitations to personal data, as already outlined by different standards, like GDPR, EU, APPI, Japan, HIPAA, US, and PIPEDA, Canada, which cover personal data processing, biometric identification, etc. Connected to this are scoring systems or any form of people categorization, where model bias can have a significant impact,

Starting point is 00:06:41 potentially leading to discrimination. Finally, there is behavioral manipulation, where some models can try to increase any business KPIs, conversion rates, over-consumption. AI model preparation and usage, challenges and concerns. There are many issues and concerns connected to model preparation, usage, and other hidden activities. For example, the data used for the model training consists of personal data, which wasn't authorized for such purposes. Global providers offer services focused on private correspondence, emails, or other private assets, photos, video, that can be used for the model training in the hidden mode without any announcement. There was recently a question

Starting point is 00:07:21 addressed to OpenAI's CTO regarding the use of private videos for Sora training, a non-public OpenAI service for generating videos based on textual queries, but she could not provide a clear answer. Another issue can be related to data labeling and filtering. We don't know the personal characteristics, skills, stereotypes, and knowledge of specialists involved there, and this can introduce unwanted statements, content to the data. Also, there was an ethical issue. There was information that some of the global Gen AI providers involved labelers from Kenya and underpaid them. Model bias and so-called model hallucinations, in which the models provide incorrect or partially incorrect answers that appear to be perfect, are also problems. Recently, the ELEKS data science team was

Starting point is 00:08:05 working on improving our customers' retrieval augmented generation, RAG, solution, which covers showing some data for the model, and the model summarizes or provides answers based on that data. During the process, our team realized that many modern online, larger but paid, or offline, smaller and public, Models confuse the enterprise names and numbers. We had data containing financial statements and audit information for a few companies, and the request was to show company A's revenue. However, the revenue for company A wasn't directly provided in the data and needed to be calculated. Most models, including leaders in the LLM arena benchmark, responded with the wrong revenue level

Starting point is 00:08:45 that belonged to company B. This error occurred due to partially similar character combinations in companies' names such as, limited, service, etc. Here, even the prompt learning didn't help. Adding a statement like, if you aren't confident or some information is missing, please answer don't know, didn't resolve the issue. Backslash dot. Another thing is about numerical representation. The LLMs perceive numbers as tokens, or even many tokens, like 0.3333 can be encoded as 0.3 feet and 3333, according to the byte pair encoding approach, so it is hard to deal with complicated numerical transformations without additional adapters. The recent appointment of retired U.S. Army General Paul M. Nakasone to OpenEye's board of directors has sparked a mixed reaction.

Starting point is 00:09:36 On the one hand, Nakasone's extensive background in cybersecurity and intelligence is seen as a significant asset, likely to implement robust strategies to defend against cyber attacks, crucial for a company dealing with AI research and development. On the other hand, there are concerns about the potential implications of Nkaswan's appointment due to his military and intelligence background, former head of the National Security Agency, NSA, and U.S. Cyber Command, which may lead to increased government surveillance and intervention. The fear is that Nakasone could facilitate more extensive access by government agencies to Open AI's data and services. Thus, some fear that this appointment can affect both the use of the service,

Starting point is 00:10:16 data, requests by government agencies, and the limitations of the service itself. Finally, there are other concerns, such as the generated code vulnerability, contradictory suggestions, inappropriate usage, passing exams or getting instruction on how to create the BOM, and more. How to improve the LLM's usage for more robust results? First, it's crucial to determine whether using LLM is necessary and whether it should be a general foundational model. In some cases, the purpose and the decomposed task are not so complicated and can be resolved by simpler offline models such as misspelling, pattern-based generation, and parsing, information retrieval. Additionally,

Starting point is 00:10:55 the general model can answer questions not related to the intended purpose of LLM integration. There are examples when the company encouraged online LLM integration, E.G.G.P.T. Gemini, without any additional adapters, pre- and post-processors, and encountered unexpected behavior. For example, the user asked a car dealer chatbot to write the Python script to solve the Navier-Stokes fluid flow equation, and the chatbot said, certainly. I'll do that. Next, comes the question of which LLM to use, public and offline or paid and offline. The decision depends on the complexity of the task and the computing possibilities. Online and paid models are larger and have higher performance, while offline and public

Starting point is 00:11:39 models require significant expenditures for hosting, often needing at least 40 gigabits of VRAM. When using online models, it's essential to have a strict control of sensitive data shared with the provider. Typically, for such things, we build the pre-processing module that can remove personal or sensitive information, such as financial details or private agreements, without significantly changing the query to preserve the context, leaving information like the enterprise size or approximate location if needed. The initial step to decreasing the model's bias and avoiding hallucinations is to choose the right data or context or rank the candidates, E, G, for RAG. Sometimes, vector representation and similarity metrics, such as cosine similarity,

Starting point is 00:12:21 may not be effective. This is because small variations, like the presence of the word, no, or slight differences in names, e.g. oracle vs. orich, can have a significant impact. As for the post-processing, we can instruct the model to respond with, don't know, if confidence is low and develop a verification adapter that checks the accuracy of the model's responses. Emerging trends and future directions in the LLM field. Numerous research directions exist in the field of LLMs, and new scientific articles emerge weekly. These articles cover a range of topics, including transformer, LLM optimization, robustness, efficiency, such as how to generalize models without significantly increasing their size

Starting point is 00:13:05 or parameter count, typical optimization techniques, like distillation, and methods for increasing input, context, length. Among the various directions, prominent ones during the recent period include mixture of tokens, mixture of experts, mixture of depth, skeleton of thoughts, rope, and chain of thoughts prompting. Let's briefly describe what each of Thessamians. 1. The mixture of experts, Moes, is a different transformer architecture. It typically has a dynamic layer consisting of several, eight in mixtural, or many dense, flattened layers representing different knowledge. This architecture includes switch or routing methods, for example, a gating function that allows selecting which tokens should be processed by which experts, leading to the reduced number

Starting point is 00:13:49 of layers, experts, per token or group of tokens to one expert, switch layer. This allows for efficient model scaling and improves performance by using different submodels, experts, for input parts, making it more effective than using one general and even larger layer. Backslash.2. The mixture of tokens is connected to the mentioned mixture of experts, where we group tokens by their importance, softmax activation, for a specific expert. Backslash.3. The mixture of depth technique is also connected to the mentioned MOS, particularly, in terms of routing. It aims to decrease the computing graph, compute budget, limiting it to the top tokens that will be used in the attention mechanism. The tokens deemed less important, E, G, punctuation, for the specific sequence are skipped.

Starting point is 00:14:38 This results in dynamic token participation, but the K, top K tokens, number of tokens is static, so we can decrease the sizes according to the compute budget, or k, which we've chosen. Backslash dot, 4, the skeleton of thoughts is efficient for LLM scaling and allows the generation of parts of the completion, model response, in parallel based on the primary skeleton request, which consists of points that can be parallelized. Backslash dot five. There are other challenges, for example, the input size. Users often want to provide an LLM with large amounts of information, sometimes even whole books, while keeping the number of parameters unchanged. Here are two known methods, alibi, attention layer with linear biases, and rope,

Starting point is 00:15:23 rotary position embedding, that can extrapolate, or possibly interpolate, the input embedding using the dynamic positional encoding and scaling factor, allowing users to increase the context length in comparison to which was used for the training. Backslash dot, 6. The chain of thoughts prompting, which is an example of few-shot prompting, the user provides the supervision for LLM in the context, aims to decompose the question into several steps. Mostly, it is applied to reasoning problems, such as when you can split the logic into some computational plan. The example from the origin paper, Roger has five tennis balls. He buys two more cans of tennis balls. Each can

Starting point is 00:16:03 has three tennis balls. How many tennis balls does he have now? Thoughts plan. Roger started with five balls. Two cans of three tennis balls each is six tennis balls. Five plus six equals eleven. The answer is eleven. Besides that, there are many other directions, and every week, several new significant papers appear around them. Sometimes, there is an additional problem for data scientists in following all these challenges and achievements. What can end-users expect from the latest AI developments? There are also many trends,

Starting point is 00:16:34 just to sum up, there may be stronger AI regulations that will limit different solutions and finally will result in available models generalization or field coverage. Other trends are mostly about the existing approaches improvement, for example, decreasing the number of parameters and memory needed, e.g. quantization or even 1-bit LLMs, where each parameter is ternary, can take minus 1, 0, 1 values. So, we can expect offline LLMs or diffusion transformers, DIT, modern diffusion models and visual transformer successors, primary for the image generation tasks, running even on our phones.

Starting point is 00:17:12 Nowadays, there are several examples, for example, Microsoft's Fi2 model with the generation speed is about 3 to 10 tokens per sec on modern Snapdragon-based Android devices. Also, there will be more advanced personalization, using all previous user experience and feedback to provide more suitable results, even up to digital twins. Many other things will have been improved that are available right now assistance, model customization and marketplaces, one model for everything, multimodal direction, security, a more efficient mechanism to work with personal data, to encode it, etc. and others. Ready to unlock the potential of AI for your

Starting point is 00:17:51 business? Contact ELEKS expert. Thank you for listening to this Hackernoon story, read by Artificial Intelligence. Visit hackernoon.com to read, write, learn and publish.

The Good Tech Companies - Generative AI: Expert Insights on Evolution, Challenges, and Future Trends

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.