A presentation at THAT Conference, WI 2023 in in Wisconsin Dells, WI, USA by Matt Williams
The Cost of AI and how to solve it locally 1
2
…is experiencing… is experiencing 4
…is experiencing… …a revolution a revolution. - everything different - new people excited - foaming at mouth for next thing - not overnight - took long time to get here - lots of places to start - up until recent, it was about 5
First there was better search better search - search getting better and better - we have gotten better - at asking questions 6
We could use: And we have had a lot of options to use. Some better and some worse. 7
We could use: Google The first to get folks really excited was Google. Sure there were others, like Yahoo, but they all sucked 8
We could use: Google Bing Then there was bing. Lots of money…and…well… 9
We could use: Google Bing Duck Duck Go And for the privacy focused there was duck duck go - many other options - my favorite - neeva - rank domain importance 10
To get better answers for every question And search engines offered us help to get better answers for every question…as long as we asked a good question 11
But search engines can only go so far But they aren’t perfect. They try to find the answer to what we ask - only as good - as question we ask 12
Sometimes our question shows some bias but sometimes our question shows some bias. - what bias 13
and the answer may reflect that bias back to us and the answer reflects that bias back to us. in this case the bias is our own and we control it. Interestingly some have seen that the more educated someone is, the more bias is seen in the questions 14
Search engines also focus on finding the source of information The other big problem with search engines is that they are built to find the authoritative source of information on what your searched for 15
and not the answer to the specific question they do not answer the question, but rather point you the way to a place where you can find the answer. 16
but now there is AI but now we have AI 17
AI tools answer the question 18
Will AI replace the role of the search engine? 19
20
21
chatgpt was the first to get everyone excited about AI 22
Can it answer everyone’s questions? 23
A lot of folks think so 24
25
Matt Williams evangelist @ infra @technovangelist Welcome to the session. My name is matt Williams and I am an evangelist for InfraHQ. You can find me on all the socials like Twitter and threads and GitHub and all the others as technovangelist In this session, I want to talk about how exciting all of these brand new AI services are, but I also wanted to look at some of the problems they introduced, and at the end we will be looking at a solution to these problems, and that solution is running these models locally. It turns out there are a few different ways. And I work in a team focused on building the best option. It’s a free and open source solution called ollama. but before we get there, lets talk about the problems. 26
AI is not new My first intro to ai was intro to ai programming at FSU. The current wave of ai started with the transformers paper in [[year]] this is a long road that we are on 27
and as with any long road that road is twisty and you can’t always see whats around the corner but its getting exciting. And as with any long twisty road 28
Sometimes a rock falls right in front of you, or even crushes your car. 29
Users are asking and trusting without verification users are asking questions without context and getting crushed. I have heard from so many who think AI is a fad because it gets the answers wrong so often. But AI is like any source. You should always verify the answers you get. But not everyone wants to do that. 30
a lawyer looked for precedent & believed what he saw At some point recently, an airline passenger was hurt by the foodcart on an Avianca flight. So he sued. The lawyer researched the issue and asked chatgpt to write an affidavit with examples that showed precedent for his client. ChatGPT found 6 examples. And the lawyer submitted it. Only problem is that none of the examples existed, chatgpt made it up. It answered the question it was given. Lawyer Used ChatGPT In Court—And Cited Fake Cases. A Judge Is Considering Sanctions 31
they are even submitting their data So if context is what is needed, folks are coming up with solutions that provide context. Obsidian has a plugin called copilot, as well as others. they submit the markdown documents to help find more connections in notes 32
personal data means personalized results This is good, right. Personal data means personalized results. how could anything go wrong 33
but that can also have bad outcomes Well it did. Two engineers working at Samsung submitted information about a project they were working on internally. Someone later asked a question and learned about an internal project name at Samsung. Since then Samsung has banned the use of chatgpt. because the data in your prompts and the answers go back to helping make the model better. 34
and it can potentially skew future results but sometimes data is injected purposefully to direct users to misinformation. Data poisoning. 35
scroll down… Another issue is about who owns what you generate in AI services. One of the really great online tools is Midjourney. In fact, apart from website screenshots, every image in this deck was created in midjourney. but if you aren’t paying, then you don’t own the assets. read this. 36
37
I mentioned there may be a solution 38
its easier and harder than you think 39
the solution is local 40
If you are the type to think about going local 41
you have probably seen this… 42
this is a “simple” example server example … with 1300 lines of code. Llama.cpp was one of the first libraries created to make running Large Language Models locally possible. Most of the tools out leverage it. but it will take more than our hour to go through getting started with it. 43
We are building another tool 44
and we are learning from every other tool out there 45
Let’s look at some of the alternatives 46
some projects are not long lived 47
but first… 48
some definitions 49
LLM define it. large language model. 50
tokens tokens are the word parts 51
weights determine relationship in a multi dimensional space 52
inference, quantization, batching, attention, and more 53
54
How to use with Python • If you are already using Python, go to the next slide • If you have done anything with Python in the past but not an active Python dev, your environment is probably broken • good luck in fixing it 55
How to use with Python – slide 2 • If you already have Jupyter setup, skip to slide 3 • If you don’t, good luck • Choose an environment, conda, anaconda, miniconda, miniforge. • Find your favorite conflicting guide to install it • Find another because the first few won’t work 56
How to use Python – slide 3 • Find a random Python notebook that you can run thru and hope there are no errors. • There will be errors and yes, they are cryptic 57
How to use Python slide 4 • If you like this process… Congratulations You are a Python developer 58
An easier method • One popular solution is openplayground • Make sure you have a good python environment • pip install openplayground • Fix the errors 59
60
Now what? 61
Oobabooga text-generationwebui Started in December 62
you got your python environment working, right? 63
64
Show finding models Editing prompt editing parameters 65
DEMO 66
67
DEMO 68
Introducing Ollama 69
Goals of the project • Easy for new users • Simple run command • Simple UI (coming soon) • Powerful for developers • Not just for Python • Easy install (Apple Silicon Today, Intel, Windows and Linux soon) • The easiest distribution model • Free and open source forever 70
Using Ollama • ollama run orca • ollama run llama2 • ollama run vicuna • ollama run orca “why is the sky blue” 71
Based on Docker layers • Layer for the model weights • Layer for parameters • Layer for prompts • Layer for LORA • Layer for other functionality • Layers are reused as needed • Updating a base layer doesn’t break the model 72
Customize with Modelfile FROM llama2 73
Create and run the model ollama create amazingimages –f ./amazingimages ollama run amazingimages 74
Customize with Modelfile FROM llama2 SYSTEM “”” You are an artist with a way with words. Every prompt will be a simple idea. You will transform that simple idea into a prompt that an image generation tool like MidJourney can use to visualize the idea. You will expand on the idea, provide new and interesting details and add excitement to the description. Turn every simple idea into an explosion of amazing inspiration. Never just provide a definition of the phrase. Always produce a visual description of the idea, and prefer to describe an analogy rather than a literal description. Just output the text. Never include emojis in the output. Also include the colors of the image, and the style. Try to describe the most visually appealing image. “”” 75
Customize with Modelfile FROM llama2 PARAMETER temperature 0.9 SYSTEM “”” You are an artist with a way with words. Every prompt will be a simple idea. You will transform that simple idea into a prompt that an image generation tool like MidJourney can use to visualize the idea. You will expand on the idea, provide new and interesting details and add excitement to the description. Turn every simple idea into an explosion of amazing inspiration. Never just provide a definition of the phrase. Always produce a visual description of the idea, and prefer to describe an analogy rather than a literal description. Just output the text. Never include emojis in the output. Also include the colors of the image, and the style. Try to describe the most visually appealing image. “”” 76
DEMO 77
Matt Williams evangelist @ infra @technovangelist https://ollama.ai YouTube.com/technovangelist Welcome to the session. My name is matt Williams and I am an evangelist for InfraHQ. You can find me on all the socials like Twitter and threads and GitHub and all the others as technovangelist In this session, I want to talk about how exciting all of these brand new AI services are, but I also wanted to look at some of the problems they introduced, and at the end we will be looking at a solution to these problems, and that solution is running these models locally. It turns out there are a few different ways. And I work in a team focused on building the best option. It’s a free and open source solution called ollama. but before we get there, lets talk about the problems. 78
that.land/3NOV9zA 79
A new tool comes out daily offering magical answers to every question thru AI. But at what cost? Samsung learned the hard way that anything you share, ChatGPT will share for the next person to find. Midjourney and other tools make it obvious that if you are not paying, you don’t own the results. Now companies are limiting their employees from using the services or risk dismissal. So what are people to do? One option is to run the tools locally. But do you know how? In this session we will look at how Large Language Models work at a high level and the dangers posed by them. Then we will review a number of the great tools out there to run models locally. We will end the session looking at Ollama, a new LLM runner that is changing the game by applying Docker technologies to this new world.
The following resources were mentioned during the presentation or are useful additional information.