July 1st, 2024 × #AI#LLM#Tokens
Do More With AI - LLMs With Big Token Counts
Discussion on using large language models with greater token counts to provide more context, allowing for better and more complex outputs to aid software development.
- Sanity experimental feature allows showing potential solutions for errors
- Context window is number of tokens model can access at a time, limiting previous questions
- More tokens gives models greater sense of context for better output
- Gemini 1.5 Pro has over 1 billion token context window
- Generated Swagger docs from SQL schema for restaurant API
- Entire codebase in context allows focused questions about code
- Knowing expected output allows models to aid development
- Generated complex fake data for testing without much effort
- Summarized 8 hour conference transcript with timestamps
Transcript
Scott Tolinski
Welcome to Syntax on this Monday hasty treat. We're gonna be talking about LLMs, large language models, with a greater context window or or greater amount of tokens that you can use and why that makes a difference and what you can possibly do with that given you're a software developer. And, well, we kinda work with a lot of text. Right? So you might think that these tokens are advantageous to have more of, and that's definitely the case. So we're gonna be talking a little bit about, AI solutions that can help you in your development just because you can give them more data and why that matters. My name is Scott Tolinski. I'm a developer from Denver.
Scott Tolinski
And with me today, once again, is CJ. What's up, CJ?
Guest 1
Not much. Not much. How are you doing, Scott? Oh, I'm doing good. Just,
Scott Tolinski
you know, context here. We just came back out of our holiday weekend. I was doing some grilling, some some dad stuff. I was at the pool. I was grilling, made some ribs. I know you're not not a meat guy, but, man, CJ, I I'm Node Wes Bos myself. He's he's definitely a grill master. I'm a a grill novice here.
Scott Tolinski
But I made some pretty day good ribs. So I'm a little proud of myself, for for doing that Memorial Day thing.
Scott Tolinski
How how how are you doing?
Guest 1
Pretty good. Yeah. We we had a little get together on Saturday, and I I grilled some plant based burgers and baked based,
Scott Tolinski
brats, which Wes really good. Mhmm. Yeah. The weather here has been awesome, and, it's fun to just just chill and grill for sure. Chill and grill. I love it. We're gonna be chilling and grilling today on on syntax. But before we do, let's talk a little bit about some really neat AI features that Sentry has. This podcast is brought to you by Sentry.
Scott Tolinski
And believe it or not, AI is really good at doing a few things. And one of those things is solving bugs. In fact, before this feature existed in Sentry, I would just I have a I I have a giant, you know, vomit of error log, and I'd say, I have no idea what any of this is. Let me just copy this, paste it into, chat g p t, and say, hey. Can you make make sense of this? Give this to me in English, please. And Sanity actually has a really neat experimental feature Node, which allows you to look at an error and say, give me a potential possible solution for this.
Scott Tolinski
Given that you know a lot about software errors, let's go ahead and see if you could potentially solve that for me. And it's it's really pretty neat. It it you know, it's an experimental feature, but it's one that I almost always try first before doing anything else. Because chances are, it gives you a proposed solution as well as a problem description that lets you kinda know what's going on and then tells you if Sanity can try to fix it for you, potentially lead you in the right direction, which bugs are gonna happen in your software. Sentry makes it easy to find and fix them. So let's get into LLM's big tokens.
Sanity experimental feature allows showing potential solutions for errors
Scott Tolinski
1st and foremost, token.
Scott Tolinski
What does a token mean to a novice? Somebody who's coming in here. They've only used chat GPT.
Guest 1
Yeah. So, it is a unit of tech. So, I mean, I like to think about it as, like, a word. I guess it it depends on the LLM that you're using. Could be, like, a single character or a whole word. But you might think, of of any block of text, Count the words in it. That could be the token count. Or count the number of characters in it. That could be the token count.
Guest 1
And so, essentially, the model, when it's processing it, it's going to break down your prompt and anything you you provide in that text there into tokens so that it can process it. And you've probably seen any AI tool that you come across, like ChatGPT or Claude or Perplexity, or we're gonna be talking about, Gemini 1.5 Pro, they all have this this token limit. So that's essentially the maximum number of tokens that you can provide in both your input and the output that that comes from it.
Scott Tolinski
Yeah. And there's some really neat little tools if you wanna visualize this stuff online, and I'll post a link to one of them. But it depends on the the model, like, in terms of, like, what a token stands for. So if if you just type in token into the GP 4 token counter that I've posted here, it's 5 characters, but it's actually 12 tokens, which is interesting. Right? Because it can just be a a simple small unit, but it isn't just that simple. So, you know, it depends on how the model itself is is understanding, you know, perhaps sub words or individual characters and how those maybe even full words, how that stuff all fits together. And it's really interesting. I didn't necessarily think about it that much because I'm usually just typing and hitting enter. And if the language model says, hey. You've given me too much.
Context window is number of tokens model can access at a time, limiting previous questions
Scott Tolinski
It will say, hey. I I I cannot accept anymore. Or when it's giving its output, it'll just cut off the output mid sentence. Gonna be left with being like, oh, great. Thank you, which is all great when you have, like, a big long code example and you say, hey. Can you do something with this code example? And it gives you, you know, a quarter of the the file or something to say, okay. Now give me this bit here. No. No. No. Give me this bit here. So the having a a larger context window or a larger amount of max tokens for any Node, as as you're gonna be able to see in this episode, it's definitely advantageous to us as software developers.
Guest 1
Definitely. And I think the other thing to think about is this this context window whenever you're let's say, most people have probably interacted with ChatGpT a little bit. So the way to think about this is, let's say you're having a conversation with ChatGpT, and you give it a prompt. It gives you a response. You do another prompt. It gives you another response. The context window is basically the the token limit. And so for GPT 3.5, I believe, it's, like, roughly 4,000 tokens. We'll get we'll try to get a table of, like, exact token counts. But think of a a token limit of 4,000 tokens. That means that the model can only look at 4,000 tokens at a time. And so as you ask it each new question, technically, it can only go back in time 4,000 tokens. So it's kind of losing context for previous questions that you asked it or previous output that it gave. And so that's kind of what you have to worry about when you have a smaller token count in a in a smaller context window.
Scott Tolinski
And understanding that can really prevent frustration when when working with these things. I know Node personally if if like, I've given GPT for some context.
Scott Tolinski
And then, like, I've told it, here's what I'm working on. This is the type of thing I expect from you. I want, like, I want snake case. I want TypeScript. I want Svelte. I'm using a Node version Wes, and then whatever.
Scott Tolinski
Like, 20 messages later, it starts spitting out require statements. I would say, e s m mother ever. Like, just give me that. Like, that's I I actually do occasionally go off the handle a little bit when I'm I'm talking to these things. I'm gonna be 1st on the line when the robots come for us because of the language that I've used against some of these things. But, you know, I I I think that even understanding that, knowing that you have a window of context here is is gonna be good for how you use them in in understanding. Because context does a lot of things for an LLM. Like like what I just said, if you if you tell it your code Bos of Node version 20, it's TypeScript, whatever, it it has that context for when you're it's outputting your your your answer.
Scott Tolinski
Being able to store more things into that context gives the LLM a finer grain understanding.
More tokens gives models greater sense of context for better output
Scott Tolinski
Man, I keep using words like understanding and knowing, knowing that it doesn't know or understand anything. Yeah. But it it gives the LLM the ability to have that that greater sense of context when it's giving me the output. So you can get better results, more fine tuned.
Scott Tolinski
Just overall, the the larger token limit you have, the more context you can give the LLM, the better responses you can get.
Guest 1
Definitely. And it also ties into preventing hallucinations.
Guest 1
So that's when the output is it just completely made something up or it's, like, not even related to what you're asking it. So, like, an example that that Scott gave, if your very first prompt JS, this is Node version 20 in ESM, for and let's say so I just looked up the table here. And if you have we have GPT 3.5 Turbo.
Guest 1
They have roughly a context window of 16,000 tokens. And so let's say you've had some back and forth with with this Node, and now you're there's maybe, like, over 20,000 tokens even in this conversation, the next prompt no longer has that context of that very first message that said Node version 20 and, ESM. So now it might output, like, require statements, and it might do something from, like, Node version 16. Like, it's gonna it's not gonna have that that context anymore, so it's it's gonna, like, hallucinate things or do things that you didn't expect. Yeah. You could think of it as, like, a virtualization window. You have a, like, a window,
Scott Tolinski
and as you scroll down or in, like, video games, as you look left and right, the stuff outside of the window almost, like, does not exist.
Scott Tolinski
So the more token count you can have, the larger of a visibility window you have essentially for for the data of that if that tracks for how you visualize things.
Gemini 1.5 Pro has over 1 billion token context window
Guest 1
Yeah. And, we'll share this table in the show notes. But, basically, it has all of the models that you can use with, specifically, OpenAI. I know that's one of the most popular ones and that people use. But, like, specific specifically, GPT 3.5 Turbo can support, 16,385 tokens.
Guest 1
But GPT 4 Turbo can support a 128,000 tokens, so that's much larger.
Guest 1
And then the latest GPT 4 o, which is Omni, apparently, it has the same context window, but it's potentially fine tuned a little bit more. But with GPT 4, if you're paying for, like, the GPT plus, your max context window is 128,000 tokens. I guess, in contrast, we can look at Claude. And I have you used Claude before? I've only used it a couple of times. Yeah. I pay for
Scott Tolinski
Claude.
Scott Tolinski
So Claude and ChatGPT are the 2 that I pay for. So I use Claude 3 Opus, but I think I'm going to cancel because I I don't know. It you know, it's it's so funny with these things. You're you you get that feeling. Are these getting worse, or am I just getting used to it? Yeah. And and that's kinda how I've been feeling with Claude lately Wes I'm I'm found myself arguing with it a lot more. So I typically do use the Cloud 3 Opus. It's the one you can get via chat.
Guest 1
And, yeah, and that one looks like on their pricing page, they're saying that it has 200,000 tokens.
Guest 1
If you're using, Cloud 2.1, it also has 200,000. And then Cloud 2 ESLint o has a context window of about a 100,000.
Guest 1
So a fairly big, and I I think if you're at least using 2.1 or 3 ESLint o, it's it's almost twice the size of GPT 4. So it definitely gives you a much bigger window to make sure that you maintain that context in your in your prompts. Yeah. Yeah. I've always liked that you can paste in
Scott Tolinski
large files, and it will read the entirety of them, or you can paste in the PDF or something. No problem. Definitely. So, also, in addition to that, you might be wondering, like, okay. Is this how does this relate to input length? And input length is really, like, how many things you can give it at once. Now you might not know this, but if you work with the APIs, and this isn't true for every LLM, but if you work the APIs, you typically get a larger input length. They tend to limit the input length of the UI. For instance, Chad GPT, 4 GPT 4, and I'm speaking about not the the latest GPT 4. But as an example, GPT 4 has a context length of 8,129 Wes 92. Right? But they capped in the UI, they capped the input length to 2048 because the total input output cannot exceed the context length. So what they don't want is people giving it 8,000 tokens of input and then only having a 192 tokens for output. Right? Because you Scott there's there's just straight up, basic algebra there. Yeah. So 8192 minus 2048,
Guest 1
that's maths. And whatever that value is, I can't do in my head. That is the maximum output that the the AI could give you for that specific prompt if it was 2048.
Scott Tolinski
Yep. So let's talk about models and services with big token counts. Now CJ just put me on to Gemini, which is funny because it's Google. Google's one of the biggest companies in the world. I had not even considered using Gemini.
Scott Tolinski
I don't know why, but Gemini 1.5 and it's not all Gemini Node point five because there's different versions of Gemini 1.5. But the Gemini 1.5, I believe, the the pro is that the one? Let me double check. Gemini 1.5 pro.
Scott Tolinski
Yes. Gemini 1.5 Pro has a context window of 1,000,048,576 tokens, and that's a whole lot of tokens. That when you told me that, I think my hat Wes, like, like, it just shot off my head. I was like, oh, that's a a lot more than I'm used to. And since then, I've been I've been given Gemini 1.5 a ESLint, and I gotta say, there's some coding tasks for this that I found to be really a joy to work in because they're simple tasks. I don't have to worry about doing too many too many things complexly with the large language model Wes, in fact, like, you can give it some mundane tasks with a lot of text and a lot of data, and it's stuff that LLMs do really well at. So I thought we would take some time or CJ thought we would take some time to go through and talk about maybe some stuff that we've done or stuff that this is well suited for.
Generated Swagger docs from SQL schema for restaurant API
Guest 1
Definitely. And, just to put this in context, like, a 1000000 tokens is 10 times more tokens than GPT 3.5 tur turbo can handle or 5 times more than, like, Cloud 3 ESLint o. So, like, we're dealing with, inputs and outputs that are, much much bigger than than than those models specifically. But one of the the first things I tried with this was I I wanted to generate some OpenAPI documentation or or Swagger docs, if you've ever heard of those, given a database schema.
Guest 1
So I had the SQL data definition language for a complex database, has, like, roughly 12 or so tables, and I wanted it to generate an API based on that data. And, it did really well. And so one of the things that kind of, like, floored me initially is just how big the responses can be.
Guest 1
Now I think there are sometimes, there are time out issues. Like, sometimes, it'll be responding, and then it'll just stop just like you you might get with ChatGPT where it's, like, it's giving you a response, and then all of a sudden it just Scott. And then you have to say continue or tell it to to keep outputting. In my experience with Gemini 1.5 Pro, it was able to output for, like, almost 2 minutes straight of just, like, giving me all the all the stuff.
Guest 1
And they may or may not change that because that could technically be, like, an API rate limit thing where it, like, maybe at a certain point, they just completely stop it from generating. But I do wanna show you, what was generated here. So I have, if needs to describe all of the schema, so all of the possible data and things that will be involved in inputs and outputs.
Guest 1
And so given the SQL code that I gave it, it output a schema for category and city and comment. And, specifically, this was around a restaurant API. So if you've watched the video that I did on the Syntax YouTube channel about, Drizzle, I showed a complex database there, but I wanted to see, could could Gemini output, some API documentation automatically based on that database schema. And so, it's pretty slick. If if you're watching the Video Node, basically, within Swagger Docs, we can see for every type of thing, it has a specific schema. This is, like, in line with the schemas that are actually in the database. And then we can see all of the different route groupings. So we see endpoints for restaurants, for menu items, for offers.
Guest 1
And then what's cool about this is each one of these also has, like, validation for for input. So if we take a look at the post request for restaurants, restaurant ID, and then menu items, This has a specific schema that describes what a menu item would have. And then whenever we're making a post request to this endpoint, the Swagger UI is gonna basically give us that schema, give us some examples, and and we can kind of, like, play around with the API as well. So the API doc definition itself is roughly 1400 lines of code, and Gemini 1.5 pro was able to output those 1400 lines in a single response. I did kind of do some back and forth because, initially, it gave me, like, a very flat schema that was just, like, Node route grouping for every table. But then I was like, well, maybe we should group these together JS, like, menu items by restaurant. And so after it output the full one, it then after I asked it to reconfigure it, it outputted another full one, but with more context of, like, oh, okay. So let's actually, like, group these these endpoints together. I tried a similar thing in just chat g p t four, and it could never get past just, like, generating schemas and then, like, one endpoint. It would just stop. It would it it wouldn't Yep. It wouldn't generate anymore. And, so, yeah, this this was pretty cool for me because I'm just imagining the working in much larger code bases. Right? So, like, this is giving me a if if there if this if I were building this app for a customer or something like that, this is a huge jumping off point. Right? Like, the doing the the mending work of, like, thinking of what are all the route endpoints and what are the schemas, it's done that. And then now I can go in and do the the more complex work of, like, implementing these endpoints or even further ESLint the the AI to to implement some of these endpoints for me as well. Wow. Yeah.
Entire codebase in context allows focused questions about code
Scott Tolinski
I think these are the types of things that this is really well suited for. I could imagine giving it a whole code base and, like, having that in the context. Here is an entire library. The library is now in your context.
Scott Tolinski
Let me ask you a bunch of questions about this library. One thing I did recently was I've been working a lot in JS doc style typing because I've been doing I have a a big mono repo I'm working on, like a a SvelteKit starter.
Scott Tolinski
And the SvelteKit starter has, like, a really interesting little package system where I have local packages that you can pick and and throw into your project really easily as part of the the starter process of it. Right? If I'm giving a blog, it's going to pick several packages and throw them into a local Node repo.
Scott Tolinski
But I didn't want those to be TypeScript. I didn't want there to have to be a build process there. So I've been getting into JS doc style typing. The only thing is is I'm not great at JS doc style typing. I've never really done it a whole lot.
Scott Tolinski
So I've been working with big files. The the repo, by the way, CJ JS called drop in. Drop in? Okay. Yeah. Drop in is a skateboarding theme starter for SvelteKit, and it's, it's pretty neat. But, again, what I was able to do is take a local package, and I was able to run the whole file. And given that it has a lot of context and say, hey. Just JSDoc type this for me. And because I'm working in single individual big files, I didn't have to do it in a step by step breakdown.
Scott Tolinski
I'll put the types for me. The types were perfect, correct, looked great. And, not only did I get JS doc types out of it, I understood a little bit about that syntax and what's going on there. So it was a nice little learning experiment.
Guest 1
Definitely.
Guest 1
And, I mean, for me, that's huge. And because you mentioned this earlier, like, maybe you paste a library for the in the context. Or what I was thinking is you could even paste documentation for, like, newer libraries. I think one of the Yep. Issues that, like, ChatGPT has or any of these other AIs have is that they've been trained on older data. So if you in for instance, if you ask it for help generating, drizzle schema, it's gonna hallucinate most of the time because it actually doesn't have the the docs or the latest info about it.
Guest 1
But if you can simply just paste in the documentation or the API examples as part of your prompt, because now you have, like, a huge context window, now you can ask it to do things that it wasn't necessarily even trained on, and you don't have to, like, generate the embeddings for it. You can literally just include all of the the things you wanted to know about in your prompt itself, which is slick. Yeah. It's super slick. And that to me is great because that's where these things should be, you know, should be existing for us right now in this space. I I think too oftentimes people look at this stuff as it's gonna take your job because it's gonna do everything for you. And it you know, Node I know this conversation has been had to death. But Yeah.
Scott Tolinski
The way I see this working really well is that you know what you want. You have all the context in the world. You can get help getting what you want faster.
Scott Tolinski
You can read through it. You can modify it. You can adjust it. You know what you're getting. But if you don't know the output that you're expecting, if it just outputs some code, you're like, yeah. Copy and paste. Throw it in there. Then it it it's typically not gonna go super well for you. So being able to know exactly, like, what you want out of these things, what the output should look like and should be in the types of code you want, being able to give it all the context in the world will only aid you in being able to better get that out.
Guest 1
Definitely. And I think that's probably the the for me, that's why I am getting benefit out of AI Node. It's like for for these little, like, simple, like, one off, like, solve this leak code or just, like Yeah. Build a to do app. Like, it's it's not useful for me, but I know what I wanna do. I just want it to give me a kick start. Right? So, like, do a lot more that that would, for me, just be, like, very mundane work, that it could figure out very quickly.
Knowing expected output allows models to aid development
Scott Tolinski
Give me this regex, please.
Guest 1
Exactly.
Guest 1
Yeah. That's the big one. And, so the next example I have that I used was generating seed data for a complex database. So Yep. So if you saw over on the Syntax Channel, I did a a video about Drizzle and basically implementing a complex database schema.
Guest 1
And I I'll be honest. I wrote the schema myself. I didn't use AI for it. So I I actually, like, typed out all the code. But what I did use AI for was generating the seed data. So if if you take a look at this repo, it's, github.com/w3cj/bitedash.
Guest 1
And then if you go into the seeds directory under DB, there's a data folder, and then I have a bunch of JSON files. Each one of these JSON files was generated using the Gemini 1.5 Pro. So I, basically, I I pasted in my SQL schema.
Guest 1
I then told it that I was looking to seed data here, and I told it to output as JSON so that I could then write my own code to import these JSON files. But for instance, if we we look at like, the simplest one is categories dot JSON. It's very simple. It's just an array with 6 categories. We have appetizers, lunch, dinner, salads, sides, and desserts.
Guest 1
Easy enough. But it output that. I didn't have to think too hard about what are the different kinds of categories of food. But what was really cool is I told it to generate restaurants as well. And so, and then from there, generate restaurant items. So if you look at this restaurants Scott JSON file, this is, over 1600 lines of code in here. And, basically, for each restaurant, it has a street address, a ZIP code, a city name. And I told it to come up with fake restaurant names. I didn't wanna use anything in the real world. So it that's that's one thing that AI is good at is kind of just coming up with interesting things that you don't have to ideate yourself.
Guest 1
And then from there, it created menu items for each one of these restaurants that were, like, themed to that restaurant. And so in if we were to do this ourselves, there are there are tools like Faker, and there are other tools that can, like, generate this kind of data. But what's nice about this is it's like it looks like real data. Right? These look like legitimate menu items and prices and ingredient lists and stuff like that.
Guest 1
And, it was able to generate all of that for all of my restaurants all in a single go. I I think, actually, for this one, I did kinda do it, like, 1 restaurant on a time because trying to generate all of that at once, it potentially could've caused some some issues. I definitely had some back and forth, but I first had it generate a list of restaurants, and then I was like, alright. For this restaurant, generate 20 menu items that fall into these categories. And so you can take a look at this repo of all these JSON files. Each one of these was just generated using AI. And I think the just having this much data generated automatically to be able to, like, test out your app without needing to, like, run everything through through Postman initially, it was it was pretty sweet.
Generated complex fake data for testing without much effort
Scott Tolinski
Yeah. Yeah. It it that's a such a great use case because e like, even advanced usage of faker, you're still having to code a lot to get good fake data out of it. You're having to have an understanding of of faker, but also, like, the specific methods you need to get the specific types of data. And this too, you could give it information to say, I want a variety of lengths of descriptions. Right? Because sometimes when you're working with fake data, gives you kind of uniformly linked things or maybe perhaps you like you said, you want text that is not lorem ipsum, but it is filler text, but it can be about a fictional restaurant. AI is great at making that kind of stuff up. And then that way, when you're coding out your design, you're never gonna be in a situation where, you know, real data comes in and all of a sudden the design can't handle it because you've been able to throw in really good fake data.
Scott Tolinski
Another thing you mentioned is being able to summarize a massive amount of video. This one to me was like, this is the killer use case for me when you told me about this. Definitely. Because, I mean, we've talked about this a lot. Like, we have
Guest 1
episode well, summaries that get generated by AI. But I guess I don't know the exact details of it, but I know when Wes was talking about it a while, like, back, he basically had to create summaries of summaries because the context window is always so big.
Guest 1
But for this, I literally pasted in an 8 hour transcript of React Scott day 1, and it spit out a a summary with chapter markers and summaries of, like, each talk that that were output. And this was crazy for me. So, so I have a YouTube channel called Coding Garden where I do live streams. And one of the things I've wanted to do for the longest time is to take a 4 hour or 5 hour stream and then easily summarize it, easily come up with, like, timestamps that I can post on YouTube. And I've never been able to find an easy way to do that with with AI.
Guest 1
And this can do it with GBT 1.5. So let me pull up the example really quick. Jeez. If you're watching the VideoPod, you can see the example here where I literally so the the prompt I give it is, given the following autogenerated transcript of day 1 of React Scott 2024, and then I paste it in the the transcript. And, you can see in the token count, this is 348,970 tokens Wow. For for my overall, like, prompt series here. So this is huge.
Summarized 8 hour conference transcript with timestamps
Guest 1
And, initially, I just I just tried this. I just said, generate a summary with timestamps given this transcript, and it hallucinated a bunch. Like so the all I gave it was the TypeScript, and it would output, like, random timestamps. It would come up with people's names.
Guest 1
Like, I came up with, like, 3 different names for Dan Aberbov. Yeah. Because I think, one of the issues here is you're dealing with auto captions from YouTube. So it doesn't necessarily have the right spellings or whatever else.
Guest 1
So to really make it work, I provided even more context. So, basically, we have the full transcript. And then right after that, I also plugged in let's see.
Guest 1
Yeah. So this is what I said. I said, the transcript might have names or technical topics wrong since it's auto generated.
Guest 1
The agenda talks and correct speaker names for the day Vercel, and then I gave it Perfect. Things. Yes. Yeah. So I I Bos basically pulled this from the React comp site, and I said, these are the speakers. These are the talks that they gave. And so now it has a little more context for how to to fit this, like, unstructured transcript data into a more, like, structured summary. And that's why context matters because Yeah. That context is really what enabled this to
Scott Tolinski
actually be usable because without that, it wasn't in fact, one of the things we do with the syntax one is that Wes and I both had specific sayings that were unique to us that we use FFmpeg to attach onto an episode.
Scott Tolinski
And so the AI knows that Scott says something about cheese on the moon or something, and then Wes says something about purple something else that don't sound like each other. So that way, the context is there, that this is Scott. This is Wes. The one that we always have the hardest time with even with context is getting century to be spelled Scott like century. Like, this is a, 10 years. You know? Yeah.
Guest 1
Definitely.
Guest 1
And I I again, I think I think it's it's, basically, we we've taken an output that had a lot of hallucinations and that was like, okay. I'll try my best, and we've kind of, like, started to align it. And with bigger context windows, this is this is something you can you can start to do definitely. And another piece of context I gave it were the timestamps of the start of each talk. I also tried it without giving it this, and it actually did a pretty good job of figuring out when a talk started.
Guest 1
But by giving it this context, we're basically saying the intro starts at 16 minutes 13.
Guest 1
The talk on what's new in React starts at 2 hours and 28 minutes. That gave it even more of a box to fit into. So that way, in the actual output, each of the sections, it knows to start at that specific time stamp and then kind of, like, summarizes each thing that happened, within that section. And so you're watching the video pod, I'm not gonna, like, read the transcript, but I I will link to the the generated summary that it gave me. But it's it's pretty insane because it has the starting timestamp of each talk. It then has bullet points of everything that happened in that talk and then a timestamp that links to that specific section in the talk. And so if you're someone that's trying to very easily review, like, 8 hours of a of a video, this is huge because now you have some some starting points. Right? You have some bullet points you can go by. You can dive into some of these time stamps. And, yeah, just makes, looking at big a large amount of of information to kind of, like, distill it, but in a way that isn't hallucinating JS as far as I could tell so far by verifying it, and was generating, like, good data. Yeah. I think that it
Scott Tolinski
again, it all comes back to you gotta know what you're expecting to see. Because if you wouldn't have, like, validated any of this stuff and it just outputted from the initial get go, you might have been like, hey. It looks like it looks like a summary of a conference. Right? But without going into it and really knowing what you're expecting to see out of it, you you might have, you know, missed that entirely.
Scott Tolinski
So, again, all about context, context, context. That's the the keyword here.
Scott Tolinski
Cool. So some stuff that we haven't necessarily tried ourselves, again, giving it more context of a code base. I I would really love to give this a try of giving it an entire library and then asking for code using that library or even more of my code base. Here's more bits of my code base as the context. Here's some examples of code that I would write. Another one is like a a personal AI. Right? An AI that works for you or an AI that could be you in general. And in fact, AIs have a general idea of who I am, which is very bizarre to me. But I asked for it to comment my code in the style of Scott Tolinski one time, and it said, yo yo yo. This is a variable here. And I'm just like, oh, it's like, yo, dog. Here's the constant. And I'm like, oh, come on. Why why would you think I would say that even though there's a chance I would? But doesn't you Scott you gotta do me like that. I've done similar things. I was like
Guest 1
earlier on, I was asking it to generate, like, YouTube transcripts in the style of CJ. It thinks I'm I mean, I'm a pretty nice guy, but it thinks I'm just, like, way too nice and bubbly. I don't know.
Guest 1
But the some of the some of the ideas I had for, like, personal AI are basically asking it about things about your own life. Right? So, like, what if you could preface a prompt with your entire calendar and all of your agenda like, your meetings that are coming up or your previous meetings? What if you could preface a prompt with, like, every note you've taken on a specific topic? And so now it has context to answer in relation to your own notes and your in in your own information about your own life. Yeah. I keep a lot of detailed notes in Obsidian, and that's all just straight markdown. It'd be really interesting to pass it in some of my Obsidian notes. Definitely. And I think, especially, like, if you're doing research on a topic and, like, you've taken your own notes and, like, pulled in from various resources, then you can start to distill down and ask it Wes, based even on, like, the notes that you've taken. Word.
Guest 1
So with all of this, we haven't talked about cost. And so right Node, as as the as of the recording of this podcast, Gemini 1.5 Pro is technically free to use in the dev console. So if you go to AI studio dot google.com, you can use Gemini 1.5 Flash and Gemini 1.5 Pro. We haven't talked about the differences, but I think Flash, eventually, when they start charging, will incur less cost. It's a little bit faster. But in the dev console, you can just have a conversation with it. It's not gonna charge you anything. But Wes they do start charging, it does look like it's gonna be somewhat expensive. So we'll have to be careful with how many 400,000 token prompts that we give it. And it also I guess, I think it depends on, how often you're querying it as well. But if we look at the the pricing page, it is saying that for GPD 1.5 pro, if you're using it via the API, it will be a dollar 5¢ per 1,000,000 tokens, and that's for prompts up to 128,000 tokens.
Guest 1
So this summarizing a 8 hour transcript probably would have costed me $2 if I would've used the API. Like, I I guess we we don't know for sure because a lot of these models, they keep free when you're at least using the API console because I think you're kind of, like, trying it out, making sure that it'll work before you actually make API calls for with it. But, yeah, that is something to note that this probably woulda cost, like, $2 or so to if if they were charging for the API. Get your free queries in while you can't, folks. Yeah. That's, that's the message.
Guest 1
Yeah. Yeah. But and I am excited to see if and when Chat gpt comes out with, bigger context windows and also Claude.
Guest 1
So that way, we can we can do some of these things with our already, like, subscription that we're paying for and necessarily have to pay, like, a dollar per 1000000 tokens or whatever else. Word. Cool. Well, this has been really super neat, and and thank you for,
Scott Tolinski
turning me on to Gemini here in the developer console. And, man, I have found it to be very useful. So, hopefully, you found this useful as well. So as always, we will catch you on Wednesday. Peace.