Very interesting and insightful. Thank you very much, Sam.
This is such a cool example. I was looking for this for a long time. Cheers for that !
Thanks a lot. Great content!
You can save this text meta back to your image files to EXIF, so it will be always going hand-to-hand without the need of extra files lying around
Great insight! Can you please provide more details for those of us getting started? Many thanks in advance!
A quick search shows that "EXIF metadata is restricted in size to 64 kB in JPEG images, because according to the specification, this information must be contained within a single JPEG APP1 segment." The relevant metadata tag is ImageDescription.
@@WhySoBroke import piexif
def add_description_to_exif(image_file, description):
# Load the existing EXIF data
exif_dict = piexif.load(image_file)
# Add or update the EXIF tag with your description
# For example, using the UserComment tag
exif_dict['Exif'][piexif.ExifIFD.UserComment] = piexif.helper.UserComment.dump(description)
# Write the modified EXIF data back to the image
exif_bytes = piexif.dump(exif_dict)
piexif.insert(exif_bytes, image_file)
# Usage example
description = "Generated description of the image."
image_file = "path_to_your_image.jpg"
add_description_to_exif(image_file, description)
I am really struggling with this getting the data in places that the tools i use will use or display this. Plus it appears that the windows thumbnail uses up most of the available space in the EXIF space so I will need to drop that piece. On top of that all, the libraries like decompressing the images which I really don't like.
that is what i exactly looking forward thanks a lot
This is right into the awesomeness space! Thanks for sharing this project! (yesterday I was working on a similar solution using ComfyUi + Python exporting but this is way cleaner)
I really like the look of ComfyUI I need to try and make some time to play with it.
Great video, Thanks very much
This is great ! With the idea to put the result to the exif metadata, this would be awesome 😎
Thanks for sharing this is very useful and its a good source that i keep coming back to
These are the 4 questions i ask llava and then I put the results manually in the comment section of the exif metadata:
describe this image in great detail
write the 10 most relevant questions for this image
answer the 10 above questions in the correct order
write the 20 most relevant tags for instagram
I will try to automate this workflow to keyword my photo collection , thanks for this tutorial!
I am trying to do this myself. I am struggling with the exif writing. I keep getting space limitation errors. I think it's due to the windows thumbnails.
Excellent !!! Was just playing around with moondream. Perfect timing ;)
I was hoping they would put Moondream in here as well. I also played with that and was impressed what it could do for the size.
Awesome video. appreciate you demystifying the process and tying in queuing, dataframe, and rag concepts some powerful stuff. Will be interesting to do an apple to apple comparison with GPT Vision and Gemini Vision functionality.
Great video, really what I was looking for, some useful real world cases on how to use LLM models locally, (instead of paying a company to do this for us, of course more secure and private). What I would love to see, is how to integrate this example to create a Tweet for us about the image, store it in the CSV file, and then be able to post the image with that tweet at intervals directly, maybe using Twitter's API? Not very tech savvy myself, but very interested in putting LLMs to some real world use and automation. Thanks for making these videos.
Good job, and thank you for again sharing your knowledge showing us how to do useful stuff. I'd also be interested in seeing how you create a professional web user interface for this and other projects going forward. What are some good ways of doing this which are easy to make look good and modern, and which run on all major browsers?
Good to know this. I look forward to an example of how to create a professional front end using NextJS if you'd like to recommend a tutorial or create one here @@samwitteveenai.
I can't wait until these multimodal local models can read charts and graphs reliably.
just what the doctor ordered.
thank you for the sharing Sam! i tried the same thing here, but nothin happen, the process seems to be stuck and show only :
Processing ./images\1.png
any idea why?
Any tips on getting a more consistent response with only the necessary text I want extracted from an image? I’ve played around with the prompt quite a bit and even provided an output example.
I have a loop where I generate a response, then have another prompt ask if this response is correct for the image. If no, try again. I like the big llava for the first writing and a smaller llava or moondream for the checking. It can take a couple minutes for the multiple attempts, but that's ok.
why pass the file to ollama as bytes and not an image file ? is it faster that way? Also do you know any hacks for ollama to return precisely a specific number of words (or a range) every time ?
can you do a tutorial bout AI agent, for image and video
Please do some more examples of identifying difficult screen shots.
Have you also thought about how boxing could improve this process?
@@christopherd.winnan8701 I haven't tried it with this model, but I tried it using the Moondream model with red bounding boxes and it was able to workout what was inside. been working on getting it to give me BB coordinates for things.
@@samwitteveenai - Thank you for the great research you are doing. Always looking forward to more of your excellent vids!
Is there any way to indicate the base model? It is not in localhost in my case..Thanks
Ollama normally support the Instruction Tuned models rather than the base model. You can do a custom install for any model including base models if they are converted to the right format. If you mean the model that gets loaded yes you can do that in the API.
Ollama llava and bakllava handle PNG . What are you gaining by converting to bytes?
for me I was having issues with it working with PNGs that is the reason I added it. I will have a look again and see if maybe I just had something set wrong the first time.
llava:34b-v1.6 running very slowly and not using GPU whereas llava:13b-v1.6 working fine.
my system specs
Ram: 32 GB
Gup: nvidia3060 12GB
Are you using GPU? Or all on CPU RAM?
He is using a Mac mini which has a unified memory architecture. So, while the GPUs are used they do not have their own dedicated memory.
@mshonle is totally right, no NVIDIA GPU used just the inbuilt Mac one
Can you try out video of an application and ask Gemini to code an application with simular functionality and design. Something simple
Ur welcome 😂
Microsoft stole your idea
Just FYI for others needing to reference their local ollama.
client = Client(host='192.168.0.25:11434')
response = client.generate(model='llava:34b-v1.6',
prompt='describe this image and make sure to include anything notable about it (include text you see in the image):',
images=[image_bytes])
Being a windows user... I am still waiting....
Windows sucks. It really really sucks 😂
Get off that Microsoft telemetry machine while you still can. (All in jest I don't actually care which OS you use. )
Yes please on tutorial building even more functionality of this example! 😀