PinnedPublished inThe Constellar Digital&Technology BlogGeek Out Time: Simple Local Testing of Llama 3 on Its Release, Gemma, and Mistral MistralUpon the release of Llama 3, I conducted tests on three models locally on my 8G RAM M1 Macbook: gemma:2b (I would have preferred to use…Apr 21, 2024Apr 21, 2024
PinnedPublished inThe Constellar Digital&Technology BlogGeek Out Time: Play with LangChain 3 — Simulate Full RAG Locally with word2vec and GemmaWith the Apple researchers’ unveiling of ReALM, following Gemma from Google, Llama from Meta, and a couple of others from Microsoft…Apr 6, 2024Apr 6, 2024
PinnedPublished inThe Constellar Digital&Technology BlogGeek Out Time: Build a Facade API for OpenAI API and Local LLM APIIn 2024, OpenAI leads the Generative AI sector, favored for its pioneering, easy-to-use API and the advanced GPT-4. However, factors like…Mar 29, 2024Mar 29, 2024
Published inThe Constellar Digital&Technology BlogGeek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM ExperiencesWebLLM brings Large Language Models (LLMs) directly into your browser, leveraging WebGPU for on-device GPU computation. In this updated…Dec 21, 2024Dec 21, 2024
Published inThe Constellar Digital&Technology BlogGeek Out Time: Exploring Opensource AnythingLLM — The All-in-One, Easy AI Platform for Local RAG…What is AnythingLLM?Dec 8, 2024Dec 8, 2024
Published inThe Constellar Digital&Technology BlogExploring LoRA on Google Colab: the Challenges of Base Model UpgradesHow to address the challenge of the need to retrain models whenever the base model is upgraded? Retraining can be computationally…Dec 5, 2024Dec 5, 2024
Published inThe Constellar Digital&Technology BlogGeek Out Time: Building an Interactive Career Coach with Google Colab and Synthesizing Agents Using…In my previous…Nov 9, 2024Nov 9, 2024
Published inThe Constellar Digital&Technology BlogGeek Out Time: Simulating High-Bandwidth and Cache-Like Memory on Google Colab’s T4 GPUThis week, we’re delving into the GPU memory hierarchies and exploring how different types of memory impact the performance of transformer…Nov 6, 2024Nov 6, 2024
Published inThe Constellar Digital&Technology BlogGeek Out Time: Exploration of Model Pruning for Efficient DeploymentThis week, I’m exploring model pruning — a technique that removes unimportant weights from a neural network to make it more efficient…Nov 2, 2024Nov 2, 2024
Published inThe Constellar Digital&Technology BlogGeek Out Time: Experimenting with FP32 vs. FP16 Quantization on Google Colab’s Free T4 GPUToday, I’m diving into quantization — a technique to optimize deep learning models by reducing their precision. Specifically, I’m…Oct 31, 2024Oct 31, 2024