Web Distiller AI: A Chrome Extension for Summarizing Web Pages
/ 2 min read
I created a Chrome extension that summarizes web page content using Gemini Nano in Chrome. It also supports translation into Japanese. `window\.ai` is awesome https://t.co/2WYo38RLMz (GIF plays at 3x speed) pic.twitter.com/HhhdZS4kuw
— Naoki Ainoya (@naokiainoya) July 6, 2024
I’ve been working on: chrome-extension-web-distiller-ai, a Chrome extension designed to summarize web page contents using the built-in Gemini Nano model. This project utilizes local LLM technology to offer secure and efficient in-browser summarization without transmitting any content to external networks.
Key Features
- Summarization: Extracts and summarizes the main content of the currently viewed web page.
- Translation Options: Offers translation of summaries into English or Japanese.
- Markdown Output: Converts the extracted content into markdown format for easy readability.
- Clipboard Copy: Provides a convenient button to copy the summary to the clipboard.
Development Challenges
During the development of this extension, I encountered several challenges:
- Context Length Issues: When the content exceeded the context length, I faced a NotReadableError: The execution yielded a bad response. error. Unfortunately, the current built-in Gemini Nano model in Chrome does not have publicly available specifications regarding its context length limit. I hope that these specifications will be published in the future.
- Unpredictable Errors: Certain content types caused similar errors during generation, and the exact cause remains unclear. This unpredictability required additional handling in the extension.
- Quality of Output: Initially, performing both summarization and translation in a single prompt significantly degraded the quality of the output. To resolve this, I separated the processes into distinct prompts, which improved the overall performance and quality.
Reflections
Developing Web Distiller AI has been an enlightening experience. Here are some of my thoughts:
- Expanded Utility: The ability to utilize LLM functionalities with just a properly set up Chrome browser represents a significant leap forward. This expands the practical utility of such models across various use cases.
- Potential for Web Services: While I developed this as a Chrome extension, it would also be interesting to implement it as a resident assistant on a web service platform. This could provide even more accessibility and convenience for users.
- Future of Gemini Nano: As a multimodal LLM, Gemini Nano holds tremendous potential. I look forward to the day when Chrome can handle not just text processing but also image OCR and voice recognition using this technology.
You can check out the Web Distiller AI repository on GitHub to learn more about the project and try it out yourself.