Your own AI Editor: Local, private, no network needed and free!

While my personal views on large language models (ChatGPT, Claude, Bing/Copilot, Gemini etc.) are overall pretty critical, I find them useful in several circumstances.

BUT, I find the online share all your data on the web aspect far from ideal. By default these models are massive, requiring GPUs and huge RAM requirements, therefore they are not suitable to run locally, and critically depend on the cloud. There are open-source versions of these models that are optimised to fit in 8 to 16GB RAM machines and run on more modest CPU / GPU systems (i.e. local desktops). And there are some free GUIs that download, configure and run these models locally:

You can copy and paste text or write directly, there are no limits or costs. I’ve used GPT4All for a while and recently added LM Studio, which has a nicer GUI. Both apps offer a built-in server that runs locally and thus enables the chat model to be called from any other app.

And so as a final cherry on the top, if you use the excellent BetterTouchTool, there is an action that can take a text selection and run it through the API and replace the text: folivora.ai - Great Tools for your Mac! & Support GPT4All action? - Feature Requests - BetterTouchTool Community — so as an example I have two keys bound to run an AI engineer to answer technical questions written as text in any app and an AI editor to provide editing for selected text (pasting the original and suggested changes for perusal). You can tweak the actions to suit your needs by changing the prompt. I attach the AI Editor preset for anyone who may want to give this a go. Personally I have found these models useful for general questions of stuff I may have forgotten, and they can make useful suggestions for ways to rewrite a paragraph (at least for me as a scientist, I find the hit rate [rewording is better than mine] is around 50%, enough to keep it around). These models sometimes get things very wrong, so it is important to keep a skeptical eye on their outputs. But given they run offline, with no dependency on the web, their cumulated knowledge is truly amazing.

aieditor.bttpreset.zip (6.1 KB)

4 Likes

The latest version of LM Studio for Apple Silicon now includes Apple’s optimised MLX inferencing engine. This speeds up local LLM models an order of magnitude or more. The models are 4bit compressed, but still incredibly useful.

I’ve used Mistral Nemo recently to good effect, a 12B parameter (~6GB when loaded in memory) model with large context (128k, you can include lots of text for it to inference over when it makes its response).

Nice. I’d like to try this sometime. I have used Ollama before, but only for experimentation.

I think the LM Studio GUI really makes a difference, it is easy to search and inspect new models, manage them and organise chats, and serve models to other tools with an OpenAI API server. I’ve also tried Ollama and while in general I prefer terminal stuff, in this case I removed Ollama after a few days and went back to LM Studio.

Yes, LM Studio looks very slick. Even its web page is so nice! Adding Apple Silicon support is critical. Ollama is very memory-hungry, so any optimization will help.

One reason that I have been experimenting with a local Ollama server (rather than through LM Studio) is because I wanted to use the LLM programmatically in my own Python code. LlamaBot (GitHub) is an amazing project that provides a Pythonic interface to many LLMs, including Ollama. Through it, it is very easy to interact with LLMs in your Python programs and even in Jupyter notebooks. I am testing out using LLMs to analyze text files. Another project, LlamaIndex (GitHub) seems useful for using LLMs to explore Pandas DataFrames.

As for where this might fit in with Scrivener… I could imagine adding a feature in Squarto for generating an abstract for a paper or summaries of each chapter for a book. I’m more interested in LLMs for summarizing/describing than I am for generating draft content. I don’t have any strong feelings for this right now, as LLMs remain only a curiosity. But it’s neat to see the “middleware” tools like LlamaBot and LlamaIndex are starting to appear.

What are you trying to do with LLMs?

LM Studio offers a fully compliant OpenAI API server, so as long as your tool supports API requests (most do considering ChatGPT is the 400lb gorilla in the room), then you are good to go.

I mostly use LLMs for bouncing ideas around when grant writing, they give quirky but sometimes insightful replies (though I consider the insight is my interpretation of their statistical probability word-chain). I’d rather talk to a colleague or student of course, but the LLM is ready to work 24/7… I’ve tried to use RAG (with either GPT4All or AnythingLLM), but find it a disappointing experience (speed reading the papers themselves gives me much more useful insight). I tried to use Google’s NoebookLLM yesterday, dumping 40 PDFs from Bookends into it on a specific topic. It worked really well, and probably is the first time I find an LLM tool that can do something that gives a significant benefit, but that needs more exploration…

I also do find them really useful for writing when I have a hairy sentence of lots of technical points I need to make. Trying to explain biological results contingent on other results elsewhere in a manuscript or previous findings in a complex causal chain is a bane of clear scientific writing. I do this after the draft is complete, and my BTT AI editor preset keeps my original text and the alternative so I can pick and choose. Success rate is really improving over the last year in this regard. It does depend on technically sound source text, or they can get the causal chain mixed up and the meaning becomes lost (i.e. don’t trust them, supervise their output).

As far as analysis of results goes, I haven’t yet got a good bearing of what added value they provide. More generally machine learning is critical in my field, but LLMs are not yet anywhere near a useful tool to understand data except for specific niches.

1 Like