GPU Hosting and Open Source AI Will Revolutionize or Kill WordPress
On the eve of WordCamp US 2024 we find ourselves in the midst of a revolution. It is perhaps the most profoundly transformative technology revolution our species has experienced in our short history in this Universe.
In fundamental terms, since computers have existed we have been programming them by writing individual functions by hand. We recently discovered how to train functions to solve problems so complex that all the programmers in all the world, working the entirety of their natural lives, would not be able to solve these problems using traditional programming.
But it goes beyond that. We’ve figured out how to train an artificial brain to have a conversation with us that includes reasoning capability and problem solving. The hilarious part is that it was as simple as creating an AI model that predicts the next part of a conversation, which is a solution so simple it has left swathes of academia bitterly disappointed in both computer science and our own species. Surely our power of reason is more complex, more nuanced, more… special?
We’ve even figured out how to train an artificial brain to be a world-class programmer, which will ultimately help us to create far more effective artificial brains. As the invention of computers created technology that accelerated innovation in hardware and software exponentially, so innovation in the field of AI will further accelerate the field.
WordPress is what powers the majority of websites that publish content on the World Wide Web – content that until now has mostly been created by humans. Large Language Models are spectacularly good at ingesting content, doing things with it and producing new content by combining, summarizing, reformatting and even reimagining the subject matter with a user-defined fresh take.
In this post I’ll discuss the powerful enabler that is agentic AI, how it will transform user experiences, how we are putting this technology to work at Wordfence, the chasm in the market around GPU enabled hosting for WordPress, why it is an important enabler for society, and the incredibly exciting business opportunity that GPU enabled WordPress hosting represents.
Note: If you’re new to AI, for clarity in this post I’ll refer to AIs as “models” for the most part, which is what an AI is in it’s portable and deployable form.
The Future is Agentic
For the past year a concept known as “agentic AI”, “function calling” or “tool calling” has been gaining momentum. That is, the ability for a model to call functions that a developer has defined and given the model access to. This takes a model, which is an isolated electronic brain receiving inputs and outputs, and gives it arms, legs, hands and feet. Tool calling gives a model the ability to do stuff at will. Don’t worry, they don’t call functions directly. They express their intention to call a function with parameters and the developer’s code decides if the function is actually called.
Tool calling capability in AI first showed up in the mainstream last year when OpenAI announced some limited capability. Since then we’ve seen it appear in many other models including Meta’s Llama. I use an application called vLLM to host our large language models in a highly scalable way, and vLLM supports tool calling for Hermes and Mistral and I’ve added that capability for Llama 3.1 which we’re using internally – and I’ll contribute a pull request at some point. The point is that tool calling – or “agentic” capability in models – is a rapidly growing field and all the bits and pieces you need to do this out of the box haven’t been created yet but are rapidly emerging.
UX Will be Transformed
The punchline is that you’re going to be able to expose the entire WordPress API to an LLM, make a few recommendations about how to use the API and what to avoid, and the future of publishing with WordPress will be you having a conversation with an AI that will go something like this: (Lets call the AI Charlie)
- Hey Charlie, you up?
- Mark you know I’m always up. How can I help?
- Go read the latest aviation news and let me know if I’ve written about anything being discussed in the past 2 years.
- Done, and yes a plane nearly landed on a taxiway and you suggested ILS be made mandatory on visual approaches a year ago.
- OK go and see if there is any regulatory progress on that.
- There is. There’s a draft bill they’re trying to attach to the FAA reauthorization bill.
- Cool, write a post that covers the taxiway near-landing, mention my thoughts on ILS and bring in any relevant data in the draft bill and reference your sources. I also want ILS to be the focus keyword with a density of at least 20 and give me three catchy headline options.
- Done. Here it is.
- Looks good. Use the first headline, publish it, and then be fairly strict about how you moderate comments as they arrive. I only want comments that are adding new data to the story.
- Will do.
- And email the mailing list, but engaged contacts only and it’s time for us to clean house so delete all unengaged contacts after you back them up. We’ll do the final deletion after a few weeks once we’re sure we got it right, so go ahead and schedule that.
- Done.
I’ve taken a few liberties with the aviation facts in this post – the FAA bill has already been approved by the Senate – but you get the idea. There are a quite a few functions that Charlie is calling in our example including URL fetching, WordPress API functions to read old posts, publish new posts and moderate comments, and API functions from a mailing list provider like MailChimp, Aweber or Hubspot to send email and manage a mailing list.
Agentic Systems are Transforming Security Research
At Wordfence we have an internal agentic system called Murphy that assists with security research. Murphy uses Llama 3.1 405B quantized down to 8 bits, running on an 8 GPU cluster of Nvidia H100 GPUs. Murphy has a range of functions I’ve built including URL fetching and the ability to accept a ZIP file and scan the entire file for vulnerabilities in code. Llama 3.1 makes agentic calls – or tool calls if you prefer – to other AI systems to evaluate code for vulnerabilities. We’re exploring ways to make this capability available to vendors and security researchers so let me know at our booth at WCUS if you’re interested in this.
Our user interface for Murphy is Slack and Murphy exists as a slackbot. It turns out a platform that already facilitates conversation, including threading, is ideal for interacting with a system emulating a human.
The biggest limit on what we can do with AI models currently is the size of their context window. Models that are the industry leaders at generating and auditing code have a context window of 128,000 tokens or roughly 80,000 to 100,000 words. Google’s Gemini Pro 1.5 supports a context window of up to 2 million tokens and is unique in that respect. Models tokenize words and a word can be one or more tokens depending on the tokenizer used. That context window size limits the amount of code a model can ingest at once and hold in it’s electronic brain while evaluating the code for vulnerabilities or performing any kind of analysis.
So right now we build workarounds like processing one file at a time or finding clever ways to combine files for processing. A larger context window would be transformative for not just security, but for all fields. A very large context window means you can feed “Save the Cat”, Robert McKee’s “Story” and Joseph Campbell’s “Hero with a Thousand Faces” into an LLM in a single query and ask it to write the best screenplay ever written. Well… it’ll probably just write Star Wars Episode IV.
If you’re having a conversation with a model and, lets say the model is providing companionship for you along with assisting you with various tasks, the context window is what limits the amount of conversation history – or relationship history – that you can have. Every time you prompt a model you have to provide the entire conversation thus far, or if you’re thinking long-term, the relationship thus far. Context window is a model’s memory about you, the world and itself. Larger context windows are critical if we are to have models that have current knowledge and up-to-date memory.
The future is large AI models with huge context windows calling functions and many of those functions will access other models which may in turn call their own functions to help fulfill not just individual tasks, but whatever your generalized goal is. That may be harvesting a healthy apple crop this fall and sharing your journey and learnings with others, or it may be positioning yourself as the preeminent source for breaking aviation news.
Where is GPU Enabled WordPress?
In the WordPress community we enable publishers. We democratize publishing – as the WordPress tagline goes. How is it possible that when I search for GPU enabled WordPress hosting, there is none available? How can this possibly be? The only explanation is that publishers who are beginning to be users of AI are using closed source APIs from OpenAI and other providers. As a proponent of open source, and someone who has witnessed the positive impact of open source on the development of the World Wide Web over the past 30 years, we need to seize this opportunity.
WordPress is open source. WordPress stands for the democratization of publishing. Why should we democratize publishing? Because if we only share approved thoughts and ideas, we won’t have any new ideas and it will create a concentration of power among the few who have the ability to decide which ideas are allowed to be shared. For example, if you can’t criticize people in power, they will probably stay in power even if they are doing very bad things. And if you can’t criticize old ideas and freely introduce new ideas, you will have an ossified society that never evolves and is stuck with antiquated values.
Just like WordPress democratized publishing, open source AI will democratize computation. But if the most powerful and capable models are kept behind lock and key and we are provided limited, monitored and metered access to them, most of us will be at a severe disadvantage to the wealthy and powerful who have unfettered access to the newest most powerful models. The best writing, the most advanced physics, the newest engineering tools and techniques and the newest trading tools and business applications will all be created behind closed doors using powerful closed source AIs.
For a brief period in the early history of the Web the dominant web server was closed source and produced by Netscape. I built my first web application on Netscape’s Commerce Server around 1995. Here’s Netscape’s server page in 1996 and here is a snapshot of Apache’s home page around the same time right as Apache started to become the dominant web server. If Web serving remained closed source, it would have choked out innovation before it got started, and we would never have seen the technology revolution, investment, product innovation and new wealth creation that we saw in the late 1990s extending deep into this century.
The Great Concentration of Power Has Started
If you need to navigate insurmountable regulatory hurdles to develop or host AI, it ensures that only wealthy and powerful companies who have large legal teams and budgets have the resources to develop and deploy AI. The wealthy and the powerful in the AI space are doing their very best to raise the drawbridge of their castles and construct regulatory moats that no one else can cross without great effort and expense.
We are at a point of inflection. Or as the cool kids would say, we’re at a “moment”. Now is when we decide if our society will be able to train and host our own models on our own machines which we can share with others, that they can further fine-tune and share back to us – or if AI will become outlawed and our only access will be to older approved models via monitored and metered access points.
That is why it is critically important for the WordPress community to act now, act fast, and to support and invest in open source AI. In the WordPress space we should be downloading open source models, modifying them and contributing them back to the community and developing open source code around those models and contributing that code to the community. By creating a vibrant and productive open source AI movement within WordPress, we add our weight to the grass roots movement behind open source AI and ensure that this critical computational resource remains democratized and accessible to all.
Open Source AI Hosting is a Massive Opportunity
The good news is that open source AI presents a massive business opportunity for the most profitable segment of the WordPress market which is hosting. I put the global WordPress hosting market at somewhere between 10 and 30 billion dollars a year. Not a single host has launched GPU hosting targeted at the WordPress market or as part of a WordPress hosting package. The future of WordPress is AI with local open source models running on WordPress websites.
Companies like Lambda Labs, RunPod and others can’t buy GPUs from Nvidia fast enough to satisfy their customers. WordPress hosting companies are missing the biggest opportunity the hosting space has ever seen. Nvidia’s market cap today is $2.8 trillion. Two years ago they were just over $300 billion. OpenAI has seen the fastest growing user-base in history. We’ve never seen growth and innovation in technology at this pace before.
The Universe just delivered WordPress hosting a fresh business model that gives their customers unbelievable value and a huge competitive advantage as publishers. WordPress hosting providers already have customers in the millions, and those customers are already sold on AI. Are a handful of AI hosting startups going to take it all?
Is a new Python based CMS going to emerge alongside a fresh hosting brand offering self-hosted open source AI models? Or is the WordPress community going to rise to the challenge and contribute our open source army to the movement supporting open source AI?
See you at the Wordfence booth in Portland at WordCamp US 2024 this week.
Mark Maunder – CTO @ Defiant Inc.
Comments
12:09 pm
Given how massively unprofitable the closed source AI models are, I don’t think their dominance is anything to worry about. https://www.wheresyoured.at/subprimeai/
2:12 pm
Computation will get cheaper. It always has and always will. Algorithms will get more efficient - we're seeing this happen in leaps and bounds with e.g. quantization massively reducing the amount of VRAM a model requires. Hosting will become more efficient - we're seeing a shootout between open source platforms like Llama.ccp, vLLM and others to produce the best price per token. Then there is the innovation on the hardware side. These trends will continue for years and exponentially drive down AI costs. So what you're seeing is first movers and fast followers spending heavily to grab market share and lock in their customers so that as things become more efficient and application developers start printing money via the features that AI enables, they own huge chunks of the back-end AI market and get a share of the application revenue being produced. OpenAI has raised $11.3 billion at this point and you can think of that as an early one off expense to grab marketshare rather than an ongoing cost to support operations. The ongoing operational costs will change radically as the tech and space evolves.
1:50 pm
Hi, thanks for the reply. If what you say is true, why did TSMC execs allegedly dismiss Sam Altman as a ‘podcasting bro’ after he made absurd requests for 36 fabs for 7 trillion USD? That is simply not feasible. If that is what Sam believes AI needs, their LLMs are already on a path to failure. Ed Zitron does a great job diving into the economics of the LLM bubble, and the figures do not lie.
2:03 pm
lol.
12:43 pm
What makes you think this statement is true, "Don’t worry, they don’t call functions directly. They express their intention to call a function with parameters and the developer’s code decides if the function is actually called"? The only reason the AI does not call functions directly is because publicly visible AIs don't currently enable it to do so. There is nothing that prevents someone from removing that block.
2:04 pm
Hi Nathan. I'm describing the actual mechanism of tool calling that exists in Llama 3.1 and many other models. The model responds with a JSON call to a function and the developer code parses that and executes the function. What this provides is an additional opportunity to insert safety mechanisms e.g. to evaluate if the call to the function is malicious in any way. I'm not sure what you mean by publicly visible AIs - perhaps you mean closed source AIs? Anyway OpenAI works the same way when it comes to function calling - it's your application that calls the function, not their AI and it's not being called on their cloud servers it's being called on your server or computer if you choose to do that. Sure we can speculate that some AI somewhere can call functions directly if it's trained to do that and built that way, but that's not the standard design we're seeing.
12:51 pm
This is an interesting, well written article. What is tragic about the implications of the future you describe (one that is not inevitable by any means) is that the internet, once a quirky, earnest and democratic network of human content may become a dystopian garbage dump of AI generated content, read by AIs for other AIs. It will become completely unusable for humans, just a feeding ground for bots hungry for more data. It has been projected that by 2040, 80% of the internet will be AI generated. Have you read AI generated poems? Short stories? Film scripts? It is absolute drivel.
For any company looking to the future of the internet, I think it's important to consider how we can retain and privilege the authentic human connection online.
I would also like to note that Artificial Neural Networks are not brains by any means. If anything they are crude approximations, and it's sensationalist to anthropomorphise them with that kind of language.
1:57 pm
The Web is already a mess of spam. Using Google to find a recipe is a great illustration of the absolute garbage that humans have generated. I think the feedback loop between Google's search and their ad business and the content generators who share that revenue have turned the Web from what was once a place to share ideas to a pretentious garbage dump. What we need is a new form of content discovery and we need to take Google out of the loop.
1:03 pm
Great post! Insightful to say the least, people / developers need to read this.
1:53 pm
Thanks Walter.
1:08 pm
Hey Mark,
glad someone has pointed this out. I have been considering my own GPU cluster for a year now, with whitelisted API endpoints for my servers with WP sites, so that it could:
- Crunch data from my clients´ blogs and WooCo stores and draw trends for their benefit,
- use your excellent APIs for security patching/virtual patching,
- automate certain tasks not unlike those you have mentioned in the (hopefully not for long) fictitious dialogue with "Charlie"...
- ... and so on.
I also marvel how no one offers this yet. Even some smaller models are very capable and quantize-abled on lesser systems.
Kudos to your system, so happy to see someone has done this the "endgame" way!
If you can share more about how you utilize Murphy (you can get very technical if you like), I am all ears, but respect either answer.
All the best and thanks for your products!
Miro
1:53 pm
Hey Miro! Thanks for the comment.
We're very much thinking along the same lines. Small models can definitely work on a cost-effective server providing GPU from a hosting provider. I think computation is expensive right now for large models - we're paying $24 an hour for an 8 GPU system with H100's which is a fairly competitive price, but it's not near affordable for most WP publishers. However that system is capable of producing around 2000 tokens per second or more and using vLLM can do that for multiple concurrent users via an OpenAI REST interface. And so I think the trick is aggregation - in other words many WP sites sharing access to GPUs and models. This removes for example the ability for a single site to load a fine tuned Llama 405B, but it means that many sites can use a generic Llama 3.1 405B and share the cost and use the context window to customize output - in other words custom system prompts with custom data in the context window. But of course it's limited to 128K right now so you're limited in terms of how much "training" you can provide for a response if you're not doing actual fine-tuning.
Computation will get cheaper. Context windows will get much larger. And so the model I think hosting providers will eventually pursue is providing a range of open source models local to the site with low latency which have large context windows, and for higher end customers they can load their own fine tuned large models onto a server and access them via the same REST interface. Just thinking out loud.
2:11 pm
Mark, thank you for such a quick reply!
This keeps me up at night (in a good way).
The shared API indeed sounds like a way to go (or THE way to go).
I am also looking forward to cheaper and/or more specialized hardware (NPUs, or AI-centric GPUs adopted for enthusiast or pro user market).
Also there was a cool article about whom I think was 17 YO ethical hacker, who tuned ChatGPT via API for pentesting/offensive red-teaming.
This gives me both a hope and goosebumps, as it, may I hazard a guess, should to us folks maintaining web worlds for a living.
No doubt that as I am asking AI, both closed and open sourced, how to better secure my site outside of all the existing info out there, there is someone asking it how to break into even much better secured ones.
And I guess your weekly intelligence summary has grown tenfold thanks to researchers using AI, too. ;-)
So good to hear the good guys (you) keep the torch lit.
And this is excellent conversation. I truly understand how little time you have, but gonna ask anyway if there is similar place to discuss this in more detail (and yeah, thinly disguised ask for this to be conversation with you).
Anyway, thanks for all you guys are doing!
Miro
2:14 pm
Thanks Miro. I'll be in Portland tomorrow for WCUS so if you're there lets chat for sure. You can't miss our booth! Otherwise I'll be blogging far more often now and be available in the comments. I'll also consider a virtual chat of some sort. (Brad if you're reading, we can look into this)
2:49 pm
Thank you, Mark. Sadly, I am half a world away (not that I could not have flown to see you guys, right? ;)), in Central Europe. However, thank you for the discussion.
Looking forward to more of your writing (not trying to flatter you, but you are rare combo of dev/writer, AFAIK from lurking here for years).
I am really looking for smart people from cybersecurity niche to chat up cool concepts with, or possibly interview them (I write tech blogs and case studies, so I love interviewing!). I understand how time consuming this is for everyone involved, though.
Anyway, hope your WCUS goes well!
3:47 pm
Very interesting read and in particular, the last paragraph about Python makes me wonder if Python can be hosted in a GPU enabled hosting provider. If it can, how long it'll take to ask an AI to reproduce the Wordpress codebase using Python instead of PHP.
4:10 pm
Hi Michael,
Python's libraries and in particular PyTorch and transformers are rock solid and mature for building AI products. Python is pretty much the standard when it comes to coding anything AI. Python isn't perfect though - it has a global interpreter lock which makes true threading not really possible. Although a fix appears to be forthcoming. Coding AI in PHP isn't really feasible, so what you end up doing is writing a Python REST interface or providing some other kind of interface where your PHP can call Python to do the AI heavy lifting.
Mark.
1:11 am
> "a new form of content discovery and we need to take Google out of the loop."
thats true but its happening anyways. lots of content is happening outside of the web, eg linkedin, medium, instagram or other networks that have their own algos. wordpress could be its own "medium" now that i think of it....
6:48 am
Yes good point.
9:08 pm
Great article, Mark! I wonder how the democratization of AI will impact small developers building plugins or themes. Will this market disappear altogether? If you can ask AI to create content using open-source programs, a much larger context window could make adding new features, like plugins, much easier. Regarding PHP, I’ve been developing a plugin that interacts with OpenAI’s Assistant APIs. Unfortunately, I haven’t seen support for streaming runs using native PHP or cURL. I believe for everything to come together, the open-source tools available to the average developer need to become more extendable. Perhaps this is already happening, but at a slower pace compared to the rapid advancements in AI. Unless, of course, we need an entirely new set of tools.
1:03 am
IR 4.0 Is Here, for sure. Great article & you should definitely open up a 2 way com. as this is the most important dev. since w.w.w. BTW you wrote this before the Reset on Open AI right 0/1? We have compared previous prompts/replies and the new release is quite frankly astounding. I think its a soft launch ChatGPT5 model.
7:09 am
I had a bit of trouble parsing this comment - maybe it's my pre-caffeinated brain. So I asked Murphy which is the internal bot I've been teasing in this post to help me out. It's a fun way to show off one of his agentic functions. He has the ability to fetch raw HTML from the web, fetch HTML and convert it to text, and to scan entire ZIP files for vulnerabilities - which I'll share in a future post. He decides what to call based on input. Anyway I asked him to fetch this blog post with all its comments and give me an interpretation of what IR means.
https://imgur.com/DiXUTRU
David I don't know what you mean by "the reset on open AI" but if you're referring to o1 and the CoT capability yes I'm aware and use it. Their models still can't do the most basic agentic action which is to fetch a URL from the Web. o1 is also horrendously slow. I guess I'm not as impressed as you and right now I think they're in a fight for their lives with other vendors.
4:02 am
Good writings, I wonder if you used any sort of AI to produce this content?
BTW, Hosting a local model & maintaining is way expensive than calling an API isn't it? Also sharing your data with others is scary.
We may end up using something in between.
6:47 am
Hi Abdullah. I'm not opposed to using AI and I use copilot heavily in my own coding. If I recall correctly the only thing I used AI for in this post is to look up synonyms.
6:31 am
This is honestly one of the most insightful articles I've read on AI anywhere and I've read a lot. Really excited to see what Wordpress can do to stay relevant in this new world. Would love to read more follow-up articles to this Mark, keep up the great work.
6:46 am
Thanks Brent. Very much appreciated.
6:32 am
Well that was really a quality read, thought-provoking as well as action oriented/inducing write-up, and thankfully not an AI regurgitated content. Clap Clap Clap
6:46 am
Thanks.
8:10 am
AI will play a critical role in our Wordpress ecosystem, I agree with your assessment. However, my burning question is: How will you use it to improve Wordfence and provide even better security for existing customers. As a customer, I'm more interested in that vision than anything else.
3:50 pm
We've started by developing a model agnostic agentic AI that is evaluating plugin source for vulnerabilities. Will post updates as we have them. Thanks for your comment.
9:45 am
Well that was really a quality read, thought-provoking as well as action oriented/inducing write-up, and thankfully not an AI regurgitated content. Clap Clap Clap
3:50 pm
Thanks.
3:38 pm
Quite an interesting read, i believe this is where software development and engineering is heading as well, just like how we create functions/classes for every functionality for a software, I believe a new methodology or technology will fill in that gap with AI. So, when building an application, each function would be replaced by specialized AIs that are easy to create with just prompts i.e "Create a Login AI agent, it should connect to MongoDB, here are the credentials. it should also have access points for other AIs to start a connection with the database". So, no more code, just full AI-based applications that can easily do whatever is needed. WordPress would just be the tip of the iceberg
3:09 pm
While I do agree with you on the path, Emmanuel, I think that some coding proficiency will always be needed in the near future - just like with AI assisted writing.
I am both a copywriter and coder and in writing, AI is still not nearly as good as humans, even with very good long prompts.
But I digress - I wanted to ask, have you tried new GPT o1 models? I know it is all hyped now, but to me personally, it really shifted my notion as to how much can AI help me write code as of year 2024. Very powerful indeed.
So we might be some way on the path you have mentioned already, but I believe always there will be needed context, and when models get even better, knowledge of context and emotion in making the right decisions.
But I am getting needlessly philosophical and did not intend to sound like a smartass.
Curious as to what you yourself think and/or what your experience is. :)
2:30 pm
If you're asking me about o1 - yes I've tried them and they're pretty good. Slow, and I occasionally get 500 errors from the REST endpoint but seem to be a bit better. I'm a bit unimpressed with CoT being the main improvement in o1 - rather than an entirely new model. They've also removed access to the system prompt so it's almost as if they're using the system prompt themselves to enable the CoT reasoning. So it's essentially a GPT with an innovative system prompt that enables CoT, which is a bit disappointing.
11:38 am
I was asking Emmanuel originally, but yes, thank you for your input!
I agree in some respects the decisions that went into o1 were funny. It almost seems like the better way to use this one is through webapp UI! At least IMO, even though that limits further options...
But I WAS impressed with the thoroughness it presented when eg. coding plugin together - "thinking" not just about the desired outcome, but also about logging, validating and sanitizing inputs, separating logic into files, constants, and so forth.
For semi-developer like myself (really, I am more writer/blogger than developer), it certainly levels the ground and enables me to be more efficient and get done stuff that I would have to pay developer not that long ago. I am all for paid dev work, BTW - but I think we can agree that if devs (and writers, too) now get only medium and pro-level assignments and the baseline can be almost automated, yeah, that is a win for everyone (unpopular opinion maybe).
Of course this all goes out the window the moment we introduce the codebase into larger context of a company with multiple systems, users, ... :-)
That is where perhaps local LLMs like your GPU cluster, all kitted out with RAG, custom training, small contextual LORAs etc. (yeah, aware it all sounds buzzwordy) come into play!
4:38 am
Scientific journals use reviewers to vet articles. The complete lack of editorial control in the current internet world has led to all manner of nonsense, particularly the growth of "influencers". These are people with no qualifications who comment on complex issues (medical, nutritional and political) and with no editorial verification. I can only see AI making this worse.