If you’ve followed my previous posts, you’ll see that I’ve spent some time attempting to build my blog on WordPress and then finally make it static. This had resulted in lots of custom code and even more failed attempts to get things to publish correctly. I’ve finally been successful in building out my site with a combination of some of my failed attempts:
- Adding Nginx in Front of WordPress
- Building a Kubernetes Container That Synchs with Private Git Repo
- Building a Static WordPress
The third article above is the most failed attempt to date at making the WordPress site static because it resulted in just as many successes as it did failures. The good news is that I learned a lot along the way on this particular attempt and it brought me to what appears to be a functional static site.
Cheating With Simply Static
I’ve got to give credit where credit is due. While attempting to fix my code, I stumbled upon the WordPress plugin called Simply Static. I did some research on it and the plugin has quite a few good ratings. I decided to drop it onto the site and take it for a spin. After a quick install, I was able to generate a static page and export it. I also did some testing and everything seemed to work great. The resulting site generated by Simply Static was WAY smaller than what I was manually mirroring with wget. Everything just seemed cleaner so I was sold.
I paid the annual fee and upgraded to Simply Static Pro because I wanted the Github integration to help me build off my previous configurations of having nginx serve up web content from a private github repo. I was glad that I made this decision!
Simply Static Simply Wasn’t the Silver Bullet
The good news is that Simply Static did indeed generate a great quality export. I noticed two problems after doing a few exports. One of the first problems is that Simply Static doesn’t appear to mirror the site and generate a single commit of all files. Instead, it appears to synch every file that it generates. I guess this is both good and bad but in my case, I didn’t want me nginx to get overloaded attempting to process nearly 1000 commits! This was the good news for creating a separate repo for Simply Static to export its static content into.
I didn’t notice the second problem until I decided to use the content share buttons on one of my articles. I noticed that the links were pointing to the internal server. As an example, my share to Twitter link looked like this:
<a href="https://twitter.com/share?url=https://blog.shellnetsecurity.com/2019/12/16/general/stay-tuned/&text=Stay%20Tuned%E2%80%A6
Next, I needed to find ways to address both of these issues.
Fixing The Static Content
The nice thing is that Simply Static allows for users to configure a webhook that will receive an empty POST request. This is unhelpful in knowing what article was just posted but I’ve already got an idea for this in the future. On the other hand, this is very helpful so that we know that the static site has been successfully exported. The first step was to leverage this webhook. In order to do this, I decided to build a custom Flask server running next to my wordpress containers.
There’s nothing too special about the Flask server so here’s that code
import logging
from modules import MirrorSite
from flask import Flask
logging.basicConfig(level=logging.INFO)
app = Flask(__name__)
@app.route('/status')
def status():
return 'Ok'
@app.route('/wordpress-static',methods = ['POST', 'GET'])
def process_wordpress():
logging.info("We received a notification from WordPress")
try:
MirrorSite.run()
return 'Ok'
except Exception as e:
logging.error(e)
return 'Error'
if __name__ == '__main__':
app.run(host="0.0.0.0", port=8000, debug=True)
This is just very simple code to bring online a server listening on port 8000 for the routes /status
and /wordpress-static
. I added the status route for future proofing, in case I wanted to add health checks to the pod. In the meantime, the /wordpress-static
route is configured as the webhook in Simply Static. The MirrorSite
module is where the bulk of the code lives and is part of some of my random learning along the way.
Mirroring the Site With Python
My Mirroring
Python module looks like the following:
import shutil
import os
import re
import glob
from git import Repo
from git import Git
import time
import logging
git_ssh_cmd = '/opt/api-handler/clone_repo.sh'
os.environ['GIT_SSH'] = git_ssh_cmd
os.environ['GIT_SSH_COMMAND'] = git_ssh_cmd
def is_binary(file_name):
try:
with open(file_name, 'tr') as check_file: # try open file in text mode
check_file.read()
return False
except: # if fail then file is non-text (binary)
return True
def check_for_current_clone(directory = None):
if directory is None:
raise Exception("Need a directory to check")
if not os.path.exists(directory):
logging.debug("Directory did not exist so we can proceed")
return True
raise Exception("The directory existed so we must be publishing already")
def clone_source_repo(directory = None):
git_ssh_cmd = '/opt/app/source_repo.sh'
os.environ['GIT_SSH'] = git_ssh_cmd
os.environ['GIT_SSH_COMMAND'] = git_ssh_cmd
logging.debug("Cloning Source Static Repo")
mode = 0o755
os.mkdir(directory, mode)
Repo.clone_from('[email protected]:this_guy/simply_static_repo', directory, env=dict(GIT_SSH_COMMAND=git_ssh_cmd))
logging.error("testing")
def fix_urls(directory = None):
logging.debug("Fixing all URL Paths")
search_regex = r"10\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:8080"
replace_string = "blog.shellnetsecurity.com"
for filepath in glob.iglob(f"{directory}/**/*", recursive=True):
if not is_binary(filepath):
logging.debug(filepath)
with open(filepath) as file:
s = file.read()
# First replace all http with https
s = s.replace('https:', 'https:')
s = re.sub(search_regex, replace_string, s)
with open(filepath, "w") as file:
file.write(s)
def clone_private_repo(directory = None):
git_ssh_cmd = '/opt/app/write_repo.sh'
os.environ['GIT_SSH'] = git_ssh_cmd
os.environ['GIT_SSH_COMMAND'] = git_ssh_cmd
logging.debug("Clone private wordpress repo")
mode = 0o755
os.mkdir(directory, mode)
Repo.clone_from('[email protected]:this_guy/live_wordpress_content', directory, env=dict(GIT_SSH_COMMAND=git_ssh_cmd))
logging.error("testing")
def mirror_site(directory = None, source_directory = None):
logging.debug("Starting crawler to get site")
# rm -rf html/blog.shellnetsecurity.com
cleanup_previous_clone(directory=f"{directory}/html/blog.shellnetsecurity.com")
# fix all https:\/\/10\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:8000 -> https://blog.shellnetsecurity.com
fix_urls(directory = source_directory)
# mv source_directory html/blog.shellnetsecurity.com
shutil.move(source_directory, f"{directory}/html/blog.shellnetsecurity.com")
# rm -rf html/blog.shellnetsecurity.com/.git
cleanup_previous_clone(directory=f"{directory}/html/blog.shellnetsecurity.com/.git")
logging.debug("Crawler Finished")
def add_files_and_commit(repo = None, post = ''):
if not repo.is_dirty(untracked_files=True):
raise Exception("No changes detected to repo")
logging.debug("Adding any files that are new to our commit")
repo.index.add(['html'])
repo.index.commit('Publishing Article : ' + post)
def push_to_private_repo(repo = None):
logging.debug("Committing our changes to the static site")
logging.debug(repo.remotes.origin.push())
def cleanup_lingering_clone(directory = '/tmp/clone_*'):
logging.debug("Executing cleanup of directories")
fileList = glob.glob(directory)
for filePath in fileList:
try:
logging.debug("Removing : " . filePath)
os.remove(filePath)
except:
print("Error while deleting file : ", filePath)
def cleanup_previous_clone(directory = None):
if directory is None:
raise Exception("Need a directory to delete")
if os.path.exists(directory):
logging.debug("Directory existed so cleaning it up")
shutil.rmtree(directory)
else:
logging.debug("Directory did not exist")
return False
def run(article_name = None):
# First get rid of any previous clone attempts
logging.debug("Removing any previous clones")
cleanup_lingering_clone()
try:
article_name = 'script testing'
source_clone_directory = '/tmp/source_clone_' + str(time.time())
clone_directory = '/tmp/clone_' + str(time.time())
check_for_current_clone(directory=source_clone_directory)
check_for_current_clone(directory=clone_directory)
clone_source_repo(directory=source_clone_directory)
clone_private_repo(directory=clone_directory)
my_repo = Repo(clone_directory)
mirror_site(directory=clone_directory, source_directory=source_clone_directory)
add_files_and_commit(repo=my_repo, post=article_name)
push_to_private_repo(repo=my_repo)
cleanup_previous_clone(directory=source_clone_directory)
cleanup_previous_clone(directory=clone_directory)
except Exception as e:
logging.error("We failed to mirror")
cleanup_previous_clone(directory=source_clone_directory)
cleanup_previous_clone(directory=clone_directory)
raise e
A brief table to explain the various functions in this module would probably be useful.
Function Name | Function Description |
---|---|
is_binary(file_name) | This is a simple test function to make sure we don’t try to search and replace in binary files |
check_for_current_clone(directory) | This is a helper function that will check to make sure we’re not currently running some type of clone. |
clone_source_repo(directory) | This makes use of the Repo functions from GitPython in order to use an ssh key to clone the simply static repo onto my worker machine |
fix_urls(directory) | This function is used to take any of those https://10.x.x.x:8080 URLS and replace them with https://blog.shellnetsecurity.com |
clone_private_repo(directory) | This makes use of the Repo functions from GitPython in order to use an ssh key to clone the production static wordpress content repo onto my worker machine |
add_files_and_commit() | This function goes through the target repo’s directory and adds any new or changed files to the commit. |
push_to_private_repo() | This function will push any changes up to the production static wordpress content repo. |
cleanup_lingering_clone() | I’m writing all of my directories out to /tmp so this will go through and attempt to clean any older mirroring attempts. |
cleanup_previous_clone() | This is a similar cleanup tasks only it will take specific directories and is intended to be used at the end of the run() function. |
mirror_site() | This function does the majority of the work for us by walking through the process needed to mirror the content. |
run() | This is the main function in this module and calling this function will trigger everything we need. |
Basic Code Execution and flow
Assuming you don’t really feel like reading all of the code above, the logic starts with the run()
function and follows this basic flow:
- Clone the Simply Static repo to our worker machine
- Clone the production wordpress static content repo onto our worker machine
- Walk all files that are not binary and search for https://10.x.x.x:8080 and replace it with https://blog.shellnetsecurity.com.
- Remove all of the existing content for the blog.shellnetsecurity.com site from the production wordpress static content repo.
- Replace it with the contents of the Simply Static repo
- Remove the .git directory to indicate that the Simply Static directory was a repo.
- Commit all of our changes
- Push the changes to the production wordpress static content repo.
Conclusion
There you have it! With a little manipulation, I’m now able to have a static wordpress site running without much interaction. With this in place, I’m now also able to begin making a few more tweaks to the site.