Static Site Search With Lunr.js

0 💕 0 ↩️ 0 🔄

My site isn’t that old, but slowly and surely I’ve been working on bits and pieces to make a working website that has all the features people expect. Because I’m running a static site that’s built by Hugo, I didn’t think that there would be an option for me to have my own search, so I utilized DuckDuckGo to search my site along with some JavaScript that added site:sentamal.in to any query so it only displayed my site’s results.

While using DuckDuckGo worked, it wasn’t ideal; I’d be at the mercy of a website’s crawlers to update their search results with my pages. I wanted to provide something that was more current for my visitors, so I searched for other options.

Hugo provides many search options for static sites on their page, which led me to Lunr. I searched for more blog posts about implementation, but none of them were satisfactory to me; many relied on other libraries that I didn’t want to add. So, I made my own!

Why Lunr?

Well… Partly because it was the most prominent example I could find while I was searching for a client-side static search. I wasn’t exactly sure what Solr referenced to (and I assumed it was Apache Solr), and ElastiSearch wasn’t free. After looking through all the documentation it also seemed like something I’d be able to implement using my current skills.

Goals

Minimalist; few or no other dependencies
Fast; a search result should appear quickly
Small; keep downloads and device processing small
Use ?q= queries; allows for sitelink searches in SERPs

Step 1: Create the Document JSON

Lunr relies on an array of documents formatted in JSON to build its index. Thankfully, Hugo’s dict and jsonify functions and its ability to create custom output formats made it easy to build the relevant arrays for Lunr to use.

Document Archetype (JSON)

{
  "id": "https://some.url/",
  "title": "Page Title",
  "tag": "some tags",
  "summary": "A summary of the page",
  "content": "The page's full text content"
}

To actually have Hugo generate the JSON, I created two new pages in my /content directory: lunr-full.md and lunr-summary.md. Note that Lunr only needs one source of documents, and I’m only using lunr-full.md for the index build. However, when Lunr returns search results, it only returns an array of objects that only has a reference to the page. Lunr does not return things like the page’s title or summary and requires getting that information from “somewhere else”. lunr-summary.md provides that “somewhere else.” Note that the only thing in these files is front matter.

lunr-full.md (Markdown/YAML)

---
date: 2018-05-06T03:31:00-06:00
type: search
layout: full
url: /search/lunr-full.json
outputs:
  - json
---

lunr-summary.md (Markdown/YAML)

---
date: 2018-05-06T03:31:00-06:00
type: search
layout: summary
url: /search/lunr-summary.json
outputs:
  - json
---

(I have to admit that there’s something a little hacky I did with the dates in these pages. When I first made these I omitted a date parameter, but it was screwing up the JSON creation in that it wasn’t adding commas between documents. I changed the date to a time in the past to mitigate that.)

After making these files, I made /layouts/search/full.json and /layouts/search/summary.json. It took me a little bit of reading through Hugo docs to figure this out, but because of the type choices and output choices in the front matter, Hugo will look for a suitable template in /layouts/[type]/[layout].[output]. The only difference between one and the other is the full document (used for actually building the Lunr index) includes the document’s type, tags, and content.

full.json (Go Template/Hugo)

[{{- range where .Site.RegularPages "Type" "ne" "search" -}}
  {{- $scratch := newScratch -}}
  {{- $scratch.Set "tags" "" -}}
  {{- range .Params.tags -}}
    {{- $scratch.Add "tags" . -}}
    {{- $scratch.Add "tags" " " -}}
  {{- end -}}
  {{- dict "id" (.Permalink | relURL) "title" .Title
      "type" .Type "tag" ($scratch.Get "tags") "summary" (.Summary | plainify)
      "content" .Plain | jsonify -}}
  {{- if .Prev -}},{{- end -}}
{{- end -}}]

summary.json (Go Template/Hugo)

[{{- range where .Site.RegularPages "Type" "ne" "search" -}}
  {{- dict "id" (.Permalink | relURL) "title" .Title
      "type" .Type "summary" (.Summary | plainify) | jsonify -}}
  {{- if .Prev -}},{{- end -}}
{{- end -}}]

With these files made, a lunr-full.json and a lunr-summary.json will be created in /public/search whenever hugo is run. To save a step, these files are already minified to boot!

Step 2: Pre-Build the Lunr Index

Lunr’s documentation recommends pre-building the index, especially for “large indexes or with documents that are largely static, such as a static site.” This is to prevent blocking the browser while the index builds.

For this I used their documentation verbatim to create a Node.js script that builds an index from my full document array made earlier.

From there it was a matter of running cat /path/to/lunr-full.json | node /path/to/build-index.js > public/search/lunr-index.json after running hugo to add the pre-built index to my static site.

Step 3: Create the Search Page

With the index made and a document summary available, it’s just a matter of linking it all together. I decided to make the search page first because the JavaScript was the most daunting part of all of this. I made /content/search/index.md and /layouts/search/index.html to begin this process. For brevity I’ve written the HTML below to the bare minimum for understanding. Of course you can modify the template to suit your needs.

index.md (Markdown/YAML)

---
title: "Search"
date: 2020-01-02T16:05:06-06:00
type: search
layout: index
draft: false
---

index.html (Go Template/Hugo)

<html>
  <head>
    <title>Search</title>
    <script src="https://unpkg.com/lunr/lunr.js"></script>
    <script src="/scripts/page-search.js"></script>
  </head>
  <body>
    <div id="search-results"></div>
    <template id="search-item">
      <article>
        <h1><a></a></h1>
        <aside></aside>
        <p></p>
      </article>
    </template>
  </body>
</html>

Note that I’m using <template> as the markup I’ll be using to, well, mark up found search results.

Step 4: Write the Rest of the Code

This took a bit of reading for me to actually do correctly. I knew about XHR, but I also knew about the Fetch API. I also knew that the latter was a bit easier to implement and to read.

What I didn’t know initially was I could use async and await, which were things I’m more accustomed to when I was working in C#. I ended up with a bit of a hellish glob of callbacks doing it the other way, but once I figured out I could just use async it made things a lot easier for me to read (and understand, and write).

page-search.js (JavaScript)

/* Async Function to Start LunrJS */
async function startLunrJSAsync() {
  console.log("search: Starting Lunr...");

  let idx, pages;
  let ok = false;

  const lunrIndex = "lunr-index.json";
  const lunrSummary = "lunr-summary.json";

  /* Load the Pre-Built Index */
  console.log("search: Fetching Index...");
  let response = await fetch(lunrIndex);
  let data = await response.json();
  idx = lunr.Index.load(data);
  console.log("search: Index Loaded!");

  /* Load the Page Summaries */
  console.log("search: Fetching Summaries...");
  response = await fetch(lunrSummary);
  data = await response.json();
  pages = data;
  console.log("search: Summaries Loaded!");

  /* Lunr is Ready; Return */
  console.log("search: Lunr Is Ready!");
  ok = true;
  let obj = {
    idx: idx,
    pages: pages,
    ok: ok
  };
  return obj;
}

/* Clear the Search Results element then populate with search results */
function searchSite(search, query) {
  let template = document.querySelector("#search-item");
  let resultsContainer = document.querySelector("#search-results");

  resultsContainer.innerHTML = "";

  let allResults = search.idx.search(query);
  if (allResults.length === 0) resultsContainer.innerHTML = "<p>Nothing found; search for something else!</p>";
  else allResults.forEach(function (result) {
    let output = document.importNode(template.content, true);
    let title = output.querySelector("a");
    let breadcrumb = output.querySelector("aside");
    let summary = output.querySelector("p");
    let docRef, typemoji;

    /* Find the requisite document summary for the search result */
    for (let i=0; i < search.pages.length; i++) {
      if (search.pages[i].id === result.ref) {
        docRef = search.pages[i];
        break;
      }
    }

    title.innerHTML = docRef.title;
    title.setAttribute("href", result.ref);
    breadcrumb.innerHTML = "Don Geronimo » " + docRef.type.charAt(0).toUpperCase() + docRef.type.slice(1) + " » " + docRef.title;
    summary.innerHTML = docRef.summary;

    resultsContainer.appendChild(output);
  });
}

(async () => {
  /* Initialize Lunr */
  let Search = await startLunrJSAsync();

  /* If there is a query in the URL, use that as the search query */
  let query;
  let params = new URLSearchParams(document.location.search.substring(1));
  if (params.get("q")) {
    document.getElementById("ddg-search-value").value = params.get("q");
    query = params.get("q");
    searchSite(Search, query);
  }
})();

The last line lets people use my search page like literally any other search engine which can receive a query using a GET request. In practice no server processing is being made, but the client can parse through the URL to get the query.

Considerations with Client-Side Search

After doing all this I ended up with a search service for my website all done on the client. However, that means that there is a danger that the index may eventually get too big. Right now, lunr-index.js is about 160 KB while lunr-summary.js is about 21 KB. The page itself is around 12 KB. Obviously other resources (like images, style sheets, the actual Lunr.js script, etc) will cause it to be bigger. It’s a managable size right now, but it can only grow larger the more posts exist.

Still, it’s nice to provide a search function that may search better than a search engine. As things grow larger I can think of ways to better mitigate some of the problems of client-side search, but until then this is good enough for now.