How Web Apps Work: Client Development and Deployment

This is a post in the How Web Apps Work series.


An overview of the concepts, terms, and data flow used in web apps: JS client development environment, tooling, and deployment

Web development is a huge field with a vast array of concepts, terms, tools, and technologies. For people just getting started in web dev, this landscape is often bewildering - it's unclear what most of these pieces are, much less how they fit together.

This series provides an overview of fundamental web dev concepts and technologies, what these pieces are, why they're needed, and how they relate to each other. It's not a completely exhaustive reference to everything in web development, nor is it a "how to build apps" guide. Instead, it's a map of the territory, intended to give you a sense of what the landscape looks like, and enough information that you can go research these terms and topics in more depth if needed.

Some of the descriptions will be more oriented towards modern client-side app development with JavaScript, but most of the topics are fundamental enough that they apply to server-centric applications as well.

Other posts in this series cover additional topics, such as:

New terms will be marked in italics. I'll link references for some of them, but encourage you to search for definitions yourself. Also, some of the descriptions will be simplified to avoid taking up too much space or dealing with edge cases.

My JavaScript for Java Developers slides also cover much of this post's content as well.

Table of Contents

JavaScript Development and Build Processes

It's important to understand the evolution of JavaScript's design and usage over time to understand why and how it gets used today.

This quote sums things up well:

The by-design purpose of JavaScript was to make the monkey dance when you moused over it. Scripts were often a single line. We considered ten line scripts to be pretty normal, hundred line scripts to be huge, and thousand line scripts were unheard of. The language was absolutely not designed for programming in the large, and our implementation decisions, performance targets, and so on, were based on that assumption.
- Eric Lippert, former IE/JS developer at Microsoft
http://programmers.stackexchange.com/a/221658/214387

Today, applications routinely consist of hundreds of thousands of lines of JavaScript. This introduces a completely different set of constraints on how developers write, build, and deliver client-side applications. Those constraints result in a very different development approach than the early days. Instead of writing a few lines of JS and inlining them into an HTML page, modern web clients use complex build toolchains that are directly equivalent to compilers for languages like C++ or Java.

JavaScript Module Formats

Almost all languages have built-in syntax for declaring encapsulated "modules" or "packages". For example, a Java file might declare it is part of package com.my.project, and then declare a dependency on another package with import some.other.project.SomeClass;. C#, Python, Go, and Swift all have their own package definition and import/export syntax.

JavaScript is not one of those languages.

Unlike all these other languages, JavaScript originally had no built-in module format syntax. For many years, JS code was written as inline <script> tags directly in HTML, or as small .js files with a few shared global variables.

As developers began writing larger application codebases, the community eventually began inventing their own module formats to help provide structure and encapsulation. Each of these formats was invented to solve differing use cases.

(Note: code snippets in this section are purely to illustrate the syntax differences between the various formats - the actual code is not intended to run or do anything useful.)

Legacy Module Formats

Adding numerous script tags to a page has several issues. It can be very hard to determine the dependencies between different script files and load them in the right order. Also, since all top-level variables occupy the same global namespace, it's really easy to accidentally have variables with the same name overriding each other:

<script src="jquery.min.js"></script>
<script src="jquery.someplugin.js"></script>
<script src="./components/dropdown.js"></script>
<script src="./components/modal.js"></script>
<script src="./application.js"></script>

// dropdown.js
var delay = 2000; // in ms

// modal.js
var delay = 4000; // in ms

// application.js
// Oops - is it 2000 or 4000?
console.log(delay)

Immediately Invoked Function Expressions (IIFEs) are a pattern that rely on JS variables being scoped to the nearest function. An IIFE involves defining a new function and then immediately calling it to get a result. This provides encapsulation, and was used as the basis of the "revealing module" pattern, where an IIFE returns an object that defines its public API (equivalent to a factory function or a class constructor):

// dropdown.js
(function(){
    var delay = 2000; // in ms
    APP.dropdown.delay = delay;
}());

// modal.js
const modalAPI = (function(){
    // No name clash - encapsulated in the IIFE
    var delay = 4000; // in ms
    APP.modal.delay = delay;

    $("#myModal").show();

    function hideModal() {
      $("#myModal").hide()
    }

    // return a "public API" for the modal by exposing methods
    return {
      hideModal : hideModal
    }
}());

The Asynchronous Module Definition format (AMD) was designed specifically to be used by browsers. A specialized AMD loader library first creates a global define function. AMD modules then call define() and pass in an array of module names they depend on, and a function that acts as the body of the module. The module body function receives all its requested dependencies as arguments, and may return any single value to act as its "export". The loader library then checks to see if all of the requested dependencies have been registered and loaded. If not, it recursively downloads other dependencies waterfall-style, and works its way back up the dependency chain to initialize each module function with its dependencies.

// moduleA.js
// Loader library adds a global `define()` function
define(["jquery", "myOtherModule"],
function($, myOtherModule) {
    // Body of the function is the module definition
    const a = 42;
    const b = 123;

    function someFunction() { }

    // Return value is the "exports" of the module
    // Can do "named exports" by returning object with many values
    return {a : a, publicName : b, someFunction : someFunction}
});

// moduleB.js
define(["backbone"],
function(Backbone) {
    const MyModel = Backbone.Model.extend({});

    // Can do a "default" export by just returning one thing
    // instead of an object with multiple things inside
    return MyModel;
});

IIFEs and AMD modules are no longer actively being used for new development, but code using those patterns is still out there.

CommonJS Modules

The CommonJS module format was developed specifically for use with the Node.js runtime (a JS interpreter running outside the browser). Since Node has access to the filesystem, the CommonJS format was designed to load modules from disk synchronously as soon as they are imported.

The Node.js interpreter defines a global require function that accepts either relative paths, absolute paths, or library names. Node then follows a complex lookup formula to find a file matching the requested path/name, and if found, immediately reads and loads the requested file.

CommonJS modules do not have any outer wrapping function. The interpreter also defines a global module.exports variable, and the module defines its exported values by assigning to that variable.

// moduleA.js
// Node runtime system adds `require()` function and infrastructure
const $ = require("jquery");
const myOtherModule = require("myOtherModule");

// The entire file is the module definition
const a = 42;
const b = 123;

function someFunction() { }

// Node runtime adds a `module.exports` keyword to define exports
// Can do "named exports" by assigning an object with many values
module.exports = {  
    a : a, 
    publicName : b, 
    someFunction : someFunction 
}


// moduleB.js
const Backbone = require("backbone");
const MyModel = Backbone.Model.extend({});

// Can do a "default" export by just assigning
// one value to `module.exports`
module.exports = MyModel;

CommonJS modules allow dynamically importing other modules at any time, and imports can be done conditionally.

The flip side is that CommonJS modules cannot be used as-is in a browser - some kind of adapter or repackaging is needed.

Universal Module Definition

Some libraries need to be able to be used in multiple environments with the same build artifact: a plain global <script> tag in a browser, an AMD module in a browser, or a CommonJS file under Node. The community invented a bizarre-looking hack that allowed a module to work correctly in all three environments by feature-detecting capabilities, which was dubbed the Universal Module Definition (UMD) format. Today this is still semi-commonly used as a build output target for some libraries.

// File log.js
(function (global, factory) {
    if (typeof define === "function" && define.amd) {
      define(["exports"], factory);
    } else if (typeof exports !== "undefined") {
      factory(exports);
    } else {
      var mod = {
        exports: {}
      };
      factory(mod.exports);
      global.log = mod.exports;
    }
})(this, function (exports) {
  "use strict";
  
  function log() {
    console.log("Example of UMD module system");
  }
  // expose log to other modules
  exports.log = log;
});

ES Modules

The ES2015 language spec finally added an official module syntax to the JS language, which is now referred to as ES Modules or "ESM". It provides syntax for defining both named and default imports and exports. However, due to the differences between browsers and Node.js, the specification did not define how modules would actually be loaded by an interpreter, or what the import strings would refer to, and instead left it up to the differing environments to figure out how to load modules appropriately. Modern browsers have all implemented loading ES Modules based on URLs as the import strings. Node has had significantly more trouble determining a path forward, due to its reliance on CommonJS modules as the default format. As of Node 15, Node has some support for loading ES Modules, but there are still difficulties determining how CommonJS and ES Module files should interop with each other.

ES Modules were designed to be statically analyzable. The downside is that you cannot do dynamic or conditional imports - all imports and exports must be at the top level of the file.

// moduleA.js
// ES6 language spec defines import/export keywords
import $ from "jquery";
// Can do "default" imports - no curly braces around the variable name
import myOtherModule from "myOtherModule";

// Define "named exports" by adding `export` in front of a variable
export const a = 42;
export const b = 123;
export {b as publicName};

export function someFunction() { }


// moduleB.js
// Can do "named imports" from other modules
import {Model} from "backbone";
const MyModel = Model.extend({});

// Can do a "default" export with the `export default` keyword
export default MyModel;

Compiling

The JS language spec has added lots of additional syntax over the years. The ES2015 spec in particular effectively doubled the amount of syntax in the language.

Each time a browser ships a new version, that version has a fixed understanding of a certain subset of the JS language. Since that browser version may stay in wide use for many years, developers need to ship code using only syntax that is supported by the set of browser versions they intend to support. However, developers also want to be able to write and develop code using the latest and greatest syntax.

This means that developers need to compile the original "current" JS source code they've written into an equivalent version that only uses older syntax.

The Babel JS compiler is the standard tool used to cross-compile JS code into a different variation of JS. It has a wide array of plugins that support compiling specific pieces of newer JS language syntax into their older equivalent forms.

As an example, this ES2015-compatible snippet uses ES Module syntax, arrow functions, the const/let variable declaration keywords, and the shorthand object declaration syntax:

export const myFunc = () => {
  let longVariableName = 1;
  return {longVariableName};
}

When compiled by Babel with the ES2015 plugin enabled and targeting the CommonJS module format, it becomes:

"use strict";

Object.defineProperty(exports, "__esModule", {
  value: true
});
exports.myFunc = void 0;

var myFunc = function myFunc() {
  var longVariableName = 1;
  return {
    longVariableName: longVariableName
  };
};

exports.myFunc = myFunc;

In addition, there are many "compile-to-JS" languages in use in the industry. Some are obscure niche languages, some were popular for a few years and have since died out (CoffeeScript). The most commonly used compile-to-TS language at this point is TypeScript, which is a statically-typed superset of JS created by Microsoft. The TypeScript compiler itself strips out type annotations at compile time and outputs plain JS. Similar to Babel, it can also compile newer syntax into varying older language versions.

// Input: TypeScript syntax is JS with type annotations
const add2 = (x: number, y: number) => {
  return x + y;
};

// Output: plain JS, no type annotations
const add2 = (x, y) => {
  return x + y;
};

Bundling

There are multiple reasons why the original JS source cannot be delivered as-is to the browser:

  • Code written in CommonJS format cannot be loaded by browsers
  • Code written in ES Module format can be, but requires careful work to have all the files and path URLs line up correctly
  • Target browsers used by consumers likely don't support all modern syntax
  • Codebases may consist of thousands of separate JS files, and downloading each file separately would take too long to load
  • Original source contains comments, whitespace, and longer variable names, and developers need to minimize the number of bytes sent to the browser to let pages load faster
  • Languages like TypeScript are not supported by JS interpreters - the original source has to be compiled to plain JS syntax

Because of this, JS code is also bundled to prepare it for use in a browser. This has to happen in both a development environment, and production.

The bundling process traces the tree of imports and dependencies starting from a set of entry point files (such as src/index.js). Any imported file is then added to the list of files to be processed. The bundler resolves all requested imports, determines the necessary loading order, and outputs the module sources wrapped in some scaffolding that initializes the application when the bundle file is loaded.

Bundling tools also typically support multiple additional processing steps during the bundling process. In particular, bundlers will usually be configured to:

  • run a compiler like Babel or TypeScript on all JS/TS source files
  • If using TS, do typechecks with the TS compiler to verify the code actually compiles
  • Enable importing and processing additional assets like CSS and images
  • Optimize the size of the output to make it as small as possibly, by minifying it (also known as uglifying).

Minifying JS source involves shrinking the code as much as possible, by stripping out whitespace and comments, replacing long variable names with shorter names, and using the shortest possible versions of syntax. Finally, minifiers can detect dead code and remove it, and JS code is often written with flags like if (process.env.NODE_ENV !== 'production') to add development-only checks that will be removed in a production build.

The same Babel-compiled output above looks like this when minified:

"use strict";Object.defineProperty(exports,"__esModule",{value:!0}),exports.myFunc=void 0;var myFunc=function(){return{longVariableName:1}};exports.myFunc=myFunc;

Webpack is the most widely used JS bundler. Other tools such as Parcel, Snowpack, and ESBuild fulfill the same roles, but with different goals and constraints.

Source Maps

Because of these transformations, the code loaded by a browser has been mangled into a completely unrecognizable form that makes it impossible to actually debug as-is. To solve this, development tools also write out source maps, which map segments of the output file back to their original lines of source. This allows browser debuggers to show the original source code, even if it's a language not actually supported by the JS interpreter in the browser. The browser will show the "original source" in its individual files, and allows developers to debug that "original source" by setting breakpoints and viewing variable contents.

Development Environments and Tools

Node.js

Node.js is a runtime for executing JS outside of a browser environment. It's equivalent to the JRE for Java, the .NET Framework SDK, or the Python runtime. It consists of the V8 JS engine from Chrome repackaged for use as a standalone executable, along with a standard library of APIs for interacting with the file system, creating sockets and servers, and much more.

NPM

"NPM" has three meanings:

  • NPM is the publicly-hosted JS package registry that hosts third-party JS libraries and packages published by the community
  • npm is an open-source CLI client used for installing packages from that registry
  • NPM is a company that runs the registry and develops the CLI client (recently bought by Microsoft)

Libraries and packages installed off of NPM are put into a ./node_modules folder. So, npm install redux downloads a published archive from the NPM registry servers, and extracts the contents into a new ./node_modules/redux folder.

Node Build Tools

Since most JS build tools are written by JS developers for JS developers, the build tools themselves are typically written in JS. This includes widely used tools like Babel, Webpack, ESLint, and many others. So, to run them, you must have Node.js installed in your development environment. (As described below, you do not need Node.js installed on a server machine just to run your client code in a browser, unless you also have written your server application itself in JS.)

There's been a recent trend of alternative JS build tools being written in Rust or Go, with a goal of making compilation and bundling much faster through use of native code and parallelism. Most of these tools haven't hit mainstream usage yet, but the potential speedups are big enough that those tools are likely to pick up usage.

Dev Servers

Because the original source needs to be repeatedly re-compiled and re-bundled as a developer makes changes locally, the typical development process involves launching a dev server, a separate process that detects edits to the original source files and rebuilds the client code with the changes.

The dev server process typically acts as an HTTP proxy, and forwards requests for data and assets onwards to an actual application server.

A typical example might look like:

  • App server process listening on port 8080
  • Dev server process listening on port 3000

The developer would then browse to http://localhost:3000 to load the page. The dev server on port 3000 receives the request, loads the HTML host page and bundled JS source from its memory, and returns it. When the browser requests http://localhost:3000/images/avatar.png, the dev server forwards that to the app server at http://localhost:8080/images/avatar.png instead. Similarly, a request for data by the browser to GET http://localhost:3000/items would be forwarded to the app server at http://localhost:8080/items, and the response passed back through the dev server to the browser.

Webpack has a prebuilt dev server package available, and other tools such as Create-React-App often wrap around the Webpack dev server to provide additional configuration and capabilities.

Hot Module Reloading

Normally, recompiling a web app requires completely reloading the page to see the changed code running. This wipes out any state loaded into the app when the page is refreshed.

Tools like Webpack offer a "hot module reloading" ability. When a file is edited, the dev server recompiles with the changes, then pushes a notification to the client code in the browser. The app code can then subscribe to "some file changed" notifications, re-import the new version of the code, and swap out the old code for the new code as the app is still running.

Other tools like React's "Fast Refresh" mode can then leverage that reloading capability to swap specific parts of the app, such as replacing individual React components in real time.

Deployment

Serving Build Output

The output of a normal bundler build process is a folder full of static JS, HTML, CSS, and image files. Here's the output structure of a typical React app:

/my-project/build
    - index.html
    /static
        /css
            - main.34928ada.chunk.css
            - 2.7110e618.chunk.css
        /js
            - 2.e5df1c81.chunk.js
            - 2.e5df1c81.chunk.js.map
            - main.caa84d88.chunk.js
            - main.caa84d88.chunk.js.map
            - runtime-main.d653cc00.js
            - runtime-main.d653cc00.js.map
        /media
            - image1.png
            - image2.jpg
            - fancy-font.woff2

These are simple static files that can be served up by any web server.

To deploy these files, they just need to be uploaded to an appropriate location on the machine that hosts the server application. This is often done using a file transfer protocol such as SFTP.

Polyfills

There are many browser APIs that do not exist in older browsers, but cannot be handled by backwards-compiling syntax. This includes built-in functions, classes, and data types. A couple examples of this are the String.padStart() method and the Map data structure.

However, some of these can still be polyfilled with developer-provided implementations. Polyfills are extra code that is executed when an app is loaded, detects if a given feature exists at runtime in the current environment, and adds an artificial-but-equivalent implementation dynamically. As an example, a polyfill for String.padStart() might look something like:

if (!String.prototype.padStart) {
  String.prototype.padStart = function padStart(targetLength,padString) {
    // actual logic here
  }
}

Code Splitting

Even with minification, JS bundles can get much too large (250K, 1MB, or worse). This is often due to use of many third-party libraries for additional functionality.

Code splitting allows bundlers to break very large bundles into smaller chunks. The extra chunks are either added as additional <script> tags into the host HTML page, or dynamically downloaded as the app is running.

Some chunks might contain only third-party code, so that the "vendor chunks" can be cached by the browser and only downloaded the first time a user visits a site. Or, chunks might be common logic shared between multiple parts of an app, such as common utilities used by the main user-facing page and an admin page.

Some chunks might be "lazy loaded" only when the user activates a certain feature. For example, a rich text editor implementation might add an extra 500K to the bundle, but if it's only used in a specific modal dialog, the dialog code can dynamically import the text editor library. The bundler then detects that dynamic import, splits the editor code into a separate chunk, and the chunk is only downloaded when the user opens that modal.

Further Resources


This is a post in the How Web Apps Work series. Other posts in this series: