Module: pageManager

Overview

This module addresses several challenges for studying user engagement with web content.

  • Syncing Measurements and Interventions. A study that uses WebScience will often involve multiple measurements or interventions on a webpage. The pageManager module enables studies to sync these measurements and interventions by assigning a random unique identifier to each webpage.
  • Generating Page Lifecycle Events. Measurements and interventions are often linked to specific events in the webpage lifecyle. The pageManager module standardizes a set of webpage lifecycle events.
  • Tracking User Attention. Measurements and interventions often depend on user attention to web content. The pageManager module provides a standardized attention model that incorporates tab switching, window switching, application switching, locked screens, and user mouse and keyboard input.
  • Generating Audio Events. This module provides events for webpage audio, enabling measurements and interventions based on media playback.
  • Bridging the Background and Content Script Environments. WebExtensions includes two distinct execution environments: background scripts and content scripts. These execution environments are, unfortunately, only loosely bound together by tab IDs. As a result, there can be race conditions---the background and content environments can have mismatched states, such that messages arrive at the wrong webpage or are attributed to the wrong webpage. This module provides provides page lifecycle, user attention, and audio events that are bound to specific webpages.

Pages

This module creates an abstraction over webpages as perceived by users (i.e., when content loads with a new HTTP(S) URL in the browser bar or the page visibly reloads). Note that the History API enables web content to modify the URL without loading a new HTML document via HTTP(S) or creating a new Document object. This module treats a URL change via the History API as equivalent to traditional webpage navigation, because (by design) it appears identical to the user. Accounting for the History API is important, because it is used on some exceptionally popular websites (e.g., YouTube).

Page IDs

Each page ID is a random (v4) UUID, consistent with RFC4122.

Page Lifecycle

Each webpage has the following lifecycle events, which fire in both the background page and content script environments.

  • Page Visit Start - The browser has started to load a webpage in a tab. This event is fired early in context script execution (i.e., soon after document_start). For a webpage with a new Document, the event is timestamped with the time the window object was created (the time origin from the High Resolution Time Level 2 API, in ms). For a webpage that does not have a new Document (i.e., resulting from the History API), the event is timestamped with the URL change in the WebNavigation API.
  • Page Visit Stop - The browser is unloading the webpage. Ordinarily this event fires and is timestamped with the window unload event. When the page changes via the History API, this event fires and is timestamped with the URL change in the WebNavigation API.

Attention Tracking

Attention to a page is defined as satisfying all of the following conditions.

  • The tab is the active tab in its browser window.
  • The window containing the tab is the current browser window.
  • The current browser window has focus in the operating system.
  • The operating system is not displaying a lock screen or screen saver.
  • Optional: The user has provided mouse or keyboard input within a specified time interval.

In the content script environment, each page has an attention status, and an event fires when that status changes. Attention update events are timestamped with events from the WebExtensions tabs, windows, and idle APIs.

Audio Events

In the content script environment, each page has an audio status, and an event fires when that status changes. Audio update events fire and are timestamped with events from the WebExtensions tabs API.

Event Ordering

This module guarantees the ordering of page lifecycle, attention, and audio events.

  • Page visit start and page visit stop only fire once for each page, in that order.
  • Page attention and audio update events will only occur between page visit start and stop events.

Additional Implementation Notes

This module depends on the idle API, which has a couple quirks in Firefox:

  • There is a five-second interval when polling idle status from the operating system.
  • Depending on the platform, the idle API reports either time since user input to the browser or time since user input to the operating system.

The polling interval coarsens the timing of page attention events related to idle state. As long as the polling interval is relatively short in comparison to the idle threshold, that should not be an issue.

The platform-specific meaning of idle state should also not be an issue. There is only a difference between the two meanings of idle state when the user is providing input to another application; if the user is providing input to the browser, or is not providing input at all, the two meanings are identical. In the scenario where the user is providing input to another application, the browser will lose focus in the operating system; this module will detect that with the windows API and fire a page attention event (if needed).

Some implementation quirks to be aware of for future development on this module:

  • Non-browser windows do not appear in the results of windows.getAll(), and calling windows.get() on a non-browser window throws an error. Switching focus to a non- browser window will, however, fire the windows.onFocusChanged event. The module assumes that if windows.onFocusChanged fires with an unknown window, that window is a non-browser window.
  • The module assumes that valid tab IDs and window IDs are always >= 0.

Known Issues

  • The background script sends update messages to tabs regardless of whether they are ordinary tabs or have the pageManager content script running, because the background script does not track window types or tab content. The errors generated by this issue are caught in messaging.sendMessageToTab, and the issue should not cause any problems for studies.

Possible Improvements

  • Rebuild a page attention update event in the background page environment.
  • Rebuild the capability to fire events for pages that are already open when the module loads.
  • Add logic to handle the situation where the content script execution environment crashes, so the page visit stop message doesn't fire from the associated content script.
  • Add an event in the content script for detecting when content has lazily loaded into the DOM after the various DOM loading events (e.g., on Twitter).
Source:

Namespaces

onPageVisitStart
onPageVisitStop

Methods

(static) initialize()

Initialize pageManager in the background and content script environments. If you are using pageManager events in content scripts but not background scripts, you must call this function. If you are using pageManager events in background scripts, this function is automatically called when adding a listener for an event. This function configures message passing between the pageManager background script and content script, registers browser event handlers, caches initial state, and registers the pageManager content script. It runs only once.

Source: