Module: pageTransition

This module enables observing webpage transitions, synthesizing a range of transition data that may be valuable for browser-based studies. See the onPageTransitionData event for details.

Types of Page Transition Data

This module supports several types of page transition data. Some types are supported and recommended, because the data is consistently available, has consistent meaning, and reflects discrete categories of user interactions. Other types of transition data are supported because they appear in prior academic literature, but we do not recommend them because of significant limitations.

  • Supported and Recommended Types of Page Transition Data
    • WebExtensions Transitions - This module reports the same webpage transition data provided by the WebExtensions webNavigation API. There are two types of transition data: TransitionType (e.g., "link" or "typed") and TransitionQualifier (e.g., "from_address_bar" or "forward_back"). Note that Firefox's support for these values is mostly but not entirely complete and defaults to a "link" transition type. The MDN documentation for Firefox's implementation is also currently out of date, see: https://github.com/mdn/browser-compat-data/issues/9019. We recommend checking click transition data to confirm whether the user clicked on a link.
    • Tab-based Transitions - This module reports the webpage that was previously loaded in a new webpage's tab. If the webpage is loading in a newly created tab, this module reports the webpage that was open in the opener tab. We recommend using tab-based transition data when the user has clicked a link (according to both WebExtensions and click data), when the user has navigated with forward and back buttons, and when the page has refreshed (due to user action or automatically). In these situations, there is a clear causal relationship between the previous and current pages. We do not otherwise recommend using tab-based transition data, because the user might be reusing a tab for reasons unrelated to the page loaded in the tab.
    • Click Transitions - This module reports when a click on a webpage is immediately followed by a new webpage loading in the same tab (or a newly opened tab were that tab is the opener). This activity indicates the user likely clicked a link, and it compensates for limitations in how browsers detect link clicks for the webNavigation API.
  • Supported But Not Recommended Types of Page Transition Data
    • Referrers - This module reports the HTTP referrer for each new page. While referrers have long been a method for associating webpage loads with prior pages, they are not consistently available (webpages and browsers are increasingly limiting when referrers are sent), do not have consistent content (similarly, webpages and browsers are increasingly limiting referrers to just origins), and do not have consistent meaning (the rules for setting referrers are notoriously complex and can have nonintuitive semantics). Be especially careful with referrers for webpage loads via the History API---because there is no new document-level HTTP request, the referrer will not change when the URL changes.
    • Time-based Transitions - This module reports the most recent webpage that loaded in any tab. We do not recommend relying on this data, because a chronological ordering of webpage loads may have no relation to user activity or perception (e.g., a webpage might automatically reload in the background before a user navigates to a new page).

Page Transition Data Sources

This module builds on the page tracking provided by the pageManager module and uses browser events, DOM events, and a set of heuristics to associate transition information with each page visit. The module relies on the following sources of data about page transitions, in addition to the page visit tracking, attention tracking, and URL normalization provided by pageManager:

  • Background Script Data Sources
    • webNavigation.onCommitted - provides tab ID, url, webNavigation.TransitionType, and webNavigation.TransitionQualifier values when a new page is loading in a tab.
    • webNavigation.onDOMContentLoaded - provides tab ID, url, and a timestamp approximating when the DOMContentLoaded event fired on a page.
    • webNavigation.onHistoryStateUpdated - provides tab ID, url, webNavigation.TransitionType, and webNavigation.TransitionQualifier values when a new page loads in a tab via the History API.
    • webNavigation.onCreatedNavigationTarget - provides tab ID, source tab ID, and url when a page loads in a tab newly created by another tab. Because of a regression, this event does not currently fire in Firefox for a click on a link with the target="_blank" attribute.
    • tabs.onCreated - provides tab ID and source tab ID when a page loads in a tab newly created by another tab, except if the new tab is in a different window.
  • Content Script Data Sources
    • The click event on the document element - detects possible link clicks via the mouse (e.g., left click).
    • The contextmenu event on the document element - detects possible link clicks via the mouse (e.g., right click or control + click).
    • The keyup event on the document element - detects possible link clicks via the keyboard.

Combining Data Sources into a Page Transition

Merging these data sources into a page transition event poses several challenges.

  • We have to sync background script webNavigation events with content scripts. As with pageManager, we have to account for the possibility of race conditions between the background script and content script environments. We use the same general approach in this module as in pageManager, converting background script events into messages posted to content scripts. We have to be a bit more careful about race conditions than in pageManager, though, because if a tab property event handled in that module goes to the wrong content script the consequences are minimal (because correct event data will quickly arrive afterward). In this module, by contrast, an error could mean incorrectly associating a pair of pages. We further account for the possibility of race conditions by matching the webNavigation URL and DOMContentLoaded timestamp with the content script's URL and DOMContentLoaded timestamp.
  • We have to sync background script webNavigation events for different stages in the webpage loading lifecycle, because we want properties of both webNavigation.onCommitted and webNavigation.onDOMContentLoaded: the former has transition types and qualifiers, while the latter has a timestamp that is comparable to an event in the content script and does not have the risk of firing before the content script is ready to receive messages. Unlike webRequest events, webNavigation events are not associated with unique identifiers. We accomplish syncing across events by assuming that when the webNavigation.onDOMContentLoaded event fires for a tab, it is part of the same navigation lifecycle as the most recent webNavigation.onCommitted event in the tab.
  • We have to sync content script data for a page with content script data for a prior page (either loaded in the same tab, loaded in an opener tab, or loaded immediately before in time). We accomplish this for ordinary page loads by maintaining a cache of page visit data in the in the background script. We accomplish this for History API page loads by passing information in the content script environment.
  • We have to account for a regression in Firefox where webNavigation.onCreatedNavigationTarget does not currently fire for a click on a link with the target="_blank" attribute. We accomplish this by using tabs.onCreated event data when webNavigation.onCreatedNavigationTarget event data is not available.
Source:
See:

Namespaces

onPageTransitionData