Module: linkResolution

This module provides functionality for resolving shortened and shimmed URLs.

Source:

Members

(static, constant) ampMatchPatternSet :matching.MatchPatternSet

A MatchPatternSet for AMP caches and viewers.

Type:
  • matching.MatchPatternSet
Source:

(static, constant) ampRegExp :RegExp

A RegExp that matches and parses AMP cache and viewer URLs. If there is a match, the RegExp provides several named capture groups.

  • AMP Cache Matches
    • ampCacheSubdomain - The subdomain, which should be either a reformatted version of the URL domain or a hash of the domain. If there is no subdomain, this capture group is undefined.
    • ampCacheDomain - The domain for the AMP cache.
    • ampCacheContentType - The content type, which is either c for an HTML document, i for an image, or r for another resource.
    • ampCacheIsSecure - Whether the AMP cache loads the resource via HTTPS. If it does, this capture group has the value s/. If it doesn't, this capture group is undefined.
    • ampCacheUrl - The underlying URL, without a specified scheme (i.e., http:// or https://).
  • AMP Viewer Matches
    • ampViewerDomainAndPath - The domain and path for the AMP viewer.
    • ampViewerUrl - The underlying URL, without a specified scheme (i.e., http:// or https://).
Type:
  • RegExp
Source:
See:

(static, constant) facebookLinkShimRegExp :RegExp

A RegExp for matching URLs that have had Facebook's link shim applied.

Type:
  • RegExp
Source:

(static, constant) urlShortenerMatchPatternSet :matching.MatchPatternSet

A matching.MatchPatternSet for known URL shorteners, based on the match patterns loaded from urlShortenerMatchPatterns.js.

Type:
  • matching.MatchPatternSet
Source:

(static, constant) urlShortenerRegExp :RegExp

A RegExp for known URL shorteners, based on the match patterns loaded from urlShortenerMatchPatterns.js.

Type:
  • RegExp
Source:

Methods

(static) initialize()

Initialize the module, registering event listeners for resolveUrl and built-in content scripts for parsing and registering URL mappings (currently Twitter and Google News). Runs only once. This function is automatically called by resolveUrl, but you can call it separately if you want to use registered URL mappings without resolveUrl.

Source:

(static) parseAmpUrl(url) → {string}

Parse the underlying URL from an AMP cache or viewer URL, if the URL is an AMP cache or viewer URL.

Parameters:
Name Type Description
url string

A URL that may be an AMP cache or viewer URL.

Source:
Returns:

If the URL is an AMP cache or viewer URL, the parsed underlying URL. Otherwise, just the URL.

Type
string

(static) parseFacebookLinkShim(url) → {string}

Parse a URL from Facebook's link shim, if the shim was applied to the URL.

Parameters:
Name Type Description
url string

A URL that may have Facebook's link shim applied.

Source:
Returns:

If Facebook's link shim was applied to the URL, the unshimmed URL. Otherwise, just the URL.

Type
string

(static) registerUrlMappings(urlMappings, pageIdopt) → {RegisteredUrlMappings}

Register known URL mappings for use in link resolution. This functionality allows studies to minimize HTTP requests for link resolution when a URL mapping can be parsed from page content.

Parameters:
Name Type Attributes Default Description
urlMappings Array.<UrlMapping>

The URL mappings to register.

pageId string <optional>
null

An optional page ID for the page that the URL mappings were parsed from. If a page ID is provided, the mappings will be automatically removed shortly after the page visit ends.

Source:
Returns:

An object that allows unregistering the URL mappings.

Type
RegisteredUrlMappings
Example
// A content script parses URL mappings from a Twitter page, then in the background script:
webScience.linkResolution.registerUrlMappings([
  {
    sourceUrl: "https://t.co/djogkKUD5y?amp=1",
    destinationUrl: "https://researchday.princeton.edu/",
    ignoreSourceUrlParameters: true
  },
  // Note that the following mapping involves a known URL shortener and would require further resolution
  {
    sourceUrl: "https://t.co/qQTRITLZKP?amp=1",
    destinationUrl: "https://mzl.la/3jh1VgZ",
    ignoreSourceUrlParameters: true
  }
]);

(static) removeFacebookLinkDecoration(url) → {string}

Remove Facebook link decoration (the fbclid paramater) from a URL, if present.

Parameters:
Name Type Description
url string

A URL that may have Facebook link decoration.

Source:
Returns:

The URL without Facebook link decoration.

Type
string

(static) resolveUrl(url, optionsopt) → {Promise.<string>}

Resolve a shortened or shimmed URL to an original URL, by recursively resolving the URL and removing shims.

Parameters:
Name Type Attributes Description
url string

The URL to resolve.

options Object <optional>

Options for resolving the URL.

Properties
Name Type Attributes Default Description
parseAmpUrl boolean <optional>
true

If the resolved URL or the original URL is an AMP URL, parse it. See parseAmpUrl for detais.

parseFacebookLinkShim boolean <optional>
true

If the resolved URL or the original URL has a Facebook shim applied, parse it. See parseFacebookLinkShim for detais.

removeFacebookLinkDecoration boolean <optional>
true

If the resolved URL or the original URL has Facebook link decoration, remove it. See removeFacebookLinkDecoration for details.

applyRegisteredUrlMappings boolean <optional>
true

If the original URL matches a registered URL mapping, apply the mapping. See registerUrlMappings for details.

request string <optional>
"known_shorteners"

Whether to issue HTTP requests to resolve the URL, following HTTP 3xx redirects. Valid values are "always", "known_shorteners" (only issue a request if the original URL or a redirection target URL matches a known URL shortener), and "never". Note that setting this value to "always" could have performance implications, since it requires completely loading the destination URL.

Source:
Returns:
  • A Promise that either resolves to the original URL or is rejected with an error.
Type
Promise.<string>

(static) urlToPS1(url) → {string}

Extracts the public suffix + 1 from a URL.

Parameters:
Name Type Description
url string

The URL.

Source:
Returns:

The public suffix + 1.

Type
string
Example

Example usage of urlToPS1.

// returns "mozilla.org"
urlToPS1("https://www.mozilla.org/");

Type Definitions

RegisteredUrlMappings

Type:
  • Object
Properties:
Name Type Description
unregister function

Unregister the URL mappings.

Source:

UrlMapping

Type:
  • Object
Properties:
Name Type Description
sourceUrl string

The source URL for the mapping.

destinationUrl string

The destination URL for the mapping.

ignoreSourceUrlParameters boolean

Whether to ignore parameters when matching URLs against the source URL.

Source: