API reference

exception lacuscore.CaptureError
class lacuscore.CaptureResponse

A capture made by Lacus. With the base64 encoded image and downloaded file decoded to bytes.

class lacuscore.CaptureResponseJson

A capture made by Lacus. With the base64 encoded image and downloaded file not decoded.

class lacuscore.CaptureStatus(*values)

The status of the capture

class lacuscore.LacusCore(redis_connector: Redis[bytes], /, *, max_capture_time: int = 3600, expire_results: int = 36000, tor_proxy: dict[str, str] | str | None = None, i2p_proxy: dict[str, str] | str | None = None, only_global_lookups: bool = True, max_retries: int = 3, headed_allowed: bool = False, remote_headed_allowed: bool = False, remote_headed_backend_type: str | None = None, tt_settings: TrustedTimestampSettings | None = None, loglevel: str | int = 'INFO')

Capture URLs or web enabled documents using PlaywrightCapture.

Parameters:
  • redis_connector – Pre-configured connector to a redis instance.

  • max_capture_time – If the capture takes more than that time, break (in seconds)

  • expire_results – The capture results are stored in redis. Expire them after they are done (in seconds).

  • tor_proxy – URL to a SOCKS5 tor proxy. If you have tor installed, this is the default: socks5://127.0.0.1:9050.

  • i2p_proxy – URL to a HTTP I2P proxy. If you have i2p installed, this is the default: http://127.0.0.1:4444.

  • only_global_lookups – Discard captures that point to non-public IPs.

  • max_retries – How many times should we re-try a capture if it failed.

  • headed_allowed – Allow to launch captures in a local headed browser.

  • remote_headed_allowed – Allow to trigger a capture in a remote headed browser.

  • remote_headed_backend_type – The backend type for the remote headed captures (curently, Xpra).

  • tt_settings – The settings for the Trusted Timestamps.

check_redis_up() bool

Check if redis is reachable

clear_capture(uuid: str, reason: str) None

Remove a capture from the list, shouldn’t happen unless it is in error

async consume_queue(max_consume: int) AsyncIterator[Task[None]]

Trigger the capture for captures with the highest priority. Up to max_consume.

Yield:

Captures.

enqueue(*, settings: CaptureSettings | dict[str, Any] | None = None) str
enqueue(*, url: str | None = None, document_name: str | None = None, document: str | None = None, depth: int = 0, browser: Literal['chromium', 'firefox', 'webkit'] | None = None, device_name: str | None = None, user_agent: str | None = None, proxy: str | dict[str, str] | None = None, socks5_dns_resolver: str | list[str] | None = None, general_timeout_in_sec: int | None = None, cookies: list[dict[str, Any]] | list[Cookie] | None = None, storage: dict[str, Any] | None = None, headers: dict[str, str] | None = None, http_credentials: dict[str, str] | HttpCredentialsSettings | None = None, geolocation: dict[str, str | int | float] | GeolocationSettings | None = None, timezone_id: str | None = None, locale: str | None = None, color_scheme: str | None = None, java_script_enabled: bool = True, viewport: dict[str, int | str] | ViewportSettings | None = None, referer: str | None = None, rendered_hostname_only: bool = True, with_screenshot: bool = True, with_favicon: bool = False, with_trusted_timestamps: bool = False, allow_tracking: bool = False, headless: bool = True, remote_headfull: bool = False, max_retries: int | None = None, init_script: str | None = None, force: bool = False, recapture_interval: int = 300, final_wait: int = 5, priority: int = 0, uuid: str | None = None) str

Enqueue settings.

Parameters:
  • settings – Settings as a dictionary

  • url – URL to capture (incompatible with document and document_name)

  • document_name – Filename of the document to capture (required if document is used)

  • document – Document to capture itself (requires a document_name), must be base64 encoded

  • depth – [Dangerous] Depth of the capture. If > 0, the URLs of the rendered document will be extracted and captured. It can take a very long time.

  • browser – The prowser to use for the capture

  • device_name – The name of the device, must be something Playwright knows

  • user_agent – The user agent the browser will use for the capture

  • proxy – SOCKS5 proxy to use for capturing

  • socks5_dns_resolver – DNS resolver for to populate IPs in HAR when a capture is done via a socks5 proxy.

  • general_timeout_in_sec – The capture will raise a timeout it it takes more than that time

  • cookies – A list of cookies

  • storage – A storage state from another capture

  • headers – The headers to pass to the capture

  • http_credentials – HTTP Credentials to pass to the capture

  • geolocation – Geolocation of the browser to pass to the capture

  • timezone_id – The timezone of the browser to pass to the capture

  • locale – The locale of the browser to pass to the capture

  • color_scheme – The prefered color scheme of the browser to pass to the capture

  • java_script_enabled – If False, javascript will be disabled when rendering the page

  • viewport – The viewport of the browser used for capturing

  • referer – The referer URL for the capture

  • rendered_hostname_only – If depth > 0: only capture URLs with the same hostname as the rendered page

  • with_screenshot – If False, PlaywrightCapture won’t take a screenshot of the rendered URL

  • with_favicon – If True, PlaywrightCapture will attempt to get the potential favicons for the rendered URL. It is a dirty trick, see this issue for details: https://github.com/Lookyloo/PlaywrightCapture/issues/45

  • with_trusted_timestamps – If True, PlaywrightCapture will trigger calls to a remote timestamp service. For that to work, this class must have been initialized with tt_settings. See RFC3161 for details: https://www.rfc-editor.org/rfc/rfc3161

  • allow_tracking – If True, PlaywrightCapture will attempt to click through the cookie banners. It is totally dependent on the framework used on the website.

  • remote_headfull – If True, the capture will be handled as a remote headfull session.

  • headless – Whether to run the browser in headless mode. WARNING: requires to run in a graphical environment.

  • max_retries – The maximum anount of retries for this capture

  • init_script – A JavaScript that will be executed on each page of the capture.

  • final_wait – The very last wait time, after the instrumentation is done.

  • force – Force recapture, even if the same one was already done within the recapture_interval

  • recapture_interval – The time the enqueued settings are kept in memory to avoid duplicates

  • priority – The priority of the capture

  • uuid – The preset priority of the capture, auto-generated if not present. Should only be used if the initiator couldn’t enqueue immediately. NOTE: it will be overwritten if the UUID already exists.

Returns:

UUID, reference to the capture for later use

get_capture(uuid: str, *, decode: Literal[True] = True) CaptureResponse
get_capture(uuid: str, *, decode: Literal[False]) CaptureResponseJson

Get the results of a capture, in a json compatible format or not

Parameters:
  • uuid – The UUID if the capture (given by enqueue)

  • decode – Decode the capture result or not.

Returns:

The capture, decoded or not.

get_capture_status(uuid: str) CaptureStatus

Get the status of a capture

Parameters:

uuid – The UUID if the capture (given by enqueue)

Returns:

The status

get_session_backend_metadata(uuid: str) dict[str, Any] | None

Return backend-specific metadata for trusted session transport callers.

get_session_metadata(uuid: str) SessionMetadata | None

Return public session metadata for a capture UUID, or None if no session exists.

playwright_devices() dict[str, Any]

Get the devices exposed by Playwright

request_finish(uuid: str) bool

Mark a remote headfull session as ready for final capture.

Returns the updated metadata, or None if no session exists.

settings() dict[str, str | bool | int]

The public settings for the instance

exception lacuscore.LacusCoreException
exception lacuscore.RemoteHeadfullSessionError
exception lacuscore.RetryCapture
class lacuscore.SessionStatus(*values)

The status of a remote headfull session

class lacuscore.XpraSessionManager(redis: Redis[bytes], *, loglevel: str | int = 'INFO')

Manage xpra-based remote headed browser sessions over per-session unix sockets.

Each remote headed session starts its own xpra server bound to a local unix socket with HTML5 enabled. This keeps the session transport private to the Lacus deployment while allowing a separate reverse proxy or sidecar to expose a stable end-user route.

cleanup_expired_sessions() None

Stop remote headed sessions whose TTL has expired.

This method scans all remote headed session metadata keys and, for any session whose expires_at is in the past and whose status is not already terminal (“stopped” or “expired”), stops the underlying xpra process and marks the status as “expired”.

It is designed to be called periodically by an external scheduler.

get_capture_env(session: Session) Mapping[str, str | float | bool]

returns ENV variables to pass to the capture

start_session(*, session_name: str, ttl: int) tuple[XpraSession, SessionMetadata, dict[str, str]]

Start an xpra session with the given name and allocate a display dynamically.

Parameters:
  • session_name – Unique name for the remote headed session (capture UUID).

  • ttl – Time-to-live in seconds for the remote headed session.

Returns:

XpraSession describing the running xpra process and its internal transport details.

stop_session(session: Session, uuid: str, metadata: SessionMetadata, *, status: SessionStatus, expire_seconds: int) bool

Terminate a running xpra backend session.

The implementation delegates to the xpra stop command, targeting the display associated with this session. This keeps the shutdown logic in xpra itself and avoids relying on PIDs being stable across restarts.

If the session is already gone or xpra is not reachable, the call is treated as a best-effort no-op. The return value signals whether shutdown was confirmed or the backend already appeared to be gone. This method does not perform any Redis or higher-level cleanup; that must be handled by the caller.

LacusCore

class lacuscore.LacusCore(redis_connector: Redis[bytes], /, *, max_capture_time: int = 3600, expire_results: int = 36000, tor_proxy: dict[str, str] | str | None = None, i2p_proxy: dict[str, str] | str | None = None, only_global_lookups: bool = True, max_retries: int = 3, headed_allowed: bool = False, remote_headed_allowed: bool = False, remote_headed_backend_type: str | None = None, tt_settings: TrustedTimestampSettings | None = None, loglevel: str | int = 'INFO')

Capture URLs or web enabled documents using PlaywrightCapture.

Parameters:
  • redis_connector – Pre-configured connector to a redis instance.

  • max_capture_time – If the capture takes more than that time, break (in seconds)

  • expire_results – The capture results are stored in redis. Expire them after they are done (in seconds).

  • tor_proxy – URL to a SOCKS5 tor proxy. If you have tor installed, this is the default: socks5://127.0.0.1:9050.

  • i2p_proxy – URL to a HTTP I2P proxy. If you have i2p installed, this is the default: http://127.0.0.1:4444.

  • only_global_lookups – Discard captures that point to non-public IPs.

  • max_retries – How many times should we re-try a capture if it failed.

  • headed_allowed – Allow to launch captures in a local headed browser.

  • remote_headed_allowed – Allow to trigger a capture in a remote headed browser.

  • remote_headed_backend_type – The backend type for the remote headed captures (curently, Xpra).

  • tt_settings – The settings for the Trusted Timestamps.

check_redis_up() bool

Check if redis is reachable

clear_capture(uuid: str, reason: str) None

Remove a capture from the list, shouldn’t happen unless it is in error

async consume_queue(max_consume: int) AsyncIterator[Task[None]]

Trigger the capture for captures with the highest priority. Up to max_consume.

Yield:

Captures.

enqueue(*, settings: CaptureSettings | dict[str, Any] | None = None) str
enqueue(*, url: str | None = None, document_name: str | None = None, document: str | None = None, depth: int = 0, browser: Literal['chromium', 'firefox', 'webkit'] | None = None, device_name: str | None = None, user_agent: str | None = None, proxy: str | dict[str, str] | None = None, socks5_dns_resolver: str | list[str] | None = None, general_timeout_in_sec: int | None = None, cookies: list[dict[str, Any]] | list[Cookie] | None = None, storage: dict[str, Any] | None = None, headers: dict[str, str] | None = None, http_credentials: dict[str, str] | HttpCredentialsSettings | None = None, geolocation: dict[str, str | int | float] | GeolocationSettings | None = None, timezone_id: str | None = None, locale: str | None = None, color_scheme: str | None = None, java_script_enabled: bool = True, viewport: dict[str, int | str] | ViewportSettings | None = None, referer: str | None = None, rendered_hostname_only: bool = True, with_screenshot: bool = True, with_favicon: bool = False, with_trusted_timestamps: bool = False, allow_tracking: bool = False, headless: bool = True, remote_headfull: bool = False, max_retries: int | None = None, init_script: str | None = None, force: bool = False, recapture_interval: int = 300, final_wait: int = 5, priority: int = 0, uuid: str | None = None) str

Enqueue settings.

Parameters:
  • settings – Settings as a dictionary

  • url – URL to capture (incompatible with document and document_name)

  • document_name – Filename of the document to capture (required if document is used)

  • document – Document to capture itself (requires a document_name), must be base64 encoded

  • depth – [Dangerous] Depth of the capture. If > 0, the URLs of the rendered document will be extracted and captured. It can take a very long time.

  • browser – The prowser to use for the capture

  • device_name – The name of the device, must be something Playwright knows

  • user_agent – The user agent the browser will use for the capture

  • proxy – SOCKS5 proxy to use for capturing

  • socks5_dns_resolver – DNS resolver for to populate IPs in HAR when a capture is done via a socks5 proxy.

  • general_timeout_in_sec – The capture will raise a timeout it it takes more than that time

  • cookies – A list of cookies

  • storage – A storage state from another capture

  • headers – The headers to pass to the capture

  • http_credentials – HTTP Credentials to pass to the capture

  • geolocation – Geolocation of the browser to pass to the capture

  • timezone_id – The timezone of the browser to pass to the capture

  • locale – The locale of the browser to pass to the capture

  • color_scheme – The prefered color scheme of the browser to pass to the capture

  • java_script_enabled – If False, javascript will be disabled when rendering the page

  • viewport – The viewport of the browser used for capturing

  • referer – The referer URL for the capture

  • rendered_hostname_only – If depth > 0: only capture URLs with the same hostname as the rendered page

  • with_screenshot – If False, PlaywrightCapture won’t take a screenshot of the rendered URL

  • with_favicon – If True, PlaywrightCapture will attempt to get the potential favicons for the rendered URL. It is a dirty trick, see this issue for details: https://github.com/Lookyloo/PlaywrightCapture/issues/45

  • with_trusted_timestamps – If True, PlaywrightCapture will trigger calls to a remote timestamp service. For that to work, this class must have been initialized with tt_settings. See RFC3161 for details: https://www.rfc-editor.org/rfc/rfc3161

  • allow_tracking – If True, PlaywrightCapture will attempt to click through the cookie banners. It is totally dependent on the framework used on the website.

  • remote_headfull – If True, the capture will be handled as a remote headfull session.

  • headless – Whether to run the browser in headless mode. WARNING: requires to run in a graphical environment.

  • max_retries – The maximum anount of retries for this capture

  • init_script – A JavaScript that will be executed on each page of the capture.

  • final_wait – The very last wait time, after the instrumentation is done.

  • force – Force recapture, even if the same one was already done within the recapture_interval

  • recapture_interval – The time the enqueued settings are kept in memory to avoid duplicates

  • priority – The priority of the capture

  • uuid – The preset priority of the capture, auto-generated if not present. Should only be used if the initiator couldn’t enqueue immediately. NOTE: it will be overwritten if the UUID already exists.

Returns:

UUID, reference to the capture for later use

get_capture(uuid: str, *, decode: Literal[True] = True) CaptureResponse
get_capture(uuid: str, *, decode: Literal[False]) CaptureResponseJson

Get the results of a capture, in a json compatible format or not

Parameters:
  • uuid – The UUID if the capture (given by enqueue)

  • decode – Decode the capture result or not.

Returns:

The capture, decoded or not.

get_capture_status(uuid: str) CaptureStatus

Get the status of a capture

Parameters:

uuid – The UUID if the capture (given by enqueue)

Returns:

The status

get_session_backend_metadata(uuid: str) dict[str, Any] | None

Return backend-specific metadata for trusted session transport callers.

get_session_metadata(uuid: str) SessionMetadata | None

Return public session metadata for a capture UUID, or None if no session exists.

playwright_devices() dict[str, Any]

Get the devices exposed by Playwright

request_finish(uuid: str) bool

Mark a remote headfull session as ready for final capture.

Returns the updated metadata, or None if no session exists.

settings() dict[str, str | bool | int]

The public settings for the instance

class lacuscore.CaptureStatus(*values)

The status of the capture

class lacuscore.CaptureResponse

A capture made by Lacus. With the base64 encoded image and downloaded file decoded to bytes.

class lacuscore.CaptureResponseJson

A capture made by Lacus. With the base64 encoded image and downloaded file not decoded.