API reference¶
- exception lacuscore.CaptureError¶
- class lacuscore.CaptureResponse¶
A capture made by Lacus. With the base64 encoded image and downloaded file decoded to bytes.
- class lacuscore.CaptureResponseJson¶
A capture made by Lacus. With the base64 encoded image and downloaded file not decoded.
- class lacuscore.CaptureStatus(*values)¶
The status of the capture
- class lacuscore.LacusCore(redis_connector: Redis[bytes], /, *, max_capture_time: int = 3600, expire_results: int = 36000, tor_proxy: dict[str, str] | str | None = None, i2p_proxy: dict[str, str] | str | None = None, only_global_lookups: bool = True, max_retries: int = 3, headed_allowed: bool = False, remote_headed_allowed: bool = False, remote_headed_backend_type: str | None = None, tt_settings: TrustedTimestampSettings | None = None, loglevel: str | int = 'INFO')¶
Capture URLs or web enabled documents using PlaywrightCapture.
- Parameters:
redis_connector – Pre-configured connector to a redis instance.
max_capture_time – If the capture takes more than that time, break (in seconds)
expire_results – The capture results are stored in redis. Expire them after they are done (in seconds).
tor_proxy – URL to a SOCKS5 tor proxy. If you have tor installed, this is the default: socks5://127.0.0.1:9050.
i2p_proxy – URL to a HTTP I2P proxy. If you have i2p installed, this is the default: http://127.0.0.1:4444.
only_global_lookups – Discard captures that point to non-public IPs.
max_retries – How many times should we re-try a capture if it failed.
headed_allowed – Allow to launch captures in a local headed browser.
remote_headed_allowed – Allow to trigger a capture in a remote headed browser.
remote_headed_backend_type – The backend type for the remote headed captures (curently, Xpra).
tt_settings – The settings for the Trusted Timestamps.
- check_redis_up() bool¶
Check if redis is reachable
- clear_capture(uuid: str, reason: str) None¶
Remove a capture from the list, shouldn’t happen unless it is in error
- async consume_queue(max_consume: int) AsyncIterator[Task[None]]¶
Trigger the capture for captures with the highest priority. Up to max_consume.
- Yield:
Captures.
- enqueue(*, settings: CaptureSettings | dict[str, Any] | None = None) str¶
- enqueue(*, url: str | None = None, document_name: str | None = None, document: str | None = None, depth: int = 0, browser: Literal['chromium', 'firefox', 'webkit'] | None = None, device_name: str | None = None, user_agent: str | None = None, proxy: str | dict[str, str] | None = None, socks5_dns_resolver: str | list[str] | None = None, general_timeout_in_sec: int | None = None, cookies: list[dict[str, Any]] | list[Cookie] | None = None, storage: dict[str, Any] | None = None, headers: dict[str, str] | None = None, http_credentials: dict[str, str] | HttpCredentialsSettings | None = None, geolocation: dict[str, str | int | float] | GeolocationSettings | None = None, timezone_id: str | None = None, locale: str | None = None, color_scheme: str | None = None, java_script_enabled: bool = True, viewport: dict[str, int | str] | ViewportSettings | None = None, referer: str | None = None, rendered_hostname_only: bool = True, with_screenshot: bool = True, with_favicon: bool = False, with_trusted_timestamps: bool = False, allow_tracking: bool = False, headless: bool = True, remote_headfull: bool = False, max_retries: int | None = None, init_script: str | None = None, force: bool = False, recapture_interval: int = 300, final_wait: int = 5, priority: int = 0, uuid: str | None = None) str
Enqueue settings.
- Parameters:
settings – Settings as a dictionary
url – URL to capture (incompatible with document and document_name)
document_name – Filename of the document to capture (required if document is used)
document – Document to capture itself (requires a document_name), must be base64 encoded
depth – [Dangerous] Depth of the capture. If > 0, the URLs of the rendered document will be extracted and captured. It can take a very long time.
browser – The prowser to use for the capture
device_name – The name of the device, must be something Playwright knows
user_agent – The user agent the browser will use for the capture
proxy – SOCKS5 proxy to use for capturing
socks5_dns_resolver – DNS resolver for to populate IPs in HAR when a capture is done via a socks5 proxy.
general_timeout_in_sec – The capture will raise a timeout it it takes more than that time
cookies – A list of cookies
storage – A storage state from another capture
headers – The headers to pass to the capture
http_credentials – HTTP Credentials to pass to the capture
geolocation – Geolocation of the browser to pass to the capture
timezone_id – The timezone of the browser to pass to the capture
locale – The locale of the browser to pass to the capture
color_scheme – The prefered color scheme of the browser to pass to the capture
java_script_enabled – If False, javascript will be disabled when rendering the page
viewport – The viewport of the browser used for capturing
referer – The referer URL for the capture
rendered_hostname_only – If depth > 0: only capture URLs with the same hostname as the rendered page
with_screenshot – If False, PlaywrightCapture won’t take a screenshot of the rendered URL
with_favicon – If True, PlaywrightCapture will attempt to get the potential favicons for the rendered URL. It is a dirty trick, see this issue for details: https://github.com/Lookyloo/PlaywrightCapture/issues/45
with_trusted_timestamps – If True, PlaywrightCapture will trigger calls to a remote timestamp service. For that to work, this class must have been initialized with tt_settings. See RFC3161 for details: https://www.rfc-editor.org/rfc/rfc3161
allow_tracking – If True, PlaywrightCapture will attempt to click through the cookie banners. It is totally dependent on the framework used on the website.
remote_headfull – If True, the capture will be handled as a remote headfull session.
headless – Whether to run the browser in headless mode. WARNING: requires to run in a graphical environment.
max_retries – The maximum anount of retries for this capture
init_script – A JavaScript that will be executed on each page of the capture.
final_wait – The very last wait time, after the instrumentation is done.
force – Force recapture, even if the same one was already done within the recapture_interval
recapture_interval – The time the enqueued settings are kept in memory to avoid duplicates
priority – The priority of the capture
uuid – The preset priority of the capture, auto-generated if not present. Should only be used if the initiator couldn’t enqueue immediately. NOTE: it will be overwritten if the UUID already exists.
- Returns:
UUID, reference to the capture for later use
- get_capture(uuid: str, *, decode: Literal[True] = True) CaptureResponse¶
- get_capture(uuid: str, *, decode: Literal[False]) CaptureResponseJson
Get the results of a capture, in a json compatible format or not
- Parameters:
uuid – The UUID if the capture (given by enqueue)
decode – Decode the capture result or not.
- Returns:
The capture, decoded or not.
- get_capture_status(uuid: str) CaptureStatus¶
Get the status of a capture
- Parameters:
uuid – The UUID if the capture (given by enqueue)
- Returns:
The status
- get_session_backend_metadata(uuid: str) dict[str, Any] | None¶
Return backend-specific metadata for trusted session transport callers.
- get_session_metadata(uuid: str) SessionMetadata | None¶
Return public session metadata for a capture UUID, or None if no session exists.
- playwright_devices() dict[str, Any]¶
Get the devices exposed by Playwright
- request_finish(uuid: str) bool¶
Mark a remote headfull session as ready for final capture.
Returns the updated metadata, or None if no session exists.
- settings() dict[str, str | bool | int]¶
The public settings for the instance
- exception lacuscore.LacusCoreException¶
- exception lacuscore.RemoteHeadfullSessionError¶
- exception lacuscore.RetryCapture¶
- class lacuscore.SessionStatus(*values)¶
The status of a remote headfull session
- class lacuscore.XpraSessionManager(redis: Redis[bytes], *, loglevel: str | int = 'INFO')¶
Manage xpra-based remote headed browser sessions over per-session unix sockets.
Each remote headed session starts its own xpra server bound to a local unix socket with HTML5 enabled. This keeps the session transport private to the Lacus deployment while allowing a separate reverse proxy or sidecar to expose a stable end-user route.
- cleanup_expired_sessions() None¶
Stop remote headed sessions whose TTL has expired.
This method scans all remote headed session metadata keys and, for any session whose
expires_atis in the past and whose status is not already terminal (“stopped” or “expired”), stops the underlying xpra process and marks the status as “expired”.It is designed to be called periodically by an external scheduler.
- get_capture_env(session: Session) Mapping[str, str | float | bool]¶
returns ENV variables to pass to the capture
- start_session(*, session_name: str, ttl: int) tuple[XpraSession, SessionMetadata, dict[str, str]]¶
Start an xpra session with the given name and allocate a display dynamically.
- Parameters:
session_name – Unique name for the remote headed session (capture UUID).
ttl – Time-to-live in seconds for the remote headed session.
- Returns:
XpraSession describing the running xpra process and its internal transport details.
- stop_session(session: Session, uuid: str, metadata: SessionMetadata, *, status: SessionStatus, expire_seconds: int) bool¶
Terminate a running xpra backend session.
The implementation delegates to the
xpra stopcommand, targeting the display associated with this session. This keeps the shutdown logic in xpra itself and avoids relying on PIDs being stable across restarts.If the session is already gone or xpra is not reachable, the call is treated as a best-effort no-op. The return value signals whether shutdown was confirmed or the backend already appeared to be gone. This method does not perform any Redis or higher-level cleanup; that must be handled by the caller.
LacusCore¶
- class lacuscore.LacusCore(redis_connector: Redis[bytes], /, *, max_capture_time: int = 3600, expire_results: int = 36000, tor_proxy: dict[str, str] | str | None = None, i2p_proxy: dict[str, str] | str | None = None, only_global_lookups: bool = True, max_retries: int = 3, headed_allowed: bool = False, remote_headed_allowed: bool = False, remote_headed_backend_type: str | None = None, tt_settings: TrustedTimestampSettings | None = None, loglevel: str | int = 'INFO')¶
Capture URLs or web enabled documents using PlaywrightCapture.
- Parameters:
redis_connector – Pre-configured connector to a redis instance.
max_capture_time – If the capture takes more than that time, break (in seconds)
expire_results – The capture results are stored in redis. Expire them after they are done (in seconds).
tor_proxy – URL to a SOCKS5 tor proxy. If you have tor installed, this is the default: socks5://127.0.0.1:9050.
i2p_proxy – URL to a HTTP I2P proxy. If you have i2p installed, this is the default: http://127.0.0.1:4444.
only_global_lookups – Discard captures that point to non-public IPs.
max_retries – How many times should we re-try a capture if it failed.
headed_allowed – Allow to launch captures in a local headed browser.
remote_headed_allowed – Allow to trigger a capture in a remote headed browser.
remote_headed_backend_type – The backend type for the remote headed captures (curently, Xpra).
tt_settings – The settings for the Trusted Timestamps.
- check_redis_up() bool¶
Check if redis is reachable
- clear_capture(uuid: str, reason: str) None¶
Remove a capture from the list, shouldn’t happen unless it is in error
- async consume_queue(max_consume: int) AsyncIterator[Task[None]]¶
Trigger the capture for captures with the highest priority. Up to max_consume.
- Yield:
Captures.
- enqueue(*, settings: CaptureSettings | dict[str, Any] | None = None) str¶
- enqueue(*, url: str | None = None, document_name: str | None = None, document: str | None = None, depth: int = 0, browser: Literal['chromium', 'firefox', 'webkit'] | None = None, device_name: str | None = None, user_agent: str | None = None, proxy: str | dict[str, str] | None = None, socks5_dns_resolver: str | list[str] | None = None, general_timeout_in_sec: int | None = None, cookies: list[dict[str, Any]] | list[Cookie] | None = None, storage: dict[str, Any] | None = None, headers: dict[str, str] | None = None, http_credentials: dict[str, str] | HttpCredentialsSettings | None = None, geolocation: dict[str, str | int | float] | GeolocationSettings | None = None, timezone_id: str | None = None, locale: str | None = None, color_scheme: str | None = None, java_script_enabled: bool = True, viewport: dict[str, int | str] | ViewportSettings | None = None, referer: str | None = None, rendered_hostname_only: bool = True, with_screenshot: bool = True, with_favicon: bool = False, with_trusted_timestamps: bool = False, allow_tracking: bool = False, headless: bool = True, remote_headfull: bool = False, max_retries: int | None = None, init_script: str | None = None, force: bool = False, recapture_interval: int = 300, final_wait: int = 5, priority: int = 0, uuid: str | None = None) str
Enqueue settings.
- Parameters:
settings – Settings as a dictionary
url – URL to capture (incompatible with document and document_name)
document_name – Filename of the document to capture (required if document is used)
document – Document to capture itself (requires a document_name), must be base64 encoded
depth – [Dangerous] Depth of the capture. If > 0, the URLs of the rendered document will be extracted and captured. It can take a very long time.
browser – The prowser to use for the capture
device_name – The name of the device, must be something Playwright knows
user_agent – The user agent the browser will use for the capture
proxy – SOCKS5 proxy to use for capturing
socks5_dns_resolver – DNS resolver for to populate IPs in HAR when a capture is done via a socks5 proxy.
general_timeout_in_sec – The capture will raise a timeout it it takes more than that time
cookies – A list of cookies
storage – A storage state from another capture
headers – The headers to pass to the capture
http_credentials – HTTP Credentials to pass to the capture
geolocation – Geolocation of the browser to pass to the capture
timezone_id – The timezone of the browser to pass to the capture
locale – The locale of the browser to pass to the capture
color_scheme – The prefered color scheme of the browser to pass to the capture
java_script_enabled – If False, javascript will be disabled when rendering the page
viewport – The viewport of the browser used for capturing
referer – The referer URL for the capture
rendered_hostname_only – If depth > 0: only capture URLs with the same hostname as the rendered page
with_screenshot – If False, PlaywrightCapture won’t take a screenshot of the rendered URL
with_favicon – If True, PlaywrightCapture will attempt to get the potential favicons for the rendered URL. It is a dirty trick, see this issue for details: https://github.com/Lookyloo/PlaywrightCapture/issues/45
with_trusted_timestamps – If True, PlaywrightCapture will trigger calls to a remote timestamp service. For that to work, this class must have been initialized with tt_settings. See RFC3161 for details: https://www.rfc-editor.org/rfc/rfc3161
allow_tracking – If True, PlaywrightCapture will attempt to click through the cookie banners. It is totally dependent on the framework used on the website.
remote_headfull – If True, the capture will be handled as a remote headfull session.
headless – Whether to run the browser in headless mode. WARNING: requires to run in a graphical environment.
max_retries – The maximum anount of retries for this capture
init_script – A JavaScript that will be executed on each page of the capture.
final_wait – The very last wait time, after the instrumentation is done.
force – Force recapture, even if the same one was already done within the recapture_interval
recapture_interval – The time the enqueued settings are kept in memory to avoid duplicates
priority – The priority of the capture
uuid – The preset priority of the capture, auto-generated if not present. Should only be used if the initiator couldn’t enqueue immediately. NOTE: it will be overwritten if the UUID already exists.
- Returns:
UUID, reference to the capture for later use
- get_capture(uuid: str, *, decode: Literal[True] = True) CaptureResponse¶
- get_capture(uuid: str, *, decode: Literal[False]) CaptureResponseJson
Get the results of a capture, in a json compatible format or not
- Parameters:
uuid – The UUID if the capture (given by enqueue)
decode – Decode the capture result or not.
- Returns:
The capture, decoded or not.
- get_capture_status(uuid: str) CaptureStatus¶
Get the status of a capture
- Parameters:
uuid – The UUID if the capture (given by enqueue)
- Returns:
The status
- get_session_backend_metadata(uuid: str) dict[str, Any] | None¶
Return backend-specific metadata for trusted session transport callers.
- get_session_metadata(uuid: str) SessionMetadata | None¶
Return public session metadata for a capture UUID, or None if no session exists.
- playwright_devices() dict[str, Any]¶
Get the devices exposed by Playwright
- request_finish(uuid: str) bool¶
Mark a remote headfull session as ready for final capture.
Returns the updated metadata, or None if no session exists.
- settings() dict[str, str | bool | int]¶
The public settings for the instance
- class lacuscore.CaptureStatus(*values)¶
The status of the capture
- class lacuscore.CaptureResponse¶
A capture made by Lacus. With the base64 encoded image and downloaded file decoded to bytes.
- class lacuscore.CaptureResponseJson¶
A capture made by Lacus. With the base64 encoded image and downloaded file not decoded.