seesaw Package¶
seesaw
Package¶
ArchiveTeam seesaw kit
config
Module¶
Configuration value manipulation.
-
class
seesaw.config.
ConfigValue
(name, title='', description='', default=None, editable=True, advanced=True)[source]¶ Bases:
object
Configuration value validator.
The collection methods are useful for providing user configurable settings at run time. For example, when a pipeline file is executed by the warrior, the additional config values are presented in the warrior configuration panel.
-
collector
= None¶
-
-
class
seesaw.config.
NumberConfigValue
(*args, **kwargs)[source]¶ Bases:
seesaw.config.ConfigValue
-
class
seesaw.config.
StringConfigValue
(*args, **kwargs)[source]¶ Bases:
seesaw.config.ConfigValue
-
seesaw.config.
realize
(v, item=None)[source]¶ Makes objects contain concrete values from an item.
A silly example:
class AddExpression(object): def realize(self, item): return = item['x'] + item['y'] pipeline = Pipeline(ComputeMath(AddExpression()))
In the example, we want to compute an addition expression. The values are defined in the Item.
event
Module¶
Actor model.
externalprocess
Module¶
Running subprocesses asynchronously.
-
class
seesaw.externalprocess.
AsyncPopen
(*args, **kwargs)[source]¶ Bases:
object
Asynchronous version of
subprocess.Popen
.Deprecated.
-
class
seesaw.externalprocess.
AsyncPopen2
(*args, **kwargs)[source]¶ Bases:
object
Adapter for the legacy AsyncPopen
-
stdin
¶
-
-
class
seesaw.externalprocess.
CurlUpload
(target, filename, connect_timeout='60', speed_limit='1', speed_time='900', max_tries=None)[source]¶ Bases:
seesaw.externalprocess.ExternalProcess
Upload with Curl process runner.
-
class
seesaw.externalprocess.
ExternalProcess
(name, args, max_tries=1, retry_delay=30, accept_on_exit_code=None, retry_on_exit_code=None, env=None)[source]¶ Bases:
seesaw.task.Task
External subprocess runner.
-
class
seesaw.externalprocess.
RsyncUpload
(target, files, target_source_path='./', bwlimit='0', max_tries=None, extra_args=None)[source]¶ Bases:
seesaw.externalprocess.ExternalProcess
Upload with Rsync process runner.
-
class
seesaw.externalprocess.
WgetDownload
(args, max_tries=1, accept_on_exit_code=None, retry_on_exit_code=None, env=None, stdin_data_function=None)[source]¶ Bases:
seesaw.externalprocess.ExternalProcess
Download with Wget process runner.
item
Module¶
Managing work units.
-
class
seesaw.item.
Item
(pipeline, item_id, item_number, properties=None, keep_data=False, prepare_data_directory=True)[source]¶ Bases:
object
A thing, or work unit, that needs to be downloaded.
It has properties that are filled by the
Task
.An Item behaves like a mutable mapping.
Note
State belonging to a item should be stored on the actual item itself. That is, do not store variables onto a
Task
unless you know what you are doing.
pipeline
Module¶
-
class
seesaw.pipeline.
Pipeline
(*tasks)[source]¶ Bases:
object
The sequence of steps that complete a
Task
.Your pipeline will probably be something like this:
- Request an assignment from the tracker.
- Run Wget to download the file.
- Upload the downloaded file with rsync.
- Tell the tracker that the assignment is done.
project
Module¶
Project information.
-
class
seesaw.project.
Project
(title=None, project_html=None, utc_deadline=None)[source]¶ Bases:
object
Briefly describes a project metadata.
This class defines the title of the project, a short description with an optional project logo and an optional deadline. The information will be shown in the web interface when the project is running.
runner
Module¶
Pipeline execution.
task
Module¶
Managing steps in a work unit.
-
class
seesaw.task.
ConditionalTask
(condition_function, inner_task)[source]¶ Bases:
seesaw.task.Task
Runs a task optionally.
-
class
seesaw.task.
LimitConcurrent
(concurrency, inner_task)[source]¶ Bases:
seesaw.task.Task
Restricts the number of tasks of the same type that can be run at once.
-
class
seesaw.task.
PrintItem
[source]¶ Bases:
seesaw.task.SimpleTask
Output the name of the
Item
.
-
class
seesaw.task.
SetItemKey
(key, value)[source]¶ Bases:
seesaw.task.SimpleTask
Set a value onto a task.
-
class
seesaw.task.
SimpleTask
(name)[source]¶ Bases:
seesaw.task.Task
A subclassable
Task
that should do one small thing well.Example:
class MyTask(SimpleTask): def process(self, item): item['my_message'] = 'hello world!'
tracker
Module¶
Contacting the work unit server.
A Tracker refers to the Universal Tracker (https://github.com/ArchiveTeam/universal-tracker).
-
class
seesaw.tracker.
GetItemFromTracker
(tracker_url, downloader, version=None)[source]¶ Bases:
seesaw.tracker.TrackerRequest
Get a single work unit information from the Tracker.
-
class
seesaw.tracker.
PrepareStatsForTracker
(defaults=None, file_groups=None, id_function=None)[source]¶ Bases:
seesaw.task.SimpleTask
Apply statistical values on the item.
-
class
seesaw.tracker.
SendDoneToTracker
(tracker_url, stats)[source]¶ Bases:
seesaw.tracker.TrackerRequest
Inform the Tracker the work unit has been completed.
-
class
seesaw.tracker.
TrackerRequest
(name, tracker_url, tracker_command, may_be_canceled=False)[source]¶ Bases:
seesaw.task.Task
Represents a request to a Tracker.
-
DEFAULT_RETRY_DELAY
= 60¶
-
-
class
seesaw.tracker.
UploadWithTracker
(tracker_url, downloader, files, version=None, rsync_target_source_path='./', rsync_bwlimit='0', rsync_extra_args=[], curl_connect_timeout='60', curl_speed_limit='1', curl_speed_time='900')[source]¶ Bases:
seesaw.tracker.TrackerRequest
Upload work unit results.
One of the inner task is used depending on the Tracker’s response to where to upload:
RsyncUpload
CurlUpload
util
Module¶
Miscellaneous functions.
-
seesaw.util.
find_executable
(name, version, paths, version_arg='-V')[source]¶ Returns the path of a matching executable.
See also
warrior
Module¶
The warrior server.
The warrior phones home to Warrior HQ (https://github.com/ArchiveTeam/warrior-hq).
-
class
seesaw.warrior.
BandwidthMonitor
(device)[source]¶ Bases:
object
Extracts the bandwidth usage from the system stats.
-
devre
= <_sre.SRE_Pattern object>¶
-
-
class
seesaw.warrior.
Warrior
(projects_dir, data_dir, warrior_hq_url, real_shutdown=False, keep_data=False)[source]¶ Bases:
object
The warrior god object.
-
class
Status
[source]¶ Bases:
object
-
INVALID_SETTINGS
= 'INVALID_SETTINGS'¶
-
NO_PROJECT
= 'NO_PROJECT'¶
-
REBOOTING
= 'REBOOTING'¶
-
RESTARTING_PROJECT
= 'RESTARTING_PROJECT'¶
-
RUNNING_PROJECT
= 'RUNNING_PROJECT'¶
-
SHUTTING_DOWN
= 'SHUTTING_DOWN'¶
-
STARTING_PROJECT
= 'STARTING_PROJECT'¶
-
STOPPING_PROJECT
= 'STOPPING_PROJECT'¶
-
SWITCHING_PROJECT
= 'SWITCHING_PROJECT'¶
-
UNINITIALIZED
= 'UNINITIALIZED'¶
-
-
class
web
Module¶
The warrior web interface.
-
class
seesaw.web.
ApiHandler
(application, request, **kwargs)[source]¶ Bases:
tornado.web.RequestHandler
Processes API requests.
-
class
seesaw.web.
IndexHandler
(application, request, **kwargs)[source]¶ Bases:
tornado.web.RequestHandler
Shows the index.html.
-
class
seesaw.web.
ItemMonitor
(item)[source]¶ Bases:
object
Pushes item states and information to the client.
-
class
seesaw.web.
SeesawConnection
(session)[source]¶ Bases:
sockjs.tornado.conn.SockJSConnection
A WebSocket server that communicates the state of the warrior.
-
clients
= set([])¶
-
instance_id
= '22941-0.507595'¶
-
item_monitors
= {}¶
-
project
= None¶
-
runner
= None¶
-
warrior
= None¶
-
-
seesaw.web.
start_runner_server
(project, runner, bind_address='localhost', port_number=8001, http_username=None, http_password=None)[source]¶ Starts a web interface for a manually run pipeline.
Unlike
start_warrior_server()
, this UI does not contain an configuration or project management panel.