boinc/todo

-----------------------
BUGS (arranged from high to low priority)
-----------------------
Matt's Bugs:
1. suddenly slow graphics (graphics are plotting smoothly, but after
several "show graphics" the plots become really slow). Graphics getting
reeeeeally slow over time. Even after stop/restart.

2. after minimizing graphics window it behaves strangely - the only way
to maximize it again is to right click and select "maximize". Even then
the resize buttons, etc. don't work as they are supposed to.

3. After running all night (on Win98) I shook the mouse to wake up the
blank screen, and all I saw was the top half of the screen was solid
gray, and the bottom half the bottom half of the astropulse graphics.
They weren't moving. The computer was frozen. I had to ctrl-alt-del
to restart.

- Graphics window should open in front of others
- Corrupted floating hover text
- Resetting project should delete old project files
- Test BOINC error handling on UNIX if app can't start up (use bad path name)
- GUI client should display "Upload failed" and "Download failed"
    in transfers tab when failure occurs
- GUI: Result status should say "downloading files", "uploading files", etc.
- message window should reposition to bottom when new message
- Win GUI: line between menus and tabs
- win GUI: reduce flicker?
- labels on disk graph are not clear
- document and add to global prefs?
    run_minimized
    hangup_if_dialed

-----------------------
HIGH-PRIORITY (should do for beta test)
-----------------------

"Add project" behavior:
    Goal: give user timely feedback if bad project URL or account ID;
    don't leave bad project files sitting around

    A project addition is "successful" when
    1) the client fetches the master page,
    2) the master page has at least one scheduler URL
    3) the client contacts a scheduler and gets a user name back.

    The cmdline and GUI clients need to inform the user if a project
    add is not successful, since it probably means the master URL
    or account ID were typed in wrong.

    A project is "tentative" if the above hasn't happened yet.
    This is flagged in the project file (<tentative/>) and in memory

    A "failure event" is
        - master fetch fails
        - master page has no scheduler URLs
        - scheduler RPC fails
        - scheduler RPC returns no user name
    A "success event" is
        - scheduler RPC returns user name

    cmdline client
        first time (no projects initially) or -add_new_project flag:
        new project is tentative (write flag to project file)
        if failure event occurs for tentative project:
            project_add_fail(PROJECT&)
                print "check URL and account ID" message
                delete project file
                exit
        If success event occurs for tentative project
            project_add_succeed(PROJECT&)
                clear tentative flag, rewrite account file

    GUI client:
        first-time dialog or "attach to project" command:
        show a modal dialog
            ("verifying account" w/ "cancel" button)
        project_add_fail(PROJECT&)
            replace w/ modal error dialog w/ retry, cancel
        project_add_succeed(PROJECT&)

Delete files if needed to honor disk usage constraint
    should include per-result constraints (e.g. giant stderr files)
    inform user if files deleted

implement server watchdogs

-----------------------
THINGS TO TEST (preferably with test scripts)
-----------------------
time-of-day limitation
Limit frequency of disk writes
    make sure it actually works
- Test suspend/resume functionality on Windows/UNIX
- verify that if file xfer is interrupted, it resumes at right place
- result reissue
- WU failure: too many errors
- WU failure: too many good results
- credit is granted even if result arrives very late
- multiple preference sets
- shared memory and CPU time measurement, with and without the BOINC API
- timezone on all platforms
- preference propagation between projects
- ensure cpu time doesn't reset if app is killed rather than quitting
- detach/reset project (especially in CLI)
- CPU accounting in the presence of checkpoint/restart
-----------------------
MEDIUM-PRIORITY (should do before public release)
-----------------------

more accurate time to completion calculation - perhaps use recent change in percent done?

change show_message to use vsprintf

add an RPC to verify an account ID (returns DB ID for user)
    needed for multi-project stats sites

implement a "fetch prefs" command (regular RPC w/o work request)

all RPCs should return a "user-specific project URL"
    to be used in GUI (might refer to user page)

in GUI, project name should hyperlink to a project-specified URL
    (typically user page for that project)

preference flag for confirm before accepting executable file

abort result if any file exceeds max_nbytes

per-WU limits
    max disk
    max CPU
    max VM size

let user choose language files in installation process

write general language file manipulation functions

use https to secure login pages, do we care about authenticator
    being transmitted without encryption from the client?

abort app if excess memory used

write docs for project management
    how to start/stop server complex
    what needs to be backed up and how

account creation: show privacy/usage policies

finish SOCKS implementation, test

decide what to do with invalid result files in upload directory

make get_local_ip_addr() work in all cases

think about sh_fopen related functionality in BOINC client

Implement FIFO mechanism in scheduler for results that can't be sent

user profiles on web (borrow logic from SETI@home)

Devise system for porting applications
    password-protected web-based interface for
    uploading app versions and adding them to DB
    XXX should do this manually since need to sign

Add 2-D waterfall display to Astropulse

Deadline mechanism for results
    - use in result dispatching
    - use in file uploading (decide what to upload next)
    - use in deciding when to make scheduler RPC (done already?)

Testing framework
    better mechanisms to model server/client/communication failure
    better mechanisms to simulate large load
    do client/server on separate hosts?

Global preferences
    test propagation mechanism
        set up multi-project, multi-host test;
        change global prefs at one web site,
        make sure they propagate to all hosts

Per-project preferences
    test project-specific prefs
        make example web edit pages
        make app that uses them
    set up a test with multiple projects
        test "add project" feature, GUI and cmdline
        test resource share mechanism

CPU benchmarking
    review CPU benchmarks - do they do what we want?
    what to do when tests show hardware problem?
    How should we weight factors for credit?
    run CPU tests unobtrusively, periodically
    check that on/conn/active fracs are maintainted correctly
    check that bandwidth is measured correctly
    measure disk/mem size on all platforms
    get timezone to work

Redundancy checking and validation
    test the validation mechanism
    make sure credit is granted correctly
    make sure average, total credit maintained correctly for user, host

Windows screensaver functionality
    idle-only behavior without screensaver - test

Data transfer
    make sure restart of downloads works
    make sure restart of uploads works
    test download/upload with multiple data servers
        make sure it tries servers in succession,
        does exponential backoff if all fail
    review and document prioritization of transfers
    review protocol; make sure error returns are possible and handled correctly

Scheduler
    Should dispatch results based on deadline?
    test that scheduler estimates WU completion time correctly
    test that scheduler sends right amount of work
    test that client estimates remaining work correctly,
        requests correct # of seconds
    test that hi/low water mark system works
    test that scheduler sends only feasible WUs

Scheduler RPC
    formalize notion of "permanent failure" (e.g. can't download file)
    report perm failures to scheduler, record in DB
    make sure RPC backoff is done for any perm failure
        (in general, should never make back-to-back RPCs to a project)
    make sure that client eventually reloads master URL

Application graphics
    finish design, implementation, doc, testing
        size, frame rate, whether to generate

Work generation
    generation of upload signature is very slow

prevent file_xfer->req1 from overflowing. This problems seems to be
    happening when the file_upload_handler returnes a message to the
    client that is large. This causes project->parsefile to get wrong
    input and so on.

test HTTP redirect mechanism for all types of ops

Add batch features to ops web

The Windows installer sometimes leave boinc.# files in the BOINC
directory.  This is likely due to the installer not being able to
delete the old boinc.dll file

If a client connects to the scheduling server using default prefs,
use the stored user prefs for determining how much work to send

Windows client crashes if application fails to start up (CreateProcess)

get preferences works, but is slightly confusing - you have to go to
projects, right click on "get preferences", and then exit/restart boinc
before I get to see my new pretty underwater colors.

"suspend" seems to suspend, but after restart the CPU time jumped up
by a significant amount.  This is because Windows 9x uses GetTickCount
for CPU time.

"Retry transfers now" feature, especially for dialup users

-----------------------
LONG-TERM IDEAS AND PROJECTS
-----------------------

use https for login (don't sent account ID or password in clear)

CPU benchmarking
    This should be done by a pseudo-application
    rather than by the core client.
    This would eliminate the GUI-starvation problem,
    and would make it possible to have architecture-specific
    benchmarking programs (e.g. for graphics coprocessor)
    or project-specific programs.

investigate binary diff mechanism for updating persistent files

verify support for > 4 GB files everywhere

use FTP instead of HTTP for file xfer??
    measure speed diff

Local scheduling
    more intelligent decision about when/what to work on
    - monitor VM situation, run small-footprint programs
        even if user active
    - monitor network usage, do net xfers if network idle
        even if user active

The following would require client to accept connections:
    - clients can act as proxy scheduling server
    - exiting client can pass work to another client
    - client can transfer files to other clients

User/host "reputation"
    keep track of % results bad, %results claimed > 2x granted credit
    both per-host and per-user.
    Make these visible to project, to that user (only)

Storage validation
    periodic rehash of persistent files;
    compare results between hosts

Include account ID in URL for file xfers
    This would let you verify network xfers by scanning web logs
    (could use that to give credit for xfers)

WU/result sequence mechanism
    design/implement/document

Multiple application files
    document, test

Versioning
    think through issues involved in:
    compatibility of core client and scheduling server
    compatibility of core client and data server
    compatibility of core client and app version
    compatibility of core client and client state file?
    Need version numbers for protocols/interfaces?
    What messages to show user?  Project?

Persistent files
    test
    design/implement test reporting, retrieval mechanisms
    (do this using WU/results with null application?)

NET_XFER_SET
    review logic; prevent one stream for starving others

Other user preferences:
    memory restrictions
    process priority, CPU affinity
    show disk usage as two pie charts (one for overall, one for per project)