----------------------- BUGS (arranged from high to low priority) ----------------------- Matt's Bugs: 1. suddenly slow graphics (graphics are plotting smoothly, but after several "show graphics" the plots become really slow). Graphics getting reeeeeally slow over time. Even after stop/restart. 3. after minimizing and quitting, when I double click on the system tray icon the window is still minimized - a bit confusing. 4. two Solaris Sun Solaris 1.01 clients in the astropulse download page. 6. get preferences works, but is slightly confusing - you have to go to projects, right click on "get preferences", and then exit/restart boinc before I get to see my new pretty underwater colors. 7. after minimizing graphics window it behaves strangely - the only way to maximize it again is to right click and select "maximize". Even then the resize buttons, etc. don't work as they are supposed to. 8. "suspend" seems to suspend, but after restart the CPU time jumped up by a significant amount. 11. When trying to detach from "http://www.lebofsky.com" I got a seg fault and BOINC crashed. After restarting BOINC I was able to detach from lebofsky.com without any ado. 13. For kicks I was busy running astropulse and then chose to log on to project and entered the astropulse URL again. I was expecting BOINC to immediately come back with an error or message saying I was already logged on to that project. Instead, the work page went haywire - The workunits I was working on went to 0% done, and the status was "results uploaded" and many more workunits were greyed out (also status of "results uploaded"). 14. So I detached from astropulse and logged in again, after downloading fresh work units they showed 0% progress but the same exact CPU time as the work units from the project I just detached from. I then tried detaching, quitting BOINC, relogging into to astropulse and once again the workunits showed the same CPU time as before, 0% done, and the status results uploaded. So it looks like a state file somewhere is not being deleted. 15. After running all night (on Win98) I shook the mouse to wake up the blank screen, and all I saw was the top half of the screen was solid gray, and the bottom half the bottom half of the astropulse graphics. They weren't moving. The computer was frozen. I had to ctrl-alt-del to restart. - If BOINC starts up before the taskbar is available, it is inaccessible - GUI client should display "Upload failed" and "Download failed" in transfers tab when failure occurs - GUI: Result status should say "downloading files", "uploading files", etc. - message window should reposition to bottom when new message - Win GUI: line between menus and tabs - "show graphics" should not use right-click - win GUI: reduce flicker? - labels on disk graph are not clear - document and add to global prefs? run_minimized hangup_if_dialed ----------------------- HIGH-PRIORITY (should do for beta test) ----------------------- "Add project" behavior: Goal: give user timely feedback if bad project URL or account ID; don't leave bad project files sitting around A project addition is "successful" when 1) the client fetches the master page, 2) the master page has at least one scheduler URL 3) the client contacts a scheduler and gets a user name back. The cmdline and GUI clients need to inform the user if a project add is not successful, since it probably means the master URL or account ID were typed in wrong. A project is "tentative" if the above hasn't happened yet. This is flagged in the project file () and in memory A "failure event" is - master fetch fails - master page has no scheduler URLs - scheduler RPC fails - scheduler RPC returns no user name A "success event" is - scheduler RPC returns user name cmdline client first time (no projects initially) or -add_new_project flag: new project is tentative (write flag to project file) if failure event occurs for tentative project: project_add_fail(PROJECT&) print "check URL and account ID" message delete project file exit If success event occurs for tentative project project_add_succeed(PROJECT&) clear tentative flag, rewrite account file GUI client: first-time dialog or "attach to project" command: show a modal dialog ("verifying account" w/ "cancel" button) project_add_fail(PROJECT&) replace w/ modal error dialog w/ retry, cancel project_add_succeed(PROJECT&) Delete files if needed to honor disk usage constraint should include per-result constraints (e.g. giant stderr files) inform user if files deleted implement server watchdogs ----------------------- THINGS TO TEST (preferably with test scripts) ----------------------- time-of-day limitation Limit frequency of disk writes make sure it actually works - Test suspend/resume functionality on Windows/UNIX - verify that if file xfer is interrupted, it resumes at right place - result reissue - WU failure: too many errors - WU failure: too many good results - credit is granted even if result arrives very late - multiple preference sets - shared memory and CPU time measurement, with and without the BOINC API - timezone on all platforms - preference propagation between projects - ensure cpu time doesn't reset if app is killed rather than quitting ----------------------- MEDIUM-PRIORITY (should do before public release) ----------------------- change show_message to use vsprintf add an RPC to verify an account ID (returns DB ID for user) needed for multi-project stats sites implement a "fetch prefs" command (regular RPC w/o work request) all RPCs should return a "user-specific project URL" to be used in GUI (might refer to user page) in GUI, project name should hyperlink to a project-specified URL (typically user page for that project) preference flag for confirm before accepting executable file abort result if any file exceeds max_nbytes per-WU limits max disk max CPU max VM size let user choose language files in installation process write general language file manipulation functions use https to secure login pages, do we care about authenticator being transmitted without encryption from the client? abort app if excess memory used Windows 9x CPU time calculated incorrectly write docs for project management how to start/stop server complex what needs to be backed up and how account creation: show privacy/usage policies finish SOCKS implementation, test decide what to do with invalid result files in upload directory make get_local_ip_addr() work in all cases think about sh_fopen related functionality in BOINC client Implement FIFO mechanism in scheduler for results that can't be sent user profiles on web (borrow logic from SETI@home) Devise system for porting applications password-protected web-based interface for uploading app versions and adding them to DB XXX should do this manually since need to sign Add 2-D waterfall display to Astropulse Deadline mechanism for results - use in result dispatching - use in file uploading (decide what to upload next) - use in deciding when to make scheduler RPC (done already?) Testing framework better mechanisms to model server/client/communication failure better mechanisms to simulate large load do client/server on separate hosts? Global preferences test propagation mechanism set up multi-project, multi-host test; change global prefs at one web site, make sure they propagate to all hosts Per-project preferences test project-specific prefs make example web edit pages make app that uses them set up a test with multiple projects test "add project" feature, GUI and cmdline test resource share mechanism CPU benchmarking review CPU benchmarks - do they do what we want? what to do when tests show hardware problem? How should we weight factors for credit? run CPU tests unobtrusively, periodically check that on/conn/active fracs are maintainted correctly check that bandwidth is measured correctly measure disk/mem size on all platforms get timezone to work CPU accounting in the presence of checkpoint/restart test Redundancy checking and validation test the validation mechanism make sure credit is granted correctly make sure average, total credit maintained correctly for user, host Windows screensaver functionality idle-only behavior without screensaver - test Data transfer make sure restart of downloads works make sure restart of uploads works test download/upload with multiple data servers make sure it tries servers in succession, does exponential backoff if all fail review and document prioritization of transfers review protocol; make sure error returns are possible and handled correctly Scheduler Should dispatch results based on deadline? test that scheduler estimates WU completion time correctly test that scheduler sends right amount of work test that client estimates remaining work correctly, requests correct # of seconds test that hi/low water mark system works test that scheduler sends only feasible WUs Scheduler RPC formalize notion of "permanent failure" (e.g. can't download file) report perm failures to scheduler, record in DB make sure RPC backoff is done for any perm failure (in general, should never make back-to-back RPCs to a project) make sure that client eventually reloads master URL Application graphics finish design, implementation, doc, testing size, frame rate, whether to generate Work generation generation of upload signature is very slow prevent file_xfer->req1 from overflowing. This problems seems to be happening when the file_upload_handler returnes a message to the client that is large. This causes project->parsefile to get wrong input and so on. test HTTP redirect mechanism for all types of ops Add batch features to ops web The Windows installer sometimes leave boinc.# files in the BOINC directory. This is likely due to the installer not being able to delete the old boinc.dll file If a client connects to the scheduling server using default prefs, use the stored user prefs for determining how much work to send Windows client crashes if application fails to start up (CreateProcess) ----------------------- LONG-TERM IDEAS AND PROJECTS ----------------------- use https for login (don't sent account ID or password in clear) CPU benchmarking This should be done by a pseudo-application rather than by the core client. This would eliminate the GUI-starvation problem, and would make it possible to have architecture-specific benchmarking programs (e.g. for graphics coprocessor) or project-specific programs. investigate binary diff mechanism for updating persistent files verify support for > 4 GB files everywhere use FTP instead of HTTP for file xfer?? measure speed diff Local scheduling more intelligent decision about when/what to work on - monitor VM situation, run small-footprint programs even if user active - monitor network usage, do net xfers if network idle even if user active The following would require client to accept connections: - clients can act as proxy scheduling server - exiting client can pass work to another client - client can transfer files to other clients User/host "reputation" keep track of % results bad, %results claimed > 2x granted credit both per-host and per-user. Make these visible to project, to that user (only) Storage validation periodic rehash of persistent files; compare results between hosts Include account ID in URL for file xfers This would let you verify network xfers by scanning web logs (could use that to give credit for xfers) WU/result sequence mechanism design/implement/document Multiple application files document, test Versioning think through issues involved in: compatibility of core client and scheduling server compatibility of core client and data server compatibility of core client and app version compatibility of core client and client state file? Need version numbers for protocols/interfaces? What messages to show user? Project? Persistent files test design/implement test reporting, retrieval mechanisms (do this using WU/results with null application?) NET_XFER_SET review logic; prevent one stream for starving others Kill app if there is a memory leak Other user preferences: memory restrictions process priority/affinity show disk usage as two pie charts (one for overall, one for per project)