C implementation. See SF patch 474274, by Brett Cannon.
(As an experiment, I'm adding a line that #undefs HAVE_STRPTIME,
so that you'll always get the Python version. This is so that it
gets some good exercise. We should eventually delete that line.)
- The log reader now provides a "closed" attribute similar to the
profiler.
- Both the profiler and log reader now provide a fileno() method.
- Use METH_NOARGS where possible, allowing simpler code in the method
implementations.
write_header(): When we encounter a non-string object in sys.path, record
a fairly mindless placeholder rather than dying. Possibly could record
the repr of the object found, but not clear whether that matters.
The staticforward define was needed to support certain broken C
compilers (notably SCO ODT 3.0, perhaps early AIX as well) botched the
static keyword when it was used with a forward declaration of a static
initialized structure. Standard C allows the forward declaration with
static, and we've decided to stop catering to broken C compilers. (In
fact, we expect that the compilers are all fixed eight years later.)
I'm leaving staticforward and statichere defined in object.h as
static. This is only for backwards compatibility with C extensions
that might still use it.
XXX I haven't updated the documentation.
Don't pass CREATE_NEW_CONSOLE to CreateProcess(), meaning our child process is in the same "console group" and therefore interrupted by the same Ctrl+C that interrupts the parent.
MSDN sample programs use it, apparently in error. The correct name
is WIN32_LEAN_AND_MEAN. After switching to the correct name, in two
cases more was needed because the code actually relied on things that
disappear when WIN32_LEAN_AND_MEAN is defined.
PyImport_ImportModule() is not guaranteed to return a module object.
When another type of object was returned, the PyModule_GetDict() call
return NULL and the subsequent GetItem() seg faulted.
Bug fix candidate.
mechanism is no longer evil: it no longer plays dangerous games with
the type pointer or refcounts, and objects in extension modules can play
along too without needing to edit the core first.
Rewrote all the comments to explain this, and (I hope) give clear
guidance to extension authors who do want to play along. Documented
all the functions. Added more asserts (it may no longer be evil, but
it's still dangerous <0.9 wink>). Rearranged the generated code to
make it clearer, and to tolerate either the presence or absence of a
semicolon after the macros. Rewrote _PyTrash_destroy_chain() to call
tp_dealloc directly; it was doing a Py_DECREF again, and that has all
sorts of obscure distorting effects in non-release builds (Py_DECREF
was already called on the object!). Removed Christian's little "embedded
change log" comments -- that's what checkin messages are for, and since
it was impossible to correlate the comments with the code that changed,
I found them merely distracting.
binascii_crc32(): The previous patch forced this to return the same
result across platforms. This patch deals with that, on a 64-bit box,
the *entry* value may have "unexpected" bits in the high four bytes.
Bugfix candidate.
binascii_crc32(): Make this return a signed 4-byte result across
platforms. The other way to make this platform-independent would be to
make it return an unsigned unbounded int, but the evidence suggests
other code out there treats it like a signed 4-byte int (e.g., existing
code writing the result with struct.pack "l" format).
Bugfix candidate.
This was mostly a matter of adding comments and light code rearrangement.
Upon untracking, gc_next is still set to NULL. It's a cheap way to
provoke memory faults if calling code is insane. It's also used in some
way by the trashcan mechanism.
object should now have a well-defined gc_refs value, with clear transitions
among gc_refs states. As a result, none of the visit_XYZ traversal
callbacks need to check IS_TRACKED() anymore, and those tests were removed.
(They were already looking for objects with specific gc_refs states, and
the gc_refs state of an untracked object can no longer match any other
gc_refs state by accident.)
Added more asserts.
I expect that the gc_next == NULL indicator for an untracked object is
now redundant and can also be removed, but I ran out of time for this.
in gc_refs, even at the cost of putting back a test+branch in
visit_decref.
The good news: since gc_refs became utterly tame then, it became
clear that another special value could be useful. The move_roots() and
move_root_reachable() passes have now been replaced by a single
move_unreachable() pass. Besides saving a pass over the generation, this
has a better effect: most of the time everything turns out to be
reachable, so we were breaking the generation list apart and moving it
into into the reachable list, one element at a time. Now the reachable
stuff stays in the generation list, and the unreachable stuff is moved
instead. This isn't quite as good as it sounds, since sometimes we
guess wrongly that a thing is unreachable, and have to move it back again.
Still, overall, it yields a significant (but not dramatic) boost in
collection speed.
1. You're not supposed to call this with a NULL argument, although the
docs could be clearer about that. The other visit_XYZ() functions
don't bother to check. This doesn't either now, although it does
assert non-NULL-ness now.
2. It doesn't matter whether the object is currently tracked, so don't
bother checking that either (if it isn't currently tracked, it may
have some nonsense value in gc_refs, but it doesn't hurt to
decrement gibberish, and it's cheaper to do so than to make everyone
test for trackedness).
It would be nice to get rid of the other tests on IS_TRACKED. Perhaps
trackedness should not be a matter of not being in any gc list, but
should be a matter of being in a new "untracked" gc list. This list
simply wouldn't be involved in the collection mechanism. A newly
created object would be put in the untracked list. Tracking would
simply unlink it and move it into the gen0 list. Untracking would do
the reverse. No test+branch needed then. visit_move() may be vulnerable
then, though, and I don't know how this would work with the trashcan.
"The regression" is actually due to that 2.2.1 had a bug that prevented
the regression (which isn't a regression at all) from showing up. "The
regression" is actually a glitch in cyclic gc that's been there forever.
As the generation being collected is analyzed, objects that can't be
collected (because, e.g., we find they're externally referenced, or
are in an unreachable cycle but have a __del__ method) are moved out
of the list of candidates. A tricksy scheme uses negative values of
gc_refs to mark such objects as being moved. However, the exact
negative value set at the start may become "more negative" over time
for objects not in the generation being collected, and the scheme was
checking for an exact match on the negative value originally assigned.
As a result, objects in generations older than the one being collected
could get scanned too, and yanked back into a younger generation. Doing
so doesn't lead to an error, but doesn't do any good, and can burn an
unbounded amount of time doing useless work.
A test case is simple (thanks to Kevin Jacobs for finding it!):
x = []
for i in xrange(200000):
x.append((1,))
Without the patch, this ends up scanning all of x on every gen0 collection,
scans all of x twice on every gen1 collection, and x gets yanked back into
gen1 on every gen0 collection. With the patch, once x gets to gen2, it's
never scanned again until another gen2 collection, and stays in gen2.
Bugfix candidate, although the code has changed enough that I think I'll
need to port it by hand. 2.2.1 also has a different bug that causes
bound method objects not to get tracked at all (so the test case doesn't
burn absurd amounts of time in 2.2.1, but *should* <wink>).
Setting the buffer_text attribute to true causes the parser to collect
character data, waiting as long as possible to report it to the Python
callback. This can save an enormous number of callbacks from C to
Python, which can be a substantial performance improvement.
buffer_text defaults to false.
The handlers array on each parser now has the invariant that None will
never be set as a handler; it will always be NULL or a Python-level
value passed in for the specific handler.
have_handler(): Return true if there is a Python handler for a
particular event.
get_handler_name(): Return a string object giving the name of a
particular handler. This caches the string object so it doesn't
need to be created more than once.
get_parse_result(): Helper to allow the Parse() and ParseFile()
methods to share the same logic for determining the return value
or exception state.
PyUnknownEncodingHandler(), PyModule_AddIntConstant():
Made these helpers static. (The later is only defined for older
versions of Python.)
pyxml_UpdatePairedHandlers(), pyxml_SetStartElementHandler(),
pyxml_SetEndElementHandler(), pyxml_SetStartNamespaceDeclHandler(),
pyxml_SetEndNamespaceDeclHandler(), pyxml_SetStartCdataSection(),
pyxml_SetEndCdataSection(), pyxml_SetStartDoctypeDeclHandler(),
pyxml_SetEndDoctypeDeclHandler():
Removed. These are no longer needed with Expat 1.95.x.
handler_info:
Use the setter functions provided by Expat 1.95.x instead of the
pyxml_Set*Handler() functions which have been removed.
Minor code formatting changes for consistency.
Trailing whitespace removed.