Commit Graph

1201 Commits

Author SHA1 Message Date
Matthew Honnibal c301bebd33 Merge branch 'master' of https://github.com/honnibal/spaCy into develop 2015-09-09 10:55:39 +02:00
Matthew Honnibal 0e24d099a1 * Fix L/R edge bug, by ensuring l_edge and r_edge are preset, and fixing the way the edge update in del_arc. Bugs keep arising here because the edges are absolute positions, where everything else is relative. I'm also not 100% convinced that del_arc is handled correctly. Do we need to update the parents? 2015-09-09 03:40:44 +02:00
Matthew Honnibal 2be3620333 * Save morphological analyses in a cache 2015-09-08 15:39:24 +02:00
Matthew Honnibal 1def5a6cbe * Fix print statements in matcher 2015-09-08 15:38:19 +02:00
Matthew Honnibal 64d71f8893 * Fix lemmatizer 2015-09-08 15:38:03 +02:00
Matthew Honnibal 623329b19a Merge branch 'master' of ssh://github.com/honnibal/spaCy into develop 2015-09-08 14:27:01 +02:00
Matthew Honnibal 62a01dd41d * Fix issue #92: lexemes.bin read error on 32-bit platforms. 2015-09-08 14:23:58 +02:00
Matthew Honnibal ef58607a99 * Add spacy.it 2015-09-06 22:10:37 +02:00
Matthew Honnibal 2154a54f6b * Add spacy.de 2015-09-06 21:56:47 +02:00
Matthew Honnibal f6ec5bf1b0 * Use empty tag map in vocab if none supplied 2015-09-06 20:19:27 +02:00
Matthew Honnibal 4f8e38271d * Fix merge errors in lexeme.pxd 2015-09-06 20:19:08 +02:00
Matthew Honnibal 86c888667f * Merge in changes from de branch 2015-09-06 19:49:28 +02:00
Matthew Honnibal d2fc104a26 * Begin merge of Gazetteer and DE branches 2015-09-06 19:45:15 +02:00
Matthew Honnibal dbf8dce109 Merge branch 'gaz' of ssh://github.com/honnibal/spaCy into gaz 2015-09-06 18:44:14 +02:00
Matthew Honnibal 9eae9837c4 * Fix morphology look up 2015-09-06 17:53:39 +02:00
Matthew Honnibal 6427a3fcac * Temporarily import flag attributes in matcher 2015-09-06 17:53:12 +02:00
Matthew Honnibal 7cc56ada6e * Temporarily add py_set_flag attribute in Lexeme 2015-09-06 17:52:51 +02:00
Matthew Honnibal e35bb36be7 * Ensure Lexeme.check_flag returns a boolean value 2015-09-06 17:52:32 +02:00
Matthew Honnibal 7e4fea67d3 * Fix bug in token subtree, introduced by duplication of L/R code in Stateclass. Need to consolidate the two methods. 2015-09-06 10:48:36 +02:00
Matthew Honnibal 5edac11225 * Wrap self.parse in nogil, and break if an invalid move is predicted. The invalid break is a work-around that papers over likely bugs, but we can't easily break in the nogil block, and otherwise we'll get an infinite loop. Need to set this as an error flag. 2015-09-06 04:15:00 +02:00
Matthew Honnibal fd1eeb3102 * Add POS attribute support in get_attr 2015-09-06 04:13:03 +02:00
Matthew Honnibal 534e3dda3c * More work on language independent parsing 2015-08-28 03:44:54 +02:00
Matthew Honnibal c2307fa9ee * More work on language-generic parsing 2015-08-28 02:02:33 +02:00
Matthew Honnibal 86c4a8e3e2 * Work on new morphology organization 2015-08-27 23:11:51 +02:00
Matthew Honnibal 5b89e2454c * Improve error-reporting in tagger 2015-08-27 10:26:36 +02:00
Matthew Honnibal f0a7c99554 * Relax rule-requirement in lemmatizer 2015-08-27 10:26:19 +02:00
Matthew Honnibal 0af139e183 * Tagger training now working. Still need to test load/save of model. Morphology still broken. 2015-08-27 09:16:11 +02:00
Matthew Honnibal 1302d35dff * Rework interfaces in vocab 2015-08-26 19:21:46 +02:00
Matthew Honnibal 2d521768a3 * Store Morphology class in Vocab 2015-08-26 19:21:03 +02:00
Matthew Honnibal d30029979e * Avoid import of morphology in spans 2015-08-26 19:20:46 +02:00
Matthew Honnibal 119c0f8c3f * Hack out morphology stuff from tokenizer, while morphology being reimplemented. 2015-08-26 19:20:11 +02:00
Matthew Honnibal b4faf551f5 * Refactor language-independent tagger class 2015-08-26 19:19:21 +02:00
Matthew Honnibal a3d5e6c0dd * Reform constructor and save/load workflow in parser model 2015-08-26 19:19:01 +02:00
Matthew Honnibal 1d7f2d3abc * Hack on morphology structs 2015-08-26 19:18:36 +02:00
Matthew Honnibal f8f2f4e545 * Temporarily add PUNC name to parts_of_specch dictionary, until better solution 2015-08-26 19:18:19 +02:00
Matthew Honnibal 008b02b035 * Comment out enums in Morpohlogy for now 2015-08-26 19:17:35 +02:00
Matthew Honnibal 378729f81a * Hack Morphology class towards usability 2015-08-26 19:17:21 +02:00
Matthew Honnibal 430affc347 * Fix missing n_patterns property in Matcher class. Fix from_dir method 2015-08-26 19:17:02 +02:00
Matthew Honnibal 3acf60df06 * Add missing properties in Lexeme class 2015-08-26 19:16:28 +02:00
Matthew Honnibal 76996f4145 * Hack on generic Language class. Still needs work for morphology, defaults, etc 2015-08-26 19:16:09 +02:00
Matthew Honnibal e2ef78b29c * Gut pos.pyx module, since functionality moved to spacy/tagger.pyx 2015-08-26 19:15:42 +02:00
Matthew Honnibal c4d8754385 * Specify LOCAL_DATA_DIR global in spacy.en.__init__.py 2015-08-26 19:15:07 +02:00
Matthew Honnibal c2d8edd0bd * Add PROB attribute in attrs.pxd 2015-08-26 19:14:19 +02:00
Matthew Honnibal c5a27d1821 * Move lemmatizer to spacy 2015-08-25 15:47:08 +02:00
Matthew Honnibal 82217c6ec6 * Generalize lemmatizer 2015-08-25 15:46:19 +02:00
Matthew Honnibal 8083a07c3e * Use language base class 2015-08-25 15:37:30 +02:00
Matthew Honnibal f2f699ac18 * Add language base class 2015-08-25 15:37:17 +02:00
Matthew Honnibal 5dd76be446 * Split EnPosTagger up into base class and subclass 2015-08-24 05:25:55 +02:00
Matthew Honnibal 5d5922dbfa * Begin laying out morphological features 2015-08-24 01:04:30 +02:00
Matthew Honnibal 6f1743692a * Work on language-independent refactoring 2015-08-23 20:49:18 +02:00
Matthew Honnibal 3879d28457 * Fix https for url detection 2015-08-23 02:40:35 +02:00
Matthew Honnibal cad0cca4e3 * Tmp 2015-08-22 22:04:34 +02:00
Matthew Honnibal bf38b3b883 * Hack on l/r reversal bug 2015-08-10 05:58:43 +02:00
Matthew Honnibal 6116413b47 * Fix label prediction in StepwiseState 2015-08-10 05:05:31 +02:00
Matthew Honnibal 2c9753eff2 * Whitespace 2015-08-10 00:09:02 +02:00
Matthew Honnibal 9de98f5a6f * Add Parser.stepthrough method, with context manager 2015-08-10 00:08:46 +02:00
Matthew Honnibal fe43f8cf39 * Whitespace 2015-08-09 02:31:53 +02:00
Matthew Honnibal 9c090945e0 * Add Parser.predict method, and clean up Parser.get_state 2015-08-09 02:29:58 +02:00
Matthew Honnibal 04fccfb984 * Fix get_state for parser prediction 2015-08-09 02:11:22 +02:00
Matthew Honnibal 55fde0e240 * Fix get_state 2015-08-09 01:45:30 +02:00
Matthew Honnibal f0f4fa9838 * Fix Parser.get_state 2015-08-09 01:40:13 +02:00
Matthew Honnibal 18331dca89 * Add continue_for argument to parser 'partial' function, which is now renamed to get_state 2015-08-09 01:31:54 +02:00
Matthew Honnibal 0653288fa5 * Fix stateclass.queue 2015-08-09 00:39:02 +02:00
Matthew Honnibal 9de218b7ba * Fix Parser.partial function 2015-08-08 23:45:18 +02:00
Matthew Honnibal 01be34d55a * Whitespace 2015-08-08 23:37:44 +02:00
Matthew Honnibal cc9deae960 * Add is_valid method to transition_system 2015-08-08 23:36:18 +02:00
Matthew Honnibal 2a46c77324 * Whitespace 2015-08-08 23:35:59 +02:00
Matthew Honnibal 7bafc789e7 * Add stack and queue properties to stateclass, for python access 2015-08-08 23:32:42 +02:00
Matthew Honnibal 3af938365f * Add function partial to Parser 2015-08-08 23:32:15 +02:00
Matthew Honnibal 76a1f0481a * Whitespace 2015-08-08 23:31:54 +02:00
Matthew Honnibal b0f5c39084 * Fix handling of exclusion entities 2015-08-06 17:28:43 +02:00
Matthew Honnibal 9f65879991 * Fix shape attr bug, and fix handling of false positive matches 2015-08-06 17:28:14 +02:00
Matthew Honnibal 10d869d102 * Don't allow conjunction between NPs in base NP chunks 2015-08-06 16:31:53 +02:00
Matthew Honnibal 383dfabd67 * Fix matcher setting of entities 2015-08-06 16:27:01 +02:00
Matthew Honnibal 59c3bf60a6 * Ensure entity recognizer doesn't over-write preset types 2015-08-06 16:09:08 +02:00
Matthew Honnibal cd7d1682cd * Fix loading of gazetteer.json file 2015-08-06 16:08:25 +02:00
Matthew Honnibal 9c667b7f15 * Set a value in attrs.pxd on the first flag, to reduce bugs 2015-08-06 16:08:04 +02:00
Matthew Honnibal c263577424 * Fix lower attribute in lexeme.pxd 2015-08-06 16:07:41 +02:00
Matthew Honnibal 5737115e1e * Work on gazetteer matching 2015-08-06 14:33:21 +02:00
Matthew Honnibal 9c1724ecae * Gazetteer stuff working, now need to wire up to API 2015-08-06 00:35:40 +02:00
Matthew Honnibal 5bc0e83f9a * Reimplement matching in Cython, instead of Python. 2015-08-05 01:05:54 +02:00
Matthew Honnibal 4c87a696b3 * Add draft dfa matcher, in Python. Passing tests. 2015-08-04 15:55:28 +02:00
Matthew Honnibal eb7138c761 * Add attr relation in base NP detection 2015-08-01 00:34:40 +02:00
Matthew Honnibal 4988356cf0 * Fix dependency type bug from merged tokens 2015-08-01 00:33:24 +02:00
Matthew Honnibal 78a9068319 * Fix spacy attr on merged tokens 2015-07-30 04:25:58 +02:00
Matthew Honnibal 430e2edb96 * Fix noun_chunks issue 2015-07-30 03:51:50 +02:00
Matthew Honnibal 9590968fc1 * Fix negative indices in Span 2015-07-30 02:30:24 +02:00
Matthew Honnibal 74d8cb3980 * Add noun_chunks iterator, and fix left/right child setting in Doc.merge 2015-07-30 02:29:49 +02:00
Matthew Honnibal d153f18969 * Fix negative indices on spans 2015-07-29 22:36:03 +02:00
Matthew Honnibal b5132bed7d * Set left and right children when loading parse from byte string 2015-07-28 21:03:18 +02:00
Matthew Honnibal 6609fcf4b2 * Make mem and vocab python-visible in Doc 2015-07-28 20:46:59 +02:00
Matthew Honnibal d42fe2e694 * Add unicode_literals to strings.pyx 2015-07-28 16:15:53 +02:00
Matthew Honnibal bb910cff92 * Fix Python3 problem in align_raw 2015-07-28 16:06:53 +02:00
Matthew Honnibal dcafb181b9 * Fix Python3 problem in align_raw 2015-07-28 15:52:10 +02:00
Matthew Honnibal c609ea18f0 * Increment version in download script 2015-07-28 15:22:17 +02:00
Matthew Honnibal 9c4d0aae62 * Switch to better Python2/3 compatible unicode handling 2015-07-28 14:45:37 +02:00
Matthew Honnibal 7606d9936f * Python3 correction for GoldParse 2015-07-28 14:44:53 +02:00
Matthew Honnibal ddc1a5cfe5 * Fix training under python3 2015-07-28 14:09:30 +02:00
Matthew Honnibal a8bbd7312c * Hackishly patch long dependencies problem 2015-07-28 00:14:29 +02:00
Matthew Honnibal bb583f7f09 * Hackishly patch long dependencies problem 2015-07-27 23:14:33 +02:00
Matthew Honnibal aa7a964a4f * Add a type declaration for doc.from_array 2015-07-27 22:57:22 +02:00
Matthew Honnibal 25a8774f42 * Fix regression in packer 2015-07-27 21:53:38 +02:00
Matthew Honnibal 1601e488ee * Fix bug in decoding non-ascii characters 2015-07-27 21:43:58 +02:00
Matthew Honnibal 6a95409cd2 * Fix type on bits 2015-07-27 21:16:49 +02:00
Matthew Honnibal a296d72b54 * Fix en/attrs 2015-07-27 21:16:33 +02:00
Matthew Honnibal 45460f505c * Fix data type on read32 in BitArray 2015-07-27 21:12:13 +02:00
Matthew Honnibal 3d43f49f69 * Revert prev change 2015-07-27 10:58:15 +02:00
Matthew Honnibal 6b586cdad4 * Change lexemes.bin format. Add a header specifying size of LexemeC and number of lexemes, and don't have the redundant orth information. 2015-07-27 08:31:51 +02:00
Matthew Honnibal af6ed18f2a * Ensure we don't use orth_encode on OOV words. 2015-07-27 02:12:01 +02:00
Matthew Honnibal 8535d872e8 * Set is_oov property in get_flags 2015-07-27 01:51:24 +02:00
Matthew Honnibal 8e4c69ee8c * Add is_oov property, and fix up handling of attributes 2015-07-27 01:50:06 +02:00
Matthew Honnibal fc268f03eb * Assert against null pointer exceptions in vocab 2015-07-27 01:00:10 +02:00
Matthew Honnibal 0f093fdb30 * Fix get_by_orth for py3 2015-07-26 19:26:41 +02:00
Matthew Honnibal ceeda5a739 * Fix get_by_orth for py3 2015-07-26 18:39:27 +02:00
Matthew Honnibal 6bb96c122d * Host IS_ flags in attrs.pxd, and add properties for them on Token and Lexeme objects 2015-07-26 16:37:16 +02:00
Matthew Honnibal eeaea25f0c * Check oov_prob file is present 2015-07-26 16:36:38 +02:00
Matthew Honnibal 7eb2446082 * Return empty lexeme on empty string 2015-07-26 00:18:30 +02:00
Matthew Honnibal 1b5d1da2a7 * Allow an OOV probability to be specified in get_lex_props 2015-07-26 00:03:43 +02:00
Matthew Honnibal cd6e25132b * Allow an OOV probability to be specified in get_lex_props 2015-07-26 00:01:46 +02:00
Matthew Honnibal fd525f0675 * Pass OOV probability around 2015-07-25 23:29:51 +02:00
Matthew Honnibal 3fe14b8ed6 * Fix CFile for Python2 2015-07-25 22:55:53 +02:00
Matthew Honnibal 823ef4a00b * Remove profile declarations 2015-07-25 18:13:06 +02:00
Matthew Honnibal f4809e562f * Allow json to be used as a fallback if ujson is not available 2015-07-25 18:11:36 +02:00
Matthew Honnibal 9da06671cf * Remove unused import 2015-07-25 18:11:16 +02:00
Matthew Honnibal 2060935cdb * Remove explicit bytes type in doc.from_bytes, to accept bytearray 2015-07-24 04:54:13 +02:00
Matthew Honnibal aa28e2e01d * Release the GIL around parse function 2015-07-24 04:53:27 +02:00
Matthew Honnibal d62eb34b76 * More Py 2/3 compatibility in bit strings 2015-07-24 04:52:06 +02:00
Matthew Honnibal 0bb839d299 * Fix string coercion for Python 3 2015-07-24 03:49:30 +02:00
Matthew Honnibal c4ff410fdb * Fix bytes problems for Python3 2015-07-24 03:48:23 +02:00
Matthew Honnibal 1ab25e4dad * Fix python3 type error 2015-07-24 02:45:34 +02:00
Matthew Honnibal f35ff173b0 * Fix bits.pyx unicode error 2015-07-23 20:37:57 +02:00
Matthew Honnibal 1406e24327 * Fix unicode error for Python3 2015-07-23 19:36:21 +02:00
Matthew Honnibal dbda6c27fa * Fix python3 error 2015-07-23 14:52:30 +02:00
Matthew Honnibal 99387f9572 * Fix python3 error 2015-07-23 14:30:29 +02:00
Matthew Honnibal b81ffe9032 * Fix typing on mode string in CFile 2015-07-23 13:24:43 +02:00
Matthew Honnibal 22028602a9 * Add unicode_literals declaration in vocab.pyx 2015-07-23 13:24:20 +02:00
Matthew Honnibal 5b41744270 * Check for directory presence before loading annotators 2015-07-23 09:27:37 +02:00
Matthew Honnibal df01a88763 Merge branch 'refactor' (and serializaton)
Add Huffman-code serialization, and do a lot of
refactoring. Highlights include:

* Much more efficient StringStore
* Vocab maintains a by-orth mapping of Lexemes
* Avoid manually slicing Py_UNICODE buffers,
  simplifying tokenizer and vocab C APIs
* Remove various bits of dead code
* Work on removing GIL around parser
* Work on bridge to Theano

Conflicts:
	spacy/strings.pxd
	spacy/strings.pyx
	spacy/structs.pxd
2015-07-23 02:18:35 +02:00
Matthew Honnibal a7c4d72e83 * Add serializer property to Vocab, and lazy-load it. Add get_by_orth method. 2015-07-23 01:18:19 +02:00
Matthew Honnibal 6ab1696b15 * Remove read_encoding_freqs from util.py 2015-07-23 01:17:32 +02:00
Matthew Honnibal d5255aad77 * Update freqs for missing tags in ner, for serializer 2015-07-23 01:17:11 +02:00
Matthew Honnibal 12699a1152 * Set initial freqs, to avoid missing values in serializer 2015-07-23 01:16:27 +02:00
Matthew Honnibal 680bb47b55 * Write serializer freqs to single file, vocab/serializer.json 2015-07-23 01:15:25 +02:00
Matthew Honnibal a0e36e8efc * Add working to/from bytes API to Doc 2015-07-23 01:14:45 +02:00
Matthew Honnibal 1f31d96bf9 * Fix Packer API, so that it reads and writes bytes strings, instead of BitArray. Docs are always byte aligned anyway. 2015-07-23 01:13:02 +02:00
Matthew Honnibal 38ef986b29 * Update spacy/en/attrs.pxd 2015-07-23 01:10:58 +02:00
Matthew Honnibal 06eac32610 * Add cfile.pyx 2015-07-23 01:10:36 +02:00
Matthew Honnibal 0c507bd80a * Fix tokenizer 2015-07-22 14:10:30 +02:00
Matthew Honnibal c86dbe4944 * Update English.save_models for new Packer save/load stuff 2015-07-22 13:40:23 +02:00
Matthew Honnibal bf77bcd6b9 * Add comment explaining hash_string 2015-07-22 13:39:42 +02:00
Matthew Honnibal 815bda201d * Remove UniStr struct 2015-07-22 13:39:17 +02:00
Matthew Honnibal 2fc66e3723 * Use Py_UNICODE in tokenizer for now, while sort out Py_UCS4 stuff 2015-07-22 13:38:45 +02:00
Matthew Honnibal 4d61239eac * Reorganize the serialization functions on Doc 2015-07-22 04:53:01 +02:00
Matthew Honnibal 109106a949 * Replace UniStr, using unicode objects instead 2015-07-22 04:52:05 +02:00
Matthew Honnibal 424854028f * Fix decode_int32 2015-07-21 20:09:59 +00:00
Matthew Honnibal 304d0e2633 * Use decode_int32 in _orth_decode 2015-07-21 20:40:55 +02:00
Matthew Honnibal 9cfa59ec33 * Optimistically try orth encoding, with char as a back-off 2015-07-21 20:22:45 +02:00
Matthew Honnibal c8b89e37a5 * Bug fix to faster huffman decoding 2015-07-21 20:05:53 +02:00
Matthew Honnibal b166d1d2a2 * Use encode32 and decode32 2015-07-21 19:59:06 +02:00
Matthew Honnibal c6cd0ddce8 * Add faster encode_int32 and decode_int32 methods 2015-07-21 19:58:45 +02:00
Matthew Honnibal dd60594f41 * Fix double encoding error in strings.pyx 2015-07-20 13:52:56 +02:00
Matthew Honnibal 06639dc497 * Add length cap to word shape feature 2015-07-20 12:06:59 +02:00
Matthew Honnibal 128b6d9714 * Move Utf8Str struct to strings module, as that's the only place it's relevant 2015-07-20 12:06:41 +02:00
Matthew Honnibal 01a97b90f3 * Fix header for string store 2015-07-20 12:06:10 +02:00
Matthew Honnibal 52d538ea42 * Fix short string optimization in strings.pyx. StringStore tests now all pass. 2015-07-20 12:05:23 +02:00
Matthew Honnibal 09a3055630 * Work on short string optimization in Utf8Str 2015-07-20 11:26:46 +02:00
Matthew Honnibal bb0ba1f0cd * Improve serialization speed 2015-07-20 03:27:59 +02:00
Matthew Honnibal 8743a8c084 * Update Doc serialization for new Packer interface 2015-07-20 01:38:04 +02:00
Matthew Honnibal 1f7170e0e1 * Reinstate the fixed vocabulary --- words are only added to the lexicon in init_model, after that we create LexemeC structs with the Pool given to us. 2015-07-20 01:37:34 +02:00
Matthew Honnibal 5a7d060d9c * Switch between the orth and char codecs depending on which is shorter for that message. Mostly orth is shorter, except if there are OOV words. 2015-07-20 01:36:22 +02:00
Matthew Honnibal 5a042ee0d3 * Add function to predict number of bits needed to encode message 2015-07-20 01:35:11 +02:00
Matthew Honnibal b89b489bb4 * Implement both character and orth encoding in Packer, so that we can decide which to use per-text 2015-07-19 22:39:45 +02:00
Matthew Honnibal ae78c9e3ce * Implement character-based codec, so that we can do word/char backoff 2015-07-19 22:03:39 +02:00
Matthew Honnibal cd1d047cb8 * Delete out-dated HuffmanCodec comment 2015-07-19 18:28:14 +02:00
Matthew Honnibal b8086067d5 * Build Huffman codec from unsorted inputs 2015-07-19 17:58:44 +02:00
Matthew Honnibal 317cbbc015 * Serialization round trip now working with decent API, but with rough spots in the organisation and requiring vocabulary to be fixed ahead of time. 2015-07-19 15:18:17 +02:00
Matthew Honnibal 6b13e7227c * Remove duplicate get_lex_attr method from doc.pyx 2015-07-18 22:46:07 +02:00
Matthew Honnibal e49c7f1478 * Update oov check in tokenizer 2015-07-18 22:45:28 +02:00
Matthew Honnibal cfd842769e * Allow infix tokens to be variable length 2015-07-18 22:45:00 +02:00
Matthew Honnibal 5b4c78bbb2 * Use an AttributeCodec based on orth for words. Still no oov handling mechanism. 2015-07-18 22:43:18 +02:00
Matthew Honnibal 82d84b0f2b * Index lexemes by orth, instead of a lexemes vector. Breaks the mechanism for deciding not to own LexemeC structs during parsing. Need to reinstate this. 2015-07-18 22:42:15 +02:00
Matthew Honnibal 4dddc8a69b * Fix type declarations for attr_t. Remove unused id_t. 2015-07-18 22:39:57 +02:00
Matthew Honnibal ced59ab9ea * Make minor efficiency improvement in Doc.__iter__ 2015-07-18 04:10:53 +02:00
Matthew Honnibal cd91914dd8 * Fix hard-coded length 2015-07-18 04:09:56 +02:00
Matthew Honnibal b1d74ce60d * Remove unused joint.pyx and joint.pxd files 2015-07-17 23:31:44 +02:00
Matthew Honnibal c27514512b * Remove cruft ner/ directory 2015-07-17 23:24:32 +02:00
Matthew Honnibal f8d6d319f4 * Remove cruft module 2015-07-17 23:23:05 +02:00
Matthew Honnibal fb0a641a2d * Don't release the gil around Parser.parse. Does this indicate thread problems? 2015-07-17 23:07:37 +02:00
Matthew Honnibal e29daea85f * Fix bint/int typing problem in TransitionSystem. In C++ bint* means bool*, but in C it means int*. So, type-casting to bint* is unsafe. 2015-07-17 22:37:24 +02:00
Matthew Honnibal cf0c788892 * Tests passing on round-trip pack/unpack on basic example 2015-07-17 21:20:48 +02:00
Matthew Honnibal 44f39a876f * Add a blank attrs.pyx 2015-07-17 16:40:42 +02:00
Matthew Honnibal c2c83120d4 * Remove codec property from Vocab 2015-07-17 16:40:11 +02:00
Matthew Honnibal dfdf19f6a9 * Draft a from_orth method for Doc 2015-07-17 16:39:54 +02:00
Matthew Honnibal 9e3f17051b * Move to ORTH instead of ID for encoding lexemes. Basic tests of the codec wrappers now passing 2015-07-17 16:38:29 +02:00
Matthew Honnibal 15ff739996 * Fix passing of ID attribute in string store 2015-07-17 14:49:42 +02:00
Matthew Honnibal 95e57c2780 * Remove unnecessary key and id properties from Utf8String. 2015-07-17 01:40:18 +02:00
Matthew Honnibal 234c7e440a * Add spacy/serialize/__init__ files 2015-07-17 01:37:33 +02:00
Matthew Honnibal db9dfd2e23 * Major refactor of serialization. Nearly complete now. 2015-07-17 01:27:54 +02:00
Matthew Honnibal c8282f9934 * Work on serialization. Needs more reorganisation 2015-07-16 19:56:02 +02:00
Matthew Honnibal d8458d6a25 * Fix attr_id_t import in Spans 2015-07-16 19:55:21 +02:00