spaCy

Commit Graph

Author	SHA1	Message	Date
Grégory Howard	cd974b32b7	Update _tokenizer_exceptions_list (adding cities)	2017-06-09 17:58:18 +02:00
Matthew Honnibal	55d0621532	Merge branch 'develop' of https://github.com/explosion/spaCy into develop	2017-06-04 15:53:25 -05:00
Matthew Honnibal	e28f90b672	Fix syntax iterators	2017-06-04 15:51:50 -05:00
Ines Montani	112c5787eb	Merge pull request #1101 from oroszgy/hu_tokenizer_fix More robust Hungarian tokenizer.	2017-06-04 22:37:51 +02:00
ines	9254a3dd78	Import and add Spanish syntax iterators	2017-06-04 21:42:15 +02:00
Matthew Honnibal	7ca215bc26	Resolve lex_attr_getters conflict	2017-06-03 16:12:01 -05:00
ines	4c643d74c5	Add norm exceptions to other Language classes	2017-06-03 22:29:21 +02:00
ines	fa7e576c57	Change order of exception dicts	2017-06-03 21:52:06 +02:00
Matthew Honnibal	3f5c85d8de	Reorder setting of lex attrs, to avoid clobbering	2017-06-03 14:47:55 -05:00
Matthew Honnibal	aeb7520133	Make norm use lower-case	2017-06-03 14:47:38 -05:00
Matthew Honnibal	de3954843e	Populate norm exceptions with lower-case	2017-06-03 14:47:12 -05:00
ines	e47eef5e03	Update German tokenizer exceptions and tests	2017-06-03 21:07:44 +02:00
ines	0d6fa8b241	Add German norm exceptions	2017-06-03 20:54:18 +02:00
ines	5bd311c77e	Fix update of norm exceptions	2017-06-03 20:54:09 +02:00
ines	746653880c	Add English norm exceptions to lex_attrs	2017-06-03 20:27:28 +02:00
ines	095eeeb12f	Update English tokenizer exceptions and add norms	2017-06-03 20:27:16 +02:00
ines	e5d426406a	Add base norm exceptions	2017-06-03 20:27:05 +02:00
ines	2f1025a94c	Port over Spanish changes from #1096	2017-06-02 19:09:58 +02:00
Gyorgy Orosz	f0c3b09242	More robust Hungarian tokenizer.	2017-05-31 22:28:40 +02:00
Gyorgy Orosz	8c0b4b850e	Fixed emoji handling for Hungarian	2017-05-30 21:34:46 +02:00
ines	84189c1cab	Add 'xx' language ID for multi-language support Allows models to specify their language ID as 'xx'.	2017-05-28 00:58:59 +02:00
ines	33e332e67c	Remove unused export	2017-05-28 00:57:59 +02:00
ines	a8e58e04ef	Add symbols class to punctuation rules to handle emoji (see #1088 ) Currently doesn't work for Hungarian, because of conflicts with the custom punctuation rules. Also doesn't take multi-character emoji like 👩🏽‍💻 into account.	2017-05-27 17:57:10 +02:00
Matthew Honnibal	5db89053aa	Merge docstrings	2017-05-21 13:46:23 -05:00
ines	924e8506de	Move Defaults subclass to module scope (necessary for pickling)	2017-05-20 19:02:27 +02:00
Matthew Honnibal	61fe55efba	Move EnglishDefaults class out of English	2017-05-20 02:18:19 -05:00
Matthew Honnibal	8815507f8e	Move SpanishDefaults out of Language class, for pickle	2017-05-18 04:28:51 -05:00
ines	1a05078c79	Add language-specific syntax iterators to en and de	2017-05-17 12:04:03 +02:00
Matthew Honnibal	4b9d69f428	Merge branch 'v2' into develop * Move v2 parser into nn_parser.pyx * New TokenVectorEncoder class in pipeline.pyx * New spacy/_ml.py module Currently the two parsers live side-by-side, until we figure out how to organize them.	2017-05-14 01:10:23 +02:00
ines	a4a37a783e	Remove import from non-existing module	2017-05-13 16:00:09 +02:00
ines	c13b3fa052	Add LEX_ATTRS	2017-05-12 15:37:45 +02:00
ines	bca2ea9c72	Update Portuguese lexical attributes	2017-05-12 15:37:39 +02:00
ines	2f870123bf	Fix formatting	2017-05-12 15:37:20 +02:00
ines	ca65993d59	Add basic Polish Language class	2017-05-12 09:25:37 +02:00
ines	48177c4f92	Add missing tokenizer exceptions	2017-05-12 09:25:24 +02:00
ines	bb8be3d194	Add Danish language data	2017-05-10 21:15:12 +02:00
ines	a0b00624bb	Make sure like_email returns bool	2017-05-09 11:37:29 +02:00
ines	ea60932e1b	Fix formatting	2017-05-09 11:08:14 +02:00
ines	02d0ac5cab	Remove redundant function and fix formatting	2017-05-09 11:06:04 +02:00
ines	b5ca50607e	Reorganise entity rules	2017-05-09 01:37:10 +02:00
ines	12c3d5fbba	Fix formatting	2017-05-09 01:15:28 +02:00
ines	2829a024ef	Re-add basic like_num check to global lex_attrs	2017-05-09 01:15:23 +02:00
ines	88adeee548	Add English lex_attrs overrides	2017-05-09 01:09:52 +02:00
ines	8f3fbbb147	Fix typos	2017-05-09 01:09:37 +02:00
ines	2216e5f326	Reorganise lex_attrs and add dict	2017-05-09 00:57:54 +02:00
ines	e666f14d20	Add global lex_attrs	2017-05-09 00:41:53 +02:00
ines	41972c43fe	Use consistent regex imports	2017-05-09 00:34:31 +02:00
ines	9f0fd5963f	Reorganise Hungarian punctuation rules	2017-05-09 00:01:59 +02:00
ines	fc0d793360	Reorganise Bengali punctuation rules	2017-05-09 00:01:52 +02:00
ines	e895d1afd7	Reorganise French punctuation rules	2017-05-09 00:00:54 +02:00
ines	014bda0ae3	Reorganise global punctuation rules	2017-05-09 00:00:46 +02:00
ines	a91278cb32	Rename _URL_PATTERN to URL_PATTERN	2017-05-09 00:00:00 +02:00
ines	604f299cf6	Add char classes to global language data	2017-05-08 23:59:33 +02:00
ines	f6f5d78cb9	Fix formatting	2017-05-08 23:59:17 +02:00
ines	3c0f85de8e	Remove imports in /lang/__init__.py	2017-05-08 23:58:07 +02:00
ines	614aa09582	Tidy up Bengali tokenizer exceptions	2017-05-08 22:29:49 +02:00
ines	73b577cb01	Fix relative imports	2017-05-08 22:29:04 +02:00
ines	ae99990f63	Fix formatting	2017-05-08 22:23:48 +02:00
ines	f46ffe3e89	Move language data to /lang module	2017-05-08 20:00:40 +02:00

... 12 13 14 15 16

759 Commits