Commit Graph

443 Commits

Author SHA1 Message Date
jeannefukumaru 6cdb7b2e04 added tag_map for indonesian (#3515)
* added tag_map for indonesian

* changed tag map from .py to .txt to see if tests pass

* added symbols import

* added utf8 encoding flag

* added missing SCONJ symbol

* Auto-format

* Remove unused imports

* Make tag map available in Indonesian defaults
2019-04-01 12:27:48 +02:00
Ines Montani c23e234d65 Auto-format 2019-04-01 12:11:27 +02:00
Duygu Altinok 5a7bc6b39d Fix/irreg adverbs extension (#3499)
* extended list of irreg adverbs

* added test to exceptions

* fixed typo
2019-03-28 13:23:33 +01:00
Wannaphong Phatthiyaphaibun 297a051992 Update Thai tag map (#3480)
* Update Thai tag map

Update Thai tag map

* Create wannaphongcom.md
2019-03-25 16:53:26 +01:00
Matthew Honnibal c66bd61e88 Fix lemmas 2019-03-21 14:22:12 +01:00
Matthew Honnibal 04395ffa49 Bring English tag_map in line with UD Treebank
I wrote a small script to read the UD English training data and check
that our tag map and morph rules were resulting in the best POS map.
This hadn't been done for some time, and there have been various changes
to the UD schema since it has been done. After these changes we should
see much better agreement between our POS assignments and the UD POS
tags.
2019-03-21 13:53:44 +01:00
Mehdi Hamoumi 9211f30ee3 Tiny correction in french lookup dictionary (#3427) 2019-03-19 13:00:19 +01:00
Ines Montani 2912ddc9a6 Don't set extension attribute in Japanese (closes #3398) 2019-03-12 13:30:33 +01:00
Ines Montani cdd418b93e Auto-format [ci skip] 2019-03-11 17:10:50 +01:00
Matthew Honnibal 39a4741e26 Add support for vocab.writing_system property (#3390)
* Add xfail test for vocab.writing_system

* Add vocab.writing_system property

* Set Language.Defaults.writing_system

* Set default writing system

* Remove xfail on test_vocab_writing_system
2019-03-11 15:23:20 +01:00
Ines Montani ee4f312e89 Add writing_system to ArabicDefaults (experimental) 2019-03-11 14:22:23 +01:00
Ines Montani ef80cfde6f Fix pickling of Japanese (closes #3191) 2019-03-11 13:34:23 +01:00
Matthew Honnibal 5d25ee52fb Fix English tag map 2019-03-11 01:06:02 +01:00
Matthew Honnibal 7503e1e505 Improve English tag map. Re #593, #3311 2019-03-10 23:50:00 +01:00
Ines Montani 610fb306bd Revert hyphens 2019-03-09 12:51:53 +01:00
Ines Montani bbabb6aaae Escape more hyphens 2019-03-09 12:41:05 +01:00
Ines Montani b8db219850 Auto-format 2019-03-09 12:40:58 +01:00
Ines Montani a145bfe627 Try escaping hyphens again 2019-03-09 03:06:50 +01:00
Ines Montani b9c71fc0f0 Fix flags 2019-03-09 02:46:04 +01:00
Ines Montani ae09b6a6cf Try fixing unicode inconsistencies on Python 2 2019-03-09 02:37:50 +01:00
Ines Montani d957d7a697 Auto-format 2019-03-09 02:37:41 +01:00
Ines Montani 65402c3d02 Revert "Experiment with escaping hyphens"
This reverts commit 9b42e2d5dd.
2019-03-09 02:13:00 +01:00
Ines Montani 9b42e2d5dd Experiment with escaping hyphens 2019-03-09 02:05:26 +01:00
Ines Montani 6bd34e9d54 Expose Japanese stop words (closes #3346) 2019-03-06 14:21:15 +01:00
Ines Montani 85deb96278 Fix whitespace 2019-03-06 14:20:34 +01:00
Ines Montani 23f6ebf0f3 Add missing " (closes #3343) 2019-02-27 16:37:03 +01:00
Ines Montani 48a2046d1c Remove stray print statement (closes #3342) 2019-02-27 15:35:04 +01:00
Ines Montani 07d7c0a1af Fix whitespace 2019-02-27 15:34:21 +01:00
Ines Montani 76ce8b2662 Merge branch 'master' into develop 2019-02-25 15:54:55 +01:00
Julia Makogon f1c3108d52 Fixing pymorphy2 dependency issue (#3329) (closes #3327)
* Classes for Ukrainian; small fix in Russian.

* Contributor agreement

* pymorphy2 initialization split for ru and uk (#3327)

* stop-words fixed

* Unit-tests updated
2019-02-25 15:48:17 +01:00
Ines Montani 2982f82934 Auto-format 2019-02-24 14:09:15 +01:00
Matthew Honnibal c5f947f194 Fix regex deprecation warnings 2019-02-21 11:56:47 +01:00
Sofie 9a478b6db8 Clean up of char classes, few tokenizer fixes and faster default French tokenizer (#3293)
* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* splitting up latin unicode interval

* removing hyphen as infix for French

* adding failing test for issue 1235

* test for issue #3002 which now works

* partial fix for issue #2070

* keep the hyphen as infix for French (as it was)

* restore french expressions with hyphen as infix (as it was)

* added succeeding unit test for Issue #2656

* Fix issue #2822 with custom Italian exception

* Fix issue #2926 by allowing numbers right before infix /

* remove duplicate

* remove xfail for Issue #2179 fixed by Matt

* adjust documentation and remove reference to regex lib
2019-02-20 22:10:13 +01:00
Ines Montani 3fdcdec6a0 Merge branch 'master' into develop 2019-02-18 10:03:32 +01:00
Roshni Biswas e09f1347fa updates for Bengali language (#3286)
* Update morph_rules.py

* contributor agreement for roshni-b

* created example sentences
2019-02-18 10:02:28 +01:00
Ines Montani 043e8186f3 Merge branch 'master' into develop 2019-02-17 17:51:17 +01:00
Marc Puig 51268e9f21 Typo error fixed (#3284) 2019-02-17 17:51:02 +01:00
Ines Montani 19a002bfd3 Merge branch 'master' into develop 2019-02-17 12:22:54 +01:00
Roshni Biswas e26d923726 Update morph_rules.py (#3283) 2019-02-17 12:21:47 +01:00
Ines Montani c31a9dabd5 💫 Add en/em dash to prefixes and suffixes (#3281)
* Auto-format

* Add en/em dash to prefixes and suffixes
2019-02-15 10:29:59 +01:00
Ines Montani 2e31921d0a 💫 Add base Language classes for more languages (#3276)
* Add base classes for more languages

* Add test for language class initialization

Make sure language can be initialize – otherwise, it's difficult to catch serious errors in the test suite, because languages are lazy-loaded
2019-02-15 01:31:19 +11:00
Ines Montani 106d95b01a Fix typo 2019-02-14 12:26:56 +01:00
Ines Montani 11d6b874db
Update stop_words.py 2019-02-14 12:25:19 +01:00
Ines Montani 4d2438f985 Tidy up and auto-format 2019-02-13 15:29:08 +01:00
Ines Montani 2f45bd94c0 Auto-formatting 2019-02-12 18:30:11 +01:00
Ines Montani 0184a95340 Merge branch 'master' into develop 2019-02-12 18:29:24 +01:00
Akhilesh a78db10941 add kannada support (#3264)
* add kannada support

* add few more stop words

* add support for Kannada Language
2019-02-12 18:28:39 +01:00
Ines Montani 25602c794c Tidy up and fix small bugs and typos 2019-02-08 14:14:49 +01:00
Ines Montani 9e652afa4b Merge branch 'master' into develop 2019-02-08 13:28:09 +01:00
Björn Lennartsson 647f0140c7 Fixed tag map for Swedish Talbanken (#3186) 2019-02-08 14:28:59 +11:00