spaCy/spacy/lang/tr/lex_attrs.py

from ...attrs import LIKE_NUM


# Thirteen, fifteen etc. are written separate: on üç

_num_words = [
    "bir",
    "iki",
    "üç",
    "dört",
    "beş",
    "altı",
    "yedi",
    "sekiz",
    "dokuz",
    "on",
    "yirmi",
    "otuz",
    "kırk",
    "elli",
    "altmış",
    "yetmiş",
    "seksen",
    "doksan",
    "yüz",
    "bin",
    "milyon",
    "milyar",
    "trilyon",
    "katrilyon",
    "kentilyon",
]


_ordinal_words = [
    "birinci",
    "ikinci",
    "üçüncü",
    "dördüncü",
    "beşinci",
    "altıncı",
    "yedinci",
    "sekizinci",
    "dokuzuncu",
    "onuncu",
    "yirminci",
    "otuzuncu",
    "kırkıncı",
    "ellinci",
    "altmışıncı",
    "yetmişinci",
    "sekseninci",
    "doksanıncı",
    "yüzüncü",
    "bininci",
    "mliyonuncu",
    "milyarıncı",
    "trilyonuncu",
    "katrilyonuncu",
    "kentilyonuncu",
]

_ordinal_endings = ("inci", "ıncı", "nci", "ncı", "uncu", "üncü")


def like_num(text):
    if text.startswith(("+", "-", "±", "~")):
        text = text[1:]
    text = text.replace(",", "").replace(".", "")
    if text.isdigit():
        return True
    if text.count("/") == 1:
        num, denom = text.split("/")
        if num.isdigit() and denom.isdigit():
            return True
    text_lower = text.lower()
    # Check cardinal number
    if text_lower in _num_words:
        return True
    # Check ordinal number
    if text_lower in _ordinal_words:
        return True
    if text_lower.endswith(_ordinal_endings):
        if text_lower[:-3].isdigit() or text_lower[:-4].isdigit():
            return True
    return False


LEX_ATTRS = {LIKE_NUM: like_num}
-												added like_num to lex

											
										
										
											2018-03-08 14:25:25 +00:00
+								from ...attrs import LIKE_NUM
-												💫 Tidy up and auto-format .py files (#2983)

<!--- Provide a general summary of your changes in the title. -->

## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)

Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.

At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.

### Types of change
enhancement, code style

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2018-11-30 16:03:03 +00:00
+								# Thirteen, fifteen etc. are written separate: on üç
-												Merge branch 'master' into develop

											
										
										
											2019-02-07 19:54:07 +00:00
-												💫 Tidy up and auto-format .py files (#2983)

<!--- Provide a general summary of your changes in the title. -->

## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)

Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.

At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.

### Types of change
enhancement, code style

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2018-11-30 16:03:03 +00:00
+								_num_words = [
 								    "bir",
 								    "iki",
 								    "üç",
 								    "dört",
 								    "beş",
 								    "altı",
 								    "yedi",
 								    "sekiz",
 								    "dokuz",
 								    "on",
 								    "yirmi",
 								    "otuz",
 								    "kırk",
 								    "elli",
 								    "altmış",
 								    "yetmiş",
 								    "seksen",
 								    "doksan",
 								    "yüz",
 								    "bin",
 								    "milyon",
 								    "milyar",
-												Merge branch 'master' into develop

											
										
										
											2019-02-07 19:54:07 +00:00
+								    "trilyon",
-												💫 Tidy up and auto-format .py files (#2983)

<!--- Provide a general summary of your changes in the title. -->

## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)

Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.

At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.

### Types of change
enhancement, code style

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2018-11-30 16:03:03 +00:00
+								    "katrilyon",
 								    "kentilyon",
 								]
-												added like_num to lex

											
										
										
											2018-03-08 14:25:25 +00:00
-												Ordinal numbers for Turkish (#6142)

* minor ordinal number addition

* fixed typo

* added corresponding lexical test
											
										
										
											2020-10-07 08:25:37 +00:00
+								_ordinal_words = [
 								    "birinci",
 								    "ikinci",
 								    "üçüncü",
 								    "dördüncü",
 								    "beşinci",
 								    "altıncı",
 								    "yedinci",
 								    "sekizinci",
 								    "dokuzuncu",
 								    "onuncu",
 								    "yirminci",
 								    "otuzuncu",
 								    "kırkıncı",
 								    "ellinci",
 								    "altmışıncı",
 								    "yetmişinci",
 								    "sekseninci",
 								    "doksanıncı",
 								    "yüzüncü",
 								    "bininci",
 								    "mliyonuncu",
 								    "milyarıncı",
 								    "trilyonuncu",
 								    "katrilyonuncu",
 								    "kentilyonuncu",
 								]
 								_ordinal_endings = ("inci", "ıncı", "nci", "ncı", "uncu", "üncü")
-												Tidy up and auto-format

											
										
										
											2020-10-10 17:14:48 +00:00
-												added like_num to lex

											
										
										
											2018-03-08 14:25:25 +00:00
+								def like_num(text):
-												💫 Tidy up and auto-format .py files (#2983)

<!--- Provide a general summary of your changes in the title. -->

## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)

Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.

At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.

### Types of change
enhancement, code style

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2018-11-30 16:03:03 +00:00
+								    if text.startswith(("+", "-", "±", "~")):
-												💫 Make like_num work for prefixed numbers (#2808)

* Only split + prefix if not numbers

* Make like_num work for prefixed numbers

* Add test for like_num

											
										
										
											2018-10-01 08:49:14 +00:00
+								        text = text[1:]
-												💫 Tidy up and auto-format .py files (#2983)

<!--- Provide a general summary of your changes in the title. -->

## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)

Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.

At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.

### Types of change
enhancement, code style

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2018-11-30 16:03:03 +00:00
+								    text = text.replace(",", "").replace(".", "")
-												added like_num to lex

											
										
										
											2018-03-08 14:25:25 +00:00
+								    if text.isdigit():
 								        return True
-												💫 Tidy up and auto-format .py files (#2983)

<!--- Provide a general summary of your changes in the title. -->

## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)

Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.

At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.

### Types of change
enhancement, code style

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2018-11-30 16:03:03 +00:00
+								    if text.count("/") == 1:
 								        num, denom = text.split("/")
-												added like_num to lex

											
										
										
											2018-03-08 14:25:25 +00:00
+								        if num.isdigit() and denom.isdigit():
 								            return True
-												Ordinal numbers for Turkish (#6142)

* minor ordinal number addition

* fixed typo

* added corresponding lexical test
											
										
										
											2020-10-07 08:25:37 +00:00
+								    text_lower = text.lower()
-												Tidy up and auto-format

											
										
										
											2020-10-10 17:14:48 +00:00
+								    # Check cardinal number
-												Ordinal numbers for Turkish (#6142)

* minor ordinal number addition

* fixed typo

* added corresponding lexical test
											
										
										
											2020-10-07 08:25:37 +00:00
+								    if text_lower in _num_words:
-												added like_num to lex

											
										
										
											2018-03-08 14:25:25 +00:00
+								        return True
-												Tidy up and auto-format

											
										
										
											2020-10-10 17:14:48 +00:00
+								    # Check ordinal number
-												Ordinal numbers for Turkish (#6142)

* minor ordinal number addition

* fixed typo

* added corresponding lexical test
											
										
										
											2020-10-07 08:25:37 +00:00
+								    if text_lower in _ordinal_words:
 								        return True
 								    if text_lower.endswith(_ordinal_endings):
 								        if text_lower[:-3].isdigit() or text_lower[:-4].isdigit():
 								            return True
-												added like_num to lex

											
										
										
											2018-03-08 14:25:25 +00:00
+								    return False
-												💫 Tidy up and auto-format .py files (#2983)

<!--- Provide a general summary of your changes in the title. -->

## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)

Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.

At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.

### Types of change
enhancement, code style

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

											
										
										
											2018-11-30 16:03:03 +00:00
+								LEX_ATTRS = {LIKE_NUM: like_num}