2017-11-01 22:04:28 +00:00
|
|
|
# encoding: utf8
|
|
|
|
from __future__ import unicode_literals
|
|
|
|
|
|
|
|
|
|
|
|
# Source: https://github.com/stopwords-iso/stopwords-ro
|
💫 Tidy up and auto-format .py files (#2983)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)
Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.
At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.
### Types of change
enhancement, code style
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-30 16:03:03 +00:00
|
|
|
STOP_WORDS = set(
|
|
|
|
"""
|
2017-11-01 22:04:28 +00:00
|
|
|
a
|
|
|
|
abia
|
|
|
|
acea
|
|
|
|
aceasta
|
|
|
|
această
|
|
|
|
aceea
|
|
|
|
aceeasi
|
|
|
|
acei
|
|
|
|
aceia
|
|
|
|
acel
|
|
|
|
acela
|
|
|
|
acelasi
|
|
|
|
acele
|
|
|
|
acelea
|
|
|
|
acest
|
|
|
|
acesta
|
|
|
|
aceste
|
|
|
|
acestea
|
|
|
|
acestei
|
|
|
|
acestia
|
|
|
|
acestui
|
|
|
|
aceşti
|
|
|
|
aceştia
|
2018-05-10 10:16:56 +00:00
|
|
|
acești
|
|
|
|
aceștia
|
2017-11-01 22:04:28 +00:00
|
|
|
acolo
|
|
|
|
acord
|
|
|
|
acum
|
|
|
|
adica
|
|
|
|
ai
|
|
|
|
aia
|
|
|
|
aibă
|
|
|
|
aici
|
|
|
|
aiurea
|
|
|
|
al
|
|
|
|
ala
|
|
|
|
alaturi
|
|
|
|
ale
|
|
|
|
alea
|
|
|
|
alt
|
|
|
|
alta
|
|
|
|
altceva
|
|
|
|
altcineva
|
|
|
|
alte
|
|
|
|
altfel
|
|
|
|
alti
|
|
|
|
altii
|
|
|
|
altul
|
2018-05-10 10:16:56 +00:00
|
|
|
alături
|
2017-11-01 22:04:28 +00:00
|
|
|
am
|
|
|
|
anume
|
|
|
|
apoi
|
|
|
|
ar
|
|
|
|
are
|
|
|
|
as
|
|
|
|
asa
|
|
|
|
asemenea
|
|
|
|
asta
|
|
|
|
astazi
|
|
|
|
astea
|
|
|
|
astfel
|
|
|
|
astăzi
|
|
|
|
asupra
|
|
|
|
atare
|
|
|
|
atat
|
|
|
|
atata
|
|
|
|
atatea
|
|
|
|
atatia
|
|
|
|
ati
|
|
|
|
atit
|
|
|
|
atita
|
|
|
|
atitea
|
|
|
|
atitia
|
|
|
|
atunci
|
|
|
|
au
|
|
|
|
avea
|
|
|
|
avem
|
|
|
|
aveţi
|
2018-05-10 10:16:56 +00:00
|
|
|
aveți
|
2017-11-01 22:04:28 +00:00
|
|
|
avut
|
|
|
|
azi
|
|
|
|
aş
|
|
|
|
aşadar
|
|
|
|
aţi
|
2018-05-10 10:16:56 +00:00
|
|
|
aș
|
|
|
|
așadar
|
|
|
|
ați
|
2017-11-01 22:04:28 +00:00
|
|
|
b
|
|
|
|
ba
|
|
|
|
bine
|
|
|
|
bucur
|
|
|
|
bună
|
|
|
|
c
|
|
|
|
ca
|
|
|
|
cam
|
|
|
|
cand
|
|
|
|
capat
|
|
|
|
care
|
|
|
|
careia
|
|
|
|
carora
|
|
|
|
caruia
|
|
|
|
cat
|
|
|
|
catre
|
|
|
|
caut
|
|
|
|
ce
|
|
|
|
cea
|
|
|
|
ceea
|
|
|
|
cei
|
|
|
|
ceilalti
|
|
|
|
cel
|
|
|
|
cele
|
|
|
|
celor
|
|
|
|
ceva
|
|
|
|
chiar
|
|
|
|
ci
|
|
|
|
cinci
|
|
|
|
cind
|
|
|
|
cine
|
|
|
|
cineva
|
|
|
|
cit
|
|
|
|
cita
|
|
|
|
cite
|
|
|
|
citeva
|
|
|
|
citi
|
|
|
|
citiva
|
|
|
|
conform
|
|
|
|
contra
|
|
|
|
cu
|
|
|
|
cui
|
|
|
|
cum
|
|
|
|
cumva
|
|
|
|
curând
|
|
|
|
curînd
|
|
|
|
când
|
|
|
|
cât
|
|
|
|
câte
|
|
|
|
câtva
|
|
|
|
câţi
|
2018-05-10 10:16:56 +00:00
|
|
|
câți
|
2017-11-01 22:04:28 +00:00
|
|
|
cînd
|
|
|
|
cît
|
|
|
|
cîte
|
|
|
|
cîtva
|
|
|
|
cîţi
|
2018-05-10 10:16:56 +00:00
|
|
|
cîți
|
2017-11-01 22:04:28 +00:00
|
|
|
că
|
|
|
|
căci
|
|
|
|
cărei
|
|
|
|
căror
|
|
|
|
cărui
|
|
|
|
către
|
|
|
|
d
|
|
|
|
da
|
|
|
|
daca
|
|
|
|
dacă
|
|
|
|
dar
|
|
|
|
dat
|
|
|
|
datorită
|
|
|
|
dată
|
|
|
|
dau
|
|
|
|
de
|
|
|
|
deasupra
|
|
|
|
deci
|
|
|
|
decit
|
|
|
|
degraba
|
|
|
|
deja
|
|
|
|
deoarece
|
|
|
|
departe
|
|
|
|
desi
|
|
|
|
despre
|
|
|
|
deşi
|
2018-05-10 10:16:56 +00:00
|
|
|
deși
|
2017-11-01 22:04:28 +00:00
|
|
|
din
|
|
|
|
dinaintea
|
|
|
|
dintr
|
|
|
|
dintr-
|
|
|
|
dintre
|
|
|
|
doar
|
|
|
|
doi
|
|
|
|
doilea
|
|
|
|
două
|
|
|
|
drept
|
|
|
|
dupa
|
|
|
|
după
|
|
|
|
dă
|
|
|
|
e
|
|
|
|
ea
|
|
|
|
ei
|
|
|
|
el
|
|
|
|
ele
|
|
|
|
era
|
|
|
|
eram
|
|
|
|
este
|
|
|
|
eu
|
|
|
|
exact
|
|
|
|
eşti
|
2018-05-10 10:16:56 +00:00
|
|
|
ești
|
2017-11-01 22:04:28 +00:00
|
|
|
f
|
|
|
|
face
|
|
|
|
fara
|
|
|
|
fata
|
|
|
|
fel
|
|
|
|
fi
|
|
|
|
fie
|
|
|
|
fiecare
|
|
|
|
fii
|
|
|
|
fim
|
|
|
|
fiu
|
|
|
|
fiţi
|
2018-05-10 10:16:56 +00:00
|
|
|
fiți
|
2017-11-01 22:04:28 +00:00
|
|
|
foarte
|
|
|
|
fost
|
|
|
|
frumos
|
|
|
|
fără
|
|
|
|
g
|
|
|
|
geaba
|
|
|
|
graţie
|
2018-05-10 10:16:56 +00:00
|
|
|
grație
|
2017-11-01 22:04:28 +00:00
|
|
|
h
|
|
|
|
halbă
|
|
|
|
i
|
|
|
|
ia
|
|
|
|
iar
|
|
|
|
ieri
|
|
|
|
ii
|
|
|
|
il
|
|
|
|
imi
|
|
|
|
in
|
|
|
|
inainte
|
|
|
|
inapoi
|
|
|
|
inca
|
|
|
|
incit
|
|
|
|
insa
|
|
|
|
intr
|
|
|
|
intre
|
|
|
|
isi
|
|
|
|
iti
|
|
|
|
j
|
|
|
|
k
|
|
|
|
l
|
|
|
|
la
|
|
|
|
le
|
|
|
|
li
|
|
|
|
lor
|
|
|
|
lui
|
|
|
|
lângă
|
|
|
|
lîngă
|
|
|
|
m
|
|
|
|
ma
|
|
|
|
mai
|
|
|
|
mare
|
|
|
|
mea
|
|
|
|
mei
|
|
|
|
mele
|
|
|
|
mereu
|
|
|
|
meu
|
|
|
|
mi
|
|
|
|
mie
|
|
|
|
mine
|
|
|
|
mod
|
|
|
|
mult
|
|
|
|
multa
|
|
|
|
multe
|
|
|
|
multi
|
|
|
|
multă
|
|
|
|
mulţi
|
|
|
|
mulţumesc
|
2018-05-10 10:16:56 +00:00
|
|
|
mulți
|
|
|
|
mulțumesc
|
2017-11-01 22:04:28 +00:00
|
|
|
mâine
|
|
|
|
mîine
|
|
|
|
mă
|
|
|
|
n
|
|
|
|
ne
|
|
|
|
nevoie
|
|
|
|
ni
|
|
|
|
nici
|
|
|
|
niciodata
|
|
|
|
nicăieri
|
|
|
|
nimeni
|
|
|
|
nimeri
|
|
|
|
nimic
|
|
|
|
niste
|
|
|
|
nişte
|
2018-05-10 10:16:56 +00:00
|
|
|
niște
|
2017-11-01 22:04:28 +00:00
|
|
|
noastre
|
|
|
|
noastră
|
|
|
|
noi
|
|
|
|
noroc
|
|
|
|
nostri
|
|
|
|
nostru
|
|
|
|
nou
|
|
|
|
noua
|
|
|
|
nouă
|
|
|
|
noştri
|
2018-05-10 10:16:56 +00:00
|
|
|
noștri
|
2017-11-01 22:04:28 +00:00
|
|
|
nu
|
|
|
|
numai
|
|
|
|
o
|
|
|
|
opt
|
|
|
|
or
|
|
|
|
ori
|
|
|
|
oricare
|
|
|
|
orice
|
|
|
|
oricine
|
|
|
|
oricum
|
|
|
|
oricând
|
|
|
|
oricât
|
|
|
|
oricînd
|
|
|
|
oricît
|
|
|
|
oriunde
|
|
|
|
p
|
|
|
|
pai
|
|
|
|
parca
|
|
|
|
patra
|
|
|
|
patru
|
|
|
|
patrulea
|
|
|
|
pe
|
|
|
|
pentru
|
|
|
|
peste
|
|
|
|
pic
|
|
|
|
pina
|
|
|
|
plus
|
|
|
|
poate
|
|
|
|
pot
|
|
|
|
prea
|
|
|
|
prima
|
|
|
|
primul
|
|
|
|
prin
|
|
|
|
printr-
|
|
|
|
putini
|
|
|
|
puţin
|
|
|
|
puţina
|
|
|
|
puţină
|
2018-05-10 10:16:56 +00:00
|
|
|
puțin
|
|
|
|
puțina
|
|
|
|
puțină
|
2017-11-01 22:04:28 +00:00
|
|
|
până
|
|
|
|
pînă
|
|
|
|
r
|
|
|
|
rog
|
|
|
|
s
|
|
|
|
sa
|
|
|
|
sa-mi
|
|
|
|
sa-ti
|
|
|
|
sai
|
|
|
|
sale
|
|
|
|
sau
|
|
|
|
se
|
|
|
|
si
|
|
|
|
sint
|
|
|
|
sintem
|
|
|
|
spate
|
|
|
|
spre
|
|
|
|
sub
|
|
|
|
sunt
|
|
|
|
suntem
|
|
|
|
sunteţi
|
2018-05-10 10:16:56 +00:00
|
|
|
sunteți
|
2017-11-01 22:04:28 +00:00
|
|
|
sus
|
|
|
|
sută
|
|
|
|
sînt
|
|
|
|
sîntem
|
|
|
|
sînteţi
|
2018-05-10 10:16:56 +00:00
|
|
|
sînteți
|
2017-11-01 22:04:28 +00:00
|
|
|
să
|
|
|
|
săi
|
|
|
|
său
|
|
|
|
t
|
|
|
|
ta
|
|
|
|
tale
|
|
|
|
te
|
|
|
|
ti
|
|
|
|
timp
|
|
|
|
tine
|
|
|
|
toata
|
|
|
|
toate
|
|
|
|
toată
|
|
|
|
tocmai
|
|
|
|
tot
|
|
|
|
toti
|
|
|
|
totul
|
|
|
|
totusi
|
|
|
|
totuşi
|
2018-05-10 10:16:56 +00:00
|
|
|
totuși
|
2017-11-01 22:04:28 +00:00
|
|
|
toţi
|
2018-05-10 10:16:56 +00:00
|
|
|
toți
|
2017-11-01 22:04:28 +00:00
|
|
|
trei
|
|
|
|
treia
|
|
|
|
treilea
|
|
|
|
tu
|
|
|
|
tuturor
|
|
|
|
tăi
|
|
|
|
tău
|
|
|
|
u
|
|
|
|
ul
|
|
|
|
ului
|
|
|
|
un
|
|
|
|
una
|
|
|
|
unde
|
|
|
|
undeva
|
|
|
|
unei
|
|
|
|
uneia
|
|
|
|
unele
|
|
|
|
uneori
|
|
|
|
unii
|
|
|
|
unor
|
|
|
|
unora
|
|
|
|
unu
|
|
|
|
unui
|
|
|
|
unuia
|
|
|
|
unul
|
|
|
|
v
|
|
|
|
va
|
|
|
|
vi
|
|
|
|
voastre
|
|
|
|
voastră
|
|
|
|
voi
|
|
|
|
vom
|
|
|
|
vor
|
|
|
|
vostru
|
|
|
|
vouă
|
|
|
|
voştri
|
2018-05-10 10:16:56 +00:00
|
|
|
voștri
|
2017-11-01 22:04:28 +00:00
|
|
|
vreme
|
|
|
|
vreo
|
|
|
|
vreun
|
|
|
|
vă
|
|
|
|
x
|
|
|
|
z
|
|
|
|
zece
|
|
|
|
zero
|
|
|
|
zi
|
|
|
|
zice
|
|
|
|
îi
|
|
|
|
îl
|
|
|
|
îmi
|
|
|
|
împotriva
|
|
|
|
în
|
|
|
|
înainte
|
|
|
|
înaintea
|
|
|
|
încotro
|
|
|
|
încât
|
|
|
|
încît
|
|
|
|
între
|
|
|
|
întrucât
|
|
|
|
întrucît
|
|
|
|
îţi
|
2018-05-10 10:16:56 +00:00
|
|
|
îți
|
2017-11-01 22:04:28 +00:00
|
|
|
ăla
|
|
|
|
ălea
|
|
|
|
ăsta
|
|
|
|
ăstea
|
|
|
|
ăştia
|
2018-05-10 10:16:56 +00:00
|
|
|
ăștia
|
2017-11-01 22:04:28 +00:00
|
|
|
şapte
|
|
|
|
şase
|
|
|
|
şi
|
|
|
|
ştiu
|
|
|
|
ţi
|
|
|
|
ţie
|
2018-05-10 10:16:56 +00:00
|
|
|
șapte
|
|
|
|
șase
|
|
|
|
și
|
|
|
|
știu
|
|
|
|
ți
|
|
|
|
ție
|
💫 Tidy up and auto-format .py files (#2983)
<!--- Provide a general summary of your changes in the title. -->
## Description
- [x] Use [`black`](https://github.com/ambv/black) to auto-format all `.py` files.
- [x] Update flake8 config to exclude very large files (lemmatization tables etc.)
- [x] Update code to be compatible with flake8 rules
- [x] Fix various small bugs, inconsistencies and messy stuff in the language data
- [x] Update docs to explain new code style (`black`, `flake8`, when to use `# fmt: off` and `# fmt: on` and what `# noqa` means)
Once #2932 is merged, which auto-formats and tidies up the CLI, we'll be able to run `flake8 spacy` actually get meaningful results.
At the moment, the code style and linting isn't applied automatically, but I'm hoping that the new [GitHub Actions](https://github.com/features/actions) will let us auto-format pull requests and post comments with relevant linting information.
### Types of change
enhancement, code style
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2018-11-30 16:03:03 +00:00
|
|
|
""".split()
|
|
|
|
)
|