Package pyarabic :: Module arabicchars' :: Class arabicchars
[hide private]
[frames] | no frames]

Class arabicchars

source code

the arabic chars contains all arabic letters, a sub class of unicode,

Instance Methods [hide private]
 
__init__() source code
 
hasHaraka(self, word)
Checks if the arabic word contains harakat ( FATHA, DAMMA, KASRA,.
source code
 
hassHadda(self, word)
Checks if the arabic word contains shadda.
source code
 
isAlef(self, archar)
Checks for Arabic Alef forms.
source code
Boolean
isArabicstring(self, text)
Checks for an Arabic Unicode block characters;
source code
Boolean
isArabicword(self, word)
Checks for an valid Arabic word.
source code
 
isHamza(self, archar)
Checks for Arabic Hamza forms.
source code
 
isHaraka(self, archar)
Checks for Arabic Harakat Marks (FATHA,DAMMA,KASRA,SUKUN,TANWIN).
source code
 
isLigature(self, archar)
Checks for Arabic Ligatures like LamAlef.
source code
 
isShadda(self, archar)
Checks for Arabic Shadda Mark.
source code
 
isShortharaka(self, archar)
Checks for Arabic short Harakat Marks (FATHA,DAMMA,KASRA,SUKUN).
source code
 
isSmall(self, archar)
Checks for Arabic Small letters.
source code
 
isSukun(self, archar)
Checks for Arabic Sukun Mark.
source code
 
isTanwin(self, archar)
Checks for Arabic Tanwin Marks (FATHATAN, DAMMATAN, KASRATAN).
source code
 
isTashkeel(self, archar)
Checks for Arabic Tashkeel Marks (FATHA,DAMMA,KASRA, SUKUN, SHADDA, FATHATAN,DAMMATAN, KASRATAn).
source code
 
isTatweel(self, archar)
Checks for Arabic Tatweel letter modifier.
source code
 
isTeh(self, archar)
Checks for Arabic Teh forms.
source code
 
isVocalized(self, word)
Checks if the arabic word is vocalized.
source code
 
isVocalizedtext(self, text)
Checks if the arabic text is vocalized.
source code
 
isWawlike(self, archar)
Checks for Arabic Waw like forms.
source code
 
isWeak(self, archar)
Checks for Arabic Weak letters.
source code
 
isYehlike(self, archar)
Checks for Arabic Yeh forms.
source code
 
stripHarakat(self, word)
Strip Harakat from arabic word except Shadda.
source code
 
stripTashkeel(self, word)
Strip Tashkeel from arabic word.
source code
 
stripTatweel(self, word)
Strip Tatweel (Kashida) from arabic word.
source code
Class Variables [hide private]
  AIN = u'ع'
  ALEF = u'ا'
  ALEFAT = (u'ا', u'آ', u'أ', u'إ', u'ٱ', u'ى', u'ٰ')
  ALEF_HAMZA_ABOVE = u'أ'
  ALEF_HAMZA_BELOW = u'إ'
  ALEF_MADDA = u'آ'
  ALEF_MAKSURA = u'ى'
  ALEF_WASLA = u'ٱ'
  BEH = u'ب'
  BYTE_ORDER_MARK = u''
  COMMA = u'،'
  DAD = u'ض'
  DAL = u'د'
  DAMMA = u'ُ'
  DAMMATAN = u'ٌ'
  DECIMAL = u'٫'
  EIGHT = u'٨'
  FATHA = u'َ'
  FATHATAN = u'ً'
  FEH = u'ف'
  FIVE = u'٥'
  FOUR = u'٤'
  FULL_STOP = u'۔'
  GHAIN = u'غ'
  HAH = u'ح'
  HAMZA = u'ء'
  HAMZAT = (u'ء', u'ؤ', u'ئ', u'ٔ', u'ٕ', u'إ', u'أ')
  HAMZA_ABOVE = u'ٔ'
  HAMZA_BELOW = u'ٕ'
  HARAKAT = re.compile(r'^[\u064b\u064c\u064d\u064e\u064f\u0650\...
  HARAKAT_NO_SHADDA_pat = re.compile(r'^[\u064b\u064c\u064d\u064...
  HARAKAT_pat = re.compile(r'^[\u064b\u064c\u064d\u064e\u064f\u0...
  HEH = u'ه'
  JEEM = u'ج'
  KAF = u'ك'
  KASRA = u'ِ'
  KASRATAN = u'ٍ'
  KHAH = u'خ'
  LAM = u'ل'
  LAM_ALEF = u''
  LAM_ALEF_HAMZA_ABOVE = u''
  LAM_ALEF_HAMZA_BELOW = u''
  LAM_ALEF_MADDA_ABOVE = u''
  LIGUATURES = (u'', u'', u'', u'')
  MADDA_ABOVE = u'ٓ'
  MEEM = u'م'
  MINI_ALEF = u'ٰ'
  MOON = (u'ت', u'ة')
  NINE = u'٩'
  NOON = u'ن'
  ONE = u'١'
  PERCENT = u'٪'
  QAF = u'ق'
  QUESTION = u'؟'
  REH = u'ر'
  SAD = u'ص'
  SEEN = u'س'
  SEMICOLON = u'؛'
  SEVEN = u'٧'
  SHADDA = u'ّ'
  SHEEN = u'ش'
  SHORTHARAKAT = (u'َ', u'ُ', u'ِ', u'ْ')
  SIX = u'٦'
  SMALL = (u'ت', u'ة')
  SMALL_ALEF = u'ٰ'
  SMALL_WAW = u'ۥ'
  SMALL_YEH = u'ۦ'
  STAR = u'٭'
  SUKUN = u'ْ'
  SUN = (u'ت', u'ة')
  TAH = u'ط'
  TANWIN = (u'ً', u'ٌ', u'ٍ')
  TASHKEEL = (u'ً', u'ٌ', u'ٍ', u'َ', u'ُ', u'ِ', u'ْ', u'ّ')
  TASHKEEL_pat = re.compile(r'^[\u064b\u064c\u064d\u064e\u064f\u...
  TATWEEL = u'ـ'
  TEH = u'ت'
  TEHLIKE = (u'ت', u'ة')
  TEH_MARBUTA = u'ة'
  THAL = u'ذ'
  THEH = u'ث'
  THOUSANDS = u'٬'
  THREE = u'٣'
  TWO = u'٢'
  WAW = u'و'
  WAWLIKE = (u'و', u'ؤ', u'ۥ')
  WAW_HAMZA = u'ؤ'
  WEAK = (u'ا', u'و', u'ي', u'ى')
  YEH = u'ي'
  YEHLIKE = (u'ي', u'ئ', u'ى', u'ۦ')
  YEH_HAMZA = u'ئ'
  ZAH = u'ظ'
  ZAIN = u'ز'
  ZERO = u'٠'
  simple_LAM_ALEF = u'لا'
  simple_LAM_ALEF_HAMZA_ABOVE = u'لأ'
  simple_LAM_ALEF_HAMZA_BELOW = u'لإ'
  simple_LAM_ALEF_MADDA_ABOVE = u'لآ'
Method Details [hide private]

hasHaraka(self, word)

source code 

Checks if the arabic word contains harakat ( FATHA, DAMMA, KASRA,.

Parameters:
  • word (unicode) - arabic unicode char

hassHadda(self, word)

source code 

Checks if the arabic word contains shadda.

Parameters:
  • word (unicode) - arabic unicode char

isAlef(self, archar)

source code 

Checks for Arabic Alef forms. ALEFAT=(ALEF, ALEF_MADDA, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW,ALEF_WASLA, ALEF_MAKSURA );

Parameters:
  • archar (unicode) - arabic unicode char

isArabicstring(self, text)

source code 

Checks for an Arabic Unicode block characters;

Parameters:
  • text (unicode) - input text
Returns: Boolean
True if all charaters are in Arabic block

isArabicword(self, word)

source code 

Checks for an valid Arabic word. An Arabic word

Parameters:
  • word (unicode) - input word
Returns: Boolean
True if all charaters are in Arabic block

isHamza(self, archar)

source code 

Checks for Arabic Hamza forms. HAMZAT are (HAMZA, WAW_HAMZA, YEH_HAMZA, HAMZA_ABOVE, HAMZA_BELOW,ALEF_HAMZA_BELOW, ALEF_HAMZA_ABOVE )

Parameters:
  • archar (unicode) - arabic unicode char

isHaraka(self, archar)

source code 

Checks for Arabic Harakat Marks (FATHA,DAMMA,KASRA,SUKUN,TANWIN).

Parameters:
  • archar (unicode) - arabic unicode char

isLigature(self, archar)

source code 

Checks for Arabic Ligatures like LamAlef. (LAM_ALEF, LAM_ALEF_HAMZA_ABOVE, LAM_ALEF_HAMZA_BELOW, LAM_ALEF_MADDA_ABOVE)

Parameters:
  • archar (unicode) - arabic unicode char

isShadda(self, archar)

source code 

Checks for Arabic Shadda Mark.

Parameters:
  • archar (unicode) - arabic unicode char

isShortharaka(self, archar)

source code 

Checks for Arabic short Harakat Marks (FATHA,DAMMA,KASRA,SUKUN).

Parameters:
  • archar (unicode) - arabic unicode char

isSmall(self, archar)

source code 

Checks for Arabic Small letters. SMALL Letters : SMALL ALEF, SMALL WAW, SMALL YEH

Parameters:
  • archar (unicode) - arabic unicode char

isSukun(self, archar)

source code 

Checks for Arabic Sukun Mark.

Parameters:
  • archar (unicode) - arabic unicode char

isTanwin(self, archar)

source code 

Checks for Arabic Tanwin Marks (FATHATAN, DAMMATAN, KASRATAN).

Parameters:
  • archar (unicode) - arabic unicode char

isTashkeel(self, archar)

source code 

Checks for Arabic Tashkeel Marks (FATHA,DAMMA,KASRA, SUKUN, SHADDA, FATHATAN,DAMMATAN, KASRATAn).

Parameters:
  • archar (unicode) - arabic unicode char

isTatweel(self, archar)

source code 

Checks for Arabic Tatweel letter modifier.

Parameters:
  • archar (unicode) - arabic unicode char

isTeh(self, archar)

source code 

Checks for Arabic Teh forms. Teh forms : TEH, TEH_MARBUTA

Parameters:
  • archar (unicode) - arabic unicode char

isVocalized(self, word)

source code 

Checks if the arabic word is vocalized. the word musn't have any spaces and pounctuations.

Parameters:
  • word (unicode) - arabic unicode char

isVocalizedtext(self, text)

source code 

Checks if the arabic text is vocalized. The text can contain many words and spaces

Parameters:
  • text (unicode) - arabic unicode char

isWawlike(self, archar)

source code 

Checks for Arabic Waw like forms. Waw forms : WAW, WAW_HAMZA, SMALL_WAW

Parameters:
  • archar (unicode) - arabic unicode char

isWeak(self, archar)

source code 

Checks for Arabic Weak letters. Weak Letters : ALEF, WAW, YEH, ALEF_MAKSURA

Parameters:
  • archar (unicode) - arabic unicode char

isYehlike(self, archar)

source code 

Checks for Arabic Yeh forms. Yeh forms : YEH, YEH_HAMZA, SMALL_YEH, ALEF_MAKSURA

Parameters:
  • archar (unicode) - arabic unicode char

stripHarakat(self, word)

source code 

Strip Harakat from arabic word except Shadda. Harakat doesn't contain Shdda. to strip all Harakat and Shadda, use stripTashkeel function.

Parameters:
  • word (unicode) - arabic unicode char

stripTashkeel(self, word)

source code 

Strip Tashkeel from arabic word. Tashkeel contains (Harakat and Shadda) to strip all Harakat and Shadda, use stripTashkeel function.

Parameters:
  • word (unicode) - arabic unicode char

stripTatweel(self, word)

source code 

Strip Tatweel (Kashida) from arabic word.

Parameters:
  • word (unicode) - arabic unicode char

Class Variable Details [hide private]

HARAKAT

Value:
re.compile(r'^[\u064b\u064c\u064d\u064e\u064f\u0650\u0652]$')

HARAKAT_NO_SHADDA_pat

Value:
re.compile(r'^[\u064b\u064c\u064d\u064e\u064f\u0650\u0652]$')

HARAKAT_pat

Value:
re.compile(r'^[\u064b\u064c\u064d\u064e\u064f\u0650\u0652]$')

TASHKEEL_pat

Value:
re.compile(r'^[\u064b\u064c\u064d\u064e\u064f\u0650\u0652\u0651]$')