Next: , Previous: Defining new search patterns, Up: Customizing through XML and Python files


16.5.12 Adding support for new languages

You can define new languages in a custom file by using the Language tag. Defining languages gives GPS the ability to highlight the syntax of a file, explore a file (using e.g. the project view), find files associated with a given language, ...

As described previously for menu items, any file in the plug-ins directory will be loaded by GPS at start up. Therefore, you can either define new languages in a separate file, or reuse a file where you already define actions and menus.

The following tags are available in a Language section:

Name
A short string describing the name of the language.
Parent
If set to the name of an existing language (e.g. Ada, C++) or another custom language, this language will inherit by default all its properties from this language. Any field explicitly defined for this language will override the inherited settings.
Spec_Suffix
A string describing the suffix of spec/definition files for this language. If the language does not have the notion of spec or definition file, you can ignore this value, and consider using the Extension tag instead. This tag must be unique.
Body_Suffix
A string describing the suffix of body/implementation files for this language. This tag works in coordination with the Spec_Suffix, so that the user can choose to easily go from one file to the other. This tag must be unique.
Extension
A string describing one of the valid extensions for this language. There can be several such children. The extension must start with a '.' character
Keywords
A V7 style regular expression for recognizing and highlighting keywords. Multiple Keywords tags can be specified, and will be concatenated into a single regular expression. If the regular expression needs to match characters other than letters and underscore, you must also edit the Wordchars node. If a parent language has been specified for the current language definition it is possible to append to the parent Keywords by setting the mode attribute to append, the default value is override meaning that the keywords definition will replace the parent's one.

The full grammar of the regular expression can be found in the spec of the file g-regpat.ads in the GNAT run time.

Wordchars
Most languages have keywords that only contain letters, digits and underscore characters. However, if you want to also include other special characters (for instance '<' and '>' in XML), you need to use this tag to let GPS know. The value of this node is a string made of all the special word characters. You do not need to include letters, digits or underscores.
Engine
The name of a dynamic library providing one or several of the functions described below.

The name can be a full pathname, or a short name. E.g. under most Unix systems if you specify custom, GPS will look for libcustom.so in the LD_LIBRARY_PATH run time search path. You can also specify explicitly e.g. libcustom.so or /usr/lib/libcustom.so.

For each of the following five items, GPS will look for the corresponding symbol in Engine and if found, will call this symbol when needed. Otherwise, it will default to the static behavior, as defined by the other language-related items describing a language.

You will find the required specification for the C and Ada languages to implement the following functions in the directory <prefix>/share/examples/gps/language of your GPS installation. language_custom.ads is the Ada spec file; language_custom.h is the C spec file; gpr_custom.ad? are example files showing a possible Ada implementation of the function Comment_Line for the GPS project files (.gpr files), or any other Ada-like language; gprcustom.c is the C version of gpr_custom.adb.

Comment_Line
Name of a symbol in the specified shared library corresponding to a function that will comment or uncomment a line (used to implement the menu Edit->Un/Comment Lines).
Parse_Constructs
Name of a symbol in the specified shared library corresponding to a function that will parse constructs of a given buffer.

This procedure is used by GPS to implement several capabilities such as listing constructs in the project view, highlighting the current block of code, going to the next or previous procedure, ...

Format_Buffer
Name of a symbol in the specified shared library corresponding to a function that will indent and format a given buffer.

This procedure is used to implement the auto indentation when hitting the <enter> key, or when using the format key on the current selection or the current line.

Parse_Entities
Name of a symbol in the specified shared library corresponding to a function that will parse entities (e.g. comments, keywords, ...) of a given buffer. This procedure is used to highlight the syntax of a file, and overrides the Context node described below.
Context
Describes the context used to highlight the syntax of a file.
Comment_Start
A string defining the beginning of a multiple-line comment.
Comment_End
A string defining the end of a multiple-line comment.
New_Line_Comment_Start
A regular expression defining the beginning of a single line comment (ended at the next end of line). This regular expression may contain multiple possible line starts, such as ;|# for comments starting after a semicolon or after the hash sign. If a parent language has been specified for the current language definition it is possible to append to the parent New_Line_Comment_Start by setting the mode attribute to append, the default value is override meaning that the New_Line_Comment_Start definition will replace the parent's one.
String_Delimiter
A character defining the string delimiter.
Quote_Character
A character defining the quote character, used for e.g. canceling the meaning of a string delimiter (\ in C).
Constant_Character
A character defining the beginning of a character literal.
Can_Indent
A boolean indicating whether indentation should be enabled for this language. The indentation mechanism used will be the same for all languages: the number of spaces at the beginning of the current line is used when indenting the next line.
Syntax_Highlighting
A boolean indicating whether the syntax should be highlighted/colorized.
Case_Sensitive
A boolean indicating whether the language (and in particular the identifiers and keywords) is case sensitive.

Categories
Optional node to describe the categories supported by the project view for the current language. This node contains a list of Category nodes, each describing the characteristics of a given category, with the following nodes:
Name
Name of the category, which can be either one of the following predefined categories: package, namespace, procedure, function, task, method, constructor, destructor, protected, entry, class, structure, union, type, subtype, variable, local_variable, representation_clause, with, use, include, loop_statement, case_statement, if_statement, select_statement, accept_statement, declare_block, simple_block, exception_handler, or any arbitrary name, which will create a new category.
Pattern
Regular expression used to detect a language category. As for the Keywords node, multiple Pattern tags can be specified and will be concatenated into a single regular expression.
Index
Index in the pattern used to extract the name of the entity contained in this category.
End_Index
Optional attribute that indicates the index in the pattern used to start the next search. Default value is the end of the pattern.
Icon
Name of a stock icon that should be used for that category (see Adding stock icons). This attribute is currently ignored, and is reserved for future uses.

Here is an example of a possible language definition for the GPS project files:

     <?xml version="1.0"?>
     <Custom>
       <Language>
         <Name>Project File</Name>
         <Spec_Suffix>.gpr</Spec_Suffix>
         <Keywords>^(case|e(nd|xte(nds|rnal))|for|is|</Keywords>
         <Keywords>limited|null|others|</Keywords>
         <Keywords>p(ackage|roject)|renames|type|use|w(hen|ith))\b</Keywords>
     
         <Context>
           <New_Line_Comment_Start>--</New_Line_Comment_Start>
           <String_Delimiter>&quot;</String_Delimiter>
           <Constant_Character>&apos;</Constant_Character>
           <Can_Indent>True</Can_Indent>
           <Syntax_Highlighting>True</Syntax_Highlighting>
           <Case_Sensitive>False</Case_Sensitive>
         </Context>
     
         <Categories>
           <Category>
             <Name>package</Name>
             <Pattern>^[ \t]*package[ \t]+((\w|\.)+)</Pattern>
             <Index>1</Index>
           </Category>
           <Category>
             <Name>type</Name>
             <Pattern>^[ \t]*type[ \t]+(\w+)</Pattern>
             <Index>1</Index>
           </Category>
         </Categories>
     
         <Engine>gpr</Engine>
         <Comment_Line>gpr_comment_line</Comment_Line>
       </Language>
     </Custom>