jEdit

Programmer's Text Editor

jEdit Home

As I was saying, jEdit is nice cross-platform Programmer's text editor. I use if for quick regular expression processing of text files. I use it for other things as well but regular expressions are something few windows programs support.

Regular Expressions

I'll write an introduction to regular expressions at some point. If you are curious, you can search the web for regular expressions1) 2) or for “regex” as they are sometimes referred to. But let me give you a short example of something you can do with regular expressions.

Regex Example Search/Replace

I had a web sitemap list with URLs followed by information such as date and time last modified. I wanted to create a list of just the pages without the complete domain name. That meant deleting the “http://www.somedomain.com” at the beginning of each line (not difficult with most editors) and deleting everything after the end of the page name - all the date/time info to the end of the line.

Prepare yourself. Looking at regex if you aren't used to it can make your eyes bleed. Or at least give you a headache. Here's a search that will match the entire line but save only the bit I want.

^.*\.com(/\w*).*$

Here are some example lines I'm trying to match3):

http://www.example.com/about.html	2009-07-11T21:28:11Z	weekly	0.5
http://www.example.com/contact.html	2009-07-11T21:28:11Z	weekly	0.5

This is what I want the results to look like.

about.html
contact.html

Here's what the Search/Replace dialog box looks like in jEdit to get that result.

jEdit example Regex Search Replace

Regex expression decoded

What does it mean? The replacement expression, “$1” uses whatever text was matched in the first parenthesis (the only parentheses in this example) and inserts it into the replacement string. Using () to get matching text is referred to as “capturing” the match. If you look at the Notepad++ entry you will see that Notepad++ uses a different nomenclature for inserting captured text.

I'll repeat the search term below and then explain the meaning of the symbols:

^.*\.com(/\w*).*$
  • The “^” means start matching at the beginning of the line.
  • The ”.” means match any character. The combination of ”.*” says to match any number of characters (from none to all).
  • The “\” before the period is an escape character which means treat the next character in the string as a normal character, that is a period. Otherwise the period would mean to match any character again.
  • “com” will be matched exactly.
  • The () surrounding the next characters allows you to capture whatever is matched between the parentheses.
  • “\w” is a shortcut meaning to match any alphanumeric characters. The “*” after it tells the regex to match any number of contiguous alphanumeric characters.
  • ”.*” matches any number of characters until “$” which is the symbol for the end of the line.

One note that could confuse you (if you're not already confused) is if you were searching more complex text the regex shown could cause problems. By default regex patterns are described to be “greedy” which means they match the biggest chunk of text that fits the expression. If ”.com” appeared more than once on a line I was searching the “^.*\.com” would match everything to the last occurrence of ”.com”. To make the expression “non-greedy” we would change it to be “^.*?\.com”

Testing Regex Expressions with jEdit

If you are just learning regex, you can test your expressions with jEdit. The next image is a screen shot where I've searched for

\.com/\w*

This matches the first .com (the \. is to escape the period which is a special character in regex); the slash after the .com and then the next contiguous set of alphanumeric characters (the \w implies this).

Testing a regex in jEdit

jEdit highlights the first match. If I hit RETURN while in the search box, the next match would be highlighted.


Up one level
1) Regular-Expressions.info has a good introduction to regular expressions. Beware, the author is sells commercial products for regex and a text editor.
2) Regular Expressions is the Wikipedia article on regular expressions
3) If the list were only two items long I would just manually edit the contents. You can assume I had a much longer list where the regex search/replace saved me time.
 
kb/tools/jedit.txt · Last modified: 2009-07-12 12:35 pm by admin
Recent changes RSS feed Creative Commons License Driven by DokuWiki
Basically Brilliant! Home
Basically Brilliant! Blog