|
Programming C, bash, Python, Perl, PHP, Java, you name it. |
|
Thread Tools | Display Modes |
|
|||
sorting special characters
Hello everybody!
Here I am again with my formatting and sorting problems. How do I tell sort to put "š" just after "s" and not at the end after "z"? Or is there a better way than using sort? |
|
|||
Quote:
Quote:
The standard tools allow for standard usage. Anything beyond this is better done through more sophisticated options. |
|
|||
Thanks, I guess it's time to start awk and perl
|
|
|||
Given many of your recent questions, either language (as well as Python...) can do the job as there is overlap in their functionalities. What you should do next is look at a little of each & determine which seems more intuitive in terms of syntax, usage, script construction, etc.
Recognize that if you are simply wanting custom sorting, then awk(1) may very well be your best choice (for now...). However, if you continue down this path of wanting custom scripts for this or that need, then you should begin assessing which language meets your more long term goals & go with the best choice. It takes time & effort to mount the learning curve of any language, & continually flipping from one choice to the next is counterproductive. |
|
||||
Isn't it wonderful how XKCD always has something applicable?
perl: http://xkcd.com/208/ python: http://xkcd.com/353/ |
|
|||
Final comment (on this subject...).
I am not aware of any awk-specific mailing lists or help sites, but then, I have never had need of one myself, so I haven't done extensive searching. However, if you choose Perl and/or Python, consider the following.
As for Python, O'Reilly has some good titles, but they did not capture the Python book market as they did with Perl. Python came out after Perl, & the industry was at a different point in its maturation. These may be contributing factors as to the difference. Last edited by ocicat; 20th March 2009 at 09:14 PM. |
|
|||
Thanks a lot for your suggestions.
I think right now I might first use awk, which seems from the outside "smaller" and "simpler", but then I'll have to learn at least Perl. In fact, yesterday I've found a converting tool (Encode::HanConvert) which I will need very often to convert simplified chinese characters to traditional ones and vice-versa. This tool is in Perl, so I guess it has all I need. As far as Python goes, I presently cannot understand the difference between the two, so maybe with time I will. |
|
|||
jggmi, the comics are really nice
|
|
|||
Quote:
How this "ease of use" translates to those speaking Chinese is unknown to me. Maybe the simplicity doesn't translate at all. As for the goals of both languages, they are very similar, but Perl comes from a heritage inheriting the syntax & mindset of both shell & C programming. Python doesn't duplicate this lineage. And for what it is worth, awk also inherits various idiosyncrasies from both shell & C programming. awk has a lot of power & served as a prominent scripting language alternative until Perl (& later Python...) arrived on the scene. |
|
|||
Well, I'm neither English nor Chinese mother tongue, so the "Englishness" does not make a big difference to me. Maybe with time I might learn all the three languages, but now I'll go first for awk and then Perl, and if its syntax is similar to shell and C, it will also help me understan Unix better, I think.
|
|
||||
Hi.
Quote:
I use awk mostly for field-related, single-shot programs. If I needed advice, I would ask at http://www.unix.com/shell-programming-scripting/ -- that's a hot-bed of awk questions and answers. I have seen some very complex and creative solutions there, as well as gentle answers for novice users. As usual, it is in one's best interest to try to solve a problem first, then -- as necessary -- post sample input, desired results, and actual results. That forum is also good for perl questions. Best wishes ... cheers, drl |
|
|||
I see, thank you for the explanation.
In the meantime, if anyone could direct me to the relevant part of awk or perl I should study first to solve my sorting problem, I'd be very grateful (could not find it on google). |
|
||||
awk is great; the syntax is a lot like C, only less finicky about declarations. So if you know C you can get started quickly (and perhaps vice versa).
But long ago in my first brushes with awk, I was very confused and bogged down in the command-line syntax, patterns and pre-defined variables. The big picture was missing, and it really didn't start to click until I realized a simple analogy that made it clear. So here's my mini-contribution to awk 101 . (For those who know awk, allow me the leniency of over-simplification in descrbing this analogy.) In a language like C, the functions have names. The code within the function block gets executed when the function is called by name, either from another such function or from main(). The analogy is that awk is like this, except the "functions" don't have a name: instead they have a pattern associated with them. The code in a "function" block gets executed when the pattern matches (part of) an input-data line. To me, that's awk in a nutshell, the rest is details. (Of course, the "functions" are called "action statements" and awk does have named functions of its own just like in C.) Happy awking! Last edited by IdOp; 21st March 2009 at 06:42 PM. |
|
|||
sorting non ascii chars
ok guys, I started to read tutorials and all. I've also found a tool which should help with this sorting of mine: Unicode::Collate (from cpan).
now I have this test file: Code:
abc aab bbc mmn lmn aaa ššš sss zzz Code:
$ sort test aaa aab abc bbc lmn mmn sss zzz ššš Code:
aaa ššš aab abc bbc lmn mmn sss zzz Do I have to explicitly tell Perl where to put them? How? Here's the script (don't laugh too loud): Code:
use Unicode::Collate; $Collator = Unicode::Collate->new(%tailoring); open (NAMES_FILE, "< path-to-my-file") or die "Failed to read file : $! "; my @not_sorted = <NAMES_FILE>; # read entire file in the array @sorted = $Collator->sort(@not_sorted); print @sorted; close (NAMES_FILE); use Unicode::Collate; Code:
#construct $Collator = Unicode::Collate->new(%tailoring); #sort @sorted = $Collator->sort(@not_sorted); #compare $result = $Collator->cmp($a, $b); # returns 1, 0, or -1. # If %tailoring is false (i.e. empty), # $Collator should do the default collation. |
|
|||
perl sorting non ascii chars SOLVED
I know, you want me to study and work it out by myself.
Actually I've finally found a good tutorial page on this, I simply did not search with the right key before on google. Here's the link: http://interglacial.com/~sburke/tpj/as_html/tpj14.html In my personal case, I have two extra letters to sort: š and ū. I've made this test file: Code:
abc aab bbc mmn lmn aaa ššš sss zzz ccc ggg uuu šas saš cab uuū ūuu ūūū Code:
use strict; use warnings; open (_file_, "< absolute-path-to-file") or die "Failed to read file : $! "; my @not_sorted = <_file_>; sub normalize { my $in = $_[0]; $in = lc($in); $in =~ tr<aeiouū> <aeiouu>; $in =~ tr<abcdefghijklmnopqrsštuvwxyz> <\x01-\x1B>; #hexadecimal numbers to tell Perl you have 27 letters to sort return $in; } my @sorted = sort{ normalize($a) cmp normalize($b)or $a cmp $b} @not_sorted; print @sorted; close (_file_); Code:
aaa aab abc bbc cab ccc ggg lmn mmn saš sss šas ššš uuu uuū ūuu ūūū zzz |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Input foreign characters under X11 | Beastie | General software and network | 5 | 30th August 2009 11:51 AM |
ls sorting of numbered files | gosha | General software and network | 6 | 11th April 2009 01:07 PM |
Username longer than 16 characters | _hmp_ | FreeBSD General | 5 | 13th January 2009 10:01 AM |
Sorting Packages | JMJ_coder | NetBSD Package System (pkgsrc) | 3 | 20th May 2008 01:08 AM |