DaemonForums  

Go Back   DaemonForums > Miscellaneous > Programming

Programming C, bash, Python, Perl, PHP, Java, you name it.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 20th March 2009
gosha gosha is offline
Spam Deminer
 
Join Date: Jun 2008
Location: China
Posts: 256
Default sorting special characters

Hello everybody!
Here I am again with my formatting and sorting problems.

How do I tell sort to put "š" just after "s" and not at the end after "z"?
Or is there a better way than using sort?
Reply With Quote
  #2   (View Single Post)  
Old 20th March 2009
ocicat ocicat is offline
Administrator
 
Join Date: Apr 2008
Posts: 3,318
Default

Quote:
Originally Posted by gosha View Post
How do I tell sort to put "š" just after "s" and not at the end after "z"?
According to the manpage for sort(1), it only sorts lexicographically. The only other knob is to ignore case.
Quote:
Or is there a better way than using sort?
Use awk(1), perl(1), or some other scripting language which will allow writing your own custom sorting routine.

The standard tools allow for standard usage. Anything beyond this is better done through more sophisticated options.
Reply With Quote
  #3   (View Single Post)  
Old 20th March 2009
gosha gosha is offline
Spam Deminer
 
Join Date: Jun 2008
Location: China
Posts: 256
Default

Thanks, I guess it's time to start awk and perl
Reply With Quote
  #4   (View Single Post)  
Old 20th March 2009
ocicat ocicat is offline
Administrator
 
Join Date: Apr 2008
Posts: 3,318
Default

Quote:
Originally Posted by gosha View Post
Thanks, I guess it's time to start awk and perl
Given many of your recent questions, either language (as well as Python...) can do the job as there is overlap in their functionalities. What you should do next is look at a little of each & determine which seems more intuitive in terms of syntax, usage, script construction, etc.

Recognize that if you are simply wanting custom sorting, then awk(1) may very well be your best choice (for now...). However, if you continue down this path of wanting custom scripts for this or that need, then you should begin assessing which language meets your more long term goals & go with the best choice. It takes time & effort to mount the learning curve of any language, & continually flipping from one choice to the next is counterproductive.
Reply With Quote
  #5   (View Single Post)  
Old 20th March 2009
jggimi's Avatar
jggimi jggimi is offline
More noise than signal
 
Join Date: May 2008
Location: USA
Posts: 7,975
Default

Isn't it wonderful how XKCD always has something applicable?

perl: http://xkcd.com/208/
python: http://xkcd.com/353/
Reply With Quote
  #6   (View Single Post)  
Old 20th March 2009
ocicat ocicat is offline
Administrator
 
Join Date: Apr 2008
Posts: 3,318
Default

Final comment (on this subject...).

I am not aware of any awk-specific mailing lists or help sites, but then, I have never had need of one myself, so I haven't done extensive searching.

However, if you choose Perl and/or Python, consider the following.Although this partial/simple book list is O'Reilly-centric, O'Reilly cornered the market when it comes to Perl titles. Other good non-O'Reilly titles exist, but when starting out with the language, staying with the animal books is a reasonable choice.

As for Python, O'Reilly has some good titles, but they did not capture the Python book market as they did with Perl. Python came out after Perl, & the industry was at a different point in its maturation. These may be contributing factors as to the difference.

Last edited by ocicat; 20th March 2009 at 09:14 PM.
Reply With Quote
  #7   (View Single Post)  
Old 20th March 2009
gosha gosha is offline
Spam Deminer
 
Join Date: Jun 2008
Location: China
Posts: 256
Default

Thanks a lot for your suggestions.
I think right now I might first use awk, which seems from the outside "smaller" and "simpler", but then I'll have to learn at least Perl. In fact, yesterday I've found a converting tool (Encode::HanConvert) which I will need very often to convert simplified chinese characters to traditional ones and vice-versa. This tool is in Perl, so I guess it has all I need. As far as Python goes, I presently cannot understand the difference between the two, so maybe with time I will.
Reply With Quote
  #8   (View Single Post)  
Old 20th March 2009
gosha gosha is offline
Spam Deminer
 
Join Date: Jun 2008
Location: China
Posts: 256
Default

jggmi, the comics are really nice
Reply With Quote
  #9   (View Single Post)  
Old 20th March 2009
ocicat ocicat is offline
Administrator
 
Join Date: Apr 2008
Posts: 3,318
Default

Quote:
Originally Posted by gosha View Post
As far as Python goes, I presently cannot understand the difference between the two, so maybe with time I will.
From the perspective of the English speaking hordes, Python's syntax is more "English"-like without the plethora of special characters & special nuisances required by other languages (specifically Perl). Some find this minimized amount of computer science cruft makes Python easier to write than other languages modeled on C (like Perl). Personally, I don't have such misgivings about Perl, but I know many that do.

How this "ease of use" translates to those speaking Chinese is unknown to me. Maybe the simplicity doesn't translate at all.

As for the goals of both languages, they are very similar, but Perl comes from a heritage inheriting the syntax & mindset of both shell & C programming. Python doesn't duplicate this lineage.

And for what it is worth, awk also inherits various idiosyncrasies from both shell & C programming. awk has a lot of power & served as a prominent scripting language alternative until Perl (& later Python...) arrived on the scene.
Reply With Quote
Old 20th March 2009
gosha gosha is offline
Spam Deminer
 
Join Date: Jun 2008
Location: China
Posts: 256
Default

Well, I'm neither English nor Chinese mother tongue, so the "Englishness" does not make a big difference to me. Maybe with time I might learn all the three languages, but now I'll go first for awk and then Perl, and if its syntax is similar to shell and C, it will also help me understan Unix better, I think.
Reply With Quote
Old 21st March 2009
drl's Avatar
drl drl is offline
Port Guard
 
Join Date: May 2008
Posts: 19
Default

Hi.
Quote:
Originally Posted by ocicat View Post
Final comment (on this subject...).

I am not aware of any awk-specific mailing lists or help sites, but then, I have never had need of one myself, so I haven't done extensive searching.

...
There is a lot of information at http://awk.info/

I use awk mostly for field-related, single-shot programs. If I needed advice, I would ask at http://www.unix.com/shell-programming-scripting/ -- that's a hot-bed of awk questions and answers. I have seen some very complex and creative solutions there, as well as gentle answers for novice users. As usual, it is in one's best interest to try to solve a problem first, then -- as necessary -- post sample input, desired results, and actual results.

That forum is also good for perl questions.

Best wishes ... cheers, drl
Reply With Quote
Old 21st March 2009
Carpetsmoker's Avatar
Carpetsmoker Carpetsmoker is offline
Real Name: Martin
Tcpdump Spy
 
Join Date: Apr 2008
Location: Netherlands
Posts: 2,243
Default

Quote:
Originally Posted by gosha View Post
Well, I'm neither English nor Chinese mother tongue, so the "Englishness" does not make a big difference to me. Maybe with time I might learn all the three languages, but now I'll go first for awk and then Perl, and if its syntax is similar to shell and C, it will also help me understan Unix better, I think.
This is not what ocicat meant, he meant that python is more like a natural language (ANY language), and has less syntax, for example python doesn't require a semicolon ( at the end of each statement, python doesn't require curly braces ({ }) and parenthesis ( () ) at many places that most other languages do, and so forth.

This is very different from other languages which sometimes require excessive parenthesis (*cough* lisp *cough*).
The syntax of many languages seems to be designed so that the parser/compiler can easily understand&read the language, python syntax is designed so that it is easier for humans to understand&read the language ... This may make the compiler slightly harder to write, but you only write a compiler once, and you write code many times.
__________________
UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things.
Reply With Quote
Old 21st March 2009
gosha gosha is offline
Spam Deminer
 
Join Date: Jun 2008
Location: China
Posts: 256
Default

I see, thank you for the explanation.
In the meantime, if anyone could direct me to the relevant part of awk or perl I should study first to solve my sorting problem, I'd be very grateful (could not find it on google).
Reply With Quote
Old 21st March 2009
IdOp's Avatar
IdOp IdOp is offline
Too dumb for a smartphone
 
Join Date: May 2008
Location: twisting on the daemon's fork(2)
Posts: 1,027
Default

awk is great; the syntax is a lot like C, only less finicky about declarations. So if you know C you can get started quickly (and perhaps vice versa).

But long ago in my first brushes with awk, I was very confused and bogged down in the command-line syntax, patterns and pre-defined variables. The big picture was missing, and it really didn't start to click until I realized a simple analogy that made it clear.

So here's my mini-contribution to awk 101 . (For those who know awk, allow me the leniency of over-simplification in descrbing this analogy.) In a language like C, the functions have names. The code within the function block gets executed when the function is called by name, either from another such function or from main().

The analogy is that awk is like this, except the "functions" don't have a name: instead they have a pattern associated with them. The code in a "function" block gets executed when the pattern matches (part of) an input-data line.

To me, that's awk in a nutshell, the rest is details. (Of course, the "functions" are called "action statements" and awk does have named functions of its own just like in C.)

Happy awking!

Last edited by IdOp; 21st March 2009 at 06:42 PM.
Reply With Quote
Old 7th April 2009
gosha gosha is offline
Spam Deminer
 
Join Date: Jun 2008
Location: China
Posts: 256
Default sorting non ascii chars

ok guys, I started to read tutorials and all. I've also found a tool which should help with this sorting of mine: Unicode::Collate (from cpan).
now I have this test file:
Code:
abc
aab
bbc
mmn
lmn
aaa
ššš
sss
zzz
if I sort it I get this:
Code:
$ sort test
aaa
aab
abc
bbc
lmn
mmn
sss
zzz
ššš
if I sort it with this Perl script I worked out with the usage indications of Unicode::Collate, I get this (and it's really slow!):
Code:
aaa
ššš
aab
abc
bbc
lmn
mmn
sss
zzz
As you see, the "ššš" are not after "z" which is already an improvement, but they should be right after "s".
Do I have to explicitly tell Perl where to put them? How?
Here's the script (don't laugh too loud):
Code:
use Unicode::Collate;
$Collator = Unicode::Collate->new(%tailoring);
open (NAMES_FILE, "< path-to-my-file")  or  die "Failed to read file : $! ";
my @not_sorted = <NAMES_FILE>;  # read entire file in the array
@sorted  = $Collator->sort(@not_sorted);
print @sorted;
close (NAMES_FILE);
This is the synoposis of Unicode::Collate, but I'm not grasping it very well yet:
use Unicode::Collate;

Code:
  #construct
  $Collator = Unicode::Collate->new(%tailoring);

  #sort
  @sorted = $Collator->sort(@not_sorted);

  #compare
  $result = $Collator->cmp($a, $b); # returns 1, 0, or -1.

  # If %tailoring is false (i.e. empty),
  # $Collator should do the default collation.
Reply With Quote
Old 9th April 2009
gosha gosha is offline
Spam Deminer
 
Join Date: Jun 2008
Location: China
Posts: 256
Default perl sorting non ascii chars SOLVED

I know, you want me to study and work it out by myself.
Actually I've finally found a good tutorial page on this, I simply did not search with the right key before on google. Here's the link: http://interglacial.com/~sburke/tpj/as_html/tpj14.html

In my personal case, I have two extra letters to sort: š and ū.
I've made this test file:
Code:
abc
aab
bbc
mmn
lmn
aaa
ššš
sss
zzz
ccc
ggg
uuu
šas
saš
cab
uuū
ūuu
ūūū
Here's the code:
Code:
use strict;
use warnings;
open (_file_, "< absolute-path-to-file")  or  die "Failed to read file : $! ";
my @not_sorted = <_file_>; 
sub normalize {
   my $in = $_[0];
   $in = lc($in);
   $in =~ tr<aeiouū>
   <aeiouu>;
   $in =~ tr<abcdefghijklmnopqrsštuvwxyz>
   <\x01-\x1B>; #hexadecimal numbers to tell Perl you have 27 letters to sort
   return $in;
}
my @sorted  = sort{ normalize($a) cmp normalize($b)or $a cmp $b} @not_sorted;
print @sorted;
close (_file_);
I still don't completely understand why you can sort in proper order ū not considering it an extra letter like you have to do for š, but I evenctually will in the future. Anyway it gives the expected result:
Code:
aaa
aab
abc
bbc
cab
ccc
ggg
lmn
mmn
saš
sss
šas
ššš
uuu
uuū
ūuu
ūūū
zzz
Hope it will be helpful to someone.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Input foreign characters under X11 Beastie General software and network 5 30th August 2009 11:51 AM
ls sorting of numbered files gosha General software and network 6 11th April 2009 01:07 PM
Username longer than 16 characters _hmp_ FreeBSD General 5 13th January 2009 10:01 AM
Sorting Packages JMJ_coder NetBSD Package System (pkgsrc) 3 20th May 2008 01:08 AM


All times are GMT. The time now is 10:56 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick