Awk is cool. It's a full-fledged programming language that's there in anything remotely unix-flavored, but I mostly see it used in one-liners to grab bits of text from piped stdout.
But you can use awk as a general-purpose scripting language [1], in many ways it's nicer than bash for this purpose. I wonder why you don't see more awk scripts in the wild. I suppose perl came along and tried to combine the good features of shell, awk, and sed into one language, and then people decided perl was bad and moved on from that.
You nailed it. Perl replaced awk and then turned out to be counterproductive in a lot of cases because there was no simple and broadly understood way for people to write Perl code that was 1) readable for other programmers and 2) scalable to medium-to-large programs.
Which is not to say that nobody ever figured out those things and did them well, just that the success rate was low enough across the industry to earn Perl a really bad reputation.
I'd like to see a revival of awk. It's less easy to scale up, so there's very little risk that starting a project with a little bit of awk results in the next person inheriting a multi-thousand line awk codebase. Instead, you get an early-ish rewrite into a more scalable and maintainable language.
> I'd like to see a revival of awk. It's less easy to scale up, so there's very little risk that starting a project with a little bit of awk results in the next person inheriting a multi-thousand line awk codebase. Instead, you get an early-ish rewrite into a more scalable and maintainable language.
Taco Bell programming is the way to go.
This is the thinking I use when putting together prototypes. You can do a lot with awk, sed, join, xargs, parallel (GNU), etc. But it's really a lot of effort to abstract in a bash script, so the code is compact. I've built many data engineering/ML systems with this technique. Those command line tools are SO WELL debugged and have reasonable error behavior that you don't have to worry about complexities of exception handling, etc.
The problem Perl and the like have to contend with is that they have to compete with Python. If a dependency needs to be installed to do something you have to convince me that whatever language and script is worthwhile to maintain over Python which is the next de jure thing people reach for after bash. The nice thing about awk is that it’s baked in. So it has an advantage. You can convince me awk is better because I don’t have to deal with dependency issues, but it’s a harder sell for anything I have to install before I can use
And it’s not even that Python is a great language. Or has a great package manager or install situation. It doesn’t have any of those things. It does, however, have the likelihood of the next monkey after me understanding it. Which is unfortunately more than can be said about Perl
> The problem Perl and the like have to contend with is that they have to compete with Python. If a dependency needs to be installed to do something you have to convince me that whatever language and script is worthwhile to maintain over Python which is the next de jure thing people reach for after bash
A historical note: Perl was that language before Python was, and it lost that status to Python through direct competition. For a while, if you had to do anything larger than a shell script but not big enough to need a "serious" C++ or Java codebase, Perl was the natural choice, and nobody would argue with it (unless they were arguing for shell or C.) That's why Perl 5 is installed on so many systems by default.
When I first started using Python, I felt a little scared for liking it too much. I thought I should be smart enough to prefer Perl. Then Eric Raymond's article about Python[1] came out in Linux Journal in 2000, and I felt massive relief that a smart person (or someone accomplished enough that their opinions got published in Linux Journal) felt the same way I did. But I still made a couple more serious attempts to force Perl into my brain because I thought Perl was going to be the big dog forever and every professional would need to know it.
But Perl was doomed —- if Python didn't exist, it would have lost to Ruby, and if Ruby didn't exist, it would have eventually lost to virtually any language that popped up in the same niche.
Perl is installed by default on most Unix systems. FreeBSD being the exception. Python isn’t. Although Python is popular; if we’re comparing the probability of someone having the interpreter installed already, it’s greater for Perl, even if people aren’t aware they already have it.
Though one would probably never be able to work with an assumed install of Python anyway because one would not be able to assume a specific version. I am guessing this is a lesser problem for Perl, since it’s been frozen at some version of 5 for the past 25-30 years correct?
What Perl nailed was being useful to write cross platform shell scripts. Agree that it didn’t scale up but you had a chance of delivering n platforms with minimal pain.
awk v gawk doesn’t make me want o to relive those days.
I’m curious: what are some problems where awk was (presumably) a reasonable choice at first but then the implementation grew into a behemoth? Did the solution need to grow as the problem grew? Or was awk just the wrong choice from the beginning?
One case I saw it used was for processing genomics data. It was kinda ok at first but when we needed to add a new sequencing type it was laborious.
Personally I don’t think awk is a good choice for anything beyond one liners and personal scripts. Here it was fine because it was (initially) some write-once academic code that needed to not be insanely slow.
> then people decided perl was bad and moved on from that.
Screw what people think. I found out I like perl. The last thing I wrote is a programmatic partition editor [1] - like how you use sfdisk to zero out the partitions, except I wanted to do more than zap, like having the MBR and GPT partition table to combine them and make hybrids.
I was fun, and I will use perl again (I may also use awk at one point now that I see how cool it is)
Valid comment, unless $is_bad_example. I genuinely really like the use of unless in Perl. There are lots of times it's nicer to express inverse logic. You could change the name of a variable to have inverse truthyness and not have if not everywhere. Or you could accept you often need to deal with inverse logic on something and use the right language.
Awk is incredibly useful. I wrote a script this week to parse Postgres logs (many, many GB) to answer the question, "what were the top users making queries in the first few minutes at the top of every hour?" [0] Took a couple of functions, maybe 20 LOC in total, plus some pipes through sort and uniq [1]. Also quite fast, especially if you prefix it with LC_ALL=C.
[0]: If you're wondering why there wasn't adequate observability in place to not have to do this, you're not wrong, but sometimes you must live with the reality you have.
[1]: Yes, I know gawk can do a unique sort while building the list. It was late into an incident and I was tired, and | sort | uniq -c | sort -rn is a lot easier to remember.
[1].a: Yes, I know sort has a -u arg. It doesn't provide a count, and its unique function is also quite a bit slower than uniq's implementation.
I suspect the performance part is only true if you're familiar with Python's stdlib performance quirks, like how `datetime.fromisoformat()` is orders of magnitude faster than `datetime.strptime()` (which would likely be the function you'd reach for if not familiar), or at the very least, that directly casting slices of the string into ints is in between the two. This example is parsing a file of 10,000,000 ISO8601 datetimes, then counting those between `HH:00:SS – HH:02:SS` inclusive. The count method is the same, and likely benefitting some OS caching, but the parse times remained constant even with repeated runs.
> then people decided perl was bad and moved on from that.
That's a large part of what's driving Awk's renaissance: devs that never learned Perl to begin with want something to fill the gap between shell and Python, and other devs like me who (reluctantly) abandoned Perl because it was deemed "uncool" by HN types, which means Perl and all code written in it now has an expiration date on it. But since Awk is a POSIX standard, HN types can't get rid of it.
"HN types" can't get rid of perl either. So just use perl if you want to. Personally I think perl is a terrible language and that anything which is too complex for a shell script (which is most things) should just be done in python. But if you disagree, it's not like anyone can stop you. If your issue is "my teammates hate it and want me to use something else", I promise you they will be just as annoyed if you use awk.
I'm pretty sure people have read more perl code than awk code, so they'll roll their eyes but will be able to cover for perl-required tasks, but won't build up the courage to touch the awk.
To me, hell is having to debug perl scripts other people wrote. Based on experiences in the 1998 time frame.
Hell is having to debug perl scripts you wrote yourself long enough ago that you've forgotten and the developers of the packages you depend on have literally died of old age.
What is an "HN" type? HN has smart, dumb, and in-between people of every variety. That's the byproduct you get of encouraging curiosity. I don't think HN was even around in any sort of prominence when Perl died.
I stopped using Perl because my egg drop bots got laborious along with my expect scripts. They were novel early on, but maintaining them became something of a chore I wasn't inclined to do anymore. Other things started to do that better.
Personally, I think Python won because it's syntax was much more readable. It had nothing to do with technical merits. DX is a strong subliminal motivator.
Then: sed -i -e 's@perl $(srctree)/$(src)/build_OID_registry@awk -f $(srctree)/$(src)/build_OID_registry@' ./lib/Makefile
They also removed the perl dependency for ca-certificates since one of the goals was to remove perl dependencies from the core system including its toolchain and kernel. It's not needed at all now.
This Aho project is neat because it has the potential of removing the perl dependency on having a git client, which was a problem prior.
I love AWK. Its stringly-typedness would make a Javascript programmer blush: 0, “”, and actually-nothing are identically falsey. Numbers are no different from strings representing numbers, like Lua. Somehow, I don’t mind — if you really need to keep your numbers numbers and strings strings, sigilate the value (prepend with #/$) and peel it off with a substr() later.
I have written scripts in awk (what seems a lifetime ago!), bash, then perl, ruby and python - in that chronological order. I think awk scripting didn't take off for the masses because while it was good for its goal, (1) it was a bit niche; the common knowledge people came in to work on unix systems was bash and awk/grep/sed one-liners - learning awk would have been work that was seen to have specialized gains, (2) yes, Perl sort of provided a sane alternative to the mix of shell scripts, magic one-liners and awk scripts. Of course, later it was supplanted by Python (transitioning through a brief period of Ruby).
Reading legacy scripts was wild back then - you had to be somewhat good at bash, unix tools like awk, C, make, Perl:-)
awk isn't readable. modern programming languages can do things much better than awk in much lesser lines of code.all of them have better code to work with text and string.
Doubtful. awk has a lot of implicit behaviour which allows the programmer to write very terse scripts. An equivalent Python program is usually several times longer.
For the problem domain that Awk targets, its close to as good as it gets. Lot of the line reading, delimiting, chunking etc is already done for you. You the programmer dont have to deal with re-implementing that same old ceremony. You get straight to the point of writing the transformations that you want.
If you break out into Python for that, that's a bit like being at your best and formal self, your best table manners, the first time you are meeting your fiance's parents. Awk on the other hand is like being with you old childhood pal, partner in crime, from your old street where the need for such ceremonies do not exist.
With modern awks you can write extensions for them. Mawk is plenty fast too.
One thing that still grates though is that the 'split' function does not come with a corresponding 'join' and one has to iterate through the array explicitly to join.
I mean, russian is also unreadable, until you know how to read it.
awk's power for me isn't the LOC needed to accomplish a task, its power is that I can express the business logic I need very easily and very quickly, and the resulting code is really fast. I am by no means great with awk, but I can go months without touching awk, encounter some problem where awk shines, and in a few minutes or less I have exactly what I need.
you can learn to be extremely productive with awk in a few hours and it's very comforting to have this in your toolset moving forward. Essential? Probably not, but I like that I don't need to break my thought process when working with awk because it's just so natural to express what I want awk to do that I don't really "think" about writing awk, I just write it.
The original authors have done next to nothing to improve Awk in those years; it's embarrassing to be writing another book on a subject that they have not advanced.
Awk could use improvement in numerous areas. Oh, for instance, you can pass associative arrays into functions, but not return them. Functions that filter array to array have to take an output array parameter.
Using extra parameters as the only way to get local variables is also a smell.
a[i] syntax cannot index into strings, what the hell?
I feel they have a point, gawk has already too many differences compared to awk. If you introduce even more distinct syntax, it is better just to fork it and call it something else.
When people ask me why I say that the linux command line is the best dev environment, Awk is one of the tools I often point to. When you know even basic awk, you can do a lot with a little. IDEs actually start to feel clunky.
If you're looking to get into Awk, and you learn well from a lecture style, I put together a talk for Linux Fest Northwest some years ago and recorded it for Youtube: https://youtu.be/E5aQxIdjT0M
At the same time unicode support was added super late (like 2 years ago ?), like any other shell tools passing content to awk by piping has edge cases (anything ressembling a CTRL+D sequence will cut the feed, and there must be dozens of other edge cases I have no idea about that will only bite me at the worst time.
Awk is an impressive tool, but putting it on a pedastal blinds people from the weak spots and why they should probably move serious tasks to more specialized, but modern and adapted tools.
How does awk replace an IDE? I love awk, I love how powerful it is if you spend the time to learn it, but if I didn’t have an IDE I would be significantly less productive. Most of what an IDE does is help you understand and change code, not text editing operations. Not trying to say you’re wrong, just curious what your angle is with that statement.
Thanks, to clarify, I don't mean that just awk replaces an IDE. I mean that the Linux command line replaces an IDE, specifically the functions like finding references, substitutions, transformations, etc. It doesn't replace syntax highlighting and stuff of course, though I get that from Vim.
It's never going to be as powerful as a specialized IDE would for a specific language, but the Linux CLI is language agnostic and even works on just text files, so it's universally applicable and doesn't change depending on the language of the project. For me it's better than an IDE, but YMMV of course because everybody is different.
Ok yes, point taken, I just meant it as a general expression. I don't mean to imply that it's exclusive to Linux, although I do specifically prefer the GNU flavor of most of the tools (awk an exception where there are multiple interesting implementations), which do tend to be more closely associated with Linux than macos or BSD. A standard installation of any Linux distro will have them all available, whereas in Mac and Windows you have to specifically install them as they aren't part of the standard command line.
Macos does not come with the GNU utilities. You have to explicitly install them if you want them on macos. The included versions are BSD and similar licensed and are not all compatible with the GNU versions.
At first I thought this was named for aho (アホ), Japanese slang for "stupid", then I remembered that Alfred Aho is the 'a' in 'awk'. Or maybe it's both?
The pun is there for sure, but "git" is British slang for an unpleasant or annoying person. I was introduced to it by The Beatles: "and curse Sir Walter Raleigh, he was such a stupid git".
I believe it was Linus himself who quipped (paraphrasing) that he is so arrogant he named two software projects after himself: Linux, and git.
Great project and great idea. Understanding the basics gives one different perspectives for other projects and problems.
Back in the day I created a Web-based wiki using awk. Why? Because I was using linksys router with minimal memory.
It was a great learning both how wikis work and what can be done with awk. And since there are no libraries to fallback on, I had to implement the basics and gain all the understandings.
Exactly, but having branches kind of allow you to not have to create local clones. And I legit use rsync to clone locally git repos. The question is : what is the best ?
BUGS
Multiple checkout in general is still experimental, and
the support for submodules is incomplete. It is NOT
recommended to make multiple checkouts of a superproject.
It complicates git with more cruft. A separate clone is more understandable and independent. If you trash something in its .git/ subdirectory, only that repo is affected.
I'd hardly call it "cruft". Git has a lot of features. Not all of them are useful for all combinations of projects, workflows, and people.
In my use case, independence would be an anti-feature. git-worktree fulfills a specific desire that I cannot fulfill with any known alternative. Therefore I use it.
As for actual bugs, I certainly never encountered any. But I also don't use it with repos that contain submodules, as per the warning.
First of all, GitHub is a service built upon Git. There is no requirement that you need to use GitHub. And there are other options, what you call "hubs", such as GitLab or SourceHut.
Git as a VCS, has the feature to sync with other repositories elsewhere. These can be somewhere else on your computer, someone else's computer, or a server. GitHub happens to be one of those servers.
There's this lingering thought in my head that with a bunch of GNU utils/programs and probably not much more one could create these omnipotent databases and processing tools that would surpass in performance and capabilities tools specialized in it. Anyone else feels like that?
I found an awesome-awk[1] page on github and is seems to be a little empty.
Maybe we should contribute to it and bring some examples like subj of this HN post
or ahrf[2], dedicated markup language for static site generators based on awk.
I've started with adding one true awk and bioawk implementations.
If this used cppawk (which didn't exist when this was developed), it could use #include. This is nicely relative to the file; no AWKPATH. Also you can just "build" the preprocessed program into a single file which then doesn't need cppawk.
I thought the same think. And it is super common in manga and anime, so a lot of people know wordslike aho, baka, kuso to have a bad meaning in japanese.
Neat project. It’s always fun to see tools pushed beyond their normal use cases.
That said, it should be a criminal offense to write any tool this large and complex in any language that can’t be used in a powerful step debugger.
TBH I’m increasingly frustrated by the amount of code written in Bash. I kind of hate Python for various reasons. But if 100% of Bash was replaced with Python I think the world would be a better place.
Only if python gained the ability to execute externals and handle stdin/stdout/stderr, return value, & environment directly instead of through exec().
And ditched the meaningful indentation.
And was backwards compatible so that old scripts work today and today scripts work tomorrow.
Bash has it's own uglies so I guess it's fair enough to compare two imperfect things, but the problem is these are two different jobs, and python just doesn't do the job bash does, and that job needs doing, so python can't be the replacement.
I'm mostly in concordance, but one thing I think every scripting language got wrong is how painful it is to run external binaries. Sure it can be done but it could be as easy as a shell script.
Ruby is nice in every regard :) Most languages have some form of backticks to run an external binary. When I left my original comment I got a phone call in the middle of it and rushed it, sorry about that. What I really meant was having the versatility of the big 3 file descriptors along with return codes, etc. You can use Open3 (in the ruby case) but it's unwieldy compared to, say bash.
That said, stdout/stderr is such a bloody, inconsistent nightmare. I’m not totally convinced that “chain small binary programs together” is better than “one language with useful libraries”.
Bash is admittedly nice for small things. But it always spirals out of control. And rarely gets ported.
Also my life is primarily Windows and if you want everything to “just work” across mac/linux/win it’s easier to just use Python or sometimes even Rust. I often wish I could easily write and run single-file rust scripts.
Not the most interesting of questions I have here, but is this indentation style on function definitions a thing or is it just accidental? It's in a few places, mostly before the first arg but sometimes before others.
eg:
function run_command( c, shortopts, longopts, quiet, directory, path, errors)
Just asking as this project has kind of resparked my interest in awk.
Awk doesn't have a way to define function-local variables. All variables are global, except for function parameters.
This spacing convention is meant to clearly separate mandatory parameters and optional parameters that are sometimes only introduced to "declare" a local variable.
I don't understand how you can write more that one line long program in a language where all variables are global by default even if they are inside a scope delimited by brackets. You can have local variables but the syntax is weird. https://www.gnu.org/software/gawk/manual/html_node/Variable-...
No, Git's object store was not designed to hold large binary blobs, and no implementation of Git in any language can change this. It's a reasonable request; I mean, Git doesn't even deal with "pretty small" binary files very well, either. But it's all simply a consequence of its design that was thought up all those years ago.
The core object storage model and data format (and many, many things on top of those) have to be changed/extended/fixed first, but it's realistically an immense change, so git-lfs and other various solutions are about as good as it'll get in the mean time.
if you are using awk for more than 5 lines of code just use python.it is painfully slow and unreadable, there is no benefit of using it at all. if you are saying awk is fast you yad never benchmarked it against other programming languages. even pyhon is 10-20x faster.
But you can use awk as a general-purpose scripting language [1], in many ways it's nicer than bash for this purpose. I wonder why you don't see more awk scripts in the wild. I suppose perl came along and tried to combine the good features of shell, awk, and sed into one language, and then people decided perl was bad and moved on from that.
[1] Random excerpt from NetBSD's source code https://github.com/NetBSD/src/blob/trunk/sys/dev/eisa/devlis...