Aho – a Git implementation in Awk

snovymgodym · on Feb 10, 2024

Awk is cool. It's a full-fledged programming language that's there in anything remotely unix-flavored, but I mostly see it used in one-liners to grab bits of text from piped stdout.

But you can use awk as a general-purpose scripting language [1], in many ways it's nicer than bash for this purpose. I wonder why you don't see more awk scripts in the wild. I suppose perl came along and tried to combine the good features of shell, awk, and sed into one language, and then people decided perl was bad and moved on from that.

[1] Random excerpt from NetBSD's source code https://github.com/NetBSD/src/blob/trunk/sys/dev/eisa/devlis...

dkarl · on Feb 10, 2024

You nailed it. Perl replaced awk and then turned out to be counterproductive in a lot of cases because there was no simple and broadly understood way for people to write Perl code that was 1) readable for other programmers and 2) scalable to medium-to-large programs.

Which is not to say that nobody ever figured out those things and did them well, just that the success rate was low enough across the industry to earn Perl a really bad reputation.

I'd like to see a revival of awk. It's less easy to scale up, so there's very little risk that starting a project with a little bit of awk results in the next person inheriting a multi-thousand line awk codebase. Instead, you get an early-ish rewrite into a more scalable and maintainable language.

kradroy · on Feb 10, 2024

> I'd like to see a revival of awk. It's less easy to scale up, so there's very little risk that starting a project with a little bit of awk results in the next person inheriting a multi-thousand line awk codebase. Instead, you get an early-ish rewrite into a more scalable and maintainable language.

Taco Bell programming is the way to go.

This is the thinking I use when putting together prototypes. You can do a lot with awk, sed, join, xargs, parallel (GNU), etc. But it's really a lot of effort to abstract in a bash script, so the code is compact. I've built many data engineering/ML systems with this technique. Those command line tools are SO WELL debugged and have reasonable error behavior that you don't have to worry about complexities of exception handling, etc.

SOLAR_FIELDS · on Feb 11, 2024

The problem Perl and the like have to contend with is that they have to compete with Python. If a dependency needs to be installed to do something you have to convince me that whatever language and script is worthwhile to maintain over Python which is the next de jure thing people reach for after bash. The nice thing about awk is that it’s baked in. So it has an advantage. You can convince me awk is better because I don’t have to deal with dependency issues, but it’s a harder sell for anything I have to install before I can use

And it’s not even that Python is a great language. Or has a great package manager or install situation. It doesn’t have any of those things. It does, however, have the likelihood of the next monkey after me understanding it. Which is unfortunately more than can be said about Perl

dkarl · on Feb 11, 2024

> The problem Perl and the like have to contend with is that they have to compete with Python. If a dependency needs to be installed to do something you have to convince me that whatever language and script is worthwhile to maintain over Python which is the next de jure thing people reach for after bash

A historical note: Perl was that language before Python was, and it lost that status to Python through direct competition. For a while, if you had to do anything larger than a shell script but not big enough to need a "serious" C++ or Java codebase, Perl was the natural choice, and nobody would argue with it (unless they were arguing for shell or C.) That's why Perl 5 is installed on so many systems by default.

When I first started using Python, I felt a little scared for liking it too much. I thought I should be smart enough to prefer Perl. Then Eric Raymond's article about Python[1] came out in Linux Journal in 2000, and I felt massive relief that a smart person (or someone accomplished enough that their opinions got published in Linux Journal) felt the same way I did. But I still made a couple more serious attempts to force Perl into my brain because I thought Perl was going to be the big dog forever and every professional would need to know it.

But Perl was doomed —- if Python didn't exist, it would have lost to Ruby, and if Ruby didn't exist, it would have eventually lost to virtually any language that popped up in the same niche.

[1] https://www.linuxjournal.com/article/3882

GrumpySloth · on Feb 11, 2024

Perl is installed by default on most Unix systems. FreeBSD being the exception. Python isn’t. Although Python is popular; if we’re comparing the probability of someone having the interpreter installed already, it’s greater for Perl, even if people aren’t aware they already have it.

dgfitz · on Feb 11, 2024

> Perl is installed by default on most Unix systems. FreeBSD being the exception. Python isn’t.

Most all “Linux” cannot even boot without python, and it is quite easy to find a minimal Linux distribution that does not have a dependency on Perl.

GrumpySloth · on Feb 11, 2024

Care to give any examples of Linux distros without Perl?

SOLAR_FIELDS · on Feb 11, 2024

Though one would probably never be able to work with an assumed install of Python anyway because one would not be able to assume a specific version. I am guessing this is a lesser problem for Perl, since it’s been frozen at some version of 5 for the past 25-30 years correct?

neuromanser · on Feb 11, 2024

Incorrect. Perl (5.x) has seen 10 stable releases in as many years, IOW since 2014.

KerrAvon · on Feb 11, 2024

Bingo. I would argue Ruby has the quality of being a great language the next person can understand, but I think Rails has prejudiced people.

Towaway69 · on Feb 11, 2024

Agree, Ruby is a wonderful language that is unfortunately dominated by an opinionated Web framework.

Job descriptions tend to be looking for rails developers and forgetting that actually its Ruby developers they are looking for.

patmorgan23 · on Feb 11, 2024

May Introduce you to https://bashsta.cc/

cmgbhm · on Feb 10, 2024

What Perl nailed was being useful to write cross platform shell scripts. Agree that it didn’t scale up but you had a chance of delivering n platforms with minimal pain.

awk v gawk doesn’t make me want o to relive those days.

dkarl · on Feb 11, 2024

> awk v gawk doesn’t make me want o to relive those days

That's a fair point. I always explicitly write my scripts to invoke gawk so that I don't accidentally invoke a different version.

fourfun · on Feb 11, 2024

I’ve seen enough awk behemoths in my time, no thanks.

cpeterso · on Feb 11, 2024

I’m curious: what are some problems where awk was (presumably) a reasonable choice at first but then the implementation grew into a behemoth? Did the solution need to grow as the problem grew? Or was awk just the wrong choice from the beginning?

fourfun · on Feb 11, 2024

One case I saw it used was for processing genomics data. It was kinda ok at first but when we needed to add a new sequencing type it was laborious.

Personally I don’t think awk is a good choice for anything beyond one liners and personal scripts. Here it was fine because it was (initially) some write-once academic code that needed to not be insanely slow.

csdvrx · on Feb 10, 2024

> then people decided perl was bad and moved on from that.

Screw what people think. I found out I like perl. The last thing I wrote is a programmatic partition editor [1] - like how you use sfdisk to zero out the partitions, except I wanted to do more than zap, like having the MBR and GPT partition table to combine them and make hybrids.

I was fun, and I will use perl again (I may also use awk at one point now that I see how cool it is)

[1] https://github.com/csdvrx/hdisk/

blacklite · on Feb 11, 2024

Perl is a great programming language, unless true.

Ultimatt · on Feb 11, 2024

Valid comment, unless $is_bad_example. I genuinely really like the use of unless in Perl. There are lots of times it's nicer to express inverse logic. You could change the name of a variable to have inverse truthyness and not have if not everywhere. Or you could accept you often need to deal with inverse logic on something and use the right language.

sgarland · on Feb 10, 2024

Awk is incredibly useful. I wrote a script this week to parse Postgres logs (many, many GB) to answer the question, "what were the top users making queries in the first few minutes at the top of every hour?" [0] Took a couple of functions, maybe 20 LOC in total, plus some pipes through sort and uniq [1]. Also quite fast, especially if you prefix it with LC_ALL=C.

[0]: If you're wondering why there wasn't adequate observability in place to not have to do this, you're not wrong, but sometimes you must live with the reality you have.

[1]: Yes, I know gawk can do a unique sort while building the list. It was late into an incident and I was tired, and | sort | uniq -c | sort -rn is a lot easier to remember.

[1].a: Yes, I know sort has a -u arg. It doesn't provide a count, and its unique function is also quite a bit slower than uniq's implementation.

v3ss0n · on Feb 11, 2024

you can do that in a lot lesser line of code in python and much better performance.

sgarland · on Feb 11, 2024

I suspect the performance part is only true if you're familiar with Python's stdlib performance quirks, like how `datetime.fromisoformat()` is orders of magnitude faster than `datetime.strptime()` (which would likely be the function you'd reach for if not familiar), or at the very least, that directly casting slices of the string into ints is in between the two. This example is parsing a file of 10,000,000 ISO8601 datetimes, then counting those between `HH:00:SS – HH:02:SS` inclusive. The count method is the same, and likely benefitting some OS caching, but the parse times remained constant even with repeated runs.

    $ python3 times_python.py strptime_listcomp
    parse time: 45.96 seconds
    count time: 0.54 seconds
    count: 498621

    $ python3 times_python.py slices
    parse time: 9.96 seconds
    count time: 0.40 seconds
    count: 498621

    $ python3 times_python.py isofmt
    parse time: 0.80 seconds
    count time: 0.38 seconds
    count: 498621

anonnon · on Feb 10, 2024

> then people decided perl was bad and moved on from that.

That's a large part of what's driving Awk's renaissance: devs that never learned Perl to begin with want something to fill the gap between shell and Python, and other devs like me who (reluctantly) abandoned Perl because it was deemed "uncool" by HN types, which means Perl and all code written in it now has an expiration date on it. But since Awk is a POSIX standard, HN types can't get rid of it.

bigstrat2003 · on Feb 10, 2024

"HN types" can't get rid of perl either. So just use perl if you want to. Personally I think perl is a terrible language and that anything which is too complex for a shell script (which is most things) should just be done in python. But if you disagree, it's not like anyone can stop you. If your issue is "my teammates hate it and want me to use something else", I promise you they will be just as annoyed if you use awk.

lanstin · on Feb 10, 2024

I'm pretty sure people have read more perl code than awk code, so they'll roll their eyes but will be able to cover for perl-required tasks, but won't build up the courage to touch the awk.

To me, hell is having to debug perl scripts other people wrote. Based on experiences in the 1998 time frame.

bemusedthrow75 · on Feb 18, 2024

Hell is having to debug perl scripts you wrote yourself long enough ago that you've forgotten and the developers of the packages you depend on have literally died of old age.

Because you have only yourself to blame.

dleink · on Feb 11, 2024

took a contract job where I was writing a thing in perl to connect a billion dollar collection of ecommerce sites to facebook ads in 2019.

lanstin · on Feb 12, 2024

OMG. I hope the pay was excellent.

oooyay · on Feb 11, 2024

What is an "HN" type? HN has smart, dumb, and in-between people of every variety. That's the byproduct you get of encouraging curiosity. I don't think HN was even around in any sort of prominence when Perl died.

I stopped using Perl because my egg drop bots got laborious along with my expect scripts. They were novel early on, but maintaining them became something of a chore I wasn't inclined to do anymore. Other things started to do that better.

Personally, I think Python won because it's syntax was much more readable. It had nothing to do with technical merits. DX is a strong subliminal motivator.

Almondsetat · on Feb 10, 2024

can't wait for the day I will be able to compile Linux without Perl

LinuxProtips · on Feb 11, 2024

You can do that right now. Sabotage Linux rewrote the one and only perl dependency in awk years ago.

Grab https://raw.githubusercontent.com/sabotage-linux/sabotage/ma... and clobber the perl script in the Linux source.

Then: sed -i -e 's@perl $(srctree)/$(src)/build_OID_registry@awk -f $(srctree)/$(src)/build_OID_registry@' ./lib/Makefile

They also removed the perl dependency for ca-certificates since one of the goals was to remove perl dependencies from the core system including its toolchain and kernel. It's not needed at all now.

This Aho project is neat because it has the potential of removing the perl dependency on having a git client, which was a problem prior.

fuzztester · on Feb 11, 2024

that's the day I'll eat my red hat

bagful · on Feb 11, 2024

I love AWK. Its stringly-typedness would make a Javascript programmer blush: 0, “”, and actually-nothing are identically falsey. Numbers are no different from strings representing numbers, like Lua. Somehow, I don’t mind — if you really need to keep your numbers numbers and strings strings, sigilate the value (prepend with #/$) and peel it off with a substr() later.

samatman · on Feb 11, 2024

I like the way Lua handles strings and numbers quite a bit. They're different types, but arithmetic will convert a string to a number when it can.

This is without footguns, because concatenation is a different operator entirely.

    2 + "3"  -- 5 
    2 .. "3" -- "23"

It's rather convenient, especially when you can combine it with LPeg to parse files with numbers in them, then do the maths directly.

garaetjjte · on Feb 11, 2024

Lua does not represent numbers with strings. It does have number type which is always floating-point.

nmz · on Feb 11, 2024

Integers were introduced in 5.3.

abhgh · on Feb 11, 2024

I have written scripts in awk (what seems a lifetime ago!), bash, then perl, ruby and python - in that chronological order. I think awk scripting didn't take off for the masses because while it was good for its goal, (1) it was a bit niche; the common knowledge people came in to work on unix systems was bash and awk/grep/sed one-liners - learning awk would have been work that was seen to have specialized gains, (2) yes, Perl sort of provided a sane alternative to the mix of shell scripts, magic one-liners and awk scripts. Of course, later it was supplanted by Python (transitioning through a brief period of Ruby).

Reading legacy scripts was wild back then - you had to be somewhat good at bash, unix tools like awk, C, make, Perl:-)

v3ss0n · on Feb 11, 2024

awk isn't readable. modern programming languages can do things much better than awk in much lesser lines of code.all of them have better code to work with text and string.

GrumpySloth · on Feb 11, 2024

> much lesser lines of code

Doubtful. awk has a lot of implicit behaviour which allows the programmer to write very terse scripts. An equivalent Python program is usually several times longer.

srean · on Feb 11, 2024

Disagree on both counts.

For the problem domain that Awk targets, its close to as good as it gets. Lot of the line reading, delimiting, chunking etc is already done for you. You the programmer dont have to deal with re-implementing that same old ceremony. You get straight to the point of writing the transformations that you want.

If you break out into Python for that, that's a bit like being at your best and formal self, your best table manners, the first time you are meeting your fiance's parents. Awk on the other hand is like being with you old childhood pal, partner in crime, from your old street where the need for such ceremonies do not exist.

With modern awks you can write extensions for them. Mawk is plenty fast too.

One thing that still grates though is that the 'split' function does not come with a corresponding 'join' and one has to iterate through the array explicitly to join.

v3ss0n · on Feb 12, 2024

enough talk, show me the code

srean · on Feb 14, 2024

seek and ye shall find. Enough open source awk out there

csydas · on Feb 11, 2024

> awk isn't readable.

I mean, russian is also unreadable, until you know how to read it.

awk's power for me isn't the LOC needed to accomplish a task, its power is that I can express the business logic I need very easily and very quickly, and the resulting code is really fast. I am by no means great with awk, but I can go months without touching awk, encounter some problem where awk shines, and in a few minutes or less I have exactly what I need.

you can learn to be extremely productive with awk in a few hours and it's very comforting to have this in your toolset moving forward. Essential? Probably not, but I like that I don't need to break my thought process when working with awk because it's just so natural to express what I want awk to do that I don't really "think" about writing awk, I just write it.

heresie-dabord · on Feb 11, 2024

> people decided perl was bad

Like any programming language, you have to get good at Perl to write good Perl.

Python has clean notation. It's a juggernaut that has changed people's expectations of language design.

But even so, Python has not decreased the amount of bad code in the world. Not even within Pypi.

smburdick · on Feb 10, 2024

Does that excerpt start with an if-else sequence?

mksybr · on Feb 10, 2024

it starts with the BEGIN block

https://www.gnu.org/software/gawk/manual/gawk.html#BEGIN_002... https://www.gnu.org/software/gawk/manual/gawk.html#Pattern-E...

throw0101b · on Feb 10, 2024

The book The AWK Programming Language, Second Edition was released this past September (2023):

* https://awk.dev

The first edition was published in 1988, and is available at:

* https://archive.org/details/pdfy-MgN0H1joIoDVoIC7

* Discussion: https://news.ycombinator.com/item?id=13451454

adonovan · on Feb 10, 2024

This 35-year gap is a great story to tell your editor whenever they ask "so how's that second edition coming along...?"

kazinator · on Feb 10, 2024

The original authors have done next to nothing to improve Awk in those years; it's embarrassing to be writing another book on a subject that they have not advanced.

Awk could use improvement in numerous areas. Oh, for instance, you can pass associative arrays into functions, but not return them. Functions that filter array to array have to take an output array parameter.

Using extra parameters as the only way to get local variables is also a smell.

a[i] syntax cannot index into strings, what the hell?

ksherlock · on Feb 10, 2024

CVS support (--csv) was added last year (to the one true awk and gnu awk)

chris_wot · on Feb 10, 2024

Just for contex, Aho is 82, Weinberger is 81 and Kerrighan is dead.

ksherlock · on Feb 10, 2024

Brian Kernighan is alive.

chris_wot · on Feb 11, 2024

I apologise for my error! And I am very happy he is alive.

layer8 · on Feb 10, 2024

And 82, just to complete the data.

banku_brougham · on Feb 11, 2024

I just wanna say i have love for the while crew and jumped when i saw that about BK. thanks for posting fast

usr1106 · on Feb 11, 2024

He gave a keynote at what I believe was the last Linuxconf Australia. Maybe 2 years ago or already 3?

Of course he could have died since, but as others noted, he hasn't.

EasyMark · on Feb 10, 2024

GRRM should probably use that as something to refer to as people pointing to him completing "the damn book"

smburdick · on Feb 10, 2024

Alongside Brian Kernighan, the "K" in K&R C, and much more Unix lore.

kazinator · on Feb 10, 2024

When you see code like:

  function read_objfile(obj, objpath,    bytes, end_of_header, header,
                                         end_of_type, type, size,
                                         bytes_after_header)

the parameters separated by the big white space are local variables. It's possible to pass them values, but you're not supposed to.

I wrote a patch for GNU Awk to give it a let statement for binding true lexical variables, so that this could be:

  function read_objfile(obj, objpath)
  {
     @let (bytes, end_of_header, header,
           end_of_type, type, size,
           bytes_after_header)
     {
     }
  }

Unfortunately, this was rejected by the project; I was encouraged to make a renamed fork of GNU Awk, so that's what I did.

https://www.kylheku.com/cgit/egawk/about/

coliveira · on Feb 10, 2024

I feel they have a point, gawk has already too many differences compared to awk. If you introduce even more distinct syntax, it is better just to fork it and call it something else.

freedomben · on Feb 10, 2024

When people ask me why I say that the linux command line is the best dev environment, Awk is one of the tools I often point to. When you know even basic awk, you can do a lot with a little. IDEs actually start to feel clunky.

If you're looking to get into Awk, and you learn well from a lecture style, I put together a talk for Linux Fest Northwest some years ago and recorded it for Youtube: https://youtu.be/E5aQxIdjT0M

makeitdouble · on Feb 11, 2024

At the same time unicode support was added super late (like 2 years ago ?), like any other shell tools passing content to awk by piping has edge cases (anything ressembling a CTRL+D sequence will cut the feed, and there must be dozens of other edge cases I have no idea about that will only bite me at the worst time.

Awk is an impressive tool, but putting it on a pedastal blinds people from the weak spots and why they should probably move serious tasks to more specialized, but modern and adapted tools.

themk · on Feb 11, 2024

You have to be trying pretty hard to make a CTRL-D cut the feed when piping into shell tools.

That only comes into effect when the input is coming from a pty, and that only when it's in a very specific mode that is meant for interactive use.

corytheboyd · on Feb 10, 2024

How does awk replace an IDE? I love awk, I love how powerful it is if you spend the time to learn it, but if I didn’t have an IDE I would be significantly less productive. Most of what an IDE does is help you understand and change code, not text editing operations. Not trying to say you’re wrong, just curious what your angle is with that statement.

freedomben · on Feb 11, 2024

Thanks, to clarify, I don't mean that just awk replaces an IDE. I mean that the Linux command line replaces an IDE, specifically the functions like finding references, substitutions, transformations, etc. It doesn't replace syntax highlighting and stuff of course, though I get that from Vim.

It's never going to be as powerful as a specialized IDE would for a specific language, but the Linux CLI is language agnostic and even works on just text files, so it's universally applicable and doesn't change depending on the language of the project. For me it's better than an IDE, but YMMV of course because everybody is different.

brabel · on Feb 11, 2024

I don't understand why you're associating AWK with Linux in particular... it seems it's available on Windows, BSD, MacOS and probably more.

freedomben · on Feb 11, 2024

Ok yes, point taken, I just meant it as a general expression. I don't mean to imply that it's exclusive to Linux, although I do specifically prefer the GNU flavor of most of the tools (awk an exception where there are multiple interesting implementations), which do tend to be more closely associated with Linux than macos or BSD. A standard installation of any Linux distro will have them all available, whereas in Mac and Windows you have to specifically install them as they aren't part of the standard command line.

brabel · on Feb 11, 2024

MacOS comes with AWK and most other utilities... with a few just having different names. IMHO it's basically as extensive as GNU/Linux.

freedomben · on Feb 11, 2024

Macos does not come with the GNU utilities. You have to explicitly install them if you want them on macos. The included versions are BSD and similar licensed and are not all compatible with the GNU versions.

bitwize · on Feb 10, 2024

At first I thought this was named for aho (アホ), Japanese slang for "stupid", then I remembered that Alfred Aho is the 'a' in 'awk'. Or maybe it's both?

bangonkeyboard · on Feb 10, 2024

Both, I assume. "Git" was already slang for "stupid person," so this is a clever name.

phkahler · on Feb 11, 2024

I always assumed git was a sort of slang for get. So you "git the code" or use git to get code.

samatman · on Feb 11, 2024

The pun is there for sure, but "git" is British slang for an unpleasant or annoying person. I was introduced to it by The Beatles: "and curse Sir Walter Raleigh, he was such a stupid git".

I believe it was Linus himself who quipped (paraphrasing) that he is so arrogant he named two software projects after himself: Linux, and git.

CharlesW · on Feb 10, 2024

My most recent experience is hearing it in Reservation Dogs: https://en.wiktionary.org/wiki/aho#Navajo

ghc · on Feb 10, 2024

I bet it's both. Extremely clever wordplay!

Towaway69 · on Feb 10, 2024

Great project and great idea. Understanding the basics gives one different perspectives for other projects and problems.

Back in the day I created a Web-based wiki using awk. Why? Because I was using linksys router with minimal memory.

It was a great learning both how wikis work and what can be done with awk. And since there are no libraries to fallback on, I had to implement the basics and gain all the understandings.

sampo · on Feb 10, 2024

> I don't plan to add network functionality to this (even though you totally can), so no clone or push.

You can also git clone from a repository in a different directory in the same computer. And push to.

d-lisp · on Feb 10, 2024

Exactly, but having branches kind of allow you to not have to create local clones. And I legit use rsync to clone locally git repos. The question is : what is the best ?

nerdponx · on Feb 10, 2024

You can have multiple worktrees per repo. Usually that's the best solution instead of local clones.

kazinator · on Feb 11, 2024

The git-worktree man page says:

  BUGS

  Multiple checkout in general is still experimental, and
  the support for submodules is incomplete. It is NOT
  recommended to make multiple checkouts of a superproject.

Source: https://git-scm.com/docs/git-worktree

It complicates git with more cruft. A separate clone is more understandable and independent. If you trash something in its .git/ subdirectory, only that repo is affected.

nerdponx · on Feb 12, 2024

I'd hardly call it "cruft". Git has a lot of features. Not all of them are useful for all combinations of projects, workflows, and people.

In my use case, independence would be an anti-feature. git-worktree fulfills a specific desire that I cannot fulfill with any known alternative. Therefore I use it.

As for actual bugs, I certainly never encountered any. But I also don't use it with repos that contain submodules, as per the warning.

d-lisp · on Feb 11, 2024

That's interesting, some kind of meta branches.

kilroy123 · on Feb 10, 2024

This is what I do for complex projects.

more-coffee · on Feb 10, 2024

That is so obvious.. yet I've never thought to try this in 10 years of using git.

Am4TIfIsER0ppos · on Feb 10, 2024

[flagged]

ambigious7777 · on Feb 10, 2024

This is incorrect.

First of all, GitHub is a service built upon Git. There is no requirement that you need to use GitHub. And there are other options, what you call "hubs", such as GitLab or SourceHut.

Git as a VCS, has the feature to sync with other repositories elsewhere. These can be somewhere else on your computer, someone else's computer, or a server. GitHub happens to be one of those servers.

----

edit: if this was sarcasm, mb

coolgoose · on Feb 10, 2024

Somebody's sarcasm meter is off :-)

eikenberry · on Feb 10, 2024

They were probably just being a generous reader and assumed the writer wanted to take part in the conversation, not just make fun of it.

macintux · on Feb 10, 2024

Weak snark is something I’m happy to downvote here at every occasion.

Am4TIfIsER0ppos · on Feb 10, 2024

Not my precious internet points! Whatever will I do now?

Keyframe · on Feb 10, 2024

There's this lingering thought in my head that with a bunch of GNU utils/programs and probably not much more one could create these omnipotent databases and processing tools that would surpass in performance and capabilities tools specialized in it. Anyone else feels like that?

Qem · on Feb 11, 2024

https://adamdrake.com/command-line-tools-can-be-235x-faster-...

zilti · on Feb 10, 2024

Oh yes, and a project like this exists/existed: https://en.wikipedia.org/wiki/Strozzi_NoSQL?wprov=sfla1

Keyframe · on Feb 10, 2024

Fantastic, first time I hear about it. So there's _something_ to it, alright.

artsi0m · on Feb 10, 2024

Might be relevant:

sed-chess: https://news.ycombinator.com/item?id=37896854

awk-raycaster: https://github.com/TheMozg/awk-raycaster

erk__ · on Feb 10, 2024

There is also a Google (and more) translate client written in AWK

https://github.com/soimort/translate-shell

artsi0m · on Feb 10, 2024

I found an awesome-awk[1] page on github and is seems to be a little empty. Maybe we should contribute to it and bring some examples like subj of this HN post or ahrf[2], dedicated markup language for static site generators based on awk. I've started with adding one true awk and bioawk implementations.

[1]: https://github.com/freznicek/awesome-awk [2]: https://github.com/Ypnose/ahrf

kazinator · on Feb 10, 2024

If this used cppawk (which didn't exist when this was developed), it could use #include. This is nicely relative to the file; no AWKPATH. Also you can just "build" the preprocessed program into a single file which then doesn't need cppawk.

https://www.kylheku.com/cgit/cppawk/about/

xonix · on Feb 11, 2024

You can also implement includes in couple (tens) of lines of AWK: https://maximullaris.com/revamp_define.html#mglwnafh

hajimuz · on Feb 11, 2024

Aho means dumb ass in Japanese. Haha

Towaway69 · on Feb 11, 2024

What's wrong with hello in pirate speak? ;)

https://en.m.wikipedia.org/wiki/Ahoy_(greeting)

kinow · on Feb 11, 2024

I thought the same think. And it is super common in manga and anime, so a lot of people know wordslike aho, baka, kuso to have a bad meaning in japanese.

srean · on Feb 11, 2024

Thats how you address your significant other in Marathi, so that 'dumbass' maybe all in affection.

chanux · on Feb 11, 2024

In Sinhala it's Alas!

forrestthewoods · on Feb 10, 2024

Neat project. It’s always fun to see tools pushed beyond their normal use cases.

That said, it should be a criminal offense to write any tool this large and complex in any language that can’t be used in a powerful step debugger.

TBH I’m increasingly frustrated by the amount of code written in Bash. I kind of hate Python for various reasons. But if 100% of Bash was replaced with Python I think the world would be a better place.

Brian_K_White · on Feb 11, 2024

Only if python gained the ability to execute externals and handle stdin/stdout/stderr, return value, & environment directly instead of through exec(). And ditched the meaningful indentation. And was backwards compatible so that old scripts work today and today scripts work tomorrow.

Bash has it's own uglies so I guess it's fair enough to compare two imperfect things, but the problem is these are two different jobs, and python just doesn't do the job bash does, and that job needs doing, so python can't be the replacement.

riddley · on Feb 10, 2024

I'm mostly in concordance, but one thing I think every scripting language got wrong is how painful it is to run external binaries. Sure it can be done but it could be as easy as a shell script.

wazbug · on Feb 10, 2024

Ruby is nice in this regard though :-)

    x = `git --version`
    puts(x) #=> "git version 2.43.0"

riddley · on Feb 11, 2024

Ruby is nice in every regard :) Most languages have some form of backticks to run an external binary. When I left my original comment I got a phone call in the middle of it and rushed it, sorry about that. What I really meant was having the versatility of the big 3 file descriptors along with return codes, etc. You can use Open3 (in the ruby case) but it's unwieldy compared to, say bash.

KerrAvon · on Feb 11, 2024

Ruby is really nice in this regard, in fact — you’re not limited to backticks; you can use any quote delimiters you want.

forrestthewoods · on Feb 10, 2024

Agreed.

That said, stdout/stderr is such a bloody, inconsistent nightmare. I’m not totally convinced that “chain small binary programs together” is better than “one language with useful libraries”.

Bash is admittedly nice for small things. But it always spirals out of control. And rarely gets ported.

Also my life is primarily Windows and if you want everything to “just work” across mac/linux/win it’s easier to just use Python or sometimes even Rust. I often wish I could easily write and run single-file rust scripts.

sodapopcan · on Feb 10, 2024

Not the most interesting of questions I have here, but is this indentation style on function definitions a thing or is it just accidental? It's in a few places, mostly before the first arg but sometimes before others.

eg:

    function run_command(    c, shortopts, longopts, quiet, directory, path, errors)

Just asking as this project has kind of resparked my interest in awk.

rbonvall · on Feb 10, 2024

Awk doesn't have a way to define function-local variables. All variables are global, except for function parameters.

This spacing convention is meant to clearly separate mandatory parameters and optional parameters that are sometimes only introduced to "declare" a local variable.

bewuethr · on Feb 10, 2024

Here's where the manual introduces the convention: https://www.gnu.org/software/gawk/manual/gawk.html#Variable-...

michaelcampbell · on Feb 11, 2024

Thanks; this may be one of the most weird conventions I've ever come across. Maybe that's a 'me' thing.

sodapopcan · on Feb 10, 2024

Oh cool, thank you!!

michaelcampbell · on Feb 10, 2024

I noticed this too and can't figure it out. Spaces between SOME parameters, not all, and not consistently placed

dexzod · on Feb 11, 2024

I don't understand how you can write more that one line long program in a language where all variables are global by default even if they are inside a scope delimited by brackets. You can have local variables but the syntax is weird. https://www.gnu.org/software/gawk/manual/html_node/Variable-...

earthboundkid · on Feb 10, 2024

The perfect project name doesn't exi…

runiq · on Feb 10, 2024

Yeah, digging the name. For the uninitiated: 'aho' is basically 'git' in Osakan.

dang · on Feb 10, 2024

EasyMark · on Feb 10, 2024

can be built as a part of busybox too, which I've found useful a few times on embedded linux systems with limited resources/program space

amelius · on Feb 10, 2024

Does it solve the large-file bottleneck? Then I might use it for my deep learning models.

aseipp · on Feb 10, 2024

No, Git's object store was not designed to hold large binary blobs, and no implementation of Git in any language can change this. It's a reasonable request; I mean, Git doesn't even deal with "pretty small" binary files very well, either. But it's all simply a consequence of its design that was thought up all those years ago.

The core object storage model and data format (and many, many things on top of those) have to be changed/extended/fixed first, but it's realistically an immense change, so git-lfs and other various solutions are about as good as it'll get in the mean time.

supriyo-biswas · on Feb 10, 2024

It doesn't, since Git's data model has to be changed to content-defined chunks to solve the issue.

You should look at git-lfs[1] instead.

[1] https://git-lfs.com

amelius · on Feb 10, 2024

I've tried it but it doesn't play nice with git-shell, and thus for me is too much of a hassle to set up.

Also, I'm not a huge fan of tools that implement important functionality as an afterthought, especially if those tools deal with my precious data.

pyuser583 · on Feb 11, 2024

Dear lord, why?

v3ss0n · on Feb 11, 2024

if you are using awk for more than 5 lines of code just use python.it is painfully slow and unreadable, there is no benefit of using it at all. if you are saying awk is fast you yad never benchmarked it against other programming languages. even pyhon is 10-20x faster.