Archive for General programming

PowerShell — my points of interest

I have never used PowerShell until quite recently. I successfully solved problems with bunch of other scripting languages e.g. Python, Perl, Bash, AWK. They all served the purpose really well and I did not feel like I need yet another scripting language. Furthermore, PowerShell looks nothing like any of those technologies that I am familiarized with, so I refused to start learning it many times.

However, when you work as a .NET developer, chances are sooner or later you will come across a solution implemented with PowerShell. It could be, for instance, a deployment script and you will have to maintain it. This happened to me a while ago. Although modification that I committed was relatively simple and I made it up rather quickly with little help of Google, I decided to dig into the subject and check few more things out. What I found after a bit of random research was quite impressive to me. I would like to share three main features I found so far and I consider valuable in a scripting technology. At the bottom of this post I also put some code snippets for quick reference how to accomplish particular tasks.

1. Out-GridView

In PowerShell you can manipulate format of the output in many ways. You can generate HTML, CSV, white space formatted text tables etc. But there is also an option to view output of a command with WPF grid that has built-in filter. Look at the effect of Get-Process | Out-GridView command — this is functionality you get out of the box with just a few keystrokes!

Out-GridView

Out-GridView

2. Embedding C# code

This feature seems quite powerful. If you need more advanced techniques in your script you can basically implement them inline using C# and then just invoke them.

Add-Type @'
using System;
using System.IO;
using System.Text;
      
public static class Program
{
    public static void Main()
    {
        Console.WriteLine("Hello World!");
    }
}
'@
 
[Program]::Main()

3. XML parsing done simply right

Any time I had to do some XML parsing in my scripts using other languages I always felt somewhat confused. This is not sort of things that you just recall from your head and type as a code. You have to use specific APIs, you have to call them in specific way, in specific order etc. I do not mean this complicated in any way, it is not, but it is cumbersome in many languages. I always had to look things up in a cheat-sheet. Not any more 🙂 From now on, I will always lean toward the simplest, and perhaps basically the best implementation of XML parsing:

$d = [xml][/xml] "12"
$d.a.b

This outputs 1. Yes, it is as simple as that. You basically call member properties with appropriate names that match XML nodes.

I am sharing these features because I did not imagine a scripting language can offer something as powerful. And this possibly is only a tip of an iceberg, as I just scratched the surface of PowerShell world. I also suggest checking out little script I wrote to explore PowerShell functionalities: managesites.ps1. It may be useful for ASP.NET developers — it allows you to delete sites from IIS Express config file.

Miscellaneous code snippets:

  • if (test-path "c:\deploy"){ "aaa" }
  • $f="\file.txt";(gc $f) -replace "a","d" | out-file $f — this one is particularily important, because equivalent functionality of in-line editing in MinGW implementation of Perl and SED seems not to work correctly
  • foreach ($line in [System.IO.File]::ReadLines($filename)){ }
  • -match regex
  • ( Invoke-WebRequest URL | select content | ft -autosize -wrap | out-string )
  • reflection.assembly]::LoadWithPartialName("Microsoft.VisualBasic") | Out-Null
    $input = [Microsoft.VisualBasic.Interaction]::InputBox("Prompt", "Title", "Default", -1, -1);
  • foreach ($file in dir *.vhd) { }
  • Set-ExecutionPolicy unrestricted

My configuration files

This post is mostly for my personal reference, as it is useful to have one, easily accessible place for quick lookups of configuration files for commonly used tools. There is also special link to this post: https://blog.pjsen.eu/conf


.gitconfig

https://github.com/przemsen/main/blob/master/configs/.gitconfig (raw)

.bashrc

https://github.com/przemsen/main/blob/master/configs/.bashrc (raw)

.vimrc

https://github.com/przemsen/main/blob/master/configs/.vimrc (raw)

git-prompt.sh — Git Bash for Windows

https://github.com/przemsen/main/blob/master/configs/git-prompt.sh (raw)


Main GitHub repository

https://github.com/przemsen/main

My GitHub + first simple project published

I have eventually set up my GitHub account and published some of my code. The URL of the account is:

https://github.com/przemsen.

And the first and very basic project is:

https://github.com/przemsen/WebThermometer.

WebThermometer

WebThermometer

WebThermometer is a WPF application to be used as a desktop gadget. It repeatedly downloads (default is 5 min. interval) current temperature from arbitrary web site and displays it. I personally find it useful as I like to observe current weather conditions right from my computer. I tried to write in a way so that it can easily be modified for use with other data sources. You can also download already compiled and ready to run version from my Polish blog.

My plan is to successively select some of my entire projects and some code snippets which in my opinion are and/or will somehow be valuable to show and demonstrate. You can freely modify and recompile all of the published code providing that you specify it has originally been authored by me.

PS. Today auto updating mechanism of my WordPress failed (apparently this sometimes happens) and I ended up with damaged entire installation. I restored from backup and I apologize for deleting few comments since last 2 months.

The basics do matter

Recently I have spotted the following method in the large C# code base:

This code works and does what it supposed to do. However, I had a slight inconvenience while debugging it. I tend to frequently use Visual Studio DEBUG->Exceptions->CLR Exceptions Thrown (check this out!) functionality which to me is invaluable tool for diagnosing actual source of an exception. The code base relied heavily on this very ConvToInt method, thus it generated lots of exceptions and caused Visual Studio to break in with the debugger over and over again. I then had to disable CLR Exceptions Thrown to protect myself from being hit by flying exceptions all the time. Having switched this off I ended up with somehow incomplete diagnosing capabilities. It is bad either way. So, what I did was basically simple refactoring:

This method also works. One can even argue for better performance of this code, because throwing exceptions is considered to be slow. And this is also correct. Although performance was not key factor here (for line of business applications rarely is), but I measured it anyway. I ran both methods in a for loop 5 million times in release mode having wrapped them with appropriate calls to the methods of Stopwatch class. The results are surely not surprising. For valid string representations of a number, the former method (i.e. one using System.Convert) gave the average result of

663 milliseconds

and the latter (i.e. one using TryParse) gave the average result of

642 milliseconds

We can safely assume both methods have the same performance in this case. Then I ran the test with a not valid string representation of a number (i.e. passing “x” as an argument). Now the TryParse version gave the average result of:

546 milliseconds

And the System.Convert version, which indeed repeatedly threw exceptions gave the (not average, I ran this once) result of

233739 milliseconds

That is a huge difference in 3 orders of magnitude. Then I was fairly convinced my small and undoubtedly not impressive refactoring was right and justified. Except that it is not correct. It has worked well and has been successfully tested. But after a few weeks, when a different set of use cases was being tested, the application called ConvToInt with -1 in the second argument. It turned out, that the method returned 0, not the -1 for invalid string representations of a number. What I want to convey here is:

TryParse sets its out argument to 0, even if it returns false and did not successfully convert a string value to a number.

I scanned the code base and have found this pattern a few times. Apparently I was not the only programmer to not know this little fact about TryParse method. Of course, it is well documented (http://msdn.microsoft.com/en-us/library/f02979c7.aspx). The problem with this very API to me seems even more serious. The 0 value is supposed to be the most frequently used number value when it comes to string conversion failure in general. However, in a construct like this above, it comes from TryParse itself, despite the fact that it is provided by the caller and, more importantly, is primarily expected to be used as a default number value in case of failure. One can easily get into trouble when he or she expects (and passes as argument) different default value, e.g. -1 and still receives 0 because TryParse works this way by definition. Obviously the solution here is to add an if statement:

The performance does not get significantly worse because of this one conditional statement, I measured it and it is roughly the same.

The lessons learned here:

  • Exceptions actually ARE EXPENSIVE. This is NOT a myth.
  • Do not rely on the value passed as out variable to TryParse method in case of a failure. Always back up yourself with an if statement and check for the failure.
  • More general one: learn the APIs, go to the documentation, do not simply assume you know what the method does. Even if it comes to basics. The descriptive and somewhat verbose method name can still turn into an evil when ran under edge cases. Always be 100% sure about what is the contract between the API authors and you, i.e. what the API actually does.

One thing cmd.exe is better at than *nix shells (with default configuration)

I know the statement might be considered controversial. I even encourage you to try to prove me wrong, because I wish I knew better solution.

In my every day work I tend to use command prompt a lot. I have both cmd.exe and bash (from Git for Windows) opened all the time. My typical environment comprises numerous directories, I mean more than one hundred. Many of them share parts of their names. The names are long, dozens of characters. I have to move between them over and over again. The problem is that it is not feasible to type longish directory name many times manually.

Now, let’s suppose we have the following (shortened, of course) subdirectories structure of a directory which we are in at the moment:

..
aa
bbaa
ccaadd
eeaaff

When I would like to change the directory to bbaa in cmd.exe I type cd *aa* <Tab> <Tab>, I get the second result of auto completion, I press <Enter> and I have moved to bbaa. If I press <tab> three times, I get ccaadd. And when I press <tab> four times, I get eeaaff. This feature is brilliant. The auto completion works with wildcards and matches not only beginnings of a name. Last, but not least, it allows to cycle through suggestions while in-line editing a command.

The most important part here is: not only beginning of a name (which is, as far as I know, the behavior of a typical Unix shell) AND also ability to have the suggestions inserted in place, not only displayed them below the command prompt.

A Unix shell also does match wildcards, but only displays matched names. It does not offer (or I am not aware of it) a way to instantaneously pass matched name to a command. It only lists relevant suggestions and a user have to then manually re-edit the command so that it has desired argument. cmd.exe is better in that it allows a user to cycle through suggestions while in-line editing command argument. Which is great when it comes to long names of which only some parts can be conveniently memorized by a human.

I propose the following function which could be appended to .bashrc.

function cdg() { ls -d */ | grep -i "$1" | awk "{printf(\"%d : %s\n\", NR, \$0)}"; read choice; if [ "$choice" == "0" ]; then : ; else cd "`ls -d */ | grep -i \"$1\" | awk \"NR==$choice\"`"; fi; }

It is a simplistic function that searches through ls results with grep, parses them with awk and finally picks one of them and calls cd.

Now we can type cdg aa and we get all possible choices:

1 : aa/
2 : bbaa/
3 : ccaadd/
4 : eeaaff/

We simply type the number and we are done being moved to the desired directory. Without the need to manually re-enter the cd command with proper argument. Obviously, in cmd.exe we get this nice auto completion for every command typed in the interpreter, and my solution only solves changing directory use case in bash.

2014.02.02 UPDATE 1: After some deeper research, it turned out that the behavior of cmd.exe can be achieved in bash as well. The following line should be included into .bashrc:

bind '"\t":menu-complete'

However, my solution still serves the purpose as it uses grep -i which makes it case insensitive and thus renders it useful.

2014.02.06 UPDATE 2: I have experienced the second reason my solution is still relevant. It is much faster than pressing <tab> and waiting for the shell to suggest names. This can be observed in an environment with number of directories greater than a few, where MinGW tooling tends to be slow in general.

Python script executed by cron crashes when printing Unicode string

I have written a little Python script for my personal purposes and scheduled it to be run on Raspberry Pi by cron. After some polishing work I was pretty sure it worked well and was successfully tested. What I mean by to be tested is to be executed manually from the shell and to observe the expected results. So far so good.

However when the script was run by cron, it failed at the line where it prints some string containing Unicode characters. The line executes normally when running from the shell. I suspected there is some issue with the standard output of the processes run by the cron, because in such case there is no meaningful notion of standard output.

As it turns out, programs executed by cron have no terminal attached to their standard output. According to this Stack Exchange post the standard output is sent to the system’s mail system and thus delivered to a user this way. One can easily verify this by issuing tty command in the cron and redirecting the output to a file. Something similar to this is not a terminal (message translated directly from my system with Polish locale) should be observed.

The further explanation goes as follows: if there is no terminal attached to the process, the Python interpreter cannot detect the encoding of the terminal (it has nothing to do with the system’s environmental variables describing locale. They are all global and are part of the process’ environment, but they do not affect the non-terminal device being attached to standard output). You can verify this by running a Python script that tries to output terminal’s encoding: print sys.stdout.encoding. The None can be observed. So, the interpreter falls back to Ascii encoding and crashes when printing Unicode characters.

The solution in this case was to enforce the interpreter to use Unicode encoder for standard output.

UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)

The output is discarded at all, but these lines prevent the interpreter from using default Ascii encoder which is not appropriate for printing Unicode string.