January 10, 2014

msed - multiple search and replace on a file from command line

msed is a simple command-line program that search and replace multiple words in a file. You put the words to be searched and replaced in the "Pattern" file, and the run msed with "msed Pattern-file Target-file".

msed does word boundary replacement. Therefore, when you ask it to replace "kit" to "kat", it will respect and not replace "kitty" or "kit0".
msed will output to standard output. You can redirect it to a file.
The code:


#!/usr/bin/php

# Perform multiple search and replace on the target file

function usage(){
    echo "\n" .
         "Usage:  msed   [-r]\n" .
         "pattern-file contains lines of Search and Replace, example\n" .
         "   Pig Dog\n" .
         "   Cat Kitty\n" .
         "-r: reversed pattern file, ie. Replace is first, Search is second\n";
}

if (count($argv)!=3 && count($argv)!=4){
    usage();
    exit(-1);
}

$pfile=$argv[1];
$tfile=$argv[2];
$reverse=false;
if (count($argv)==4){
    $reverse=true;
    fprintf(STDERR,"reversed serach and replace patterns\n");
}

## check files
$pat_str=file_get_contents($pfile);
if ($pat_str===FALSE){
    die("Error opening pattern file $pfile\n");
}
$t_str=file_get_contents($tfile);
if ($t_str===FALSE){
    die("Error opening target file $tfile\n");
}
$pat_arr=explode("\n",$pat_str);
if (count($pat_arr)<1){
    die("Error, no pattern found in file.\n");
}

## read patterns and sort by length
$pat=array("s"=>array(),"r"=>array());
foreach($pat_arr as $pline){
    $pline=trim($pline);
    if (strlen($pline)<1) continue;
    $parts=preg_split('/\s+/',$pline);
    if (count($parts)!=2){
        fprintf(STDERR,"skipping invalid pattern line:$pline\n");
        continue;
    }
    if ($reverse){
        $pat["s"][]="/\b$parts[1]\b/";
        $pat["r"][]=$parts[0];
    }else{
        $pat["s"][]="/\b$parts[0]\b/";
        $pat["r"][]=$parts[1];
    }
}

## do search and replacement
$t_str=preg_replace($pat["s"],$pat["r"],$t_str);
echo($t_str);
                

No comments:

Post a Comment